Introduction

According to the European Union, benzene is not allowed in concentrations equal to or greater than 0.1 % by weight in substances and preparations placed on the market (European Directive 2008/50/CE). The prohibition does not apply to fuels for which the benzene level does not exceed 1 % by volume, and therefore, the major source of environmental exposure consists of the exhaust gasses from motor vehicles fueled with gasoline. However, benzene is still one of the most dangerous sources of contamination in urban air; it is an ubiquitous environmental contaminant as a component of automobile emissions, gasoline, and cigarette smoke. Occupational exposure to benzene, as a side product, can occur in the petrochemical industries, coke oven and steel plants, chemical industries and in laboratories using the chemical for research or analysis, and in case of prolonged exposure to traffic pollution of workers. Benzene is classified as human carcinogen with hematotoxicity (class I, IARC 1982). The European Community set a limit value of 5 μg/m3 for benzene in urban outdoor air as an annual average (D Lgs n. 155 2010), while for the workplaces the threshold limit for an average working shift of 8 h is 3.25 mg/m3 (1 ppm) (DLg.s 81/08). A time-weighted average-threshold limit value (TLV-TWA) for occupational exposure of 1.60 mg/m3 (0.5 ppm) is recommended by the American Conference of Governmental Industrial Hygienists (ACGIH 2014). The assessment of the risk of exposure to benzene can be achieved by monitoring the environmental and occupational airborne concentrations of this compound and/or by means of the biological monitoring of the exposed subjects. Urinary trans, trans muconic acid (t,t-MA), and S-phenylmercapturic acid (SPMA) are recommended as sensitive and specific biomarkers for occupational benzene exposure by the (ACGIH 2014) but they are also suitable markers for environmental exposures. Cigarette smoking is a major source of exposure to benzene in active smokers (Fustinoni et al. 2005) and is also able to affect the levels of biological markers of exposure to benzene in non-smokers exposed to passive smoking (Protano et al. 2010; Fustinoni et al. 2012); for this reason, the assessment of occupational or environmental benzene exposure through biological monitoring in smokers is complicated by the confounding effect of smoking. In order to overcome this problem, a method to separately evaluate the contribution of smoking to the formation of benzene metabolites is needed, based on the biological monitoring of smoke exposure: cotinine is the biomarker used to measure exposure to cigarette smoke, which is the major metabolite of nicotine contained in tobacco, excreted in the urine (Fustinoni et al. 2013). The time to complete elimination of cotinine from the body is about 36 h. In the literature, the median value of cotinine in the urine of non-smokers exposed to environmental tobacco smoke (ETS) is indicated in about 1.3 ng/mL (fifth-95th percentile 0.2–38) while in the urine of smokers median cotinine levels is 687 ng/mL (fifth–95th percentile 3.7–2717).; while several studies have shown that urinary cotinine levels are always <100 μg/L in non-smokers, proposed cut-off values to identify an active smoking habit have ranged from 20 to 100 μg/L (Fustinoni et al. 2013). Human exposure to benzene, independently from its source, causes oxidative damage to DNA and RNA (Lai et al. 2005; Manini et al. 2010) that can produce tissue inflammatory damage, cell aging, diabetes, neurodegenerative, cardiovascular and other age-related diseases, and even the development of some cancers. Oxidative stress is an imbalance between the production of reactive oxygen species (ROS) and the ability of the biological system to repair the damage. The oxidative damage/repair markers of DNA and RNA, 8-hydroxy-guanine (8oxoGua), 8-hydroxy-2′deoxyguanosine (8oxoGuo), and 8-oxo-7,8-dihydroguanosine (8oxodGuo) can be determined in the urine; 8-oxo-7,8-dihydro-2′-deoxyguanosine (8-oxodGuo) is the oxidized form of the nucleoside that is formed by guanine attached to a deoxyribose, from the 2’-deoxyribonucleotide pool; neither cell death nor diet contributes considerably to urinary 8-oxodG, and its levels are not influenced by long-term storage of urine specimens at −20 °C. 8-oxo-7,8-dihydroguanosine (8-oxoGuo) is the oxidized form of the nucleoside that is formed by guanine attached to a ribose in RNA. 8-oxo-7,8-dihydroguanine (8-oxoGua) is the oxidized form of the guanine, coming predominantly from DNA (Il’yasova et al. 2012; Jacob et al. 2013; Valavanidis et al. 2009). In this study, the validation of HPLC-MS/MS method for the determination of benzene and nicotine metabolites in urine was carried out, and the correlation between exposure to benzene and oxidative stress was studied in smokers and non-smokers by means of the determination of the oxidative stress biomarkers on the same urine samples.

Materials and methods

Chemicals and equipment

The analytical reference standards of DL S-phenylmercapturic acid (DL-SPMA) (98 %) and 8-oxo-7,8 dihydroguanine (8-oxoGua) (98 %), 8-oxo-7,8dihydro-2′deoxyguanosine (8-oxodGuo) (99 %), 8oxo-7,8 dihydroguanosine (8-oxoGuo) (99 %) were purchased by Spectra 2000 s.r.l (Rome, Italy). T,t MA (97 %) and cotinine (>99.5 %) were supplied by Sigma-Aldrich (Milan, Italy). The deuterium-labeled internal standards DL-SPMA-3,3-d2(99 %), t,t-MA-d4 (>99 %), cotinine-d3 (99 %), and [13C 15N2] 8-oxodGuo and [13C 15N2] 8-oxoGuo were obtained from CDN Isotopes Inc. (Pointe-Claire, Quebec, Canada). [13C15N]8-oxoGua (98 %) was obtained from Cambridge Isotope Laboratories, Inc. (MA, USA). 6 N Hydrochloric acid, glacial acetic acid, and CHROMASOLV® gradient grade, ≥99.9 % methanol, and acetonitrile for LC/MS were obtained from Sigma-Aldrich (Saint Louis, MO, USA). Ammonium acetate (98 %; Merck, Darmstadt, Germany) buffer was in water, purified water was obtained from a Milli-Q Plus system (Millipore, Milford, MA, USA), and concentrated ammonia (Merck) has been used for urine pH adjustment. The SPE cartridges, Sep-Pak Plus C18 (10 mL, 500 mg) were supplied by Waters. Anotop 10LC syringe filter device (0.2-μm pore size, 10-mm diameter) was purchased from Whatman Inc. (Maidstone, England). A Sinergi Fusion C18 column (150 × 4.6 mm, 4 μm) supplied by Phenomenex (USA) and Synergi 4U Polar RP C18 column (150 × 4.6 mm, 4 μm) supplied by Phenomenex (USA) were used throughout the study. Urinary creatinine has been determined by the method of Jaffè (Henry 1974), using alkaline picrate test with UV/VIS detection at 490 nm.

All other urinary determinations were performed by HPLC/MS-MS on a Series 200 LC quaternary pump (PerkinElmer, Norwalk, CT, USA), coupled with an AB/Sciex API 4000 triple-quadrupole mass spectrometry detector equipped with a Turbo Ion Spray (TIS) probe. The instrument was calibrated using polypropylene glycol, and the resolution was adjusted to a peak width (FWHM) of 0.7 Th over the range of m/z 100–1000. Detection was in the MRM mode, and parameters were optimized for the analytes by the automated “infusion quantitative optimization” procedure and subsequently refined by flow injection analysis (FIA) using the pure standards.

Human urine samples for standard calibration curves and quality control samples were obtained from healthy, non-smoking volunteers.

Subjects

The study involves 446 healthy volunteers residents in an area of central Italy (66 % of the total number invited); selection criteria were age between 35 and 69 years at the beginning of the study and being residents in the same area of since at least 10 years. To each subject who accepted to participate to the study signing an informed consent, a questionnaire was administered for the collection of the following information: gender, age, residence address, occupation, smoking status, and health status. Smoking is the most important source of benzene exposure with respect to which occupational and environmental benzene exposure are small contributes and minor confounding factors. These subjects provided a first morning urine sample on which the benzene and nicotine metabolites were determined. In addition, the biomarkers of oxidative stress were determined in a sub sample of 131 subjects, representative for gender (58 % female), age, and smoking habit (28 %). The study was performed in the 2 years period 2013–2014. As our study was considered an observational study on the basis of the definitions of the European Directive 2001/20/EC, therefore, the approval of an Ethic Committee was not requested.

Analytical procedure

Biomarkers of oxidative stress

Urine samples were collected in sterile polypropylene containers, divided into three aliquots and frozen at −20° in polypropylene screw-cap tubes until analysis. Biomarkers of nucleic acid oxidation, 8-oxodGuo, 8-oxoGuo, and 8-oxoGua, were determined on one aliquot of urine sample by isotopic dilution LC-MS/MS using an AB-Sciex API 4000 triple-quadrupole mass spectrometer, according to the method described by Andreoli et al 2010 with some modifications that involve the use of commercial standards for 8-oxodGuo and 8-oxoGuo, of a Synergi 4U Polar RP C18 column and acetic acid (instead of formic acid) for the mobile phase acidification. In these conditions, the LODs, calculated using the approach based on the standard deviation of the response and the slope, and expressed as 3.3 σ/S, was 2.99 nmol/L for 8oxoGua, 1.69 nmol/L for 8oxodGuo and 2.34 nmol/L of 8oxoGuo. The variability of the method expressed as % CV was in the range 2–17 % for intra-day and between 1 and 21 % for inter-day determinations.

Preparation of standard, calibration, and quality control solutions of t,t-MA, SPMA, and cotinine

A stock standard solution of 10 mg/L for t,tMA, 1 mg/L for SPMA, and 100 mg/L for cotinine, and stock standards solutions of the internal standards of 10 mg/L for t,tMA-d4, 1 mg/L for SPMA-d2, and 100 mg/L for cotinine-d3 were obtained by weighing and dissolving in methyl alcohol. Calibration standards mixtures are then obtained by further dilutions of the above solutions with urine of healthy non-smoking subjects, in the concentration range of 10–500 μg/L for t,tMA, of 0.1–25 μg/L for SPMA, and 50–2500 μg/L for cotinine. A non-spiked urine sample represents the 0 point of the calibration curve. The concentration of internal standards is respectively 100 μg/L for t,t-MA-d4 5 μg/L for SPMA-d2, and 500 μg/L for cotinine-d3 in each sample; such samples were then subjected to the urine purification procedure and HPLC-MS/MS analysis in order to build calibration curves.

Three quality control samples (solutions of known concentration) were prepared at low, median, and high concentrations (see Table 1) in urinary matrix, all containing the same concentration of the internal standard and analyzed as unknown samples on five different days, three of them not consecutive. The results were used to establish the performances of the method.

Table 1 Validation results of the analytical method for: SPMA, t,tMA, and Cotinine

Urine purification procedure for t,t-MA, SPMA, and cotinine determination

Three microliters of urine was treated with of HCl 6 N until pH = 2 in order to hydrolyze the precursor of SPMA (Paci et al. 2007) and added with the deuterium-labeled internal standards solution in methanol containing 30 μL of t,t-MA-d4, 15 μL of SPMA-d2, and 15 μL of cotinine-d3 at the concentrations of 10, 1, 100 ng/L, respectively. SPE purification was carried out on Sep-pack C18 cartridges preconditioned with 3 mL of methanol and 3 mL of 2.0 % (v/v) acetic acid in water; after loading the samples, the cartridges were washed with 3 mL of 2.0 % (v/v) acetic acid in water: the washing fraction (containing the cotinine) was added with 160 μL of 30 % NH4OH and 1 mL of 0.1 M ammonium acetate buffer reaching pH = 8 and stored. The acidic metabolites (t,t-MA and SPMA) were then eluted with 1.5 mL of methanol and the eluate, filtered on Anotop 10 I.C. 0,2 μm, and injected into the HPLC/MS-MS system. The same cartridges were further washed with 3 mL of methanol and 3 mL of water, and afterwards, the stored fraction containing the cotinine was loaded; the cartridges were then washed with 3 mL of water and eluted with 1.5 mL of methanol. After filtration on Anotop syringe filters, the eluate was analyzed by HPLC/MS-MS See Fig. 1. Each sample was tested in duplicate. The final concentration of each analyte was divided by the urinary creatinine concentration and expressed as μg/g of creatinine. Samples with creatinine concentrations lower than 0.3 g/L or higher than 3.0 g/L were excluded from statistical analysis according to the American Conference of Governmental Industrial Hygienists (ACGIH) recommendation (ACGIH 2014).

Fig. 1
figure 1

Steps of the SPE urine purification process

HPLC/MS/MS analysis

The HPLC analyses of urine samples and calibration standards for t,t-MA, SPMA and cotinine were performed on a Series 200 LC quaternary pump (Perkin Elmer, Norwalk, CT, USA) using a 150 × 4.6-mm, 4-μm Sinergi Fusion C18 analytical column. The mobile phase, was as follows: 2-min equilibration with 10 % acetonitrile (phase A) and 90 % acetic acid 1.0 % v/v in water (phase B), then a linear gradient up to 77 % of phase A and 23 % of phase B in 7 min, 1 min with a linear gradient back to the starting conditions, flow rate 600 μL/min. In these conditions, the retention times of t,t-MA and the internal standard are about 5.48 min, and of SPMA and internal standard are 7.8 min. The total run time was 10 min. For urinary cotinine analysis, the same column and mobile phase were used, with 60 % of phase A and 40 % of phase B, flow rate 800 μL/min. The retention times of cotinine and the internal standard are about 2.8 min. The total run time was 5 min.

The precursor → product ionic transitions monitored are, in the negative ion mode 141.0 → 97.00 for t,t-MA, 145.0 → 100.0 for t,t-MAd4, 238.1 → 109.1 for SPMA, and 240.1 → 109.1 for SPMAd2 and in the positive ion mode 177.3 → 80.10 for cotinine and 180.3 → 80.10 for cotinine-d3. The 1.5 version of Analyst® software was employed for instrument control. The chromatograms of a urine sample of a smoker are reported in Fig. 2a for the determination of SPMA and t,t MA and in Fig. 2b for the determination of cotinine.

Fig. 2
figure 2

a HPLC-MS/MS chromatogram of a urine sample of a smoker spiked with SPMA and t,t-MA isotope labeled internal standards. b HPLC-MS/MS chromatogram of a urine sample of a smoker spiked with cotinine isotope labeled internal standard

Data processing

For each sample tested, the arithmetic mean value of the peak areas of two replicate injections was used. For each analyte, the area of peak a blank urine sample was subtracted from the areas of the corresponding urine calibration standards. The calibration curves were generated using linear regression analysis according to the equation y = ax + b for t,t-MA and SPMA, where y is the ratio between the area of the analyte calibration standards (after subtraction of the blank) to that of the corresponding internal standard, a is the slope of the regression line, and x is the concentration of the analyte, and b is the intercept. For urinary cotinine, the best interpolation has proved to be a calibration curve of second order polynomial as suggested by the International Conference of Harmonization (ICH) guidelines for method validation (ICH 1996). This solution has been used in other studies that used the MS/MS detector (Picó et al. 2007; Fayed et al 2007; Carrieri et al. 2009). The concentrations of the analyte in the unknown or quality control samples were calculated from the regression equation of the calibration curve and expressed as micrograms per liter of urine. The concentrations of the biomarkers of oxidative stress in the unknown or quality control samples were calculated from the regression equation of the calibration curve and expressed as nmol/L of urine. Concentrations of all urinary metabolites were expressed in micrograms per gram of creatinine.

Uncertainty of measurement

The uncertainty of measurement was evaluated according to EURACHEM/CITAC guidelines (Ellison et al. 2000). Two independent sources of uncertainty were recognized: the component associated with the precision u (p) and the component associated with the calibration curve u (cal).

The contribution to uncertainty due to the precision was calculated from the intraday precision measured on three different days at the lowest concentration. The component associated with the calibration curve at the concentration c0 is

$$ u\ \left({c}_0\right) = \frac{S}{slope}\sqrt{\frac{1}{p}+\frac{1}{n} + \frac{{\left({c}_0-{c}_m\right)}^2}{S_{xx}}} $$

where S is the residual standard deviation of the linear regression of the calibration curve, p is the number of calibration curves used, n is the number of observations (n of calibration curves x n of concentration levels), c 0 is the concentration considered, c m is the median of the concentrations of the standards used to build the calibration curve, and S xx is calculated as follows:

$$ {S}_{xx} = {\displaystyle \sum_{J=1}^n}{\left(\ {c}_j-{c}_m\right)}^2 $$

The two components were combined according to the following equation, obtaining the relative combined uncertainty:

$$ {u}_{(comb)} = \sqrt{{u_{(p)}}^2+{u_{(cal)}}^2} $$

Statistical analysis and chemometrics methodology

Statistical analyses were performed using the Analysis ToolPak, a Microsoft Office Excel add-in program. Prior to performing any other statistical analysis, the normality of the distribution of the concentrations of the analytes was evaluated. Data below LOD have been replaced with one half of LOD; data between LOD and LOQ have been replaced with one half of LOQ. Statistical analyses were always performed using parametric methods (Pearson’s correlation, t test for independent variables). When transformation of the data to obtain a normal distribution was necessary, parametric tests were applied on the log values. Pearson’s coefficients were used for correlation analyses between the various urinary metabolites concentrations. Values of p < 0.01 are considered significant.

In order to understand whether the smoker/non-smoker condition could translate into a difference in the nucleic acids and oxidative stress-markers between the subjects, a multivariate classification approach based on linear discriminant analysis (LDA; Fisher 1936) was adopted. LDA, introduced by Fisher in 1936, was the first multivariate classification method proposed in the literature. It is a parametric method which assumes that, for each class, the measurements are normally distributed and that the within-class scatter matrix is the same for all the categories. Under such assumptions, the decision boundaries separating the categories in the multivariate space assume the form of linear surfaces, i.e., hyperplanes.

In detail, under the method’s assumptions, the probability that a sample characterized by the measurement vector x i belongs to the gth class p(g|x i ) may expressed as:

$$ p\left(g\left|{\mathrm{x}}_i\right.\right)=\frac{p_{0,g}}{{\left(2\pi \right)}^{\frac{d}{2}}\left|\mathrm{S}\right|}{e}^{\frac{1}{2}{\left({\mathrm{x}}_i-{\mathrm{x}}_g\right)}^T{\mathrm{S}}^{-1}\left({\mathrm{x}}_i-{\mathrm{x}}_g\right)} $$

where is the centroid of category g, S is the variance-covariance matrix, which is common for all the groups, p 0,g is the a priori probability of class g (i.e., the probability of observing a sample from class g, before having carried out any measurement), and d is the number of variables. Classification is then accomplished according to Bayes’ rule, which states that a sample should be assigned to the class to which it has the highest probability of belonging. Here it must be stressed that, even if LDA postulates that the probability densities are normally distributed with the same within-class scatter matrix for each category, as described above, it is anyway rather robust against violations of these assumptions.

Considering the relatively low number of smokers (24) in the subset of 131 subjects which constituted the basis of multivariate data analysis, a Monte Carlo cross-validation approach with 1000 iterations was used to estimate the predictive ability of the model.

Results and discussion

Validation of HPLC-MS/MS method for SPMA, t,t-MA and cotinine

The validation results are reported in Table 1. The calibration curves for each analyte showed a linear response in the tested concentration range. The linear regression coefficients (R 2) for all calibration curves are around 0.999. The total recovery of t,t-MA, SPMA, and cotinine is the result of the contribution of two factors, the extraction recovery due to the sample preparation procedure, and the matrix effect that can influence the instrumental response of the analyte. A matrix effect lower than 100 % indicates a phenomenon called ion suppression. The contribution of the matrix effect was evaluated with the procedure suggested by (Matuszewski et al. 2003), in three experiments performed using urines from different donors and resulted to be in the range 30–60 % for the three analytes. Total recovery ranges from a minimum of 45 % for t,t-MA to 55 % for SPMA and 60 % for cotinine. The use of the isotope-labeled internal standards compensate both for the matrix effect and for the extraction recovery, rendering the method performances in terms of accuracy and precision independent form the total recovery. The coefficient of variation (% CV) of the results ranged between 1.5 and 15 %, while accuracy calculated as the % ratio between found and theoretical value ranged from 85 to 105 % for each QC concentration, for each compound and for all intra- and inter-day determinations.

The LOD was calculated using the approach based on the standard deviation of the response and the slope, and expressed as 3.3 σ/S, where σ is the standard deviation of the response and S is the slope of the calibration curve of the standard analytes. The estimate of σ was carried out on the calibration curve as the residual standard deviation of the regression line of each analyte. The LOQ is expressed as 10 σ/S.

The relative combined uncertainty calculated for the lowest QC concentration is reported for each analyte with the validation results in Table 1 together with detection limits (LOD) and quantitation limits (LOQ) for each analyte.

SPMA, t,t-MA, and cotinine levels in all subjects

In the questionnaire, subjects declared to be smokers (n = 93), non-smokers (n = 197), and ex-smokers (n = 156). With reference to the cotinine urinary concentration, a cutoff value for the definition of smoker is set at urinary cotinine >100 μg/g of creatinine. Cotinine was detectable in all samples. The mean and median urinary concentrations of SPMA, t,t-MA, and cotinine are reported in Table 2 expressed in micrograms per grams creatinine for all subjects (n = 446) and separately for subjects having cotinine concentration >100 (n = 110) and <100 (n = 336). In the group with cotinine >100, there are subjects who declared to be non-smokers (n = 3) or ex-smokers (n = 18), while in the group having cotinine <100 μg/g of creatinine, there are three subjects who declared to be smokers. The cotinine concentration in subjects who declared to be non-smokers is much lower than 100 (mean 11.48/median 5.70 μg/g of creatinine) while that of ex-smokers is slightly higher (mean 149.32/median 7.88 μg/g of creatinine): this could be due to occasional smoking but most probably to the exposure to passive smoking. If we plot the concentration of SPMA in function of that of urinary cotinine for subjects having cotinine >100 μg/g of creatinine (smokers), we find that in the linear regression equation, the intercept of SPMA for cotinine = 0 corresponds to the median SPMA value of the subjects having cotinine <100 μg/g of creatinine (non/ex-smokers), about 0.2 μg/g of creatinine, confirming that 100 is a good cutoff value for the definition of smoker.

Table 2 t,t-MA, SPMA, and cotinine concentrations in all subjects

The urinary metabolites of benzene and nicotine followed a log normal distribution. Subjects with cotinine >100 showed significantly higher median concentrations of SPMA than those with values <100 (t test on log values, p < 0.0001). The t,t-MA value of subjects with cotinine >100 is about twice the <100 group value, but it is known that t,t-MA is not a specific benzene exposure biomarker as it is also a metabolite of sorbic acid; therefore, other sources contribute to the biomarker level.

SPMA, t,t-MA, and cotinine levels in the subsample

The mean and median urinary concentrations of SPMA, t,t-MA and cotinine are reported in Table 3 expressed in micrograms per gram creatinine for all subjects (n = 131) and separately for the cotinine levels. For this group, all the considerations made for the 446 are still valid.

Table 3 t,t-MA, SPMA, and cotinine in a sub sample of 131 subjects

Oxidative stress markers values in the subsample

The distributions of nucleic acid oxidation biomarkers expressed as median (and median standard deviations, SD)in the subgroup (n = 131) classified according to urinary cotinine concentration are summarized in Table 4. These biomarkers followed a log normal distribution: we did not find any statistically significant differences between The two groups, but a Pearson’s correlation analysis on log values shows that in subjects having cotinine >100, urinary 8-oxodGuo is correlated with SPMA and cotinine, and SPMA is strongly correlated with cotinine; this last result was to be expected as smoking is well known as a source of benzene exposure. Urinary 8-oxodGuo is positively correlated with 8-oxoGuo in subjects with cotinine <100, but not in those having cotinine >100. A table with correlation analysis is reported in Table 5; significant correlations are reported in bold. Scatter plots related to 8oxodGuo vs. SPMA is reported in Fig. 3 and vs. cotinine in Fig. 4, for subjects having cotinine >100-μg/gr.creatinine.

Fig. 3
figure 3

Scatter plot for the log values of 8oxodGuo vs. SPMA for smokers (cotinine >100-μg/g creatinine)

Table 4 Oxidative stress markers values in the subsample of 131 subjects
Table 5 Pearson’s correlation of exposure and effect biomarkers (log transformed values)
Fig. 4
figure 4

Scatter plot for the log values of 8oxodGuo vs.cotinine for smokers (cotinine >100-μg/g creatinine)

Multivariate analysis of exposure and effect biomarkers correlation by chemometrics

The dataset composed of the measurements of both the exposure and effect biomarkers on the subset of 113 samples was used to build and validate a multivariate classification model by means of linear discriminant analysis (LDA). To this purpose, the ex-smokers and the non-smokers have been gathered in a single category, so that only two classes were considered, namely, smokers and non-smokers + ex-smokers. Before building the classification model, the data have been pretreated by first dividing the metabolite concentrations of each individual by the creatinine content of the same subject (to compensate for the spurious inter-individual variation due to the differences in excreted urine volume) and then by applying column autoscaling. LDA was then applied to the pretreated matrix and an average correct classification rate of 93.1 % (75 % of smokers and 96.3 % of non- and ex-smokers). In order to validate the predictive ability of the model, a Monte-Carlo cross-validation approach with 1000 iterations, in which 31 samples were randomly put in the test set and the remaining 100 were used as training set, was adopted. An overall 91.6 % correct classification rate was obtained, where 75.2 % of smokers and 95.9 % of non- and ex-smokers were correctly predicted. To interpret the observed separation in terms of the markers mostly contributing to the discrimination between the classes, one can inspect the values of the coefficients of the single canonical variate, which can be computed. The canonical variate (CV) is the direction in the multivariate space along which the separation between the two categories is maximum, i.e., it is the direction orthogonal to the classification boundary. For the studied data subset, projection of the samples onto the single canonical variate (CV1) is reported in Fig. 5 as bar plot: it may be observed how smokers, with only a few exceptions (corresponding to the wrongly classified individuals) are characterized by a positive score onto the canonical variate, while the non- or ex-smokers (also in this case, with few exceptions corresponding to the misclassified individuals) have negative ones. Therefore, the plot clearly shows a very good separation between the classes and the high classification accuracy of the model.

Fig. 5
figure 5

Projection of the samples of the reduced subset onto the first (and only) canonical variate, indicating the very good separation among the groups of individuals

When inspecting the canonical weights, i.e., the coefficients of the six experimental variables (8oxoGua, 8oxodGuo, 8oxoGuo, t,t-MA, SPMA, cotinine) in the definition of the variate, it is possible to affirm that the experimental variables contributing the most to the discrimination (i.e., the variables having the highest values of the squared weight) are cotinine and t,t-MA, both with a positive coefficient: since positive scores on the canonical variate correspond to smokers (as shown in Fig. 5), this observation indicates, as expected, that these two metabolites have higher concentration in the urine samples coming from smoking people.

Conclusions

The validation results show that the HPLC-MS/MS analytical method is sensitive, accurate, and reproducible in determining the biomarkers of exposure to benzene and nicotine both in smokers and in non/ex-smokers: isotopic dilution is essential for compensating the matrix effect. The median SPMA value in smokers is about ten times that of non-smokers, while the t,t-MA value of smokers is about twice the non- and ex-smokers value: besides, the concentration of urinary SPMA is linearly correlated to that of cotinine in subjects over the cutoff value of 100 μg/g of creatinine (smokers), and if cotinine is set to 0 it corresponds to the median SPMA value of the subjects below the cutoff value (non-/ex-smokers), confirming the validity of this cutoff value. Also chemometric analysis performed on the subgroup found no difference between non and ex-smokers, while affirmed that the experimental variables contributing the most to the discrimination are cotinine and t,t-MA, both with a positive coefficient. SPMA looks to be more sensitive to smoking (approximately tenfold difference in median levels between smokers and non-smokers) than ttMA (only twofold difference). As it is recommended as biomarker for occupational benzene exposure by the ACGIH, and occupational exposures continue to be better controlled, increasing difficulties are encountered in distinguishing occupational exposure from cigarette smoke in smokers.

With reference to nucleic acid oxidation, we found no significant differences in the subgroup for the three biomarkers concentrations in the urine between smokers and non- or ex-smokers, while there is a significant correlation for smokers of urinary 8-oxodGuo with SPMA and cotinine, indicating that an effect of smoking on nucleic acid oxidation exists, even if it is probably masked by other factors depending on environmental and occupational exposure to oxidative stress agents. Further investigations increasing the number of subjects could reveal a significant difference.