1 Introduction

Non-targeted analyses have wide applicability in diverse research fields, such as food safety, metabolomics, and environmental analyses [13]. Mass spectrometry (MS), often coupled with a separation technique, is commonly used for chemical screening because a wide range of chemical compound classes can be detected and identified within a single, highly complex sample. Despite numerous advantages, non-targeted MS analyses are quite challenging, in part due to ion suppression, isobaric compounds, and changes in retention time, which can complicate automated detection, identification, and the development and application of data analysis strategies.

Seven Golden Rules were proposed by Kind and Fiehn to aid in the correct generation of chemical formulas for unknown compounds [4]. These include suggestions for restrictions for the number of elements, LEWIS and SENIOR chemical rules, isotopic pattern thresholds, and elemental ratios and their probabilities. To elucidate the correct formula of a completely resolved compound with an 80 to 99% probability, the mass accuracy should be within 3 ppm and a maximum of 5% absolute isotope ratio deviation should be observed [4]. High-resolution mass spectrometry (HR-MS) is certainly capable of obtaining the required mass accuracy to achieve these desired figures of merit, which has led to the use of Orbitrap and Q-TOF mass analyzers for non-targeted analyses [2, 3]. However, deviations in mass accuracy can occur and instrumental performance with regard to isotopic ratios, especially in complicated sample matrices and at varying analyte ion intensities, has not been fully characterized.

Mass accuracy measurements for Q-TOF and Orbitrap mass analyzers have been previously reported [511]. Experimental mass accuracies of Q-TOFs differ, where values are generally below 5 ppm, but larger mass accuracy errors can occur when low or high ion counts are encountered [5, 6]. Similarly, mass accuracies lower than 5 ppm have been demonstrated for Orbitrap mass analyzers, including characterizing a single compound in a complex sample matrix [10] or by measuring instrument performance using a standard mixture of greater than 200 compounds [11]. Different resolution settings were also tested for a four-compound mixture where resolution was not found to greatly affect mass accuracy [7]. However, Orbitrap mass accuracies can worsen because of peak coalescence and ion suppression [8, 9], which can occur when monitoring many analytes within a chemically complex sample.

Similarly, relative isotopic abundance values for both Q-TOF and Orbitrap instruments have been reported. Experimentally observed TOF isotopic ratio errors were within the 5% threshold for a standard mixture at a given concentration [12]. However, high ion counts can cause saturation of the detector, which would affect the experimental isotopic ratio. In a separate investigation involving a Q-TOF, the isotopic pattern fit was independent of acquisition rate [13]. Isotopic ratios have been reported for Orbitrap instruments [1416], although it is difficult to make a direct comparison with the 5% threshold because the data is not defined in this manner. Additionally, the Orbitrap and Q-TOF have also been characterized with regard to peak shape using MassWorks, rather than using relative abundance ratios [17, 18], which may aid in improved molecular formula generation. Investigations regarding resolution-related effects have also been reported, where increased resolution appears to result in increased isotopic distribution error [15, 17].

While these previous studies have demonstrated useful aspects of Q-TOF and Orbitrap capabilities, we aim to complement these studies by determining typical/expected instrument performance when analyzing different classes of compounds covering a large mass and retention time range under varying degrees of ion abundance within complicated sample matrices. This is especially critical in food safety screening, where adulteration of a food source may result in high abundance molecular species (e.g., melamine) or contamination of the food supply where the analyte of interest would be expected to be in low abundance in comparison with the compounds native to the sample matrix (e.g., pesticides). Similarly, these metrics would be useful for other research fields. For example, the goal in a metabolomics workflow is to characterize all compounds within a given sample matrix, which often have a large dynamic range (nM to mM levels).

The goal of this work is to determine experimental data quality and the conditions in which it would fulfill the requirements for accurate formula generation. Thus, we aim to characterize two state-of-the art instruments in terms of their mass accuracy and relative isotopic abundance (RIA) performance using complex sample matrices spiked with varying concentrations of a 48-compound analytical standard mixture. The stated figures of merit for the Q-Exactive include 140,000 resolving power with an expected mass accuracy less than 5 ppm, whereas the maXis is capable of 60,000 resolving power with a mass accuracy of 1 ppm. The comprehensive measurements of mass accuracy and RIA values provide reasonable expectations of performance for these two instruments, as well as generate specific examples of impaired data quality that would impact the success of a high-throughput non-targeted workflow. Experimental design and data analysis strategies can be modified or developed based on the measurements described here.

2 Experimental

2.1 Chemicals and Food Matrices

All solvents used were Optima Grade (Thermo-Fisher Scientific, Pittsburg, PA, USA). Analytical standards were purchased from Sigma-Aldrich (St. Louis, MO, USA), Thermo-Fisher Scientific (San Jose, CA, USA), and Cerilliant (Round Rock, TX, USA) and are listed in Supplementary Table S1. These compounds were chosen to cover a mass range of approximately m/z 100–1000 and a 50 min retention time range. The compound classes include antibiotics, poisons, toxins, steroids, pesticides, erectile dysfunction and weight loss drugs, carcinogens, and parabens. Food matrices were purchased from a local grocery store and included apple juice, plain low fat yogurt, banana baby food, and powdered infant formula.

2.2 Sample Preparation

The analytical standard mixture (compounds listed in Supplementary Table S1) was prepared in 90/10 (v/v) water/acetonitrile. The food matrices were prepared using 2 mL or 2 g of matrix in 10 mL acetonitrile. These were rotated at 33 rpm on a roller mixer for 1 h. The samples were centrifuged for 10 min at 3900 rcf at 10°C. The samples were then filtered using 25 mm PVDF, 0.45 μm pore size, Luer Lock syringe filters (Grace, Deerfield, IL, USA). The samples were then diluted 1:1 with water and the analytical standard mixture was spiked into these food matrices at a final concentration of 200, 50, 10, and 1 pg/μL. The analytical standard mixture was also prepared at these concentrations for LC/MS analysis.

2.3 Instrumentation

Two ESI HR-MS instruments were characterized for analytical performance: Q-Exactive Orbitrap (Thermo Fisher Scientific) and MaXis Qq-TOF (Bruker Daltonics, Billerica, MA, USA). The Q-Exactive settings were: 140,000 resolution, 1e6 AGC target, and Maximum IT 60 ms; the settings for the heated electrospray ionization probe (HESI-II) were: 4 kV spray voltage, 50 psi sheath gas, 15 (arbitrary units) auxillary gas, 380ºC capillary temperature, and 300ºC heater temperature. The source settings for the MaXis were 4.5 kV capillary voltage, −0.5 kV end plate offset, 1.6 bar nebulizer pressure, 8 L/min drying gas flow at 200ºC. The data were acquired with a 1.5 Hz acquisition rate under the version 1 calibration mode. The lock mass spray head was utilized for the MaXis using Hexakis(1H,1H,2H-difluoroethoxy)phosphazene (m/z 622; SynQuest Laboratories, Alachua, FL, USA) as the lock mass. Both instruments were run in full-scan, positive ion mode, and the scan range was m/z 100–1200. Both MS instruments were calibrated prior to a sample set; a sample set included all concentrations and technical replicates within a given food matrix (e.g., all yogurt concentration spikes).

The UHPLC system utilized for both MS instruments was a Shimadzu Nexera (Columbia, MD, USA). Chromatographic separations were performed on a Kinetex C18 2.1 mm × 100 mm, 1.7 μm, 100 Å column (Phenomenex, Torrance, CA, USA). The separation was performed with a column temperature of 60ºC and a flow rate of 400 μL/min using water with 0.1% formic acid (v/v) and acetonitrile with 0.1% formic acid (v/v) with the following gradient: 5 min hold at 95% water, 50 min linear gradient from 95% to 5% water, 5 min equilibration at 95% water. An injection volume of 10 μL of each sample resulted in 10, 100, 500, and 2000 pg of the analytical standard mixture on column. Each prepared sample was injected and analyzed five times for a total of 100 individual data files for both the Orbitrap and Q-TOF. The varying concentrations were randomly analyzed with a blank in between each sample.

2.4 Data Analysis

The experimental m/z values and peak intensities for the A, A + 1, and A + 2 ions were obtained using ToxID (Thermo) and DataAnalysis (Bruker). The settings for ToxID were a 15 s retention time window and a 5 ppm exact mass window for analyte screening. A lock mass calibration was applied to a subset of the Thermo data using RecalOffline (Thermo) using a mass tolerance of ± 5 ppm, searching by intensity, recalibrating per scan, and using Param C. This was performed for the yogurt data set for a side-by-side comparison of the applied lock mass calibration and without recalibration. The m/z value used was 391.28429 (diisooctyl phthalate ion), a background ion that is present for the majority of the chromatographic run [19]. The mass accuracy drift for the Q-TOF instrument was monitored using the lock mass ion, m/z 622.0290. Q-TOF data was then recalibrated using this lock mass, and scripting was utilized in DataAnalysis to plot and integrate the chromatograms of interest for each compound (10 mDa window) and to extract the experimental m/z value and intensity (30 s retention time window) from the centroid compound spectrum. Peak intensities were excluded for the isotopic peak ratios if the monoisotopic peak intensity was less than 100 counts.

A number of metrics were used to analyze the data. Mass accuracy was calculated by:

$$ mass\kern0.5em accuracy\kern0.5em error(ppm)=\frac{m/{z}_{theor}-m/{z}_{obs}}{m/{z}_{theor}}\times {10}^6 $$
(1)

To calculate the absolute isotope ratio deviation, the following formula was used for both A + 1 and A + 2 ions:

$$ absolute\kern0.5em isotope\kern0.5em ratio\kern0.5em deviation=\left| RI{A}_{theor}- RI{A}_{obs}\right|*100 $$
(2)

where

$$ RI{A}_{A+1}=\left(\frac{ Intensity\kern0.5em A+1}{ Intensity\kern0.5em A}\right) $$
(3)

We also wanted to compare the data to previously reported data, so RIA error (%) calculations were performed using:

$$ RIA\kern0.5em error\left(\%\right)=100\times \frac{ RI{A}_{\exp }- RI{A}_{theor}}{ RI{A}_{theor}} $$
(4)

After these individual calculations were performed for each analyte in each replicate, averages and standard deviations were calculated for each sample set.

3 Results and Discussion

Four different concentrations of a standard mixture that contained 48 compounds were analyzed for mass accuracy and relative isotopic abundance. These measurements were replicated in four different sample matrices, including apple juice, baby food, yogurt, and infant formula. Each experimental variation included five technical replicates; the combined data was comprised of 4800 data points for each instrument platform.

3.1 Mass Accuracy

First, the impact of the sample matrix and the amount loaded on column was investigated for the observed mass accuracy. In Figure 1, the mass accuracy is plotted against the m/z value of the corresponding compound for all replicates in all concentrations and food matrices, with each point corresponding to the mass accuracy value for an individual compound. In general, the mass accuracy was independent of the amount of the standard mixture loaded on column. The mass accuracy is within ± 5 ppm for the Orbitrap data, and the overall average values and the average values for each individual food matrix (shown on the top of each condition) were less than 3 ppm, which agrees with the expected specifications for this instrument. This mass accuracy was consistent during the 1.5 d required to analyze a sample set. The standard mixture and infant formula matrix yielded the best overall average mass accuracy followed by apple juice and baby food, whereas the worst was from yogurt. Although the temperature of the mass analyzer will impact the observed mass-to-charge ratios, it does not appear that this is the source of the drift (Supplementary Figure S1). To determine if the deviations were matrix-related, the mass accuracy of the diisooctyl phthalate ion (m/z 391) was monitored prior to sample elution (the first 9 s of each analysis) and is shown in Supplementary Figure S2. A similar pattern between mass accuracies is observed in Figure 1 and Supplementary Figure S2, where both data sets are negatively biased and have similar average/standard deviation values. Thus, it appears that this is an instrument characteristic and not attributable to matrix effects. This mass accuracy bias, data not being centered around 0 ppm, has been observed elsewhere [10, 20].

Figure 1
figure 1

Mass accuracy measurements for the 48 compounds in the analytical standard mixture with varying amounts of the standard mixture spiked in the individual food matrices. Each individual data point corresponds to the calculated mass accuracy for an individual detected compound, where five measurements were taken for each condition. The numerical values listed at the top of each plot for each matrix for the two instruments represent the average ± standard deviation of the absolute values of the experimental mass accuracy. The overall values for the Orbitrap and Q-TOF are 1.06 ± 0.76 and 1.62 ± 1.88, respectively. The m/z range displayed for each matrix is m/z 120–1130, with each increment along the x-axis corresponding to 100 Da. The color of the individual data points represent the amount loaded on column (red-2000 pg, green-500 pg, purple-100 pg, and navy-10 pg on column). The Q-TOF mass accuracy is with lock mass calibration applied

It is worth noting that the observable mass accuracy can worsen if peak coalescence or ion suppression of a nearby, co-eluting peak occurs [8]. Although this may be contributing to the larger deviations in mass accuracy observed in some of the sample matrices, the overall effect is minimal. However, the probability of encountering this issue could increase with faster chromatography or under non-ideal separation conditions. Furthermore, the Orbitrap mass accuracy does not appear to be influenced by intensity, as a plot of mass accuracy versus intensity yields randomly scattered data (Supplementary Figure S3).

To obtain an 80% to 99% probability of determining the correct molecular formula from an unknown compound, the mass accuracy should be within a 3 ppm window [4]. A lock mass calibration was not initially applied to the Orbitrap analyses to determine the extent of instrument drift for continuous LC runs, which spanned approximately 36 h. Because the yogurt matrix data set yielded the worst mass accuracy, a lock mass calibration was applied post-acquisition to this subset of Orbitrap data files to determine the extent of improvement. As expected, the mass accuracy improved, with the majority of signals within the suggested ± 3 ppm window (Supplementary Figure S4). However, there is a small section of the chromatographic run (<6 s window) where melamine elutes and the lock mass ion is not detected because of electrospray ion suppression. This emphasizes the importance of chromatography; despite high peak capacity, ion suppression due to interfering chemical species present in the complex sample matrix resulted in an insufficient intensity for the lock mass ion. In the absence of the lock mass ion, the software selected an incorrect peak as the lock mass instead of reverting to the original instrument mass calibration, resulting in a worsened mass calibration where the mass accuracy for melamine approached 4 ppm (Supplementary Figure S4). Ensuring that the lock mass is present during the entirety of the chromatographic gradient and/or is present while analytes of interest are eluting should yield data within the suggested 3 ppm mass window for the Orbitrap.

The Q-TOF mass accuracy was monitored with the m/z value of the lock mass for each analysis. As shown in Supplemental Figure S2, the mass accuracy drift varied, including a more than 40 ppm shift for the standard sample set. These plots trend similarly to the Q-TOF mass analyzer temperature data shown in Supplementary Figure S1. Therefore, the application of a lock mass calibration was necessary prior to processing the Q-TOF data to account for temperature fluctuations. After the lock mass calibration was applied, the mass accuracy generally improved to less than 10 ppm, with the majority of ions within ± 5 ppm (Figure 1). However, the 100 pg data points resulted in >10 ppm mass accuracy for one compound because of an incorrectly centroided peak resulting from an interfering background ion. The erroneous centroid only occurred at the 100 pg level because larger amounts on column resulted in higher ion counts that were sufficient to dominate the background ion, whereas the peak was undetected at the 10 pg level. Centroiding could be improved by changing the summation width used for creating centroid data. The average plus standard deviation listed at the top of the Q-TOF data in Figure 1 for all matrices is less than 4 ppm. Although the average mass accuracies for both instrument platforms appear to be sufficient, this does not yield a complete view of expected experimental mass accuracy; information regarding specific examples of when and how often deviations can occur are useful in determining areas of improvement when designing, optimizing, and/or choosing appropriate data analysis workflows.

Surprisingly, the Q-TOF detector did not saturate, as evidenced by the mass accuracy not worsening with increased ion counts; however, the large deviations of mass accuracy (>10 ppm) did occur at low ion counts (Supplementary Figure S3), which has been observed previously [6]. Although the mass accuracy of the Q-TOF data was not within the 3 ppm threshold, the mass accuracy could be improved by incorporating a calibrant at the beginning of every run with a loop injection for internal calibration. A lock mass calibration could then be applied post-acquisition to maintain sufficient mass accuracy during long chromatographic analyses, which has resulted in an average sub-ppm mass accuracy [13].

3.2 Isotopic Ratios

The absolute isotope ratio deviation was calculated for all A + 1 and A + 2 peaks of each detected compound in the standard mixture. The calculated average values for each concentration and matrix are listed in Table 1, along with the standard deviation. These average values worsen as the amount loaded on the column decreases. The Orbitrap data is generally within the 5% threshold for both A + 1 and A + 2 ions, except for the 10 pg on column analyses. In contrast, the combined average and standard deviations for the Q-TOF values were over the 5% threshold in most cases.

Table 1 Average Absolute Isotope Ratio Deviation Values

The absolute isotope ratio deviation values worsen as signal intensity decreases, as shown in a plot of the absolute isotope ratio deviation versus monoisotopic peak intensity (Figure 2). The largest absolute error for the Orbitrap was close to 30% for A + 1, whereas the largest error was greater than 100% for A + 2. Because the Orbitrap exhibits the largest deviations in isotopic abundance ratios under increased resolution settings [15, 17], the highest resolution setting was used for this investigation to determine the extent of this deviation. Lower resolution settings could improve upon this data, but at the risk of including interfering chemical species with similar molecular weight. The Q-TOF data shows a similar trend, with lower intensity signals resulting in increased absolute isotope ratio deviation. Because the deviation is much greater than the Orbitrap, this also skews the average data shown in Table 1. Nevertheless, the observed deviations in RIA are within the 5% threshold for both instrument platforms, given sufficient monoisotopic peak intensity.

Figure 2
figure 2

Absolute isotope ratio deviation (Equation 2) for the A + 1 and A + 2 peaks versus the intensity of the monoisotopic peak. The red line marks the 5% absolute isotope threshold suggested by the Seven Golden Rules

Although the current work was focused on how well modern HR-MS instrumentation performs with respect to requirements outlined by the Kind and Fiehn Seven Golden Rules, we also wanted to include data with respect to RIA error (%) so that a comparison could be made to previously collected data [16]. The trend for worsening RIA errors (%) with decreased peak intensity is similar to previously published data (Supplementary Figure S5). RIA errors (%) greater than 100% were encountered for both instruments for A + 1. RIA errors (%) are lower than 20% for the Orbitrap (including the standard deviation) for 500 and 2000 pg on column for all matrices (Supplementary Table S2), although this is not true for A + 2 (data not shown). The average RIA errors (%) are higher for the Q-TOF, but again this is due to the larger RIA errors (%) that are observed at decreased monoisotopic peak intensities that skew the average values.

To determine the extent of acceptable experimental RIA values, the percentage of compounds below and above the 5% absolute ratio deviation threshold and isotopic peaks that were not detected, was calculated for each order of magnitude of intensity (Figure 3). For example, if the monoisotopic peak for the Orbitrap or Q-TOF is greater than 1E7 or 1E5, respectively, it is highly likely that the relative isotopic distribution will be within the 5% threshold. The trends are quite similar for the two instrument platforms. For Orbitrap intensities less than 1E5, there is an approximately equal percentage of compounds that are or are not within the threshold, if the A + 1 peak is detected. The confidence in the Q-TOF A + 1 data deteriorates at intensity levels below 1E3. The A + 2 ratio for Orbitrap and Q-TOF monoisotopic peaks less than 1E5 and 1E3, respectively, have a high probability that the distribution will be outside the threshold, if an A + 2 peak is detected at all. Thus, the A + 2 peak is not sufficient for formula generation when monoisotopic ion counts are low, except in cases where heteroatoms, such as bromine or sulfur, are present.

Figure 3
figure 3

Percentage of compounds that are or are not within the 5% absolute isotopic ratio deviation threshold or are not detected. Data are listed for the A + 1 (Top) and A + 2 (Bottom) ions. ND: not detected, <5: less than the 5% isotopic threshold, >5: greater than the 5% isotopic threshold

While analyzing the data, we found a couple of specific examples of impaired data quality where automated non-targeted screening may be hampered. In the first example, the A + 2 peak for ricinine was within the 5% absolute ratio deviation threshold when the monoisotopic peak had sufficient intensity (Figure 4a). However, a peak with an identical mass to the A + 2 peak that was present in some of the food matrices eluted at the same retention time as ricinine, which resulted in a significant RIA error. This is in spite of UPLC separations with long chromatographic gradients and high peak capacity where interfering species should be minimized. Furthermore, with sufficient analyte monoisotopic peak intensity, this effect should be less likely, as shown in Figure 2. In the second example, a co-eluting compound impedes the detectability of amoxicillin (Figure 4b). Despite analyzing the samples with the highest resolution setting for the Orbitrap instrument (140,000), it was not sufficient to separate amoxicillin from a co-eluting matrix peak. This impairs the ability to detect this compound even when 2000 pg is loaded on column. This emphasizes the need for sufficient chromatographic resolution combined with HR-MS instrumentation, especially in non-targeted screening.

Figure 4
figure 4

Specific examples of complicating factors in automated non-targeted screening. (a) Example of incorrect isotopic distribution due to A + 2 peak interference for ricinine. (b) Interfering matrix peak co-eluting with amoxicillin at 140,000 resolution on the Orbitrap

4 Conclusion

When developing strategies for automated non-targeted screening with HR-MS instrumentation, knowledge of the experimental data quality is critical. The ability to achieve the 3 ppm/5% RIA thresholds will be affected by choice of instrument and signal intensity, although if ion signals are sufficient, both types of analyzers used here can meet the target values. The Orbitrap demonstrated sufficient mass accuracy for non-targeted analyses with respect to the Seven Golden Rules when correct lock mass calibration was applied, and should be implemented especially with long sample sequences (>1 d instrumental analysis). The Q-TOF mass accuracy without lock mass was outside of this 3 ppm threshold; thus, internal mass calibration and application of a lock mass calibration post-acquisition is recommended. Mass accuracy values obtained at insufficient monoisotopic peak intensity on a QTOF will require a larger m/z window, and these detected ions may not result in correct formula generation. If an analyte signal is insufficient to yield confidence in the generated formula, concentrating the sample to increase the intensity of the analyte of interest is necessary. Because the Orbitrap isotopic abundance ratio errors increase with greater resolution, the highest resolution setting was used for this investigation. Lower resolution settings could improve upon the experimental isotopic ratios, but at the risk of including interfering chemical species.

Peak capacity is also crucial when analyzing complex sample matrices; ion suppression can impede the ability to detect compounds, but can also affect measured mass accuracy and isotopic ratios if the monoisotopic peak intensity is insufficient. This is emphasized by the two specific examples shown in Figure 4. Minimizing effects due to ion suppression and interfering chemical species is essential for better accuracy of automated data analysis workflows, even with >60,000 resolving power. Sample preparation strategies that can reduce sample complexity without removing any potential compounds of interest are beneficial. Similarly, efficient, long chromatographic analyses (>30 min) will minimize co-eluting molecular species.

Data analysis strategies can now be developed with these instrument performance metrics in mind; however, designing a non-targeted workflow is not trivial. Vendor-specific software is an important consideration when choosing an optimal platform, especially in regard to user control over the data analysis workflow and the ability to analyze complex data sets in a high-throughput manner; data analysis is often the rate-limiting step in a non-targeted workflow. Additionally, the specific algorithms for formula generation are typically unknown, so it is difficult to determine how well/poor the software is performing without first interrogating the software with a well-characterized standard mixture or sample; this should be done with any unfamiliar software packages.

Based on the data shown here, several metrics are critical for correct formula generation and can be implemented in a data analysis workflow or in software design. Monoisotopic peak intensity that falls below a given threshold should be indicated or not considered. Similarly, while we did not observe saturation of the detector, other software packages do identify peaks where the ion count is too high, resulting in decreased mass accuracy (e.g., Agilent’s MassHunter). Denoting a split peak or, alternatively, a peak that has a decreased peak resolution could indicate if an interfering peak is present. Furthermore, bin widths for peak extraction can be reduced to the experimental mass accuracy of the instrument. The number of generated formulas can be limited to matches that are below 3 ppm of the experimental mass accuracy to increase identification throughput, which is advantageous if a large number of compounds of interest is present in the sample. The mass accuracy window could be decreased further if the instrument platform has been thoroughly interrogated for the expected mass accuracy, similarly to what we have demonstrated here.

Interrogating experimental mass accuracy and RIA with a standard mixture comprised of different compound classes that cover a large retention time, molecular weight, and concentration range creates a more complete view of instrument performance. The combined data provide an expectation of data quality when analyzing chemical species in complex sample matrices, which is critical in experimental and data analysis workflow design. Knowledge of the experimental data quality imparts criteria that should be incorporated into non-targeted data analysis workflows, which should increase the probability of generating the correct molecular formula.