1 Introduction

The proteomics field has witnessed tremendous growth over the last decade fueled by a combination of advancements in mass spectrometry instrumentation/technology [14], software [5], and bioinformatics [68]. Instruments are even faster, more sensitive, and accurate than versions introduced in the previous year. As a result, mass spectrometry has become an indispensable tool in biological studies with the number of protein identifications from a complex tissue lysate routinely in the several thousand [9, 10]. When applied to complex biological problems, mass spectrometry can provide critical information at the systems level by investigating overall-protein dynamics including site-specific post-translational modifications, relative quantification amongst different states, half-life, and evaluation of molecular interactions. However, this key information relies on the quality of the fundamental measurements made by LC MS/MS, including retention time, chromatographic peak width, mass measurement, and ion intensity of precursor and fragment ions. Data quality and robustness depend on a clear standard operating procedure and the systematic evaluation of the entire workflow. With the continued goals of biomarker discovery, clinical applications of established biomarkers, and integrating large multi-omic datasets (i.e., systems biology), more accessible tools are needed to monitor data quality throughout a proteomics experiment.

An experimental design should not only be statistically sound but should also incorporate a quality control procedure that consistently evaluates appropriate performance metrics. However, descriptions of performance procedures are rarely included in the method sections of manuscripts. This omission is probably due to the lack of both tools and knowledge of suitable quality metrics. Many investigators have begun to realize the importance of assessing proteomic data quality [11]. The driving force behind these realizations is the irreproducibility of peptide and protein identifications [12, 13]. While implementing robust quality control (QC) protocols will likely improve reproducibility within a laboratory, the limiting factor between laboratories will continue to be the absence of a standard operating procedure [12] and the stochastic sampling inherent of data dependent acquisition (DDA) [1315]. Regardless, for these extended biomarker discovery experiments and clinical applications it is essential that tools continue to be developed for monitoring instrument performance of a specific instrument and within a specific experiment.

National Institute of Standards and Technology (NIST), working in collaboration with National Cancer Institute-supported Clinical Proteomic Technology Assessment for Cancer (CP-TAC) Network, developed a protocol termed MSQC for evaluating 46 system performance metrics in a shotgun proteomic experiment [16]. This list is extensive and includes everything from basic peak widths in chromatography to monitoring dynamic sampling in DDA. However, the NIST platform has not seen widespread implementation likely attributable to its complexity and difficulty in incorporating different instrument and database search platforms [17]. QuaMeter performed a similar analysis as MSQC and computed 42 performance metrics for each raw data file [17]. QuaMeter did circumvent several of the original limitations of MSQC, including incorporating different vendor data formats via the ProteoWizard library [18]. While both MSQC and QuaMeter are rather comprehensive, the large number of metrics reported makes it difficult for the non-expert to objectively interpret the state of the experiment.

Pichler et al. developed a semi-automated approach for quality control termed simple automatic quality control (SIMPATIQCO) [19] where a designated server calculates a variety of metrics based on two types of quality control samples: (1) a sensitivity test, and (2) a performance and speed test. SIMPATIQCO provides an interface in which one can store QC results and longitudinally track the performance of instruments; however, this software is currently limited to data acquired on Thermo Scientific instruments. Due to the dependency of the majority of existing QC metrics on database searching, most QC tools are currently used for assessment of system performance prior to and/or after an experiment, but not systematically throughout a study. Recently, Abbatiello et al. presented a system suitability protocol (SSP) for evaluating liquid chromatography stable isotope dilution selected reaction monitoring mass spectrometry (LC-SID-SRM-MS) workflows [20]. Due to the nature of targeted proteomics using SRM, this QC method does not rely on peptide identification. However, there remains a need for a simple but powerful vender-neutral tool that quickly assesses system performance in either a targeted or discovery experiment that will guide the user to problematic areas, such that valuable instrument time and sample are not wasted on a suboptimally performing system.

Statistical process control is a proven set of tools that provides an objective method to monitor the transformation of a set of inputs into desired outputs (i.e., a process). It was first routinely implemented in the United States in the automobile industry in the early 1980s, and principles from SPC are used in industries ranging from manufacturing, sales, marketing, finance, technology sectors, and clinical diagnostics [21, 22]. Its power as a quality control procedure lies in its primary focus on early detection of a process performing outside defined thresholds and the subsequent determination of the cause of that variation such that the process is continually improved upon. SPC aims to separate random error or error that users have little control over (referred to as common cause variation) from error that can be assigned (referred to as special cause variation). If the benefit to cost ratio is high enough, special cause variation can be examined and improved upon such that its effect on product quality is eliminated or at least mitigated. SPC has seen limited use in proteomic studies; however, a recent article by Bramwell et al. summarizes the power of SPC as a quality control procedure and its application to a 2-D DIGE proteomics experiment [23].

The staple in SPC is the control chart described by Shewhart [24, 25] in the mid-1920s (also referred to as the Levey-Jennings [26] chart in the clinical regime). The main purpose of the control chart is to track outputs from a process over time and to determine departure from what is considered “statistical control.” The basic principle for the control chart [24] is derived from simple statistics in which the distribution of averages from samples tends toward normal regardless of the distribution of the original population. An efficient method to summarize the data in the control chart across several metrics is via a Pareto chart. The Pareto principle, first noted by Vilfredo Pareto and first implemented in quality management by Joseph Juran [27] in the 1940s, states that most problems (~80 %) arise from only a few (~20 %) of the possible causes. This idea can be applied to proteomics in that it is unlikely that all problems (e.g., retention time drift, mass analyzer out of calibration, sensitivity issues) will occur simultaneously. The Pareto chart is a combination of a bar and line graph, which displays the number of nonconformers from each metric category along with its cumulative percentage. It provides the user instant feedback on which metrics are more variable and may require immediate attention or optimization.

To help the proteomics community implement principles of SPC in their own laboratory, we developed a tool called Statistical Process Control in Proteomics (SProCoP). The LC-MS process is systematically evaluated by a QC standard and then the data is imported into the freely available Skyline software [5]. This source code is written in the R statistical language, runs directly from Skyline, takes user input, and produces a control chart matrix, box plots, and a Pareto chart. It monitors the performance of five identification-free metrics, including signal intensity, mass measurement accuracy (MMA), retention time reproducibility, FWHM, and peak symmetry. The utility of this QC tool to detect systematic changes in a proteomics experiment via LC MS/MS is shown in two case studies.

2 Methods

2.1 Materials

Formic acid, ammonium bicarbonate, DTT, and iodoacetamide were obtained from Sigma Aldrich (St. Louis, MO, USA). Proteomics grade trypsin was purchased from Promega (Madison, WI, USA). HPLC grade acetonitrile was from Burdick and Jackson (Muskegon, MI, USA). The six protein digest of bovine proteins was from Bruker-Michrom (Auburn, CA, USA) and used throughout.

2.2 SProCoP

The tool can be downloaded from the following website: (http://proteome.gs.washington.edu/software/skyline/tools/sprocop.html).

This zipped file contains everything that is needed to run SProCoP from the Skyline interface. A detailed tutorial including step by step instructions is available to guide the user through installing from the website. Five quality control metrics are monitored and include (1) Targeted peak areas (from MS1 filtering [28], selected reaction monitoring, or targeted MS/MS [29]) reported by Skyline; (2) mass measurement accuracy for high resolving power instruments of targeted peptides; (3) full width at half maximum of targeted peptides; (4) peak asymmetry of targeted peptides; and (5) retention time of targeted peptides. Further description of how each metric is calculated can be found in the Supplemental Information. Three different charts are used and include control charts, box plots (high resolution data), and Pareto charts.

2.3 Detection of Gross Errors in the System

Quality controls were monitored within the context of a larger study assessing the time course trypsin digestion of proteins reconstituted from dried blood spots. All experimental data were collected on a Q-Exactive tandem mass spectrometer (ThermoFisher, Bremen, Germany) coupled to a nanoAcquity UPLC (Waters, Milford, MA, USA). The mass spectrometer settings and the targeted peptides used for these experiments are summarized in Supplemental Figure 1. Full scan MS data were collected in a cycle with eight targeted MS/MS scans. Quality controls were monitored at an interval of every four experimental injections. For each QC experiment, a 3 μL aliquot of the diluted bovine protein digest (50 fmol/μL) was loaded from a 5 μL sample loop onto a 40 × 0.1 mm polyimide coated fused silica trap column (Polymicro Technologies, Phoenix, AZ, USA) packed with Jupiter 4 μm Proteo 90Ǻ reverse phase beads (Phenomenex, Torrance, CA, USA) at a flow rate of 1 μL/min for 8 min. Peptides were then resolved on a 150 × 0.075 mm polyimide coated fused silica column (Polymicro Technologies, Phoenix, AZ, USA) packed with ReproSil-Pur 3 μm C18-AQ 120Ǻ reverse phase beads (Dr. Maisch GmbH, Ammerbuch-Entringen, Germany) using a 30-min linear gradient from 2 % acetonitrile in 0.1 % formic acid to 40 % acetonitrile in 0.1 % formic acid at a flow rate of 300 nL/min. The initial gradient was followed by steeper 5- min linear gradient from 40 % acetonitrile in 0.1 % formic acid to 60 % acetonitrile in 0.1 % formic acid also at 300 nL/min. The column was then washed for 5 min at 95 % acetonitrile in 0.1 % formic acid and finally re-equilibrated to initial conditions at 500 nL/min.

2.4 Detection of Known Changes to the System

Data from a previous published report was used to evaluate the sensitivity of these analyses to detect different disturbances to the system (i.e., a positive control). These experiments are described in detail elsewhere [30]. In brief, the reproducibility of changing the trap cartridge was evaluated using a novel modular LC-MS source design in which the trap, column, and emitter can be replaced independent of one another. The experiment explored the reproducibility of trap cartridge replacement in which five QC standards were analyzed on similar length traps n = 3 (4.0 ± 0.1 cm). Thresholds for the various QC metrics were determined from the first set of five QC standards analyzed on the first trap. The purpose was to determine the sensitivity of these statistical analyses to display and detect a known change to the system (i.e., positive control).

3 Results and Discussion

Figure 1 summarizes the overall process of using SPC in a proteomics experiment. In LC-MS, as with any process, inputs will affect the measured output. As illustrated in Figure 1a, one can separate these input parameters into three categories: (1) electrospray ionization inputs; (2) liquid chromatography inputs; and (3) mass spectrometry inputs. Unlike traditional industries that use SPC, these inputs are more variable and often user-, laboratory-, and/or experiment-specific. However, for any one set of inputs, the measured outputs throughout an experiment should be within statistical control. These data are then automatically imported into Skyline where the QC metrics for various targeted peptides are extracted. From the Skyline interface, the SProCoP tool runs and displays the metrics for each individual peptide in a control chart matrix. The control chart allows visual determination of the point where the process goes outside the empirically-defined thresholds. A detailed explanation of the control chart implemented within SProCoP is described in Supplemental Figure 2. In addition, a screen shot of the exact output from SProCoP the user can expect is available in Supplemental Figure 3.

Figure 1
figure 1

(a) A diagram summarizing the workflow for using SPC in a proteomics experiment. QC data is imported into Skyline where a control chart matrix, Pareto analysis, and box plots are output. These charts illustrate a representative output from SProCoP when using a high resolving power mass spectrometer. The box plots of MMA would not be displayed if using a low RP mass spectrometer. (b) The process is systematically evaluated approximately every 4–5 injections. (c) Scan cycle consists of a MS1 full scan followed by a series of targeted MS/MS or SRM scans

Defining the state of statistical control (i.e., thresholds) is not straight-forward and is determined empirically as shown in Figure 1b. In practice, we will often use a reference set at the beginning of each experiment that consists of at least three to five standards prior and three or more standards after the start of an experiment to establish the thresholds. Since there is no ideal number of standards, the script allows the user to specify the number of standards (n ≥ 3) from which to determine the thresholds. The LC-MS process is then systematically evaluated every fourth or fifth injection with the same standard used to establish the thresholds. These QC standards are then imported into Skyline. The scan cycle consists of a single MS1 scan followed by targeted MS/MS scans (or SRM scans) monitoring a user-determined number of peptides distributed across the gradient (Figure 1c). It is worth noting that peptide stability is critical for the successful implementation of this QC procedure. For this reason, targeting peptides with amino acids that are easily modified in vitro (e.g., methionine, asparagine, glutamine) could lead to false positives and should be avoided. It is recommended to perform a thorough investigation with the user’s standard to ensure that the peptides chosen are stable across several days. A list of peptides from the bovine protein digest standard routinely used in the laboratory for QC analysis is available in Supplemental Figure 1.

To determine a nonconformer, a threshold of ±3 standard deviations (SD) from the mean is used for all metrics except mass measurement accuracy (MMA), which is user-specified. The width of these thresholds determines the run length, defined as the maximum number of runs possible without the detection of a nonconformer by chance. For example, a threshold of ±2 SD would yield an average run length of 20 (1 in 20 runs would be a false-positive). Others have used a cutoff of 1 SD below the mean for QC metrics (e.g., number of phosphopeptide identifications) [31]. This threshold is overly conservative, considering by chance ~16 % (~one out of six) of measurements should be below 1σ. In implementation of the SPC in the context presented here, which requires systematic evaluation, a 1 or 2 SD threshold would yield a large number of failures that are due simply to random chance. It is worth noting that while we are decreasing the false positive rate (i.e., type 1 error) with a ±3 SD threshold (~1 in 370 runs), we are also theoretically increasing the false negative rate (type 2 error).

3.1 Case Study 1—Detecting Gross Errors

Figure 2a shows a bar graph of the integrated peaks areas from a targeted MS/MS experiment of a peptide across the study. As is often the case, the start of the LC MS/MS experiment is in statistical control, and any variation in the peak areas can be attributed to common causes. These subsets of runs for the different metrics are then used to establish the empirical thresholds for the remainder of the study as described vide supra. The bar graph does show a systematic shift in peak areas; however, it is difficult to quantify this shift in a statistical manner. The control chart of this data (Figure 2b) provides a nice visual determination of the systematic shift to lower peak areas and the point in time where the peptide abundance is repeatedly out of statistical control (i.e., outliers > ±3 SD). The other targeted peptides show a similar systematic shift (Figure 2c). The first 10 quality control runs were used to establish the thresholds, with the blue line representing the average and green, brown, and red representing ±1 SD, ±2 SD, ±3 SD from the mean, respectively. It is worth noting that unless metric(s) are considerably extreme (e.g., very low or no signal observed), less emphasis should be placed on single random occurrences for a specific peptide/metric than systematic shifts and repeatable observations of individual metrics for all or multiple peptides performing outside the empirical threshold. Supplemental Figure 4 illustrates this point further.

Figure 2
figure 2

(a) A bar graph of the integrated peak areas of a representative peptide (FFVAPFPEVFGK) across the study. (b) Use of a control chart to visualize and quantify the systematic shift of the peptide abundance to outside the empirically defined threshold. (c) Other peptides that were monitored show a similar trend. Black arrow marks the first QC that was run after the bath gas was replenished

At QC standard #31, the operator noticed that the nitrogen C-trap bath gas on the Q-Exactive mass spectrometer was low; the gas was replenished and seven more standards (QC# 32–38) were run to finish the experiment. The arrow in Figure 2b and c notes the first standard that was run with the new nitrogen supply. The peak areas in QC# 32 for all peptides were significantly greater because of the replenished nitrogen supply. The abundance of six of the seven peptides came back within the empirically-defined thresholds (±3 SD). Interestingly, the signal from most of the peptides did not increase enough to return within 1 SD from the original mean. In fact, one peptide (YSTDVSVDEVK) did not increase enough to return within the 3 SD threshold. It is hypothesized that this failure of the signal for some peptides to return to the expected intensity is a result of other confounding factors (dirty emitter, etc.) that affect peptide abundance aside from the identified principle cause of low bath gas. It is important to note that although the bath gas was low, peptide signal was still observed and data collected. This point emphasizes the importance of systematically evaluating the performance of the LC MS/MS throughout an experiment with a known concentration of standard. If performance had been based on peptide spectral matches before and after the experiment, it is likely that this problem would not have been corrected as quickly.

3.2 Case Study 2—Detecting Known Changes to the System

Figure 3 describes the effects of changing the trap cartridge on the CorConnex source [30] on the identification-free figures of merit. For simplicity, only the peak areas and retention time control charts from five different targeted peptides are shown. Thresholds were determined from the QC #1–5 run on the first trap. The blue line represents the mean of the first five runs, the green line shows ±1 SD, brown line ±2 SD, and bold red line is ±3 SD from the mean. For four of the five peptides monitored, there was negligible effect from changing the trap cartridge on peak areas. The subsequent runs (QC# 6–15) were within ±3 SD of the mean. However, the peak areas for the peptide R.GASIVEDK.L on the third trap (QC# 11–15) were well above the 3 SD threshold. This peptide is extremely hydrophilic and the third trap was apparently more efficient at trapping the peptide. However, retention time control charts indicate a significant shift towards later elution in all peptides with the third trap. Finally the Pareto chart provides the number of nonconformers (±3 SD) and cumulative percentage as a function of the four different metrics. One can easily identify that the major cause of variation in the system is related to the retention time reproducibility, as 79 % of the nonconformers (> ±3 SD) were peptides whose retention time shifted significantly from the first trap. Peak areas, FWHM, and peak symmetry contributed 13 %, 5 %, and 3 %, of the nonconformers, respectively.

Figure 3
figure 3

Control charts for peak areas (first row) and retention time (second row) for five targeted peptides in an experiment where the traps (n = 3) were changed every 5 QC runs. Thresholds were established on the first trap (n = 5 QC standards). The majority of peak areas were within the thresholds (±3 SD); however, the abundance of the GASIVDK peptide increased significantly using the final trap. The retention times shifted significantly for all peptides and is easily seen in the control charts and summarized in the Pareto chart. The majority of nonconformers were related to retention time reproducibility (80 %)

4 Conclusions

We have reported a vender-neutral QC tool termed Statistical Process Control in Proteomics (SProCoP) that uses SPC techniques for monitoring the performance of an LC MS/MS experiment and allows evaluation of both targeted and discovery experiments. The tool uses freely available software and can be implemented in any lab that has access to a windows computer running Skyline and R. Future studies will focus on the addition of more identification-free metrics; for example, variations in total ion current could be used to monitor spray stability. Also, we would like to implement a set of multi-rules, analogous to the Westgard rules [21, 32, 33] used in the clinical regime, to determine when an experiment has statistically failed. These endeavors will be met with key challenges. The huge variation between user analytical expertise, instrumentation, experimental set-ups, experimental goals, and preconceived notions on acceptable data quality will present problems. For example, more variation in metrics may be acceptable if one is performing a referenced design experiment (e.g., SILAC, PC-ID MS) than a label-free spectral counting or peak intensity experiment. Regardless, as proteomics continues to find its way into the clinical regime, we believe it is important to have the capabilities to readily implement these charts and continue as a community to explore strategies for benchmarking data quality throughout an experiment, an instrument, intra-laboratory, and inter-laboratory. Although the emphasis of this tool has been on real time monitoring of an experiment, it can also be used retrospectively on data, provided the user ran some type of peptide standard in a systematic fashion and performed MS1 profiling, targeted MS/MS, or SRM of targeted peptides. Currently, this implementation is experiment-specific, but these analyses will be supported by our online repository Panorama [34] and used to track all of the QC standards from all experiments as a function of user and instrument. This combined approach will provide even further power to monitor the performance of an instrument over days, weeks, and years.