Introduction

A number of studies have demonstrated that results for the determination of serum total 25-hydroxyvitamin D (25(OH)D), which is defined as the sum of 25-hydroxyvitamin D2 (25(OH)D2) and 25-hydroxyvitamin D3 (25(OH)D3) and is the primary marker of vitamin D status, may vary significantly depending on the assay used, i.e., liquid chromatography – tandem mass spectrometry (LC-MS/MS) or various ligand binding assays [1,2,3,4]. The impact on decision making due to these variations in assay performance has been discussed [5, 6] and recent reviews have described the analytical challenges and difficulties in assessing vitamin D status [4, 7,8,9,10,11,12,13,14,15]. To assist in standardization of measurements of 25(OH)D, the U.S. National Institutes of Health, Office of Dietary Supplements (NIH-ODS) organized the Vitamin D Standardization Program (VDSP) in 2010 [16] as a collaboration among the U.S. National Institute of Standards and Technology (NIST) [17], the U.S. Centers for Disease Control and Prevention (CDC), national survey laboratories in several countries, and vitamin D researchers worldwide [16]. The VDSP has established a reference measurement system that includes reference measurement procedures (RMPs) at NIST [18, 19], Ghent University [20], and CDC [21]; NIST Standard Reference Materials® (SRMs) [22,23,24]; the CDC Vitamin D Standardization – Certification Program (VDSCP) [25]; and collaborations with two accuracy-based proficiency testing/external quality assessment (PT/EQA) programs, i.e., the U.S. College of American Pathologists (CAP) accuracy-based vitamin D (ABVD) program [26] and the U.K.-based Vitamin D External Quality Assessment Scheme (DEQAS) [27, 28]. The VDSP also established assay performance criteria for measurement variability and bias for the determination of serum total 25(OH)D , i.e., coefficient of variation (CV) ≤ 10% and mean % bias ≤ |±5%| [29, 30].

In 2011, the VDSP coordinated an interlaboratory comparison study to assess measurement variability and bias for the two primary methodological approaches to measure 25(OH)D, i.e., ligand binding assays and LC-MS/MS assays, and to evaluate the commutability of SRMs and PT/EQA materials. The results of these initial VDSP studies, denoted as Intercomparison Study 1 and Commutability Study 1, were reported by Wise et al. [3] and Phinney et al. [31], respectively. Since these intercomparison/commutability studies, a number of studies have evaluated various ligand binding assays for the determination of serum total 25(OH)D and often compared results to an isotope dilution (ID) LC-MS/MS method [32,33,34,35,36,37,38,39,40,41,42]. However, other than the first VDSP Intercomparison Study, no studies have compared or assessed the performance of a significant number of LC-MS/MS assays for the determination of the two metabolites, 25(OH)D2 and 25(OH)D3, which ultimately determine total 25(OH)D.

The assessment of measurement performance in VDSP Intercomparison Study 1 was intended to provide a baseline for evaluation of future 25(OH)D standardization activities. Unfortunately, the interpretation of results from the first interlaboratory comparison was limited since some participants in the study did not agree that their results could be reported. A second round of VDSP intercomparison and commutability studies, denoted as Intercomparison Study 2 and Commutability Study 2, was undertaken with participating laboratories agreeing to laboratory and assay identification in all publications. The goals of VDSP Intercomparison Study 2 were as follows: (1) to assess the measurement performance of commonly used assays for determination of serum total 25(OH)D; (2) to identify any specific issues in the measurement of serum total 25(OH)D; (3) to assess potential contributions of 3-epi-25-hydroxyvitamin D3 [3-epi-25(OH)D3] and 24R,25-dihydroxyvitamin D3 [24R,25-(OH)2D3], as well as 25(OH)D2 and 25(OH)D3, to the assay measurement of serum total 25(OH)D; and (4) to determine whether there were any significant improvements in assay performance since 2011. Intercomparison Study 2 consisted of the combination of results from two studies: (1) an intralaboratory study evaluating 13 assays that focused primarily on assay variability and bias in a single-laboratory setting, and (2) a multi-laboratory commutability study among 28 laboratories using 34 assays to assess the commutability of SRMs and PT/EQA samples. In total, Intercomparison Study 2 assessed the comparability and bias for 47 assays including 13 unique ligand binding assays and 15 LC-MS/MS assays performed in 28 laboratories. In this paper, we describe Part 1 of Intercomparison Study 2 including the assignment of target values to the set of single-donor samples and the evaluation and comparison of the different LC-MS/MS assays. Part 2 of this study describes the results for the ligand binding assays and is reported in this same journal issue [43]. The single-laboratory assessment of assay measurement variability and bias [44] and the commutability assessment of SRMs and PT/EQA samples [45] are reported elsewhere.

For Intercomparison Study 2, a set of 50 single-donor serum samples was characterized for the content of 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and 24R,25(OH)2D3. To identify measurement challenges for determination of serum total 25(OH)D, 8 of the 50 single-donor samples had high concentrations of 25(OH)D2 (> 30 nmol/L). This interlaboratory comparison study is the first extensive comparison of the performance of different LC-MS/MS assays using a sample set characterized for both 3-epi-25(OH)D3 and 24R,25(OH)2D3 as well as 25(OH)D2 and 25(OH)D3.

Methods

Measurands

The measurand for the intercomparison study was serum total 25(OH)D in nmol/L, which is defined as the sum of the concentrations of 25(OH)D2 and 25(OH)D3, without the inclusion of the concentration of 3-epi-25(OH)D3. The concentrations of 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and 24R,25(OH)2D3 were determined in the 50 single-donor samples by NIST. Several participating laboratories reported results for 3-epi-25(OH)D3 in addition to 25(OH)D2 and 25(OH)D3.

Intercomparison Study 2: Coordination and responsibilities

Intercomparison Study 2 was co-designed and coordinated by NIST and NIH-ODS through the VDSP, including acquisition and distribution of 50 single-donor serum samples, recruitment of participating laboratories, and compilation of the results [46]. Samples were distributed to the participating laboratories in November 2016 and results were received in January/February 2017. NIST was responsible for analysis of the 50 single-donor serum samples to assign target values for 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and 24R,25(OH)2D3. NIH-ODS and VDSP LLC were responsible for conducting the data analyses.

Single-donor serum samples

The single-donor serum samples used for Intercomparison Study 2 were procured from Solomon Park Research Laboratories (Seattle, WA). Serum samples were obtained from 50 healthy donors (i.e., no known disease states, pregnant, or renal failure patients) containing only endogenous vitamin D metabolites which were requested with a distribution of serum total 25(OH)D concentrations across the clinically relevant range of 15 to 150 nmol/L [47, 48]. A number of serum samples were requested with high levels of 25(OH)D2 from donors who were taking vitamin D2 supplements. To achieve the requested distribution of 25(OH)D concentrations, potential single-donor serum samples were screened using an LC-MS/MS method at the University of Washington, a subcontractor to Solomon Park Research Laboratories. The results of these screening analyses were provided to NIST and were used as target estimates using the RMPs. Solomon Park Research Laboratories acquired 200 mL of serum from each donor using the Clinical Laboratory Standard Institute (CLSI) C37-A protocol [49, 50]. Each 200-mL single-donor sample was subsampled into 400 vials each containing 0.5 mL of serum. Samples were shipped frozen on dry ice to NIST in four shipments from August through November 2015 and stored at -80 °C.

Results used for Intercomparison Study 2

Laboratories participating in the study agreed prior to receiving samples that their results, including the identification of laboratory and assay platform used, would be included, as appropriate, in publications reporting the results. Each participant received a set of the 50 single-donor serum samples with a protocol for the analyses. Participants were to analyze the 50 single-donor samples (DS) in duplicate on one day using a specified run order of first analyzing the samples in ascending (DS01-DS50) and then descending order (DS50-DS01). The SRM, CAP ABVD, and DEQAS samples were to be interspersed within these runs [45] for the commutability study. Participants were requested to use their routine laboratory operation procedures with normal internal QC criteria.

Participants were requested to provide their results using a data reporting template provided by NIST. Results for 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and serum total 25(OH)D were requested in units of nanomoles per liter with three significant figures. The following information was also requested: (1) instrumentation description, (2) method performance characteristics such as limit of detection and measurement range, and (3) lot numbers of reagent(s), calibrators, and controls used. Additional details specific to Commutability Study 2 and the analysis of the SRMs and PT/EQA samples are reported elsewhere [45]. Whereas the commutability assessment used only the first replicate measurement, Intercomparison Study 2 utilized the mean of the two replicates for all assessments of assay performance. One additional set of LC-MS/MS results were available from the single-laboratory study [44].

NIST value assignment of 50 single-donor serum samples

Mass fractions (ng/g) of 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and 24R,25(OH)2D3 were determined in each of the 50 single-donor serum samples using the ID LC-MS/MS-based RMPs as described by Tai and coworkers for 25(OH)D2 [18], 25(OH)D3 [18], and 24R,25(OH)2D3 [19]. Although not included in the NIST RMPs, 3-epi-25(OH)D3 was determined using a similar ID LC-MS/MS approach using isotopically labeled 3-epi-25(OH)D3 as an internal standard. The approach for value assignment using the NIST RMPs has been described for the previous VDSP interlaboratory and commutability studies [31] and for providing an accuracy basis for the DEQAS program [28]. The details of the analyses are provided in the Supplementary Information (ESM).

The 50 single-donor serum samples and the DEQAS, CAP ABVD, and SRM samples were distributed among 15 sample sets (a total of 79 samples). SRM 972a Level 1 to Level 4 [23] and SRM 2973 [24] were analyzed as controls as well as unknown samples in the commutability study. Each of the 15 sample sets was analyzed separately; four different analysts at NIST were involved in the analyses. Three of the analysts performed the analyses for 25(OH)D2 and 25(OH)D3; the fourth analyst analyzed all sets for determination of 3-epi-25(OH)D3 and 24R,25(OH)2D3. Each of the samples in a set was prepared in duplicate and analyzed in duplicate by ID LC-MS/MS. The limit of quantification for 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and 24R,25(OH)2D3 was 0.5 ng/g. The results were determined as mass fraction (ng/g), converted to mass concentration (ng/mL) using a universal serum sample density value of 1.02 g/g for each sample, and finally converted to molar concentration (nmol/L) using the appropriate molecular mass ratios of 25(OH)D2 (2.42), 25(OH)D3 (2.50), 3-epi-25(OH)D3 (2.50), and 24R,25(OH)2D3 (2.40). The results for 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and 24R,25(OH)2D3 as determined in nanograms per gram and converted to nanomoles per liter are provided in Tables S1 and S2 (ESM). A summary of the results in nanomoles per liter is provided in Table 1. The results of the analyses of SRM 972a (L1-L4) and SRM 2973, used as controls, are summarized in Table S3 for the determination of 25(OH)D2 and 25(OH)D3 and in Table S4 for determination of 3-epi-25(OH)D3 and 24R,25(OH)2D3. All results for the analysis of the SRMs as control samples were within the uncertainties of the certified and reference values assigned to the SRMs as shown in Tables S3 and S4.

Table 1 Summary of results for determination of 25(OH)D2, 25(OH)D3, total 25(OH)D, 3-epi-25(OH)D3, and 24R,25(OH)2D3 in 50 single-donor serum samples

LC-MS/MS assays evaluated in study

For Intercomparison Study 2, results from 15 LC-MS/MS assays from 14 laboratories were evaluated as summarized in Table 2. Two sets of results were available using the Chromsystems assay (including the assay manufacturer’s laboratory); all other LC-MS/MS assays were specific to the laboratory with instrument and LC column details provided in Table 2. The reported calibrators and control materials used by the laboratories are summarized in Table S5 (ESM).

Table 2 Laboratories participating in VDSP Intercomparison Study 2 using LC-MS/MS assays

Data analysis

Mean bias (%) was determined using the following equation:

$$ Mean\ Bias\ \left(\%\right)=\frac{\sum \limits_{j=1}^{j=50}\sum \limits_{i=1}^{i=2}\left[\left( XTest\ Lab- XNIST\right)/ NIST\right]\ x\ 100}{N_{Total}} $$

where j = donor samples, i = sample replicates, NIST represents the assigned value, and NTotal = the total number of assays performing the measurements. Calculations of % bias were performed using Stata software (College Station, TX). Calculations of the Ordinary Deming regression and 95% prediction interval were determined using Analyse-it, a statistical analysis add-in for Microsoft Excel (Analyse-it Software, Leeds, UK). The multivariable regression analysis was accomplished using Analyse-it. The Deming regression was performed as described in Camara et al. [45] for the assessment of commutability using only the first replicate; however, in this study, we used the mean of the two replicate measurements. The Deming regression calculation requires input of the ratio of the variances of the NIST RMP and the test assay (X/Y = λ). For consistency in the assessment of all assays, we used λ = 0.1, which was the mean of individual λ values for the ligand binding assays (λ = 0.07) and for the LC-MS/MS assays (λ = 0.15) [45].

Results and discussion

NIST assignment of target values for single-donor samples

Target concentrations for 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and 24R,25(OH)2D3 were assigned for the 50 single-donor serum samples using the RMPs [18, 19] for 25(OH)D2, 25(OH)D3, and 24R,25(OH)2D3 and a similar ID LC-MS/MS method for the determination of 3-epi-25(OH)D3. The mean, standard deviation (SD), and CV of the measurements for the 50 samples are summarized in Table 1 for the molar concentrations (nmol/L) of the four vitamin D metabolites. The distribution of 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and 24R,25(OH)2D3 concentrations in the 50 single-donor serum samples, arranged from low to high concentration of 25(OH)D, is shown in Fig. 1. The concentration of serum total 25(OH)D ranged from 16 to 148 nmol/L, with 25(OH)D3 ranging from 9 to 141 nmol/L. As shown in Fig. 1 and Table 1, most of the samples had concentrations of 25(OH)D2 between 0.3 to 5.0 nmol/L. However, 8 of the 50 samples had 25(OH)D2 concentrations of > 30 nmol/L ranging from 32 to 137 nmol/L. Although levels of 25(OH)D2 > 30 nmol/L are rare in the healthy US population [47, 48] except where individuals supplement with ergocalciferol, these higher concentration 25(OH)D2 samples provided an opportunity to extend the range in assessing LC-MS/MS assay performance. Concentrations of 3-epi-25(OH)D3 ranged from 0.58 to 9.80 nmol/L with 10 of the 50 samples having concentrations of > 7 nmol/L. Concentrations of 24R,25(OH)2D3 ranged from 0.48 to 15.2 nmol/L with only 5 of the 50 samples having concentrations of > 10 nmol/L. The relationship between the concentrations of 3-epi-25(OH)D3 and 24R,25(OH)2D3 with the concentration of 25(OH)D3 is illustrated in Fig. S1 (see ESM). Regression analysis indicates that the correlation with 25(OH)D3 concentration is slightly greater for the 24R,25(OH)2D3 than for the 3-epi-25(OH)D3 (R2 values of 0.79 vs. 0.67, respectively). A similar regression plot for 3-epi-25(OH)D3 in the 50 single-donor samples used in the first intercomparison study had a slope of 0.087 and R2 = 0.60 [51]. A correlation for 24R,25(OH)2D3 in a set of 30 DEQAS samples with values assigned by NIST plus SRM 972a and SRM 2973 had an R2 value of 0.97 and a slope of 0.100 [52].

Fig. 1
figure 1

Distribution of single-donor serum samples from lowest to highest molar concentration (nmol/L) of serum total 25(OH)D. Orange bar represents 25(OH)D3 molar concentration, purple bar represents 25(OH)D2 molar concentration, yellow bar represents the 3-epi-25(OH)D3 molar concentration, and green bar represents the 24R,25(OH)2D3 molar concentration, all in nmol/L. Note that the total serum 25(OH)D includes only 25(OH)D2 and 25(OH)D3 (the orange plus purple envelope) and does not include the 3-epi-25(OH)D3 or 24R,25(OH)2D3

Assay performance assessments

For Intercomparison Study 2, the performance of the LC-MS/MS assays was evaluated and compared using the mean of two replicates for the 14 LC-MS/MS assays from the commutability study and the mean of four replicates for the one LC-MS/MS assay from the single-laboratory study (total of 15 assays). The results for the analysis of the 50 single-donor samples reported by all laboratories in Intercomparison Study 2 are provided as two Excel files in the ESM identified as Data VDSP Commutability Study 2 and VDSP Intralaboratory Study. Performance characteristics of the different LC-MS/MS assays were compared for the 50 single-donor samples and for the subset of the 42 single-donor samples remaining after omitting the 8 samples with concentrations of 25(OH)D2 > 30 nmol/L.

Descriptive statistics for assays

The mean, SD, minima, and maxima for total 25(OH)D for the 50 single-donor samples and for the 42-sample subset are summarized in Table S6 (ESM). Similar descriptive statistics for the set of eight high 25(OH)D2 concentration samples are provided in Table S7. The mean value for total 25(OH)D for the analysis of the 50 single-donor samples for each LC-MS/MS assay ranged from 74.4 to 88.6 nmol/L, whereas the mean value was 77.1 nmol/L with SD of 35.8 nmol/L using the NIST RMPs. For the eight samples with high 25(OH)D2 concentrations, the mean 25(OH)D concentration for the LC-MS/MS assays ranged from 110 to 138 nmol/L, with a mean of 120.9 nmol/L for the 15 LC-MS/MS assays (Table S7) compared to 97.3 nmol/L (SD = 37.7) for the NIST RMP., i.e., all are biased high.

Regression analysis

Using the mean of replicates for each of the single-donor samples and using the NIST-assigned values for total 25(OH)D, the linear regression and 95% prediction intervals (PI) were calculated for each assay and results are summarized in Table 3 for both the 50 single-donor samples and for the 42-sample subset. Regression analysis plots for four LC-MS/MS methods comparing the 50- and 42-sample sets are shown in Figs. 2 and 3. Similar plots for the remaining laboratories are included in the ESM as Fig. S2 through S7. The plots in Fig. 2 and 3 illustrate the changes in slope and width of the PI when removing the samples with high concentrations of 25(OH)D2. The plot for Lab 36 (Fig. 3C and D) shows an LC-MS/MS assay that changes minimally, whereas the plot for Lab 16 (Fig. 2C and D) illustrates the largest change in PI for the LC-MS/MS assays evaluated.

Table 3 Ordinary Deming regression analysis based on the mean of replicates for LC-MS/MS assays
Fig. 2
figure 2

Results for determination of serum total 25(OH)D in single-donor samples versus the NIST assigned target value for Lab 7 (Chromsystems Instruments & Chemicals) (A and B) and Lab 16 (Endoceutics Inc.) (C and D) using LC-MS/MS assays

Fig. 3
figure 3

Results for determination of serum total 25(OH)D in single-donor samples versus the NIST assigned target value for Lab 28 (Penn State University) (A and B) and Lab 36 (University of Washington) (C and D) using LC-MS/MS assays

For the LC-MS/MS assays, the slopes from the regression analysis for the 50-sample set are within ±10% of 1.00 (with three exceptions, Labs 7, 22, and 28 at 1.14, 1.16, and 1.22, respectively) and changes in slope between the 50-sample and 42-sample sets are minimal (< 6%). Compared to the ligand binding assays [43], the slopes for the LC-MS/MS assays span a narrower range with significantly higher R2 values. The widths of the 95% prediction intervals for the 50-sample set are much narrower for the LC-MS/MS assays (14.6 to 28.1 nmol/L) compared with the ligand binding assays (17.6 to 166 nmol/L) as reported in Part 2 of this study [43]. The % change in the 95% PI width from the 50 samples to the 42-sample subset for the LC-MS/MS assays ranged from −5.3 to 41% compared with −13 to 72% for the ligand binding assays [43].

Bias analysis

The % bias for each assay was determined by comparison of the assay results with the NIST target values for each of the 50 single-donor samples. The mean % bias results are summarized in Table 4 for both the 50-sample and 42-sample sets and in Table S8 (see ESM) for the eight high 25(OH)D2 concentration samples. The mean % bias for only 9 of the 15 assays (only 8 for the 42-sample set) is within ±5%. There does appear to be a greater positive bias for the samples with high 25(OH)D2 concentrations (see Table S8 in ESM).

Table 4 Percent bias for total 25(OH)D for LC-MS/MS assays based on the mean of replicates

Plots of mean % bias for the determination of serum total 25(OH)D in the 50 single-donor samples compared with NIST target values are shown in Fig. 4 for four LC-MS/MS assays. Similar plots for the remaining LC-MS/MS assays are provided as Fig. S8 through S10 (ESM). The plots in Fig. 4A (Lab 37) and 4B (Lab 16) illustrate LC-MS/MS assays that meet the VDSP criterion of mean % bias ≤ |±5%|. However, both assays do have high positive bias for the eight samples with high concentrations of 25(OH)D2 (see Table S8). The results in Fig. 4C (Lab 7) and Fig. 4D (Lab 22) illustrate assays that have consistent positive bias, and the bias for the majority of the individual samples is > |±5%|. The results for these two assays are representative of additional LC-MS/MS assays with significant positive bias including Lab 6 (ESM Fig. S8A) and Lab 28 (ESM Fig. S9C). Three additional assays have sufficient positive bias to fail the VDSP criterion of ≤ |±5%| including Lab 12 (ESM Fig. S8C), Lab 36 (ESM Fig. S10A), and Lab 38 (ESM Fig. S10B). The positive bias may indicate that the assay does not differentiate between 3-epi-25(OH)D3 and 25(OH)D3. Only Lab 17 (ESM Fig. S8D) exhibited a consistent negative bias, particularly for the samples with concentrations of total 25(OH)D less than 70 nmol/L.

Fig. 4
figure 4

Mean % bias for the determination of serum total 25(OH)D in 50 single-donor samples compared with the NIST target values for Lab 37 (University of Western Australia) (A), Lab 16 (Endoceutics Inc.) (B), Lab 7 (Chromsystems Instruments & Chemicals) (C), and Lab 22 (Imperial College Healthcare) (D) using LC-MS/MS assays

The percentage of single-donor samples with mean bias ≤ |±5%| is provided in Table 4. The percentage within ±5% mean bias is much greater for the LC-MS/MS assays compared to the ligand binding assays [43] except for three assays (Labs 6, 7, and 22) with only 0% or 6% of the individual mean results within ±5% mean bias due to the high positive bias. The percentage of individual samples within various limits from 5 to 50% is shown in ESM Table S9.

To illustrate the influence of high 25(OH)D2 concentrations on the LC-MS/MS assays, the mean % bias for the 42-sample subset with normal 25(OH)D2 concentrations and the 8 samples with high 25(OH)D2 concentrations are compared graphically in Fig. 5. Both the high 25(OH)D2 concentration samples and the normal 25(OH)D2 concentration samples have positive bias (with only two exceptions with minor deviations), and generally the high 25(OH)D2 concentration samples have higher mean % bias.

Fig. 5
figure 5

Mean % bias for various LC-MS/MS assays for single-donor samples with normal concentrations of 25(OH)D2 (42 samples) (yellow bar) and with high concentrations (> 30 nmol/L) of 25(OH)D2 (8 samples) (green bar). Error bars are the SD of the % mean bias for the sample sets

Comparison of individual metabolites using LC-MS-MS assays

All laboratories using LC-MS/MS assays provided results for individual measurements of 25(OH)D2 and 25(OH)D3; six of these laboratories also provided results for 3-epi-25(OH)D3. Therefore, potential bias in the measurement of individual metabolites compared with the NIST target values was investigated. The results for the determination of 25(OH)D2, 25(OH)D3, and 3-epi-25(OH)D3 for these 15 laboratories are summarized in Table S10 through Table S14 as well as the % bias compared to the NIST target values in Table S15 through Table S19 (see ESM). The mean % bias for the determination of 25(OH)D3 for the nine laboratories not reporting the 3-epi-25(OH)D3 and for the six laboratories reporting concentrations for 3-epi-25(OH)D3 are shown in Fig. 6A and B, respectively. All of the LC-MS/MS assays except one (two for the 42-sample set) have a positive bias relative to the NIST target values for total 25(OH)D (Table 4). Therefore, it is not unexpected that the individual metabolite values for 25(OH)D2 (ESM Table S15 and S17) and 25(OH)D3 (ESM Table S16 and S18) are predominantly biased high as shown in Fig. 6, particularly for the laboratories not reporting 3-epi-25(OH)D3 (Fig. 6A). For the 25(OH)D3 values, the six laboratories reporting the separation of 3-epi-25(OH)D3 had a lower mean bias (range of 0.8 to 6.3 nmol/L; mean of 3.1 nmol/L, see Table S18, ESM) compared with the nine laboratories not reporting the epimer (eight laboratories with positive bias ranging from 0.1 to 17.3 nmol/L; mean of 7.9 nmol/L; one lab with negative bias). These results support the conclusion that positive bias for the determination of 25(OH)D3 in some LC-MS/MS assays may be due to the lack of separation of 3-epi-25(OH)D3.

Fig. 6
figure 6

Percent bias for determination of 25(OH)D3 compared to the NIST target values by nine laboratories not reporting 3-epi-25(OH)D3 (A) and six laboratories reporting 3-epi-25(OH)D3 (B). Samples listed in order of increasing concentration of 25(OH)D3

Another observation is that a majority of the LC-MS/MS assays evaluated have insufficient sensitivity to detect 25(OH)D2 at normal population levels (i.e., non-supplemented donors). Of the 15 laboratories using LC-MS/MS, 10 laboratories reported 25(OH)D2 for less than 11 of the samples, i.e., reporting results only for samples with relatively high 25(OH)D2 concentrations. Only three laboratories reported concentrations of 25(OH)D2 in more than 40 samples, and two laboratories reported values for only 20 samples. For the determination of 3-epi-25(OH)D3, six labs found measurable levels in 50 to 100% of the 50 samples; results for five of the six laboratories had a significant negative bias compared to the NIST values for 3-epi-25(OH)D3 (ESM Table S19).

Interestingly, Laboratory 7 (Chromsystems) reported measurements for the 3-epimer of 25(OH)D2 with quantifiable levels found in the six samples with the highest concentrations of 25(OH)D2 (64.2 to 137 nmol/L 25(OH)D2) as follows (mean of two replicates): sample 1 (3.3 nmol/L), sample 2 (3.4 nmol/L), sample 3 (3.7 nmol/L), sample 31 (1.7 nmol/L), sample 34 (2.7 nmol/L), and sample 39 (2.7 nmol/L). Lab 7 also reported 3-epi-25(OH)D2 in a replicate of DEQAS 9A at a concentration of 1.7 nmol/L which is consistent with the 57 nmol/L of 25(OH)D2 in this sample.

Influence of concentrations of 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and 24R,25(OH)2D3 on assay performance

With target values determined in the 50 single-donor samples for 25(OH)D2, 25(OH)D3, 3-epi-25(OH)D3, and 24R,25(OH)2D3, the potential contributions of each of these metabolites to the LC-MS/MS assay response for serum total 25(OH)D were investigated. Multivariable regression analysis was performed for the test assay result for serum total 25(OH)D using the NIST values for each metabolite as independent variables for the following equation:

$$ {\displaystyle \begin{array}{l}\mathbf{Y}\ \left[\mathbf{25}\left(\mathbf{OH}\right)\mathbf{D}\ \mathbf{Test}\ \mathbf{Assay}\right]=\mathbf{Constant}+{X}_1\left[\mathbf{25}\left(\mathbf{OH}\right){\mathbf{D}}_{\mathbf{2}}\mathbf{NIST}\right]+\\ {}{X}_2\left[\mathbf{25}\left(\mathbf{OH}\right){\mathbf{D}}_{\mathbf{3}}\mathbf{NIST}\right]+{X}_3\left[\mathbf{3}-\mathbf{epi}-\mathbf{25}\left(\mathbf{OH}\right){\mathbf{D}}_{\mathbf{3}}\mathbf{NIST}\right]+\\ {}{X}_4\left[\mathbf{24}\mathbf{R},\mathbf{25}{\left(\mathbf{OH}\right)}_{\mathbf{2}}{\mathbf{D}}_{\mathbf{3}}\mathbf{NIST}\right]\end{array}} $$

where X1, X2, X3, and X4 are scaling parameters that when multiplied by the NIST metabolite concentrations provide the test assay result. Since the test assay result for serum total 25(OH)D should be the sum of only 25(OH)D2 and 25(OH)D3, in the ideal case, the scaling parameters X1 and X2 should be close to 1.0, and the parameters X3 and X4 should be negligible. The results of the multivariable regression are summarized in Table 5 with the values of the scaling parameters color coded (as indicated in Table 5 footnotes) to distinguish the level of individual metabolite contributions.

Table 5 Multivariable linear regression analysis for LC-MS/MS assays for 50 single-donor samples using the mean of replicatesa

The results of the multivariable regression analyses indicate that the response to 25(OH)D2 and 25(OH)D3 is near unity for most of the LC-MS/MS assays with overestimations of 25(OH)D3 by Lab 22 and of 25(OH)D2 by four assays (Labs 7, 22, 28, and 37) and an underestimation of 25(OH)D3 for Lab 38. However, there is a statistically significant contribution from 3-epi-25(OH)D3 for 4 of 15 LC-MS/MS assays (Labs 7, 17, 28, and 38). If the SRM and PT/EQA samples are included in the multivariable regression analysis (Table S20, ESM), three additional laboratories (Labs 6, 22, and 33) are observed to have a contribution from 3-epi-25(OH)D3 for a total of 7 of the 15 LC-MS/MS assays, which may be attributed to the lack of separation of the 3-epimer from 25(OH)D3.

Performance of individual LC-MS/MS assays

Chromsystems assay

Two laboratories (Labs 6 and 7) used the Chromsystems LC-MS/MS assay including the assay manufacturer’s laboratory (Lab 7). Both laboratories have positive biases of 21% and 12%, respectively (see Table 4). Lab 7 reported results for 3-epi-25(OH)D3, whereas Lab 6 did not. Because of the positive bias for Labs 6 and 7, almost no individual sample means fall within the ±5% criterion (see Table 4). Multivariable linear regression analysis (Table 5) also indicates that the 3-epimer influences the estimate for Lab 7 (even though it does separate the epimer as indicated in Table S14, ESM) and possibly for Lab 6 when the SRM/PT/EQA samples are included (Table S20, ESM). In the determination of individual metabolites, Lab 6 was positively biased for the measurement of 25(OH)D3 (17%).

Other LC-MS/MS assays

For the remaining LC-MS/MS assays, only five labs reported results for the 3-epi-25(OH)D3 (Labs 11, 25-2, 36, 37, and 40-13). These five laboratories (12, 22, 28, 36, and 38) have bias > |±5%|, which may indicate a lack of separation of the epimer. Interestingly, Labs 25-1 and 25-2 use the same LC column but on different MS/MS systems with Lab 25-2 reporting results for the 3-epi-25(OH)D3 and Lab 25-1 not reporting results for the epimer. Multivariable regression analysis indicates that Lab 17 (which has a negative bias), Lab 28, and Lab 38 have contributions from the 3-epimer to the estimate of 25(OH)D. In Commutability Study 2, Lab 17 did not analyze SRM 972a L4 because of the known high level of 3-epi-25(OH)D3 and did not use the alternative LC-MS/MS method offered by their lab that does separate the epimers of both 25(OH)D2 and 25(OH)D3 [45]. The multivariable regression analysis using the 50 single-donor and 29 SRMs and PT/EQA samples includes Labs 6, 22, and 33 in the group of LC-MS/MS assays with a contribution from 3-epi-25(OH)D3 to the estimate of 25(OH)D. Shortly after participating in this study, Lab 22 retired the LC-MS/MS method used in this study utilizing the BEH Phenyl column which they reported as not separating the epimer, and replaced it with a method using a PFP column to resolve the 3-epimer from the 25(OH)D3.

Conclusions

Intercomparison Study 2 improved on the information obtained from the first VDSP intercomparison study with a greater number of LC-MS/MS assays evaluated. In Intercomparison Study 1, six of the eight LC-MS/MS assays evaluated met the criterion of ≤|±5%| mean % bias. In Intercomparison Study 2, 8 of 15 LC-MS/MS assays evaluated (53%) for the 50 single-donor samples were within ±5% bias. Based on consistent positive bias for some LC-MS/MS assays and the multivariable regression analysis that indicates a contribution from the 3-epimer, it appears that some assays do not separate the 25(OH)D3 from the 3-epimer (i.e., Labs 6, 7, 17, 22, 28, and 38). Although the clinical interpretation of not separating the 3-epimer may be insignificant, it is difficult to understand why laboratories persist in performing analyses for the determination of 25(OH)D using an LC-MS/MS assay that does not separate the 3-epimer when appropriate LC column selection and conditions may resolve this bias. Based on the results of this study and previous studies [3], the VDSP recommends that LC-MS/MS assays for 25(OH)D should use appropriate LC columns and conditions that separate the 3-epimer from the 25(OH)D3 to eliminate this significant source of bias. The results from VDSP Intercomparison Study 2 represent the most comprehensive comparison of different LC-MS/MS assays for the determination of serum total 25(OH)D and document the impact of the lack of separation of 3-epi-25(OH)D3 and 25(OH)D3 on assay performance with regard to mean % bias.