Introduction

The administration of synthetic steroids is one of the most important issues facing sports. Synthetic copies of endogenous steroids are pharmacologically and chemically identical to their endogenous analogs, but there are small differences in the carbon isotope ratio (CIR). Natural variations in the abundance of 13C reflect the passage through biological and/or chemical processes in which transformations are accompanied by isotopic fractionation that results in a slight depletion or enrichment of 13C due to the difference in mass. CIRs are expressed as δ13C values against the international standard Vienna Pee Belemnite (VPDB) [1]. To measure the CIR of a compound, doping control laboratories use gas chromatography combustion isotope ratio mass spectrometry (GC-C-IRMS) [214]. Many laboratories derivatize their compounds before analysis. The CIR of the acetylated compounds needs to be corrected for the extra carbon atoms that have been incorporated into the steroidal skeletal. Therefore, the measured δ13C value has to be corrected using the following formula:

$$ {\delta}^{13}{C}_{\mathrm{steroid}}=\frac{\left({n}_{\mathrm{sa}}{\delta}^{13}{C}_{\mathrm{sa}}\right)-\left({n}_{\mathrm{a}}{\delta}^{13}{C}_{\mathrm{a}\mathrm{corr}}\right)}{n_{\mathrm{steroid}}} $$

Here, ‘n’ is the number of moles of carbon, ‘sa’ stands for steroid acetate and ‘a’ for the acetate derivative group. δ13Cacorr is indirectly calculated by measuring the δ13C value of the reference material of each relevant analyte in the free and derivatized fractions.

GC-C-IRMS remains very laborious and expensive because one can only determine the CIR of a pure compound [15]. To minimize the amount of samples that needs to be analysed on GC-C-IRMS and maximize the detection of doping violations, all urine samples that enter a doping control laboratory are first screened by a fast gas chromatography triple quadrupole mass spectrometer (GC-MS/MS) method that quantifies all important endogenous steroids [16]. Afterwards, samples with suspicious endogenous steroid concentrations and/or ratios are forwarded to IRMS according to WADA regulations [17]. The testosterone/epitestosterone (T/E) ratio is the most important parameter for detecting T or T prohormone administration and its threshold is currently set at 4 by WADA as most individuals have values that vary between 0.1 and 3 [18, 19].

After administration of a synthetic anabolic steroid, the CIR of that steroid and its metabolites will exhibit depleted CIR in relation to an endogenous reference compound (ERC). ERCs are steroids that are not involved in the pathway of anabolic steroids and therefore their CIR will not be affected by application of an exogenous anabolic steroid. Pregnanediol (PD, 5β-pregnane-3α,20α-diol) is used as primary ERC, but others such as 5α-androst-16-en-3α-ol, 11β-hydroxyandrosterone and 11-oxo-etiocholanolone (11-oxoEt, 5β-androstane-3α-ol-11,17-dione) are also routinely used. ERCs normalize GC-C-IRMS results in relation to the diet of the individual athlete and WADA has set a minimum threshold of 3 ‰ difference between ERC and target compound (TC) in order to prove an adverse analytical finding by synthetic steroid abuse [17]. TCs are steroids that can be administered as doping substance or their metabolites (for example androsterone (A), etiocholanolone (Et), 5α-androstane-3α,17β-diol (ααβ-diol), 5β-androstane-3α,17β-diol (βαβ-diol),…).

The main goal of this paper was to provide useful data for other doping control laboratories and illustrate how we deal with GC-C-IRMS measurements, with an emphasis on the new WADA technical document and its implementation [17]. In an extensive method as GC-C-IRMS, localizing a malfunction can obviously be very time consuming. The different quality controls incorporated in the GC-C-IRMS method are described and an elucidation is given on how these can help identifying the source of deviating δ13C values.

Materials and methods

Reagents and chemicals

All used reagents and chemicals are mentioned in previous papers [20, 21].

GC-C-IRMS confirmation procedure and its evolution

All samples were processed according to our routine IRMS confirmation procedure. The first samples were analysed in February 2011. Since then, the method has been adapted on several occasions in order to improve quality control, simplify the sample preparation, expand the number of TCs and ERCs and to comply with WADA requirements (Table 1) [17]. The original method started with aliquotation of a system blank, quality control negative (QC Neg), QC positive (QC Pos) and the routine samples, followed by solid phase extraction (SPE), hydrolysis, liquid extraction (LLE) with diethyl ether, acetylation and a high-performance liquid chromatography fraction collection (HPLC-FC). Four TCs (androsterone acetate (A-Ac), etiocholanolone acetate (Et-Ac), 5α-androstane-3α,17β-diacetate (ααβ-Ac2), 5β-androstane-3α,17β-diacetate (βαβ-Ac2)) and one ERC (5β-pregnane-3α,20α-diacetate, PD-Ac2) were collected, analysed on GC-MS to check for peak impurities and finally the CIRs were determined on GC-C-IRMS by one single injection per fraction. The details of the procedure are described in a previous publication [20].

Table 1 Summary of the method evolution

In September 2011, the method was slightly extended: an additional fraction containing T-Ac and EpiT-Ac was collected during the HPLC-FC. This fraction needed a second HPLC clean up due to interference of T-Ac. During the first HPLC-FC, this T-Ac/EpiT-Ac fraction was always collected, but in order to keep the sample preparation as simple as possible, it only underwent the second HPLC-FC, GC-MS and GC-C-IRMS analysis if suspicious CIRs were determined for the primary compounds (A-Ac, Et-Ac, βαβ-Ac2, ααβ-Ac2 and PD-Ac2). In addition, the diethyl ether LLE was replaced by an n-pentane LLE. More details are available in a previous work [21]. At this stage, a new QC Neg was collected and stored at −80 °C. The new QC Neg comprised a volume of more than 3 L which allows us to use the same QC Neg for several years. If long-term shifts in CIR would occur, they can be detected and visualized with this QC Neg. In December 2011, a new QC Pos was made and stored at −80 °C for the same purpose. The new QC Pos was obtained by spiking QC Neg with sufficient amounts of A, Et, βαβ-diol, ααβ-diol, T and EpiT.

In February 2012, we switched from 1-μL splitless injections to 10-μL solvent vent injections by using a programmed temperature vaporizer (PTV) on our GC-MS and GC-C-IRMS instrument. These 10 μL solvent vent injections increased our IRMS sensitivity by a factor of 10 and allowed us to simplify our sample preparation procedure by reducing the required urine volume for IRMS analysis by a factor of 5. For a TC with a urinary concentration of 50 ng/mL, we now only needed 5 mL of urine instead of 25 mL in the past. More information is available in our previous work [21]. From now on, we consume 7 mL of QC Neg and 3 mL of QC positive per batch, prolonging the lifetime of our QCs.

From September 2012 on, we have started to actively record all δ13C values of all compounds in our acetate standard (acetate STD). The acetate STD is a mixture of acetylated standards of all our TCs, PD-Ac2 and the IS (5α-ol-Ac) and is analysed on GC-C-IRMS before, during and after every batch. It has always been included in our IRMS confirmation procedure, but was merely used to check and set the retention times of the compounds, estimate the GC chromatographic performance (peak tailing and separation) and evaluate the δ13C value of our IS as a deviating δ13C value could indicate problems during the combustion process.

Since May 2013, a second ERC (11-oxoEt) has been included in the method. The sample preparation procedure remained the same, but an additional fraction containing 11-oxoEt-Ac was collected during the HPLC-FC (6.6–8.1 min). This additional fraction was always collected, but in order to keep the sample preparation as simple as possible, it was only analysed on GC-MS and GC-C-IRMS if suspicious CIR were determined for the primary compounds or in another exceptional case. The IRMS calibration procedure was also adjusted. Instead of calibrating the CO2 reference gas with an n-alkane mixture (C17–C25), CU-USADA 33-1 was used. During the calibration procedure, an additional check was performed using CU-USADA 34-1 and 5α-androstane (IU 5α-A), provided by the Biogeochemical Laboratories of the Indiana University (Bloomington, IN, USA). These adaptations were made to comply with the requirements of the new WADA technical document that would come into effect in September 2014 [17]. From January until September 2014, the guidelines document was in force, but both documents are essentially the same [22].

Since January 2014, an extra QC (QC STD MIX) was processed and analysed with every batch. QC STD MIX was prepared by spiking 3 mL of steroid stripped urine with 150 μL of a 12.5 μg/mL standard solution containing A, Et, βαβ-diol, ααβ-diol, T, EpiT, PD and 11-oxoEt. QC STD MIX was added to the method in order to comply with WADA regulations [17]. However, a recent communication from WADA (dated 8th September 2014) has indicated that it is not necessary to subject reference materials with known CIR to the full sample preparation procedure, only that they must be injected prior to sample analysis.

Routine doping samples

From February 2011 until May 2014, we analysed 609 routine urine samples from athletes on GC-C-IRMS. 279 samples (45.8 %) were investigated on IRMS because of an elevated T/E ratio, the other 330 samples (54.2 %) required IRMS investigation due to an elevated concentration of A, Et, βαβ-diol, ααβ-diol, T or DHEA (or combinations) or because this was requested by the corresponding sport federation.

Negative reference population samples

The collection and analysis of negative reference population samples occurred at different points in time during the period 2010–2013. In 2010, a first reference population consisting of 27 male (aged 26.0 ± 5.5 year) and 25 female (aged 26.3 ± 8.0 year) volunteers who declared not to have used any prohibited substance was analysed. The remainder of the negative reference population samples originated from healthy university staff that needed to undergo the yearly health check at the university hospital. These were collected and analysed in 2011–2013.

Results and discussion

Calibration

Ideally, the CO2 reference pulse is calibrated against an isotopic standard with a chemical structure that is very similar or equal to that of the analytical compounds that need to be analysed on IRMS. It is important that reference materials have similar chemical properties because the combustion efficiency may vary, changing the isotopic composition of the gases [23]. To harmonize reported values and achieve uniform results between the different anti-doping laboratories worldwide, the IRMS system needs to be calibrated against a steroid reference material according to the new WADA regulations [17].

In 2008, two steroid isotopic standards (SIS), CU-USADA 33-1 and CU-USADA 34-1, have been assigned certified CIRs that are traceable to the international reference material [24]. CU-USADA 33-1 contains 5α-ol-Ac, A-Ac, 11-oxoEt-Ac and 5α-cholestane (5α-Chol), CU-USADA 34-1 contains Et, A and PD (Table 2).

Table 2 Measured and certified CIRs for CU-USADA 33-1, CU-USADA 34-1 and IU 5α-A

Since May 2013, our CO2 reference pulse has been calibrated against CU-USADA 33-1 eight times. This SIS was chosen because it contains acetylated steroids, similar or identical to the ones we analyse in our GC-C-IRMS method. We calibrate with CU-USADA 33-1 in threefold and use the average assigned value as the new δ13C value for our CO2 reference pulse. Immediately before and after the calibration, CU-USADA 33-1 is analysed as well to check how much the measured δ13C values of the steroids deviate from the certified δ13C values by using respectively the old and the new δ13C value of the CO2 reference pulse (Table 2). Afterwards, as an additional check, CU-USADA 34-1 and IU 5α-A (5α-androstane) are analysed with the new CO2 reference pulse δ13C value for the same purpose.

For CU-USADA 33-1, the bias with the certified values is acceptable, except for 11-oxoEt-Ac. For unknown reasons, the standard deviation (SD) and bias are unacceptable large for this compound. Similar observations were made for 11-oxoEt-Ac in CU-USADA 33-1 in a paper of Piper et al. [25]. The effect was partially attributed to an incomplete transfer from the GC column onto the hot zone of the combustion furnace. Other effects caused by the GC such as losses linked to the injection were also indicated as possible contributors to the atypical behaviour. An amount-depended shift was also observed, but this does not apply for our measurements as we always inject the same amount during the calibration. 11-oxoEt-Ac is also present in our QC Neg, QC Pos and QC STD MIX but here we do not see a similar phenomenon and normal deviations are obtained (Table 3). The four steroids present in CU-USADA 33-1, including 11-oxoEt-Ac are used to set the CO2 reference pulse δ13C value, but despite this atypical behaviour of 11-oxoEt-Ac, a correct δ13C value for the reference pulse is obtained (acceptable bias for the other compounds). We assume that during the calibration process, the other three steroids compensate for the atypical behaviour of 11-oxoEt-Ac.

Table 3 Statistical data gathered for QC Neg, QC Pos, QC STD MIX and the Acetate STD

CU-USADA 34-1 consists of three non-acetylated standards. Free steroids cause more tailing during the GC analysis and this results in larger SDs. Overall, the bias is slightly worse in comparison with CU-USADA 33-1. The bias and SD for IU 5α-A were in line with CU-USADA 33-1.

Calibrating with the new reference material CU-USADA 33-1 did not induce a shift in the CO2 reference gas value. The obtained δ13C values were in agreement with previously obtained values (using the alkane reference material). Neither was there a shift in the CIR of the IS, QC Neg and QC Pos (Figs. 1 and 2).

Fig. 1
figure 1

CIR of the IS from February 2011 until May 2014. The corresponding dashed line represents the Gaussian least-square fitted linear slope (y = 0.0002x − 32.36)

Fig. 2
figure 2

Quality control chart with CIR of Et, A, bab-diol, aab-diol and PD from QC Neg and QC Pos

System stability

As recommended by a paper of Piper et al., the CIR of the IS has been monitored on a continuous basis as this compound is highly sensitive towards long term shifts and changes in the combustion process [26]. The data is presented in Fig. 1 with consecutive numbers instead of the date of measurement as data collection was carried out on a regular basis (n = 579, three to four measurements per week). The average obtained value was −32.30 ‰ with a SD of 0.39 ‰ and according to the Shapiro Wilk test (W = 0.996, p = 0.221) spread according to a normal distribution. The first 30 measurements were characterised by an average of −32.38 ‰ and a SD of 0.47 %, the last 30 measurements had an average of −32.23 ‰ and SD of 0.51 ‰. According to the independent t test at p = 0.05, there is no significant difference between the first and last 30 measurements (p = 0.236), illustrating that no clear significant shift occurred. Contrary to the expectations, the changes made to the method (different injection type and different calibration procedure) did not induce a clear shift.

In the publication of Piper et al., two factors contributed to two shifts: replacement of the CO2 tank for the reference pulse and a change in the oxidation protocol for reoxidizing the combustion reactor. Since the launch of our method, it has not yet been necessary to replace the CO2 tank. Our oxidization protocol has remained the same as well: we oxidize for 12 s before each measurement. If the IRMS has not been used for a longer period of time (>3 days), we start the batch with five blanks to ensure that the combustion reactor is saturated with enough oxygen before the actual measurements begin. We do not perform any other oxidation protocols (i.e. oxidize for a longer time period once a week). So far, we only had to replace the combustion reactor once (November 2011), but this did not induce a measurable shift either.

Quality control

System blank

So far, there has not been a case where the system blank indicated any problems or contaminations.

QC Neg and QC Pos

QC Neg and QC Pos were prepared and analysed with every batch of samples. The obtained CIR for the TCs and ERCs were plotted in a quality control chart and compared with the data acquired in the previous batches. If the CIR of one of the TCs or ERCS was outside the 2-sigma limits, all data from that batch was rejected and the batch was reanalysed [27]. The cause of the out-of-control data would be sought and solved and the batch would be reanalysed. Five times we have had out-of-control data: on one occasion, we were dealing with a broken combustion reactor and in a second incident we used a HPLC column that had exceeded its lifetime. The peaks had become so wide that they started to elute outside the collected fractions during HPLC-FC. This was especially the case for PD-Ac2 (late eluting compound) and was more pronounced in QC Neg than QC Pos (same concentration of PD but a 7 mL aliquot compared to 3 mL for QC Pos). PD-Ac2 had values of 0.5 and 0.3 ‰ outside the 2-sigma limit for QC Neg and QC Pos respectively. On three other occasions, the GC column had exceeded its lifetime.

The collected data for QC Neg (starting from September 2011) and QC Pos (starting from December 2011) are presented in Table 3 and visualized in Fig. 2. For QC Neg, the largest SD was found to be 0.81 ‰ (ααβ-diol), for QC Pos the largest SD was 0.87 ‰ (EpiT). Overall, the obtained precisions were in line with what can be expected for GC-C-IRMS measurements [26, 28]. Differences in precision between compounds are obviously caused by chemical and physical dissimilarities, which may for example lead to different chromatographic behaviour (i.e., tailing compounds are prone to more variation). However, differences in precision are also sample dependent. QC Pos is the same urine as QC Neg (i.e. same matrix) but spiked with the TCs. In QC Neg, ααβ-diol is present in quite low concentrations, leading to a low peak intensity (but still within the acceptable linearity range) which will be susceptible to more variation. In QC Pos, ααβ-diol is present in higher concentrations and has a lower SD (0.68 ‰). Likewise, a sample (and its TCs and ERCs) defined by a higher background will be prone to more variation. This is something one has to keep in mind when defining “the overall precision of a method” (cf. in the section “Uncertainty”).

QC STD MIX

QC STD MIX has basically the same function as QC Neg and Pos: It allows the analyst to identify deviating CIR for one or multiple compounds and to visualize long term shifts. QC STD MIX however has the advantage that the standards used to spike the steroid free urine are disposable and can also be analysed directly on GC-C-IRMS without sample preparation. The standards can be injected with or without acetylation which allows the analyst to for example identify acetylation problems if these would occur. QC STD MIX data is given in Table 3.

Acetate STD

The acetate STD has always been included in our IRMS confirmation procedure, but was merely used to check and set the retention times of the compounds, estimate the GC chromatographic performance (peak tailing and separation) and evaluate the δ13C value of our IS as a deviating δ13C value could indicate problems during the combustion process (cf. in the section “System stability”). When our IRMS method was launched, we assumed that if the δ13C value of the IS was within specifications, the GC-C-IRMS was working properly for δ13C value determination of the other compounds as well (if there were no chromatographic issues, impurities,…). However, in some occasions, the δ13C value of one or more acetylated standards started to deviate from their normal value and the deviation progressed in time whereas the δ13C values of the IS and the other acetylated standards remained correct, indicating that incomplete combustion was not the problem. All compounds were still baseline separated, there was no peak tailing or other chromatographic issue and there were no visible impurities present. In these cases, replacement of the GC column however solved the issue and restored the δ13C values back to their normal values. We suspect that as the GC column gets older, (undetectable) column bleeding that elutes at specific retention times starts to influence the combustion process of compounds eluting at that retention time with a deviating δ13C value as a result. In our experience, the GC column lifetime rarely exceeds 600–700 injections for GC-C-IRMS purposes. However, to minimize costs, we transfer this “old” column to our GC-MS instrument as it still fulfils the chromatographic requirements for this analysis.

In this light, measuring the δ13C values of all compounds in our acetate STD facilitates troubleshooting to a great extent. Correct δ13C values for all compounds in our acetate STD guarantees that the GC-C-IRMS is working properly for CIR determination of all TCs and ERCs (and not only the IS). In the past, when deviating CIRs were obtained in the QC Neg and QC Pos, we were uncertain if the problem was situated in the sample preparation or originated from a malfunctioning GC-C-IRMS instrument. In our experience, in 90 % of the cases, deviating CIR are invoked by an “old” GC column. For the remaining 10 % there were various causes. One time they were caused by an HPLC column that had exceeded its lifetime. On a second occasion, deviating δ13C values were the result of a broken combustion reactor. In a third incident, deviating CIR resulted from a malfunctioning IRMS detector and an electronic control board needed to be replaced.

Some laboratories include additional quality controls in order to for example evaluate the efficiency of the hydrolysis or the derivatization. Because issues during the hydrolysis or derivatization will be traceable with our current quality controls, we have chosen not to include them in our already extensive method and we consider the presented strategy as sufficient.

Uncertainty

Since 2014, WADA demands a maximum combined measurement uncertainty (u c) of 1 ‰ [17]. To the best of our knowledge, only one publication has dealt with detailed uncertainty calculations for IRMS purposes [29]. These calculations however focused at certifying a reference material, rather than the combined measurement uncertainty of an IRMS method. The basic rule for the calculation of the combined uncertainty is the “square root of the sum of the squares rule”. Uncertainty components u(x 1)…u(x n ), are combined as shown in the following equation [30]:

$$ {u}_c=\sqrt{u{\left({x}_1\right)}^2+u{\left({x}_2\right)}^2+\dots u{\left({x}_n\right)}^2} $$

For GC-C-IRMS, there are no clear rules on how these separate uncertainty components need to be defined exactly. The consequence is that doping control laboratories use different methodologies to calculate their uc, leading to a situation where some doping control laboratories have no problem maintaining their u c beneath 1 ‰, whereas other laboratories have great difficulties. In those cases, the apparent difference in combined uncertainty is not necessarily related to the quality of the GC-C-IRMS method, but caused by a difference in calculation methodology. An example of such a methodology might be the following (Table 4):

$$ {u}_c=\sqrt{u{\left(\mathrm{calibration}\ \mathrm{bias}\right)}^2+u{\left(\mathrm{reference}\ \mathrm{value}\right)}^2 + u{\left(\mathrm{method}\ \mathrm{precision}\right)}^2} $$
Table 4 Combined uncertainty calculation according to the square root of the sum of the squares rule

The calibration bias is the difference between the certified value for the SIS and the measured SIS value as calculated in Table 2 (−0.15 ‰; the largest bias, disregarding 11-oxoEt-Ac). The second uncertainty component is the SD on the true certified SIS value as given in Table 2 (0.14 ‰; again the largest value). The largest contribution to u c is the method precision that can for example be defined as the SDs for QC Neg (Table 3). As mentioned in section “QC Neg and QC Pos”, these SDs are not only component dependent, but also sample (and matrix) dependent. Laboratories that do not use real urine but spiked steroid stripped urine (or even water) for the calculation of this uncertainty component will most likely have lower SDs and a lower combined measurement uncertainty. Applying this methodology would, in our case, result in a maximum u c of 0.84 ‰.

In general, the u c will contain contributions from:

  1. 1)

    Precision of measurements

  2. 2)

    Bias of experimental processes

  3. 3)

    The uncertainty of CIR in reference materials used to fix and normalize the δ-scale

  4. 4)

    The algorithms applied to correct and normalize the data [23].

The issue is that this requires the separate uncertainty components to be independent. IRMS works with values that are normalized in relation to VPDB, meaning that the uncertainty in normalized results for samples (δ true (sample)) will have contributions from the precision of the measurements of the reference standards and sample, and the uncertainty in the reference value. This makes determination of the uncertainty of δ true (sample) complicated because of correlation. Fortunately, the correlation can be circumvented by calculating the uncertainty directly as described below [23].

The equation for calculating δ true (sample) can be written as:

$$ {\delta}_{\mathrm{true}\ \left(\mathrm{sample}\right)}={\delta}_{\mathrm{true}\ \left(\mathrm{R}\mathrm{S}1\right)}+\left[\left({\delta}_{\mathrm{raw}\ \left(\mathrm{sample}\right)}-{\delta}_{\mathrm{raw}\ \left(\mathrm{R}\mathrm{S}1\right)}\right)\times \left(\frac{\left({\delta}_{\mathrm{true}\ \left(\mathrm{R}\mathrm{S}1\right)}-{\delta}_{\mathrm{true}\ \left(\mathrm{R}\mathrm{S}2\right)}\right)}{\left({\delta}_{\mathrm{raw}\ \left(\mathrm{R}\mathrm{S}1\right)}-{\delta}_{\mathrm{raw}\ \left(\mathrm{R}\mathrm{S}2\right)}\right)}\right)\right] $$

RS1 and RS2 are the reference standards that were used for normalization using their measured and true values. In our GC-C-IRMS method, we have four RSs in CU-USADA 33-1, RS1 represents the steroid with the highest CIR and RS2 the steroid with the lowest CIR (or vice versa) so that the entire calibration range is covered. δ true is the true CIR in relation to VPDB, δ raw is the non-normalized value.

For this equation, it is not possible to use the simple rules for combining uncertainties and the spreadsheet-based calculation as described by Kragten is the most straightforward approach [31]. The Kragten spreadsheet that applies for our IRMS method is presented in Table 5 and a clear explanation on how to set up these spreadsheets is given in the work of Carter and Barwick [23].

Table 5 Combined uncertainty calculation with a Kragten spreadsheet. A and B: RS1 = 11-oxoEt-Ac, RS2 = A-Ac; C and D: RS1 = 5α-Chol, RS2 = A-Ac. A and C: sample = ααβ-Ac2; B and D: sample = 11-oxoEt-Ac. The calculated u c is given in italics

In Kragten sheet A, RS1 and RS2 are, respectively, 11-oxoEt-Ac and A-Ac from CU-USADA 33-1. Their δ true and its uncertainty can be found in Table 2, δ raw and its uncertainty were obtained from the same measurements as in Table 2, but before normalization with the reference value. For “sample”, ααβ-Ac2 of the QC Neg measurements was chosen (Table 3) as this compound was characterized by the largest SD. The δ raw was extracted from the data and the uncertainty calculated. The Kragten sheet calculates δ true (sample) and its uc. Keep in mind that δ true (sample) is the value for ααβ-Ac2 whereas the value given in Table 3 is the corrected value for ααβ-diol. The original non-corrected value for ααβ-Ac2 was −28.88 ‰, meaning that there is a difference of 0.65 ‰ between the value calculated by the Kragten sheet and the value generated by the IRMS software. This is likely caused by the IRMS software that uses all four steroids in CU-USADA 33-1 and in addition we are uncertain on how the conversion algorithms functions exactly.

In Kragten sheet B, ααβ-Ac2 has been replaced by 11-oxoEt-Ac from QC Neg (lowest SD). The original non-corrected value for 11-oxoEt-Ac was −25.75 ‰, meaning that there is a difference of 0.79 ‰ between the value calculated by the Kragten sheet and the value generated by the IRMS software.

In Kragten sheet C and D, 11-oxoEt-Ac has been replaced by 5α-Chol as RS1. As mentioned in section “Calibration”, 11-oxoEt-Ac showed atypical behaviour and replacement by 5α-Chol should give a better representation of u c. 5α-Chol was chosen as it has the second highest CIR of CU-USADA 33-1. In sheet C, “sample” is again ααβ-Ac2 and in sheet D 11-oxoEt-Ac. The difference between δ true (sample) calculated by the Kragten sheet and the value generated by the IRMS software now drops to 0.29 and 0.17 ‰ for ααβ-Ac2 and 11-oxoEt-Ac, respectively.

In all four sheets u c remains beneath 1 ‰ (0.96, 0.72, 0.80 and 0.44 ‰ for A, B, C and D respectively). Sheet C however gives the most representative value.

Routine doping samples

Table 6 gives an overview of the number of samples that were analysed from February 2011 until May 2014. 97 out of 609 samples (15.9 %) were female. Eight out of 609 samples (1.3 %) were positive, all of these were males. From 2011 until 2013, 40 % of all IRMS investigations were due to an elevated T/E ratio (>4). In 2014 almost 80 % of all IRMS requests were triggered by a T/E ratio above 4. This increase was caused by the new WADA regulations that demanded that urine samples with a T/E above 4 in the screening would be analysed on IRMS (after approval of the corresponding sport federation). The old WADA rules required that an elevated T/E (>4) was first confirmed with a second independent analysis before the corresponding sport federation received a request whether IRMS needed to be performed on the suspicious sample or not [32]. Taking the measurement uncertainty of the T/E confirmation analysis in account, this meant that only T/E ratios above 4.43 were labelled as suspicious.

Table 6 Overview of routine samples confirmed by GC-C-IRMS

The highest T/E that resulted in a negative IRMS result was 18.5. For this sample, all Δδ13C values were between −0.2 and 0.4 ‰ and all δ13C values were above −23.4 ‰, making it unlikely that this was a false negative as the most enriched synthetic preparation that has ever been reported had a CIR of −22.9 ‰ [4, 5, 33, 34]. All T/E > 19 were due to anabolic steroid abuse (Table 7). One sample with a T/E beneath 4 (T/E = 2.4) led to an adverse analytical finding. All other adverse analytical findings had T/E ratios between 10 and 83. From 2011 until 2013, 30 % (195 out of 609) of all IRMS requests were triggered by a T/E between 4 and 6; in 2014 this number increased to 50 % (37 out of 72), but in 4 years time none of these samples exhibited Δδ13C values above the WADA IRMS threshold. These data support the proposition of Mareck et al. to raise the T/E threshold to 6 and only recommend an IRMS analysis for samples showing T/E > 6 [19]. This would reduce the time and money spent on negative IRMS confirmations considerably and increase the overall efficiency of the IRMS confirmation procedure.

Table 7 Steroid profile and CIR of the positive samples

Recently, WADA has implemented the steroidal module of the athlete biological passport (ABP) [35]. The steroidal module monitors selected urinary steroid concentrations of an athlete over time in order to detect steroid doping. With each measurement, the ABP progressively adapts the calculated upper and lower reference limits, evolving from a population-based to an individual-based threshold [36]. As the number of test records increases, the calculated reference limits adapt from population thresholds (i.e. T/E > 4) towards individual thresholds resulting in a more sensitive criterion. Over time, the use of the ABP will lead to a change in the types of samples submitted to IRMS analysis. More T/E < 4 samples will undergo an IRMS analysis, while other ones with T/E > 4 will no longer require testing. This fundamental change will be a big factor for IRMS analysis moving forward.

Table 7 gives an overview of all positive samples. These adverse analytical findings were conclusive and according to WADA guidelines. There were other samples that were suspicious and had one or multiple values close to the 3 ‰ threshold. Nonetheless, these cases did not fulfil the criteria and were reported as negative.

As pointed out in our previous publication, βαβ-diol and ααβ-diol are more sensitive parameters than A and Et [20]. This can be attributed to lower endogenous dilution, longer detection times and a higher biological sensitivity due to their position in the metabolic pathway. In all positive samples, at least one of the androstanediols had a higher Δδ13C value than Et and A; and in all samples at least one of the androstanediols had a Δδ13C value above 3 ‰. In sample number 4, for example, neither A or Et has a Δδ13C value above 3 whereas ααβ-diol does. This illustrates and confirms that measuring only A and Et and not the androstanediols during the IRMS confirmation deteriorates the efficiency of the analysis.

Table 8 gives an overview of the collected data for the negative urine samples. The data for T and EpiT have been omitted as these TCs were only measured in cases that resulted in suspicious IRMS results in the first place. They were eventually labelled as negative but there is a high chance for false negatives within these samples. For the remaining TCs and ERCs it is obviously impossible to exclude the occurrence of any false negatives.

Table 8 Overview of CIR obtained for the negative routine samples

In a reference population with 56 subjects (36 males and 20 females), Flenker et al. pointed out that the difference between PD and A had an approximately symmetrical scatter around zero. In contrast, Et was on average ±1 ‰ more depleted in relation to PD and A [7]. The same phenomenon is observed in our data. The effect was attributed to isotopic fractionation between the 5α- and 5β-pathway, where the 5β-compound becomes more depleted. Kinetic isotope effects originate from either rate limitation or branching (or a combination). Reduction of the A-ring is rate limiting and generates two branches: 5α- and 5β-metabolites [37]. However, our data shows only marginal difference between the average CIR of ααβ-diol and βαβ-diol (0.1 ‰). The average CIR for Et, ααβ-diol and βαβ-diol are very similar to each other. The average CIR for A and PD are very similar as well but ±1 ‰ more enriched in relation to the androstanediols and Et. Piper et al. analysed a population twice (one time with an old and one time with an optimized new HPLC-FC) [25]. Interestingly, with the old HPLC-FC, a difference of ±1 ‰ between ααβ-diol and βαβ-diol was observed. With their optimized HPLC-FC, the average CIR for E, ααβ-diol and βαβ-diol were very similar to each other as is the case in our population and A and PD were also ±1 ‰ more enriched, illustrating that there can be analytical causes for observed dissimilarities between laboratories.

In a study of Cawley et al. with a reference population of 167 subjects, the average CIR of Et was 0.3 ‰ depleted in relation to A [6]. The average Δδ13C value for Et and A with 11-oxoEt was 2.3 and 2 ‰, whereas our population resulted in 0.2 and −0.3 ‰, respectively. The data of Cawley suggest that 11-oxoEt is enriched with ±2 ‰ in comparison with the TCs making it less convenient to use as ERC in combination with a WADA 3 ‰ threshold. In our population however 11-oxoEt is suitable as ERC as 11-oxoEt is on average 0.8 ‰ depleted in relation to PD.

At this stage, it is unclear on how the parallels and differences between laboratories and populations need to be clarified in a consistent manner. The origin could be biological and physiological and related to the population. There could also be an analytical cause, resulting in (small) deviations between laboratories. In this context, analysing an identical small population (10–20 samples) by different laboratories might bring clarification. In any case, it is clear that every doping laboratory needs to verify the reference population limits for its relevant TCs and ERCs with their own reference population. Table 9 gives an overview of the collected data for the negative reference population and these values (average and SD) are very close to the values gather with the negative routine samples.

Table 9 Overview of CIR obtained for the negative reference population samples

Conclusions

Our GC-C-IRMS confirmation method was performed on 609 routine doping samples. The data was investigated, presented and interpreted, covering different areas relevant to doping control. Special emphasis was given to the calibration, quality control and measurement uncertainty, in light of the 2014 WADA technical document. The paper concentrates on how IRMS results remain similar by switching to the 2014 WADA technical document and gives insight in a straightforward approach to calculate the measurement uncertainty.

The method revealed eight cases of endogenous anabolic steroid abuse. The gathered data proves that using a T/E ratio threshold of 6 instead of 4 would increase the efficiency of the IRMS confirmation procedure. The introduction of the steroidal module of the ABP however will most likely invoke a fundamental change in the types of samples that will require IRMS analysis in the future. In addition, we demonstrated the reliability and validity of the GC-C-IRMS confirmation protocol.