Introduction

The European Commission has been involved in production and distribution of certified reference materials for more than 30 years [1]. The European Commission's Institute for Reference Materials and Measurements (IRMM) has been established as an internationally recognised provider of reference materials for a broad variety of measurements. Since January 2003, all Community reference materials activities (which have become known as the "BCR-Programme") are located at IRMM.

The current range of CRMs offered covers, among others, food safety including labelling and authenticity, environmental monitoring, occupational hygiene, clinical chemistry, physicochemical properties, industrial raw materials and products as well as nuclear-related areas and pure substance standards. In the course of the last three decades, considerable advances have been made, not only in the analytical science behind the various measurements, but also in terms of the requirements for new reference materials and the technological know-how to produce them.

Parallel to increased technical feasibility of CRM projects, quality management requirements for CRM producers have grown as well [2]. In this context, the introduction of the "Guide to the Expression of Uncertainty in Measurement" GUM [3] also triggered a discussion of the uncertainty evaluation of certified values [4, 5, 6, 7, 8, 9]. This and the awareness that the production of reference materials must be subjected to at least the same quality requirements as the measurements controlled by them also resulted in a revision of the respective ISO Guides for RM-Producers.

While in the early times of the Commission's Reference Materials program it was stated that materials like "fresh strawberries" were neither feasible nor meaningful [10], reality today looks very different. Fresh materials are no longer mere science fiction. Materials like "BCR-718—fresh herring" or "IRMM/IFCC-451—cortisol in fresh frozen human serum" illustrate this tendency. One of the consequences is of course the need to monitor carefully the stability of those CRMs that are more sensitive to various changes.

In this paper, we will describe some trends in the production of CRMs and their implications on the efforts devoted to ensure CRM stability. We will give a short overview over the impact on processing, packaging and storage combined with the requirements for the statement of verifiable shelf lives and continuous stability monitoring.

Trends in CRM production

Materials in their natural form

The nature of biological and environmental CRMs has changed considerably since their first introduction 30 years ago. While materials such as Bowen's Kale [11], SRM-1566 (Oyster Tissue), IAEA's SP-M-1 (Sea plant) or BCR-063 (milk powder) were highly processed materials that made a strong compromise between the requirements of homogeneity and stability versus the closeness to a real analytical sample (fresh cabbage, real oysters, plants or milk), today's trend is towards materials for which analytes and matrix are processed as little as possible [12]. This tendency is often required by the nature of analyte, as discussed for the case of speciation analysis elsewhere [13]. Two fish materials can illustrate this change: BCR-422 (Cod), which was produced in the late 1980s, is a dried powder, whereas the recently released BCR-718 is a fresh canned herring. This trend can also be observed in the case of some clinical materials: the formerly produced form of lyophilised proteins (e.g. BCR-457—Thyroglobulin) is increasingly replaced by frozen sera being certified for the property of interest, for example IRMM-451, a set of frozen sera certified for their cortisol content. The downside of greater closeness to reality is increased danger of degradation and of course a higher possibility of within-bottle and between-bottle inhomogeneity. This tendency towards little processed materials will require more detailed instructions to CRM users than in the past.

Analytes at their natural concentration level

The development towards preservation of the matrix in as natural a state as possible is accompanied by the tendency to release materials, whose analytical concentration is closer to frequently observed concentrations rather than exceptionally high ones. Thus, more and more "sets of CRMs" are produced in which one CRM can contain different concentration levels (BCR-614, set of PCDD/F-dioxins calibration standards in nonane) or a series of independent CRMs (e.g. BCR-628, BCR-630 and BCR-631, normal and abnormal plasmas for prothrombin time) covering a property range as was requested by Emons et al. [14]. In the case of some complex analytes for which matrix interferences may disturb the analytical response, this development results in the production of "blank" materials being certified for the "absence" (i.e. below a stated limit of detection) of the analyte of interest, for example BCR-695, BCR-697 and BCR-706 (Chlortetracycline in pig liver, muscle and kidneys). As lower analyte levels are usually more difficult to quantify, these low levels make the assessment of stability more difficult.

New certified properties

Higher sophistication both in analytical instrumentation (MALDI-TOF, real time-PCR, etc.) and RM production allows the realisation of reference materials for novel analytical challenges such as the detection of prions or GMOs. Similarly, the investigation of total element contents, for example in the context of environmental monitoring, is increasingly amended by the determination of "assessment relevant" parameters for their speciation of metals. New materials for extractable trace elements in soil such BCR-701 or for organometallic species such BCR-646 (organotins in freshwater sediment) underpin this trend. A similar evolution can be observed for "method-specific" certifications and "method-defined" values such in case of IRMM-443 (adsorption parameters in soils). The limited amount of experience with these materials makes a priori assumptions about their stability difficult.

Prevention of degradation

The changes in the CRMs produced have a considerable impact on the preservation efforts of CRMs. Preservation, as it is understood by IRMM, consists of two aspects: firstly, it comprises all efforts to prevent degradation; secondly, it consists of the measures taken to detect degradation for those cases where even the most cautious prevention failed.

More cautious processing

In the past, the choice of processing steps (milling, sieving etc.) was mainly based on the requirement to obtain a material (usually a powder) of a certain consistency (particle size etc.). The main purpose of processes like cryogrinding was to facilitate homogenisation of fatty or moist tissues (meat, vegetables) and not to prevent degradation. Homogenisation was preferably carried out using slurry techniques to prevent clogging of particles. These processes were well suited for the preparation of materials certified for reasonably robust analytes like Pb or vitamin B, but problems may be encountered for some novel materials as exemplified by the first generation of GMO CRMs. The materials were produced by slurry mixing GM and non-GM material with subsequent drying of the mixture. This process provided excellent homogeneity, but led to severe fragmentation of DNA, possibly by activation of enzymes. In this specific case, no harm was done as the materials were intended to harmonise immunochemical methods for the detection of GM material using the specific proteins, but the processing method has proven less suitable for materials to be used for DNA-based detection methods. To overcome this problem, a dry-mixing technique was employed for later generations of GMO CRMs [15]. This technique offers better protection against degradation but increases the possibility of inhomogeneity. It can be expected that this kind of trade-off will be encountered more frequently in the future.

Improved packaging

Choice of suitable containment and packaging also receives greater attention now than in earlier times. Whereas not much effort was made to provide an especially inert surrounding for CRMs certified for trace elements in fly ash, bottling under inert gas (Ar, N2) and selecting containers as tight as possible (preferably ampoules, otherwise vials with septa rather than screw-cap bottles without insert) has become normal practice nowadays. This improves stability at the distributor but has implications on the stability at the customers' premises: once deprived of the protection of the unopened containment, degradation can occur at a much higher rate than in the unopened bottle. Also for that reason, CRM producers cannot guarantee the stability of a material outside its premises for longer times.

Changed storage conditions

The developments outlined above also caused a significant evolution regarding the storage conditions and requirements of these new CRMs. Figure 1 gives an overview on the development of storage conditions at IRMM over the past two decades. It shows that most of the materials produced before 1990 could be stored at room temperature. Since then, the trend has been towards lower temperatures, with storage even well below −20 °C replacing storage at +4 and −20 °C since 1995. At the moment, about 20 % of the materials at IRMM need to be stored at −20 °C or below.

Fig. 1
figure 1

Change in storage conditions at IRMM

Change in dispatch conditions

CRMs that need more careful storage also require more care during transport. It is therefore not surprising to see that 10 % of the materials stored at IRMM require a cooled dispatch using cooling elements or even dry ice depending on the dispatch time. Even more stringent dispatch conditions (e.g. liquid nitrogen) might be expected for the future.

As dispatching CRMs becomes more sophisticated, the problems of providing an adequate distribution system increase. Our experience shows that it is not easy to find suitable courier services that accept the shipment of those sensitive, sometimes hazardous materials (in general, courier services refuse the transportation of perishable goods). Ensuring that dispatch does not take too long puts considerable strain on logistics and therefore makes it expensive. As an example, one has to avoid weekends for the dispatch as this may lead to longer dead-times, try to find courier services willing to replenish dry-ice used for cooling etc. Longer delivery times are unavoidable to make all those preparations. Last but not least, the export of these goods across borders is often hampered by the same technical trade barriers, which the CRM should help to overcome. As an example, custom authorities refused recently the "import" of a sample of BCR-178 (ammonium nitrate fertiliser), because the use of such a fertiliser was banned in the respective country.

Detection of degradation

According to ISO-Guide 34, effects of "light, moisture, heat and time shall be quantified in order to provide advise on storage location and life-span (and hence a suitable shelf-life/expiry date)" [2] (emphasis by the authors). At IRMM, CRMs are stored in closed vessels in the dark at controlled temperature and humidity, so the main parameter of interest is time. The increased awareness of the importance of stability testing is illustrated by the fact that the current version of ISO-Guide 35 [16] does not even mention stability testing, whereas the new draft has an entire chapter dedicated to this topic [17]. Ensuring stability might be an easy task in the case of for example BCR-032 (Moroccan phosphate rock), for which stability for several thousand years is proven by the mere existence of the rock, but is less obvious in cases such as BCR-485 (vitamins in mixed vegetables). Therefore, the need for stability testing and the necessity to state a shelf life increases the more sensitive a CRM becomes.

Statement of shelf lives

Expiry dates of CRMs and shelf lives do not refer to the materials itself but to the time the certified value and uncertainty is guaranteed by the CRM producer. Consequently they should be named "Expiry dates of the Certificate". Practically speaking, this means that potential degradation does not change the certified value and its uncertainty until the expiry date. The possibility of degradation can never be totally excluded, a statement that is equivalent to saying that stability can never be proven without uncertainty.

CRM producers can set the shelf live to a date until which potential degradation is thought to be negligible. By using the GUM as a guideline, this means that potential degradation is less than 1/3 of the largest other contribution to the CRM property value uncertainty. In theory, this would allow the setting of a shelf life even for real-life stability studies that have non-zero uncertainties. Experience at IRMM shows that this approach, although seemingly elegant, is practically inapplicable, as the uncertainty of the stability study is usually in the range of the other uncertainty contributions. The problem here is that stability of a CRM can never be proven without uncertainty and the uncertainty of a CRM should always be smaller than the uncertainty of an individual measurement to make the material useful. Let us assume a certified value of 23.3 ± 0.3 mg L−1 and a stability test with a result of 21.7 ± 1.8 mg L−1. This test is in total agreement with the certified value, but it also does not exclude the possibility of degradation to a concentration of 19.9 mg L−1, which is far outside the certified range. In principle, the expiry date of the certificate cannot be prolonged on the basis of this test without changing the certified uncertainty.

IRMM uses a different approach to set shelf lives, which is in agreement with requirements of ISO-Guide 34 which has been extensively described elsewhere [18]. The possibility of degradation is quantified as uncertainty of stability and this value is added to the other uncertainty contributions. This generates an allowance not only for unknown degradation, but also for confirmation of stability without changing the uncertainty of the material. The stability after certification is confirmed by performing stability monitoring.

In this context it should be mentioned that the overall uncertainty of a CRM increases with the measurement uncertainty (also of the stability study). This implies a careful planning of stability testing with a sufficient number of replicates already in the preparation phase of the CRM. In cases where materials have been produced without inclusion and an uncertainty contribution from the stability study, this may even lead to revision of certified values once appropriate and sufficient stability data are available.

Real-temperature versus accelerated degradation studies

As the discussion above already indicated, statement of a shelf life requires quantification of potential instability. This quantification is usually based on stability studies. Establishing shelf lives using stability data obtained by storing the materials at their real storage temperature requires considerable time. As usual stability studies last 2 years and longer, the use of "accelerated stability studies", in which the material is exposed to higher temperatures for short times and a degradation rate is often extrapolated to the storage temperature chosen using the Arrhenius equation, has therefore been proposed [19]. IRMM decided not to use this approach because it relies, in our view, on doubtful assumptions about reaction mechanisms and causes problems in the application if valid statistical concepts for the estimation of uncertainties should be used.

The basis of the accelerated stability testing is the assumption of a reaction mechanism that is valid for the whole range of temperatures tested and the temperature of extrapolation. Apart from the fact that virtually the only analytes for which the degradation mechanism is known with any certainty are radioactive isotopes, this also implicitly assumes that the degradation mechanism does not change with temperature and aggregate state. In the case of environmental and biological materials, for which oxidation, hydrolysis, auto-catalytic reactions, microbiological degradation etc. can occur simultaneously, such a strong assumption needs to be tested thoroughly for each material anew.

The second problem is the extrapolation of degradation rates to lower temperatures. Extrapolation of regression data beyond the tested conditions should generally be taken with utmost care. In analytical chemistry extrapolations of regression curves are not regarded as acceptable even for something as straightforward as linear calibration graphs in spectrophotometry. Hence, extrapolating regression lines for degradation rates seems inappropriate to us, if it has not been proven that the functional relationship is still the same.

The third problem with accelerated stability studies is the evaluation of the uncertainty. The result of an accelerated stability study is always a degradation rate. In the sense of the GUM, this degradation rate is a bias, and not an uncertainty. Although the GUM allows adding a bias to the uncertainty rather than correcting for it in very special cases, inclusion of the bias of course does not eliminate the need to include also the uncertainty of the degradation rate. As uncertainties of degradation rates are usually rather large, this is hardly ever done, thus resulting in not GUM-compliant uncertainties.

Potential pitfalls are highlighted by the case of BCR-601 (extractable trace elements in sediment) for which degradation at elevated temperatures was found and degradation seems to occur upon freezing. Extrapolation would therefore lead to the conclusion to store the material at as low temperature as possible. As freezing the material seems to change the extraction behaviour, following the conclusions from the accelerated stability study would have resulted in accelerated degradation of the material.

Isochronous studies as performed at IRMM only seemingly rely on the same approach as criticised here. The difference is that the isochronous study only requires that degradation at reference conditions is less severe than that at the testing condition, which is usually justified, as molecular movement, elementary reactions, diffusion etc. are usually slower at lower temperatures. It does not make any assumptions about degradation mechanism nor does it extrapolate to an untested condition.

Because of the reasons listed, we believe that accelerated stability studies can only yield technically sound results if the underlying assumptions are clearly investigated and confirmed. As we believe that investigation of the assumptions is a more tedious task than performing longer studies at storage temperature (after all, nothing needs to be done with the material for most of the time), only stability studies at the temperature of relevance are used at IRMM.

Stability monitoring

The aim of stability monitoring is the regular confirmation of stability to extend the shelf life during the lifetime of a CRM. New stability data obtained after the release of a CRM are then used to confirm the certified values and to expand the expiry date of the certificate if stability is confirmed. Although this concept seems to be rather easy on first glance, there are a number of problems to be solved like:

  • Consideration of stability monitoring in CRM-planning phase (sufficiently large batch size, etc.);

  • Dealing with differences in absolutes values;

  • Merging of stability studies into one uncertainty;

  • Stability of non-monitorable parameters.

Our experience shows that planning stability monitoring is necessary even at the beginning of the production to prevent further problems. The first aspect of the planning of stability monitoring is to reserve samples over the whole batch for stability monitoring, which allows one to monitor the stability of the CRM representatively. The second aspect is the setting up of the monitoring of the CRM as soon as possible. This increases the apparent duration of stability studies as the time from the processing to the certification can be used for stability testing. This "look into the future" can only be made if the monitoring was planned beforehand and can never be made up by efforts at later times. Another advantage by prospective planning is the prevention of bad surprises: one can see before the production of a CRM whether guaranteeing the stability would require too much resources (money, units of the material, time) to make even the production unfeasible. Another aspect to be taken into consideration is the batch size, as seen for example in the case of IRMM/IFCC-466 (glycated haemoglobin HbA1c) and IRMM/IFCC-467 (glycated haemoglobin HbA0), where only 100 units could be produced. This limited number of units naturally limits the number of units available for stability testing. Ultimately, the certified uncertainty may be limited by the monitoring efforts possible (a small number of units for stability testing usually limits the size of the stability study and increases the uncertainty of stability), or if the material was produced without foresight, it may not be possible to guarantee stability.

Because of the extended periods that are covered by stability investigations, day-to-day variations between measurements become important. In stability studies carried out before certification, this problem can be dealt with easily by using the isochronous set-up [20]. Several options for the post-certification monitoring exist. If the absolute values of samples analysed are compared, day-to-day fluctuations may hide or feign instability. To solve this problem, samples can be put to even safer storage conditions, ("reference samples"; frequently lower temperatures) and can be analysed together with temperatures from normal storage. In fact, this approach is an isochronous stability study with only one time-point. The most thorough and elegant option is to organise the post certification monitoring as a series of isochronous studies with increasingly longer times as implemented now for many materials at IRMM. For example, a cascade of isochronous studies lasting 4, 8 and 12 years can be planned. Each study will give better information about the stability status of the material than individual measurements alone. The disadvantage that information about stability only becomes available at the end of the study is overcome by intermittent testing with reference samples.

The same problem as with day-to-day fluctuations arises with laboratory-to-laboratory fluctuations. Variations of laboratory bias can feign or hide instability when different laboratories monitor the stability at different times, as is often the case. As it is the same problem, the same solutions arise: reference samples can be used to eliminate the influence of laboratory bias. The application of repeatability conditions furthermore removes a large part of the laboratory uncertainty, so that the uncertainty of the ratio is small compared to the uncertainty of the material.

Whenever at least two stability studies are performed, the problem of merging the various studies into one uncertainty arises. In principle, each stability study can be affected by an unknown bias. Use of the values of the studies as they are includes an additional uncertainty component and therefore results in an unrealistically pessimistic assessment of the stability of the CRM. To prevent this, some correction of at least one of the studies needs to be made with subsequent effects on the total uncertainty of the stability study. The merging of isochronous studies to obtain an estimation of uncertainty of stability will be described in a forthcoming paper [21].

With the move away from "traditional" certified parameters, completely new problems for stability monitoring arise. For some CRMs, stability cannot be monitored at all. Examples for this problem are the IRMM-CRMs for genetically modified (GM) organisms. These materials are certified for the mass fraction of GM-product (e.g. maize, soybeans etc.) in non-GM material of the same kind. Certification is possible, as the purity of GM and non-GM materials are checked and homogeneity is tested. The certificate is based on the masses of the GM and non-GM material. The problem for stability testing is that real-time polymerase chain reaction (PCR), the measurement method of choice, targets only very small parts of the DNA, which makes it unlikely that in the case of degradation of DNA exactly the small sequence part targeted is affected. Furthermore, the relative nature of this measurement technique makes it impossible to detect DNA degradation if the part of the genetically modified DNA sequence and the endogenous sequence targeted are degraded to the same extent but not completely. Real-time PCR is therefore unable to detect degradation of the material early enough. Instead, stability monitoring of the mass fraction powder CRMs is carried out using gel electrophoresis to visualise DNA degradation and fluorometry to quantify the total extractable DNA content. This example shows that stability testing using the method the customers will use may give little information about the stability of the materials.

Even taking all these precautions, some pitfalls may be hidden in the very models used. In the current models used by IRMM, homogeneity and stability are treated as independent. This ignores the possibility that stability depends on the homogeneity (e.g. of the antioxidant level) or that inhomogeneity might increase due to degradation. However, we have not yet seen any of these effects, which is not surprising giving the good homogeneity and low degradation of the materials found fit for sale. The assumption of independence between homogeneity and stability is therefore justified.

The IRMM stability testing program

As one way of meeting the "stability challenge", IRMM has set up a regular stability-monitoring program for those materials that might be subject to degradation. The first step was the evaluation of the "stability status" of the materials at IRMM. This took into consideration assumptions about stability (the total heavy metal content in soil was regarded as more stable than veterinary drugs in tissue) as well as the quality of the original stability testing (more thorough pre-certification stability testing decreases the efforts needed for monitoring). After this evaluation, it was decided to include 80 % of the non-nuclear and non-isotopic CRMs in the IRMM stability-testing program, whereas the remaining 20 % of CRMs do not require monitoring. Monitoring intervals were set-up for each CRM. CRMs are analysed in intervals from every 6 months to every 5 years depending on the analyte and the matrix. Preferably, testing is carried out in-house, but given the range of analytes and matrices, many analyses are performed by external collaborators. Apart from saving resources by not developing all methods in-house, higher analytical quality can ultimately be ensured if experienced co-operators perform the tests rather than performing an analysis in-house once every several years. In this way, potential stability problems can be detected early and the stability of the CRMs can be warranted. To organise co-operation, suitable laboratories are addressed via "calls for expression of interests". Applicants are evaluated to identify reliable and qualified laboratories, frequencies of stability tests have to be defined, testing schemes must be designed, dispatch needs to be arranged and last but not least the incoming results need to be evaluated and documented.

On average, 130 materials are monitored each year at a considerable expense of resources (time to manage the program, time to perform analyses in-house, money spent to pay for external collaborators). This expense may seem to make stability monitoring highly expensive and therefore prohibitive. This is, however, not the case. Costs for an ongoing certification of aflatoxins in three peanut materials are distributed as follows:

Cost

Percentage

Processing of the materials

37

Homogeneity and stability testing

23

Batch characterisation

20

Stability monitoring (yearly for 10 years)

20

This comparison clearly shows that stability monitoring itself contributes only to a limited extent to the total CRM production costs, thus refuting the main argument against regular stability monitoring.

Conclusions

The change of the nature of CRMs and the certified parameters, together with more stringent quasi-legal requirements (ISO-guidelines) increase the need for more precautions to ensure the stability of reference materials. At IRMM, this challenge is met by improving the processing of materials, other ways of storage and dispatch to prevent degradation combined with an intensive stability-testing program. This strategy allows statements of realistic shelf lives that can be guaranteed, as long as an uncertainty component for the verification process of the stability was included.