Introduction

In emissions monitoring, accredited measurements ensure the quality and comparability of reported emissions, enabling confidence in environmental compliance. At the very top of the measurement infrastructure are National Measurement Institutes—NMIs—(e.g. NPL in the UK, PTB in Germany, NIST in the USA) that realise the international system of units (SI—Système International de’Unités [1]) setting internationally comparable measurement scales for local test and calibration laboratories. This system is maintained via the CIPM (International Committee for Weights and Measures [2]) mutual recognition agreement [3], where NMIs are awarded calibration and measurement capabilities (CMCs) if: there is in place an appropriate and approved quality management system; the claimed capability is successfully demonstrated in scientific comparisons with other NMIs organised by CIPM Consultative Committees; international peer review (both intra-continental and then inter-continental) accepts the claimed capability [4].

Through the hierarchy of laboratories, NMIs disseminate measurement scales down to local test and calibration laboratories. The level of quality that is required of local laboratories is generally driven by the industry for which services are being provided. In the industrial emissions, sector test laboratories (sometimes referred to as stack testing organisations) are often required by European member state national regulators to be accredited in accordance with ISO/IEC 17025:2005 [5].

Accreditation to ISO/IEC 17025:2005 requires that, ‘The laboratory shall have quality control procedures for monitoring the validity of tests and calibrations…’, ‘This monitoring shall be planned and reviewed and may include … participation in interlaboratory comparison or proficiency-testing programmes’ [5]. However, in many countries participation in proficiency testing schemes (or at least some form of comparison) is mandatory. Regional accreditation associations (e.g. the European cooperation for Accreditation—EA [6]) that are signatories to the International Laboratory Accredited Cooperation, mutual recognition agreement (ILAC MRA [7]) commit their members to comply with the requirements of the ILAC MRA. The ILAC MRA requires that National Accreditation Bodies (NABs) are accredited by their peers (i.e. other NABs) to ISO 17011 [8]. One of the requirements of which being that the NAB ensures that the laboratories it accredits take part in, ‘proficiency testing or other comparison programmes, where available and appropriate’.

Hence, there is a consistent approach to ensuring and maintaining quality in that the principles of what applies to a NMI to achieve a CMC has parallels to what is required of a local test laboratory to achieve accreditation to ISO/IEC 17025:2005. There must be in place an appropriate and approved quality management system and participation in an appropriate PT scheme or similar comparison if such exist (of course, there are other more detailed requirements to achieve a CMC or ISO/IEC 17025 accreditation, but it is not necessary to discuss these here). It is appropriate that PT schemes are less onerous than CIPM Consultative Committee comparisons as the same levels of accuracy are not required and the associated cost to participants would often be difficult to justify. PT schemes tend to be operated on a pass/fail basis and unlike CIPM comparisons participants are not required to state a specific capability in advance that they are required to meet. Instead, participants are expected to meet a global performance requirement fit for purpose for the industry that the service is supporting. Also, to achieve a CMC, a NMI must submit procedures to Euramet (European Association of National Metrology Institutes) for review by Technical Committee Quality. Lastly, it is worth noting that whilst ISO/IEC 17025:2005 [5], as stated above, details that PT ‘may’ be one of the ways a laboratory demonstrates it is monitoring the validity of its tests and calibrations, an updated version of this key standard was published in 2017 [9]. In ISO/IEC 17025:2017 it stipulates that, ‘This monitoring shall be planned and reviewed and shall include, but not be limited to, either or both of the following: (a) participation in proficiency testing; (b) participation in interlaboratory comparisons other than proficiency testing’. Changing from ‘may include’ to ‘shall include’ means if an appropriate scheme exists that once the period of grace for ISO/IEC 17025:2017 implementation has expired that regardless of any other procedures a laboratory has in place to monitor the validity of its tests and calibrations, participation will be mandatory to achieving and maintaining ISO/IEC 17025 accreditation.

National regulators have the legal responsibility for enforcing emission limits stipulated in the Industrial Emissions Directive (IED) [10] (and other relevant legislation), which covers industrial processes with a thermal input > 50 MW (e.g. power stations, glass manufacture plants, waste incineration, steel works). Under mandate from the European Commission, the Comité Européen de Normalisation (CEN) has produced a series of standard reference methods (SRMs) providing measurement methods covering the regulated pollutants (e.g. NOx [11], total dust [12]). As defined by CEN/TC 264, ‘Air Quality’ a SRM is a, ‘Reference method prescribed by European or national legislation’ [13], hence, SRMs are passed into, or referred to, in member state legislation distinguishing them from voluntary ‘reference methods’ found commonly in other sectors (defined as a, ‘Measurement method taken as a reference by convention, which gives the accepted reference value of the measurand’ [13]). Due to their adoption/referral in legislation, SRMs set mandatory measurement standards with which all test laboratories across Europe carrying out emissions measurements must comply. These documentary standards help ensure the quality of test laboratories performing stack measurements: providing a test laboratory with a traceably certified artefact (e.g. gas cylinder) on its own does not ensure successful dissemination of the measurement scale, a written method is needed (e.g. a SRM) ensuring the valid use of the artefact and also the validity of the stack emissions measurements. This is important as ultimately such data are used in compiling national emission inventories that are reported into the European Pollutant Release and Transfer Register (E-PRTR [14]), a key mechanism by which the EU meets its commitment to monitor and reduce pollution as a signatory to the UNECE Convention on Long-range Transboundary Air Pollution (LRTAP [15]).

The IED and related regulations define emission limit values (ELVs). The SRMs usually define required measurement uncertainties, expressed as a relative percentage of the relevant ELV. Hence, the uncertainty requirement in terms of concentration derives from a combination of directive and SRM documents. In one or two exceptions where the SRM does not stipulate an uncertainty requirement, the default within the directive is used instead (there are uncertainty requirements within directives, but these are more generous and applicable to data reported by the process plant operator and beyond the scope of this paper). A consequence, by intentional design, of this documentary framework is that as emission limits become more stringent so do associated measurement uncertainty requirements. Hence, when a measurement method reaches the point of being unable to meet an uncertainty requirement, its ability to be used to enforce the associated emission limit is brought into question.

CO, NOx, TOC and dust pollutants are common emission species from many processes regulated under the IED (and its predecessor directives) and the SRMs to monitor these species have been in place since the early 2000s: CO—EN 15058:2006 [16], NOx—EN 14792:2005 [11], TOC—EN 12619:1999 [17], dust—EN 13284-1:2001 [12]. The SRMs described in the aforementioned EN documents provide procedures for the quality assurance and quality control of the techniques. The techniques can be based on portable instruments or what is conventionally referred to as a ‘manual method’. The CO, NOx and TOC SRMs are instrumental techniques based on NDIR, chemiluminescence and flame ionisation detection, respectively. The dust SRM is an example of a manual method as it involves physically collecting then ‘manually’ weighing the dust. All four SRMs describe sampling apparatus in terms of probes, heated lines, filters, pumps, materials that will not react with the sample, etc. to successfully extract the sample out of the stack and deliver it to the measurement technique. The current suite of SRMs were produced by CEN under mandate from the European Commission in support of predecessor directives to the IED, namely, the Waste Incineration Directive [18] and Large Combustion Plant Directive [19]. The CEN Technical Committee responsible for the production of standards, such as the aforementioned SRMs, in the emissions sector (CEN/TC 264 ‘Air Quality’) have themselves documented that work is needed on, ‘assessment of current SRM to meet stricter limit values’ [20]. The question is therefore being asked, are SRMs produced in support of the WID and LCPD able to enforce emission limits under the IED and under future regulation currently at the draft stage?

We report the analysis of combined UK and German stack simulator based proficiency testing data for CO, NOx, TOC and dust. Data from scheme participants (anonymised) are used as an indication of the capability of the respective SRMs for enforcing increasingly stringent emission limits. Regulatory context is added by reference to the uncertainty requirements necessary for enforcement of emission limits across three generations of legislation: the Waste Incineration Directive (WID) and Large Combustion Plant Directive (LCPD); the Industrial Emissions Directive (IED); the Waste Incineration BREF (WI BREF) and Large Combustion Plant BREF (LCP BREF), the former of which is at the draft stage whilst the latter has been published.

Methodology

The general design and specifications of the National Physical Laboratory (NPL) and Hessian Agency for Nature Conservation, Environment and Geology (HLNUG) stack simulator facilities based in the UK and Germany, respectively, can be found elsewhere [21,22,23]. Both NPL and HLNUG are accredited to ISO 17043 ‘Conformity Assessment—General Requirements for Proficiency Testing’ [24]. Participants in both schemes are tested for NOx and TOC using reference mixtures generated in the respective stack simulator. CO is only tested at NPL, hence, all CO data presented herein are derived from participants in the UK scheme. Water vapour is injected into the NPL simulator (10–15 vol %), however, this is not possible at HLNUG. Consequently, all UK data are reported under dry conditions to ensure comparability. In contrast, dust can be injected into the HLNUG facility whilst this is not possible at NPL. Therefore, the UK dust proficiency scheme tests only the analytical elements of the dust SRM (EN 13284-1), i.e. the HLNUG scheme tests sampling and analytical proficiency, whilst the NPL scheme tests only the latter. The NPL scheme does this via despatching foil shims and NaCl solutions to simulate collected matter on exposed filters and recovered matter from washings of the sampling apparatus. As an aside, it is worth mentioning that with increasing numbers of stack simulators appearing across Europe CEN/TC 264 have created working group 45 ‘Emissions—Proficiency testing schemes’ who are tasked with augmenting the requirements of ISO 17043 in terms of standardising PT schemes specifically based on stack simulators. It is hoped in the future that this will lead to greater comparability of data between such schemes.

The German scheme started in 2002 and is the longer running of the two, with the UK dust scheme starting in 2008 followed by the gases in 2010. Both schemes grade participant performance using variations of the commonly employed z-score [25]. However, for the work described here, a database from the schemes was compiled comprising of the participants’ reported values and the assigned values (reference values assigned by scheme provider). Data were compiled across the following concentration ranges under dry conditions and standard temperature and pressure; 0–75 mg m −30 CO, 0–200 mg m −30 NOx, 0–15 mg m −30 TOC, 0–15 mg m −30 total dust. The original z-scores were not used, firstly because slightly different z-score equations are used in the UK and German schemes and secondly, because across the years both providers on more than one occasion have altered the allowable deviations, i.e. the original z-scores were not comparable either between schemes or temporally within a scheme. Consequently, z-scores were recalculated using the strictest allowable deviation across all years and both schemes for each species, which in all cases was the most recent year (Table 1). z-score for the data reported here is defined as,

$$z = \frac{{x - {\text{AV}}}}{\sigma }$$
(1)

where x is the participant’s reported value, AV is the scheme operator’s assigned value and σ the allowable deviation (Table 1).

Table 1 Allowable deviation (σ) used for each species to recalculate z-scores to allow year to year comparison of performance

Typically, in proficiency testing allowable deviations are chosen such that statistically, assuming a Gaussian distribution, it is expected that 5 % of participants would score \(\left| z \right| \ge 2\) and 0.3 % of participants would score \(\left| z \right| \ge 3\). However, this was not done here since scoring participants was not the aim of the work. Instead the aim was to compare the performance possible with the standard reference methods in relation to legislative requirements. Hence, the exact value of allowable deviation used was not important, only that all deviations from the assigned values were normalised for a given species to the same value. Recalculated z-scores were visualised using box plots where the box spans the middle two quartiles of the distribution whilst the whiskers extend out to the farthest z-score within 25 % of the box extreme. Any z-scores beyond this limit were indicated by individual markers.

Results and discussion

The boxplot for CO (Fig. 1) shows negligible change in performance with time. For NOx (Fig. 2) whilst the distributions are narrower in some years than others, there is no overall trend and performance remains unchanged. With regard to TOC, in the first two years, a disproportionate number of scheme participants produce negative z-scores, i.e. under-read. After this, scores are more evenly distributed around zero, which is what is expected unless a very small number of participants are taking part, which is not the case here. It is worth reiterating that for TOC and NOx the UK data start in 2010, however, in both cases distributions appear similar before and after this year. This is perhaps unsurprising since in both schemes participants are required to follow the same SRM (EN 12619:1999 [17] and EN 14792:2005 [11]) and are all accredited by their respective national accreditation body against the same measurement uncertainty requirements. It should also be noted that in both schemes there are participants taking part from outside of the UK and Germany, so the data are not only attributable to these two nations. For total dust (Fig. 4), there is a disproportionate number of negative z-scores evidencing a systemic bias. However, no such bias is seen for either dust washings (Fig. 5) or dust shims (Fig. 6) implying that the issue may be due to extraction of sample from the stack itself. In terms of how distributions change with time, then in common with all other species performance appears unchanged.

Fig. 1
figure 1

CO z-scores from participants carrying out measurements in accordance with the CO standard reference method (EN 15058) across a 0–75 mg m −30 range of the UK stack simulator based proficiency scheme. Expanded uncertainties as required by the Industrial Emissions Directive (dashed lines) and draft Waste Incineration BREF (dotted lines). Outlier z-scores (asterisk), 17 further outliers not shown as the values are beyond the range of the y-axis

Fig. 2
figure 2

NOxz-scores from participants carrying out measurements in accordance with the NOx standard reference method (EN 14792) across a 0–200 mg m −30 range of UK and German stack simulator based proficiency schemes. Expanded uncertainties as required by the Industrial Emissions Directive (dashed lines) and draft Waste Incineration BREF (dotted lines). Outlier z-scores (asterisk), 29 further outliers not shown as the values are beyond the range of the y-axis

For the total dust results (Fig. 4) a disproportionate number of negative z-scores are seen, however, it is important to view this in the context of the required measurement uncertainty. The relative expanded uncertainty (herein, expanded uncertainty, unless stated otherwise, is defined as the combined uncertainty expressed at a confidence interval of 95 %, where the combined uncertainty is the combination of all the individual standard uncertainty sources) requirement of EN 13284-1:2017 is 20 % of emission limit value. As discussed under methodology, the data are all from concentration ranges applicable to waste incineration plants: waste incinerators being the conventional benchmark to judge performance. The Industrial Emissions Directive sets an ELV for total dust on waste incinerators of 10 mg m −30 , so with a required expanded uncertainty of 20 % and an allowable deviation of 0.2044 mg m −30 this equates to a z-score of 9.8 (Table 1). Hence, with almost all participant z-scores within the required expanded uncertainty, this negative bias is not necessarily an issue, or at least can be described as insufficient to result in non-compliance. Carrying out the analogous calculation for other species, (Figs. 1, 2, 3) show that in all other cases the vast majority, and in some case all, of the distribution lies within the SRMs’ uncertainty requirement linked to IED ELVs. Hence, for the process types considered here, there appear no issues in enforcing IED ELVs using the existing measurement infrastructure.

Fig. 3
figure 3

TOC z-scores from participants carrying out measurements in accordance with the TOC standard reference method (EN 12619) across a 0–15 mg m −30 range of UK and German stack simulator based proficiency schemes. Expanded uncertainties as required by the Industrial Emissions Directive (dashed lines) and draft Waste Incineration BREF (dotted lines). Outlier z-scores (asterisk), 11 further outliers not shown as the values are beyond the range of the y-axis

The dust shims and dust washings schemes are testing specific elements of EN 13284-1:2001, namely proficiency for weight measurement of exposed filters (dust shims) and the proficiency to recover matter from probe washings and carry out a weight measurement (dust washings). EN 13284-1:2001 stipulates limits for various error sources known to contribute to the combined uncertainty. With respect to weighing EN 13284-1:2001 [12] in clause, 10.6 stipulates a specific weighing standard uncertainty requirement of 5 % of ELV. EN 13284-1 was updated in 2017 (EN 13284-1:2017 [26]) and the weighing uncertainty requirement was removed. However, the data examined here are all from before 2017 and so the requirements of EN 13284-1:2001 are applicable.

Taking the product of the required standard uncertainty and k = 2 gives a required weighing uncertainty of 10 % expressed at a confidence interval of 95 %. Participants in the dust shims and dust washings schemes are not actually sampling any gas, however, for the purposes of discussion the requirements are divided by a volume of 1 m3 as at some processes it is possible to have extracted this amount of gas during a measurement. Doing this and carrying out the same calculation as for total dust equates to a z-score of 10 for dust shims and 1.1 for dust washings (Table 1). It is seen that performance for the former is well within the weighing requirement (Fig. 6) whilst this is not the case for the latter (Fig. 5).

There are various rationale that may explain the poor performance. One possibility is that the laboratory temperature and humidity are varying during repeat weighing. Dried samples are repeat weighed and the weight extrapolated to time zero to remove the effect of water vapour absorption from the laboratory atmosphere during measurement. The error on this extrapolation increases the less fixed the temperature and humidity remain during the repeats. That being said, the SRM does require laboratory temperature and humidity to be monitored by the user, but crucially, it does not stipulate how much variation is considered too much and when the measurements should be postponed, this is left to the user. Equally, prior to any weighing, bias may have already occurred as there have been examples in the past of malfunctioning oven thermostats and even users’ setting the incorrect temperature. If the temperature is too low, the samples will not be fully dried and if too high sample loss can occur out of the receptacles due to ‘spitting’.

Overall, the distributions for CO, NOx, TOC and total dust (Figs. 1, 2, 3, 4) appear consistent with a measurement capability able to comply with the uncertainty requirements of the SRMs linked to IED emission limits. The exception is dust washings (Fig. 5), however, since 2017 the specific weighing uncertainty requirement has been removed from EN 13284-1 and the total dust data (Fig. 4) evidence distributions within the expanded uncertainty requirement for the entire method. i.e. the data are consistent with a scenario where the other uncertainty sources are sufficiently small that when combined with the relatively high dust washings uncertainty the overall method is still compliant.

Fig. 4
figure 4

Total dust z-scores from participants carrying out measurements in accordance with the total dust standard reference method (EN 13284-1) across a 0–15 mg m −30 range of the German stack simulator based proficiency scheme. Expanded uncertainties as required by the Industrial Emissions Directive (dashed lines) and draft Waste Incineration BREF (dotted lines). Outlier z-scores (asterisk), 7 further outliers not shown as the values are beyond the range of the y-axis

Fig. 5
figure 5

Dust washings z-scores from participants carrying out recovery followed by weight measurements in accordance with the total dust standard reference method (EN 13284-1:2001) for the UK scheme based on NaCl solutions to simulate recovery of matter from probe washings. Expanded weighing uncertainties as required by the Industrial Emissions Directive (dashed lines) and draft Waste Incineration BREF (dotted lines). Outlier z-scores (asterisk), 11 further outliers not shown as the values are beyond the range of the y-axis

The Industrial Emissions Directive was published in November 2010 and was transposed by all member states into national legislation by 2013. Prior to this, the sector was regulated by seven separate directives including the Waste Incineration Directive [18] and the Large Combustion Plant Directive [19]. These originally came into force in 2000 and 2001, respectively, and so the majority of the data discussed here were originally acquired in an era regulated by ELVs from these two directives.

In 2013, the IED brought in stricter limits for large combustion processes whilst leaving those for waste incinerators unchanged (Table 2). However, in contrast to its predecessors, the IED made a significant change under Article 13(1) in terms of adopting Best Available Technique (BAT) Reference documents (BREFs). Under the framework of Article 13(1), member states exchange experience as part of drafting BREFs. Once completed an associated document termed BAT conclusions is produced in accordance with Article 13(5) listing BAT-AELs (associated emission levels). Under Article 14(3), BAT conclusions are referenced by the national regulator for setting permit conditions for plants regulated under the IED. Or more specifically, national regulators are required to stipulate emission limits in site permits in accordance with BAT-AELs.

Table 2 How each generation of legislation has set increasingly stringent emission limits for key pollutants

There are in the region of 30 BREFs at various stages of production covering various sub-sectors including the Waste Incineration BREF [27] and Large Combustion Plant BREF [28]. The former is at the first draft stage whilst the later has been published. However, whilst yet to be finalised, what is clear is that emission limits are set to continue to become increasingly stringent, and hence so will the associated uncertainty requirements. Taking the most stringent limits from Table 2 and repeating the same analysis as for the IED—by taking the product with the required uncertainties and normalising to the allowable deviations (Table 1)—gives limit values and associated z-score thresholds of CO 10 mg m −30 (0.6), NOx 50 mg m −30 (3.3), TOC 3 mg m −30 (2.6), total dust 2 mg m −30 (2.0), dust shims 2 mg m −30 (1.0), dust washings 2 mg m −30 (0.1). In terms of future requirements, it can be seen that with respect to CO (Fig. 1), total dust (Fig. 4) and dust washings (Fig. 5) that the measurement capability is insufficient with a significant majority of the distribution beyond the respective thresholds. With respect to NOx, TOC and dust shims (Figs. 2, 3, 6) the central interquartile ranges are generally within the threshold, but this leaves around half the distribution outside. It might be expected that the magnitude of the distributions decreases as concentration decreases. Analysing the data at a series of isolated concentration levels evidences a marginal decrease in distribution as a function of concentration for CO, NOx and total dust (data not shown). However, these decreases are insufficient to alter any of the observations made above.

Fig. 6
figure 6

Dust shims z-scores from participants carrying out weight measurements in accordance with the total dust standard reference method (EN 13284-1:2001) for the UK scheme based on weight measurement of foil shims to simulate filter weighing. Expanded uncertainties as required by the Industrial Emissions Directive (dashed lines) and draft Waste Incineration BREF (dotted lines). Outlier z-scores (asterisk), 2 further outliers not shown as the values are beyond the range of the y-axis

Examining the distributions in relation to these thresholds might be argued as unfair. The techniques for CO, NOx and TOC for which the associated SRMs provide documentary measurement method are all portable instruments (or more correctly termed, portable automated measuring systems (P-AMS [29])). Such portable instruments were designed by manufacturers to comply with the uncertainty requirements of the IED, or even the WID and LCPD that predate it. So whilst it is reasonable to expect performance commensurate with IED requirements, it is not reasonable to expect performance to meet requirements not yet in force under BREF documents.

In contrast, dust is measured by a ‘manual method’ and is not based on portable instruments. Whilst this has disadvantages, such as not providing real-time data, an important advantage is that in principle sensitivity can be easily improved, by increasing the run time. In absolute units, both the uncertainties of the balance used to weigh the collected matter and of the gas volume metre used to measure the extracted volume are expected to remain fixed. Hence, if the run time is doubled, the relative uncertainties halve. Whilst there are some other uncertainty sources that remain unaffected (e.g. temperature measurement), mass and volume are significant enough contributors to the uncertainty budget that a marked effect should be seen on the combined uncertainty associated with the overall measurement method. However, whilst in principle, this is a perfectly valid way to improve the uncertainty of the method, it is also fair to say that the monitoring community is not keen on using excessively time-consuming methods. So at some stage, a tipping point is likely to be reached where the SRM is no longer considered as fit for purpose.

Across all species, the data demonstrate that to enforce future legislative requirements, there is a need for new measurement science to provide tools for test laboratories providing measurement services, process plant operators for national emission inventory reporting and national regulators to discharge their legal responsibility to enforce emission limits bringing the environmental and health benefits that EU directives/BREFs are targeting.

Conclusions

Combined UK and German proficiency testing data between 2002 and 2015 were examined with respect to the standard reference methods for CO (EN 15058), NOx (EN 14792), TOC (EN 12619) and total dust (EN 13284-1) in order to gauge the current capability of the emissions monitoring measurement infrastructure. It was found that these SRMs in general provided measurement capability that enabled enforcement of emission limits under the now withdrawn legislation for which they were originally designed, i.e. the Waste Incineration Directive and Large Combustion Plant Directive; that emission limits from the superseding directive in 2013 (the Industrial Emissions Directive) could in principle be enforced; but that in their current form the SRMs would struggle to enforce some of the emission limits proposed in the draft Waste Incineration BREF and Large Combustion Plant BREF. Hence, across the three generations of increasingly stringent legislation this evidenced that enforcement of future emission limits will not be possible without work to improve the existing measurement infrastructure either through provision of new measurement method described in EN standards published by CEN or the provision of improved/new techniques on which to base measurement method, or a combination of both. Without this test laboratories will find it difficult to provide plant operators with the quality of data needed for national emission inventory reporting, and national regulators will not be able to fully discharge their legal responsibility to enforce emission limits bringing the health and environmental benefits the EU directives/BREFs are targeting.