Introduction

Comparability of measurement/analytical results is the realization of their property “traceability” [1], and a conclusion on the equivalence of the results (“tested once, accepted everywhere” [2]) can be made only when the results are comparable. Distributions of analytical results and hypotheses necessary for development of their comparability criteria, as well as comparability of the results obtained in proficiency testing (PT) based on the metrological approach are discussed recently in ref. [35]. The approach implies the use of samples of a reference material (RM) with traceable property values as test items sent to the laboratories participating in PT. Such RM is a working measurement standard to which PT results are traceable. Assuming normal distribution, comparability (equivalence of the PT results) is assessed by the bias of the mean of PT results C PT from the assigned/certified value C cert of the RM property, taking into account standard uncertainty σ cert of this value and standard deviation σ PT of the PT results. However, when traceability of the RM property value C cert prepared for PT is questionable, comparability of the PT results cannot be assessed in the meaning “tested once, accepted everywhere”. In such cases, especially when number N of the participants is limited, a local comparability, i.e. among the participants only, is assessed [3].

The main problem of RM developing for fresh concrete testing is its inhomogeneity and instability. To overcome this problem, every participant in the CCRL Concrete Proficiency Sample Program, overseen by ASTM [6], is provided with a sample consisting of the concrete RM components. The participant mixes these components at its laboratory, i.e. prepares the concrete (sample) independently. It allows to start the test immediately after preparing the concrete and to eliminate material instability as a source of uncertainty of the test results. However, the situation with RM inhomogeneity is more complicated. In spite of homogenizing the sample components at the Cement and Concrete Reference Laboratory (CCRL at NIST, USA) and supplying the mixing instructions to the program participants, inhomogeneity of the concrete prepared in different laboratories influences the test results distribution. Deviations of the participant results from their average/consensus value C PT/avC cert in comparison to the standard deviation S PT (based on Youden plot analysis [7]) are used for evaluation of the participant proficiency. Since in general participants in the program are situated far both each from other and from CCRL (in different states and even on different continents) and their number N is big enough (N>100), the program is probably optimal and can indicate broad comparability if uncertainties and traceability of the test results and of the RM properties are stated.

The purpose of the present publication is to develop a PT scheme for comparability assessment of results of concrete testing in a limited region, like Israel. Such a scheme is intended for the local laboratory accreditation body (ISRAC in Israel) to control the performance of not numerous accredited laboratories (N<30) that test concretes for the local building industry. Slump and compressive strength are chosen in the scheme as the test parameters of fresh and hardened concrete, practically the most required by the customers.

Experimental

Test/measurement methods

For slump determination by standard [8] a slump-cone (of tin plate, upper and bottom diameters are of 100 and 200 mm, respectively, height is of 300 mm) is filled with the concrete to be tested and is compacted completely to the upper ring of the cone manually by means of a steel rod. Then the slump-cone is being carefully lifted, also manually, during 5–10 s. The height of the remaining concrete cone is measured with a ruler and the slump determination result is calculated as the difference in the heights of the slump- and concrete cones, mm.

The compressive strength is measured by standard [9] as a pressure, MPa, applied by a special testing machine to 100 mm hardened concrete test cube in order to destroy it. To prepare the test cubes the standard requires to fill the corresponding steel forms with the concrete by hand using a steel rod or by means of a vibrating table. Afterwards, the test cubes should be stored 7 days under controlled conditions (air temperature of 21±3 °C and humidity of more than 95%) and then 21 days under standard laboratory conditions for hardening. On the 28th day the test cubes should be destroyed. In many cases customers are also interested in compressive strength determination on the 7th day of the test cube hardening.

Design of experiment

The Research Unit of the Department of Building Units and Materials at ISOTOP Ltd. served as a Reference Laboratory (RL) in the experiment overseen by INPL (the National Metrology Institute).

Composition of the in-house reference material (IHRM) [10, 11] developed as a local working measurement standard for the PT and corresponding to a fresh concrete of type B30 by standard [12] is shown in Table 1. The aggregates were thoroughly washed with water before the experiment, dried till constant weight at 105±5 °C, sieved and homogenized. The sea sand was also dried till constant weight at 105±5 °C, sieved (the fraction smaller than 0.65 mm was used) and homogenized. The components were stored in RL at air humidity of 45–60%.

Table 1 Composition of the concrete IHRM produced for the PT (calculated for a sample of 35 L)

The concrete for every PT participant (IHRM sample of 35 L) was produced by RL using the same Pan Mixer of 55 L, company “Controls”, Italy, in the same conditions. Every participant had a possibility to start testing its sample from the moment when the concrete preparation was finished, like in the CCRL Program. However, the material homogeneity in this scheme is probably higher than in CCRL Program (since all the samples are prepared identically) and can be evaluated. Therefore, the concrete stability was not taken into account, while inhomogeneity parameters were studied during the experiment and included in the uncertainty budget of the assigned values of the IHRM properties.

Twenty five participants took part in the experiment (N=25). Twenty nine samples were prepared by RL during two weeks in September 2005 before a season of rains in Israel influences the air humidity. RL tested the 1st, 12th, 23rd and the 29th (last) samples for the material inhomogeneity study and characterization. Other samples were tested by the PT participants according to the schedule preliminary prepared and announced.

Table 2 RL results of the IHRM study in slump units

The slump duplicate determinations were performed at RL by representatives of every participant using their own facilities and standard operating procedures (SOP) corresponding to standard [8]. Immediately after the slump determination, twelve 100 mm test cubes for compressive strength determinations were prepared by representatives of every participant also using their own facilities and SOP corresponding to standard [9]. On the next day after preparing the hardened cubes were transferred from RL to the laboratory of the participant, where compressive strength determinations were performed both on the 7th day and on the 28th day after the sample preparation (every one of 6 replicates).

The data obtained were sent by the participants to ISRAC and re-sent from ISRAC to INPL in an anonymous form. For this reason, the participants are named in the paper by numbers j=1, 2, …, N (25).

The results of the compressive strength determinations on the 7th day and some additional information (the mass of the cubes, correlation between the mass and the strength results on the 7th and on the 28th days, etc.) are not discussed further for short.

Results and discussion

Slump determination

Homogeneity estimation, assigned value and its uncertainty

The RL results of slump determination are presented in Table 2, where rows represent samples; X 1 and X 2 are the duplicate values; \(R = |X_1 - X_2 |\) is the range; L = 2.77u mRL is the limit of the range at the level of confidence of 95%, u mRL=3 mm is the standard measurement/test uncertainty declared by RL; and X avg=(X 1+X 2)/2 is the average result of the slump determinations for a sample. There is also a standard ANOVA (analysis of variances) output including between-sample and intra-sample variances \(S_{{\rm bsi}}^2\) and \(S_{{\rm isi}}^2\) of the material inhomogeneity shown as MS of Rows and of Columns; F(Rows) is the Fisher's ratio characterizing between-sample inhomogeneity; F(Columns) is the Fisher's ratio characterizing intra-sample inhomogeneity; and F crit is the F critical value at the level of confidence of 95%.

A range greater than the 95% limit is not observed. The F values are less than the critical ones at the level of confidence of 95%. Therefore, the four samples are homogeneous in slump units. Since there are samples taken at the beginning, the middle and the end of the experiment, all the material produced for the experiment is also assessed as homogeneous in these units.

The assigned/certified slump value is calculated as the RL average result: \(C_{{\rm cert}} = \sum\nolimits_{n = 1}^4 {(X_{{\rm avg}} )_n /4} {\rm\; = 111}{\rm .9}\) mm, where n is the number of the samples tested by RL. The standard uncertainty of the assigned value is \(u = [ { {u_{{\rm mRL}}^2 + S_{{\rm bsi}}^2 + S_{{\rm isi}}^2 } /2} ]^{1/2} = 5.5\) mm.

Analysis of the results obtained by the PT participants

The participant results of slump determination are presented in Table 3, where u mLP is the standard measurement/test uncertainty declared by the participant; L=2.77 u mLP is the limit of the range of the participant duplicates at the level of confidence of 0.95; B=X avgC cert is the bias of the average participant result from the assigned/certified value; u comb=(u mLP 2+u 2)1/2 is the combined standard test uncertainty; and ζ=B/u comb is the zeta score.

Table 3 Results of slump determination obtained by the PT participants

The ranges of duplicates of laboratories No. 1, 2, 15 and 16 (shown by bold in Table 3) are greater than their 95% limit. It means that either the slump determinations in these cases were performed not completely by the corresponding laboratory SOP, or the measurement/test uncertainty u mLP declared in the SOP is evaluated not adequately enough (lower than in reality). The unsatisfactory score values \(|\zeta| > 3\) of laboratories No. 3–5, 11, 13–17, as well as the questionable score value \(|\zeta| > 2\) of laboratory No. 21 are italics in Table 3. The problem is that measurements of heights of the slump- and concrete cones, i.e. the length measurements (with a ruler having subdivisions of 1 mm) are not the limiting/dominant stage of the test. Manual filling the slump-cone with concrete using a steel rod, as well as the cone careful manual lifting during 5–10 s are a kind of art. Naturally, technicians perform these operations not equally (depending on their experience, temperament, etc.), especially in different laboratories. However, Table 3 contains the score values individual for every participant, while assessment of the results comparability (as a group) can be helpful.

Fig. 1
figure 1

Histogram and fitted distribution of slump determination results. C cert is the IHRM assigned/certified slump value, and C PT/avg is the average slump value obtained by the PT participants. The pointer shows the average slump result obtained by laboratory (participant) No. 3

Comparability assessment

The hypothesis about normal distribution of 58 slump determination results X (4×2=8 RL results and 25×2=50 participant results) is not rejected according to the Cramer von Mises criterion: the empirical value ω 2 = 0.88 is less than the critical value of 2.50 at the level of confidence of 95%. The X histogram and the fitted normal distribution are presented in Fig. 1. In this case the distribution of X avg values is also normal by the central limit theorem [4]. The total average result of the PT participants \(C_{{\rm PT/avg}} = \sum\nolimits_{j = 1}^N {(X_{{\rm avg}} )_j /25} {\rm \; = 118}{\rm .6}\) mm is shown in Fig. 1 by a dotted line. The standard deviation of X avg values from C PT/avg is S PT=21.5 mm. The assigned/certified value C cert=111.9 mm is also shown in Fig. 1 by another dotted line.

Table 4 RL results of the IHRM study in units of compressive strength

Values u and S PT are statistical sample estimates of population values σ cert and σ PT allowing to assume the ratio γ=σ cert/σ PT=0.5 for the slump determinations. Since \(|C_{{\rm cert}} - C_{{\rm PT/avg}} |/S_{{\rm PT}} = 0.31 < 0.40\), one can say that comparability of the results of the 25 PT participants is satisfactory at the level of confidence of 95% [3]. It means that customers can use results of the PT participants as comparable ones. Even the outlying average result of laboratory No. 3 shown in Fig. 1 by a pointer is within the distribution framework. This conclusion contradicts to a number of unsuccessful score values in Table 3. It probably indicates necessity of further improving uncertainty evaluation for correct interpretation of slump determination results.

Cone height/length measurements are traceable to the known measurement standards and the SI unit, while other important test stages (the slump-cone filling and lifting) are not traceable. It is similar to analyte extraction in chemical analytical methods, where recovery can dramatically influence the analysis results, even if the measurement/detection of amount of substance at the final stages of the analysis is precise and traceable. Internationally accepted RMs allow to overcome this problem in chemistry. However, the concrete IHRM developed for our experiment is by definition unstable and cannot be used elsewhere. Therefore, the comparability discussed here is of local relevance only.

Compressive strength determination

Homogeneity estimation, assigned value and its uncertainty

The RL results of strength determination on the 28th day obtained using a vibrating table for the test cube preparation, are presented in Table 4. The rows in the table represent samples; X 1, X 2, ..., X 6 are the replicate values; R=X maxX min is the range, X max and X min are the maximal and minimal of the six replicate values; L=4.03 u mRL is the range limit at the level of confidence of 95%, u mRL=1 MPa is the standard measurement/test uncertainty declared by RL; and \(X_{{\rm avg}} = \sum\nolimits_{i = 1}^6 {X_i /6}\) is the average result of the strength determinations for a sample. A standard ANOVA output is also attached, like in Table 2. The intra-sample inhomogeneity in strength units is statistically insignificant. However, the Fisher's ratio characterizing the between-sample inhomogeneity F(Rows) is greater than the critical value F crit at the level of confidence 95%, i.e. the IHRM between-sample inhomogeneity in strength units is statistically significant. To evaluate the inhomogeneity, the value F(Rows) is recalculated as ratio of the between-sample variance S bsi 2 (shown as MS (Rows) in the output) to the measurement/test uncertainty variance u mRL 2: F=2.344/1=2.344 < F crit. Therefore, the inhomogeneity is not significant in comparison to the RL measurement/test uncertainty, and the IHRM can be used for the experiment.

Thus, both intra- and between-sample inhomogeneity components are taken into account, like in the slump study. Hence, the assigned/certified strength value is \(C_{{\rm cert}} = \sum\nolimits_{n = 1}^4 {(X_{{\rm avg}} )_n /4{\rm }}\)= 32.0 MPa, and its standard uncertainty is\(\; u = [ { {u_{{\rm mRL}}^2 + S_{{\rm bsi}}^2 + S_{{\rm isi}}^2 } /6} ]^{1/2} = 1.9\) MPa.

Table 5 Results of compressive strength determination obtained by the PT participants

Analysis of the results obtained by the PT participants

The participant results of strength determination obtained on the 28th day are presented in Table 5. All the results, both with hand test cube preparation (“hand” in Table 5) and using a vibrating table (“vibr”) are satisfactory concerning their ranges and ζ-score values. The only questionable score value is \(2 < |\zeta = - 2.26| < 3\) obtained by laboratory No. 16 (italics in Table 5). Probably it is a random deviation: five laboratories out of 100, i.e. one out of 20 laboratories, can have \(|\zeta| > 2\) at the level of confidence of 95%.

Fig. 2
figure 2

Histogram and fitted distribution of compressive strength determination results. C cert is the IHRM assigned/certified strength value, and C PT/avg is the average strength value obtained by the PT participants. The pointer shows the average slump result obtained by laboratory (participant) No. 16

Comparability assessment

The hypothesis about normal distribution of 174 strength determination results X (4×6=24 RL results and 25×6=150 participant results) is not rejected according to the Cramer von Mises criterion: the empirical value ω 2 = 1.53 is less than the critical value of 2.50 at the level of confidence of 95%. The X histogram and the fitted normal distribution are presented in Fig. 2. The total average result of the participants \(C_{{\rm PT/avg}} = \sum\nolimits_{j = 1}^N {(X_{{\rm avg}} )_j /25 = {\rm 30}{\rm .2}}\) MPa is shown in Fig. 2 by a dotted line. The standard deviation of X avg from C PT/avg is S PT=1.9 MPa. The assigned/certified value C cert=32.0 MPa is shown in Fig. 2 by another dotted line, like in Fig. 1. The average result obtained by laboratory No.16 having a questionable score value is shown in Fig. 2 by a pointer.

Calibrated testing machines used in RL and in the laboratories participated in PT allow to measure pressure with standard uncertainty of less than 2%. Therefore, the values u and S PT are equal and the assumption γ=1 is reasonable here. Since \(|C_{{\rm cert}} - C_{{\rm PT/avg}} |/S_{{\rm PT}} = 0.95 < 1.04\), comparability of the results of the 25 participants can be assessed as satisfactory at the level of confidence of 95% [3]. Thus, the individual score and the comparability (group) assessment coincide for strength determinations.

The pressure measurement applied to a test cube in order to destroy it is traceable to the corresponding international measurement standards. The stages of test cube preparation and hardening during 28 days can be performed in conditions controlled by using traceable measurements. They are less depending on the technician's art than in slump determination. Therefore, traceability of the IHRM assigned strength value to international measurement standards can be achieved theoretically. However, again, the concrete IHRM is intended for the single use only. Thus, the comparability discussed for strength determinations is also of local relevance.

Conclusions

  1. 1.

    A PT scheme is developed for comparability assessment of results of concrete slump and compressive strength determination. The scheme is based on preparing of a concrete IHRM test portion/sample at a reference laboratory in the same conditions for every PT participant, and on using of the IHRM as a local working measurement standard.

  2. 2.

    The IHRM produced and studied for the PT in Israel is found homogeneous in both slump and strength units on the level of measurement/test uncertainties declared by the reference laboratory.

  3. 3.

    Since traceability of the concrete IHRM assigned values to international measurement standards and SI units cannot be stated, local comparability of the results is assessed. It is shown, that comparability of the slump and compressive strength determination results is satisfactory, while the uncertainty evaluation for slump results requires additional efforts.