Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The question of whether to reject a measurement arises very often. The question is then whether the observed large deviation of a measurement from the mean of the series of measurements to which it belongs is due to expected and acceptable random errors, or is the result of a mistake during measurement. It must be borne in mind that many researchers are of the opinion that no measurement should be rejected, as a matter of principle, because this would alter the results based on subjective criteria. Nevertheless, the question frequently arises and we will present here the criteria by which a measurement, even if not finally rejected, is placed on our list as the result of some unusual and unknown sequence of events.

10.1 The Problem of the Rejection of Measurements

Assume that we have Ν results \( x_{i} \) (\( i = 1,2, \ldots ,N \)) of measurements of the magnitude \( {\mathbf{x}} \). These results have a mean value \( \bar{x} \) and sample standard deviation \( s_{x} \). If the distribution of the errors in x is Gaussian, the results will be distributed about their mean with this standard deviation. If we had a large number of measurements, we would expect some values to differ by a large amount from the mean.

The question that arises is: given the number Ν of measurements, by how much must a result differ from the mean for us to conclude that the difference is improbable to be due to random errors, but is rather the result of a mistake during the experimental procedure and, therefore, this measurement must be rejected as unacceptable?

Figure 10.1 shows the results of 13 measurements of the magnitude \( {\mathbf{x}} \), in the order in which they were obtained. The results are also given in Table 10.1. We find that \( \sum {x_{i} = 655} \) and, therefore, the mean of the measurements is \( \bar{x} = 50.4 \). The mean value is marked in Fig. 10.1 by a horizontal (full) straight line at \( x = 50.4 \).

Fig. 10.1
figure 1

A series of 13 measurements, among which there is one considered for rejection (Α)

Table 10.1 The results of 13 measurements, one of which is a candidate for rejection

From the sum \( \sum {(x_{i} - \bar{x})^{2} } = 51.08 \), the standard deviation of the measurements is calculated to be \( s_{x} = \sqrt {51.08/13} = 2.0 \).

The 9th measurement differs from the mean by \( d_{9} = 55-50.4 = 4.6 \). The difference is equal to \( 4.6/2.0 = 2.3 \) standard deviations.

Assuming that the distribution of the measurements about the mean is normal, we may calculate the probability for a measurement to differ from the mean by at least 2.3 standard deviations either in the positive or the negative direction. From Table 4.2, we find that this probability is equal to \( 1-2 \times 0.4893 = 0.021 \) or 2.1%. We expect that, due to random errors, one measurement in 48 will differ from the mean by more than 2.3 standard deviations. In our measurements we find one such measurement in only 13. The expected number would be \( 0.021 \times 13 = 0.27 \). The observed number, 1, is almost 4 times the expected. We therefore conclude that, most probably, this measurement differs from the mean by as much as it does not because of random errors but due to other causes and we reject it.

Having rejected the 9th measurement, we recalculate the mean and the standard deviation of the remaining 12 values. We find \( \bar{x}^{\prime} = 600/12 = 50 \). This new mean value is also marked in Fig. 10.1 with a horizontal dashed line. The sum \( \sum {(x_{i} - \bar{x}^{\prime})^{2} } = 28 \), gives a standard deviation of the 12 measurements \( s^{\prime}_{x} = \sqrt {28/12} = 1.5 \).

Among the remaining 12 measurements, the value with initial order number 12 is now the candidate for rejection. This measurement differs from the new mean by \( \left| {\,47 - 50\,} \right|/1.5 = 2 \) new standard deviations. From Table 4.2 we find that the probability of finding a measurement which differs from the mean by at least 2 standard deviations, either in the positive or the negative direction, is equal to \( 1-2 \times 0.4773 = 0.045 \) or 4.5%. The expected number of such results in our 12 measurements is 0.54. Obviously this measurement must not be rejected, as the ratio of the observed to the expected number is near unity.

There seems to be a need for a criterion which, although not entirely objective, would at least be commonly agreed to, thus reducing to a certain degree the subjective factor. One such criterion we will examine below.

10.2 Chauvenet’s Criterion

Chauvenet proposed the following criterion for the rejection of measurements:

  • A measurement belonging to a group of Ν measurements is rejected if its difference from the mean of the measurements is such that the probability of observation of such a difference or greater is less than \( 1/(2N) \).

In other words, Chauvenet’s criterion rejects a measurement if the expected number of such measurements with a differences from the mean equal to or larger than its deviation is less than ½. Obviously, the number ½ is arbitrary and this is one of the objections against this criterion or any other criterion.

For use with Chauvenet’s criterion, Table 10.2 gives the probability that the absolute difference of a value of \( x \) from the mean \( \bar{x} \) is equal to or greater than ν times the standard deviation \( s_{x} \) of the measurements, as a function of ν. The difference of \( x \) from the mean \( \bar{x} \) is expressed, in units of \( s_{x} \), as \( \left| {\,x - \bar{x}\,} \right|/s_{x} \) and the probability for this difference to be equal to or greater than ν is denoted by \( \Pr \left\{ {\frac{{\left| {\,x - \bar{x}\,} \right|}}{{s_{x} }} \ge \nu } \right\} \).

Table 10.2 Probability for the absolute difference of a value of \( x \) from the mean \( \bar{x} \) being equal to or greater than ν times the standard deviation of the measurements, \( s_{x} \)

According to Eq. (4.65), the probability for a value of \( x \) to differ from the mean \( \bar{x} \) by more than \( \nu \) times the standard deviation \( s_{x} \), is given by the relation

$$ \Pr \{ x \le \bar{x} - \nu s_{x} \;{\text{or}}\;x \ge \bar{x} + \nu s_{x} \} = 1 - {\text{erf }}\left( {\frac{\nu }{\sqrt 2 }} \right) \equiv {\text{erfc }}\left( {\frac{\nu }{\sqrt 2 }} \right) = 1-2\, \varPhi (\nu ). $$
(10.1)

We wish to find the limit for rejection of values, \( \nu_{\text{C}} \equiv \frac{{\left| {x - \bar{x}} \right|}}{{s_{x} }} \), according to Chauvenet’s criterion. From Eq. (10.1) we have \( \Pr \{ x \le \bar{x} - \nu_{\text{C}} s_{x} \) or \( x \ge \bar{x} + \nu_{\text{C}} s_{x} \} = \frac{1}{2N} \) or

$$ {\text{erf }}\left( {\frac{{\nu_{\text{C}} }}{\sqrt 2 }} \right) = 1 - \frac{1}{2 N} \quad {\text{and}} \quad {\Phi} (\nu_{\text{C}} ) = \frac{1}{2} - \frac{1}{4N} $$
(10.2)

from which, for a given Ν, we may find the corresponding value of \( \nu_{\text{C}} \).

Given in the Table 10.3, for various values of the number Ν of the measurements, is the limit for rejection, \( \nu_{\text{C}} \equiv \left| {x - \bar{x}} \right|/s_{x} \), according to Chauvenet’s criterion. A value, out of a total of Ν values, is rejected if the absolute value of its difference from the sample mean, \( \left| {\,x - \bar{x}\,} \right| \), is larger than \( \nu_{\text{C}} s_{x} \), where the value of \( \nu_{\text{C}} \) is found from the table for the corresponding Ν. For example, one measurement in a series of 10 measurements is rejected if it differs from the sample mean by more than 1.96 \( s_{x} \).

Table 10.3 The rejection limit of a value, \( \nu_{\text{C}} \equiv \left| {x - \bar{x}} \right|/s_{x} \), according to Chauvenet’s criterion , as a function of the number Ν of measurements

Example 10.1

Apply Chauvenet’s criterion to the measurements of Table 10.1.

The Ν = 13 measurements of the table have a mean equal to \( \bar{x} = 50.4 \) and a standard deviation \( s_{x} = 2.0 \). The 9th measurement differs from the mean by

$$ \frac{{\left| {\,x_{9} - \bar{x}\,} \right|}}{{s_{x} }} = \frac{{\left| {\,55-50.4\,} \right|}}{2.0} = \frac{4.6}{2.0} = 2.3\; {\text{standard}}\,{\text{deviations}}. $$

From Table 10.2 we find that the probability for a difference from the mean greater than or equal to \( 2.3s_{x} \) is \( \Pr \left\{ {\frac{{\left| {\,x - \bar{x}\,} \right|}}{{s_{x} }} \ge 2.3} \right\} = 0.0214 \). For Ν = 13, it is \( 1/(2N) \) = 0.0385.

Since it is \( \Pr \left\{ {\frac{{\left| {\,x - \bar{x}\,} \right|}}{{s_{x} }} \ge 2.3} \right\}\,\, < \,\,1/(2N) \), the 9th measurement is rejected.

Alternatively, from Table 10.3 we find that, for 13 measurements, a measurement that differs from the mean by more than 2.07 \( s_{x} \) is rejected. For a difference of 2.3 \( s_{x} \), the rejection is justified.

The remaining 12 measurements now have a mean \( \bar{x}^{\prime} = 50 \) and a standard deviation of \( s^{\prime}_{x} = 1.5 \).

The measurement with number 12 differs from the new mean by

$$ \frac{{\left| {\,x_{12} - \bar{x}^{\prime}\,} \right|}}{{s^{\prime}_{x} }} = \frac{{\left| {\,47 - 50\,} \right|}}{1.5} = \frac{3}{1.5} = 2.0\; {\text{standard}}\,{\text{deviations}}. $$

From Table 10.2 we find that the probability for a difference from the mean greater or equal to \( 2.0s_{x} \) is \( \Pr \left\{ {\frac{{\left| {\,x - \bar{x}\,} \right|}}{{s_{x} }} \ge 2.0} \right\} = 0.0455 \). For \( N = 12 \), it is \( 1/(2N) \) = 0.0417.

Since it is \( \Pr \left\{ {\frac{{\left| {\,x - \bar{x}\,} \right|}}{{s_{x} }} \ge 2.0} \right\}\,\, > \,\,1/(2N) \), the 12th measurement is not rejected.

Alternatively, from Table 10.3 we find that, for \( N = 12 \), a measurement that differs from the mean by up to 2.04 \( s_{x} \) is not rejected. With a difference of \( 2.0s_{x} \), the 12th measurement should not be rejected.

10.3 Comments Concerning the Rejection of Measurements

Before we take a stand on the subject of the ‘rejection’ of measurements, we must clarify exactly what we mean by this term. We examine the two main different possibilities below:

In the case we have examined (Fig. 10.1), we had a series of measurements of the same quantity, which were taken under identical experimental conditions, as much as this was possible. Of course, it is not possible to maintain the conditions unchanged. It is good practice, in all experimental work, to keep detailed notes for everything happening and resort to these in an effort to find what had possibly changed during the taking of the ‘suspect’ measurement and which led to the difference observed. It is, however, rather improbable that the causes have been recorded, given that, if changes in the experimental conditions had been noted, they would have been corrected before the execution of the measurement. The opinion that the measurement must be repeated is not the solution. In a series of Ν measurements, the measurement has been repeated \( N - 1 \) times! If the rejection of results alters significantly the final result, then it might be necessary, if it is still possible, for more measurements to be made. Of course, the danger exists here that we keep making measurements until we have a result we like. This would have much more serious consequences than the rejection of a measurement. Summarizing, however, we would say that, in cases such as this, the use of Chauvenet’s criterion is justified. In no case, however, should the criterion be used for the rejection of two or more measurements, even when the number of measurements is large, in which case the rejection of a measurement would not alter the final result significantly. In cases when the criterion suggests the rejection of two or more measurements, the possibility should be seriously examined that the distribution of the parent population is not normal, the deviations being more important at its tails. Naturally, when a measurement has been rejected, this should be stated.

Someone using systematically Chauvenet’s criterion in his work, sooner or later is bound to reject measurements which he should not reject. Large deviations are improbable but not impossible!

The second case is that in which we are dealing with measurements which are not expected to give the same result, and one differs significantly from its expected value. For example, if we measure the values of variable \( y \) as a function of another, \( x \), and by the use of some method (such as the method of least squares, to be developed in the next chapter) we find the best mathematical relation between the two magnitudes (Fig. 10.2), then some point may deviate by so much from the expected value, that it is probable that this value is the result of a mistake (point Α in Fig. 10.2). If the method used in the fitting of a curve to the results also gives the expected error in \( y \) for every value of \( x \), it is possible to apply criteria for the rejection of some value.

Fig. 10.2
figure 2

A series of 13 measurements of \( y \) as a function of \( x \), among which there is one candidate for rejection (Α)

In this case, greater attention is needed to the rejection of a measurement. The reason we might wish to reject ‘erroneous’ results is so that we then apply the curve-fitting method again to the remaining results and find more accurate values for the parameters of the function relating \( x \) and \( y \). In contrast to the last case, however, we do not have other measurements taken under the same experimental conditions, with which we would compare the result under investigation. The right way to face the problem is to return to the laboratory (assuming that, quite wrongly, we have left it before detecting the problem!) and perform more measurements in the region of the suspect point. In the example of Fig. 10.2 this would mean the region between \( x = 8 \) and \( x = 10 \). Only then will we be in a position to decide whether the measurement should be rejected or that the relation \( y(x) \) does not behave as assumed in this region and, perhaps, that the deviation is due to a hitherto unknown phenomenon.

In cases that the problem cannot be resolved in the way described above, the criterion will indicate whether something unusual is happening in that region of values, by giving an estimate of how probable it is for the deviation to be due to random errors. We must not forget, however, that for \( \left| {x - \bar{x}} \right|/s_{x} > 2.5 \) our confidence that the normal distribution accurately describes the deviations is low.

We will conclude our discussion by mentioning one of the many cases in the history of science when not rejecting a value that appeared wrong lead to important discoveries. Rayleigh and Ramsay, in 1894, noticed that nitrogen produced in the laboratory was lighter than atmospheric nitrogen by 0.5%. The fact that they did not interpret the difference as being the result of random errors but considered the deviation to be real, lead them to conclude that an unknown gas was present in the sample of what was thought to be pure atmospheric nitrogen. Thus argon was discovered. Of course, in support of the statistical analysis of experimental results, it must be said that it was the knowledge of the possible errors in the measurements that lead the two scientists to suspect that the deviation was statistically significant.

10.4 Comparison of the Means of Two Series of Measurements of the Same Quantity

The need frequently arises for the comparison of the results of two series of measurements of the same quantity. The two series of measurements were, possibly, performed by the same experimenter at different times or were performed by different researchers. It is also possible that the same quantity was measured using two different experimental methods. A classical example is the measurement, in an educational laboratory, of a universal constant or the property of a material and the comparison of the results with the generally accepted values for these magnitudes, found in tables.

We suppose that we have two means, \( \bar{x}_{1} \) and \( \bar{x}_{2} \), and their corresponding standard deviations, \( \sigma_{{\bar{x}_{1} }} \) and \( \sigma_{{\bar{x}_{2} }} \). The two series must be considered to have been taken from the parent population of the infinite possible values that may result in the measurement of the magnitude \( {\mathbf{x}} \). Being finite samples from the same parent population, they are not expected to agree completely. The question of whether the two samples originated from the same parent population is answered in Statistics by Student’s t-test. Criteria which are simpler to apply are used for experimental results in the laboratory, of which we will describe only two:

  1. 1.

    The two results are considered to be in agreement with each other (or, better, that there are no serious indications for the presence of systematic errors), if the difference of their two means is smaller than or equal to the sum of the standard deviations of the two values:

$$ \left| {\,\bar{x}_{2} - \bar{x}_{1} \,} \right| \le \sigma_{{\bar{x}_{1} }} + \sigma_{{\bar{x}_{2} }} . $$
(10.3)
  1. 2.

    The two results are considered to be in agreement with each other if the difference of their two means is smaller than or equal to the standard deviation of the difference of the two values:

$$ \left| {\,\bar{x}_{2} - \bar{x}_{1} \,} \right| \le \sqrt {\sigma_{{\bar{x}_{1} }}^{2} + \sigma_{{\bar{x}_{2} }}^{2} } . $$
(10.4)

We will consider the second criterion as somewhat more correct and we will use it in the example that follows.

Example 10.2

In an educational laboratory, two students determined experimentally, by two different methods, the absolute value e of the charge of the electron and found the following values:

$$ e_{1} = (1.62 \pm 0.02) \times 10^{ - 19} {\text{ C}} \quad {\text{and}} \quad e_{2} = (1.59 \pm 0.03) \times 10^{ - 19} {\text{ C}} $$

Check whether the two results are consistent with each other and if they agree with the value of e generally accepted today.

The absolute value of the difference of the two results is \( \left| {\,e_{2} - e_{1} \,} \right| = 0.03 \times 10^{ - 19} {\text{ C}} \).

The standard deviation of the difference of the two mean values is

$$ \sigma_{{e_{2} - e_{1} }} = \sqrt {(0.02)^{2} + (0.03)^{2} } \times 10^{ - 19} = 0.036 \times 10^{ - 19} = 0.04 \times 10^{ - 19} {\text{ C}} . $$

Since it is \( \left| {\,e_{2} - e_{1} \,} \right| < \sigma_{{e_{2} - e_{1} }} \), the two results are considered to agree with each other.

The absolute value of the electronic charge is given in tables as \( e = 1. 60 2\; 1 7 6\; 5 6 5\left( { 3 5} \right) \times 10^{ - 19} {\text{ C}} \).

The standard error in this value is relatively negligible compared to the errors in the values obtained by the students. Therefore, as standard deviation of the difference of e and \( e_{1} \) will be taken to be the standard deviation of \( e_{1} \), i.e. \( 0.02 \times 10^{ - 19} {\text{ C}} \). The difference between the values of e and \( e_{1} \) is somewhat smaller than the standard deviation of their difference. We therefore conclude that the value \( e_{1} \) is in agreement with the generally accepted today value of e, within the limits of the errors of the measurements. The absolute difference of \( e_{2} \) from e is \( 0.01 \times 10^{ - 19} {\text{ C}} \), which is quite smaller than the standard deviation of the difference of \( e_{2} \) and e, which is equal to \( 0.03 \times 10^{ - 19} {\text{ C}} \). Thus, \( e_{2} \) is considered to be in agreement with the generally accepted today value of e, within the limits of the errors of the measurements.

Problems

  1. 10.1

    The results of 13 measurements of the quantity x are:

$$ 9\quad 6\quad 5\quad 9\quad 7\quad 9\quad 6\quad 10\quad 4\quad 7\quad 8\quad 5\quad 1 3. $$

(a) Find the mean \( \bar{x} \) and the standard deviation \( s_{x} \) of the measurements.

(b) Should the 13th measurement, \( x = 13 \), be rejected according to Chauvenet’s criterion?

  1. 10.2

    The results of 10 measurements are:

    $$ 1 2 6\quad 7 2\quad 1 6 2\quad 1 4 4\quad 2 5 2\quad 1 6 2\quad 1 3 5\quad 1 3 5\quad 1 5 3\quad 1 1 7. $$

    Is there a result that should be rejected according to Chauvenet’s criterion?

  1. 10.3

    A series of 37 measurements resulted in the values \( x_{r} \) with frequencies \( n_{r} \)

\( x_{r} \)

31.9

32.0

32.2

32.3

32.4

32.5

32.6

33.0

\( n_{r} \)

1

3

7

12

6

6

1

1

Use Chauvenet’s criterion in order to decide whether the last measurement should be rejected.

  1. 10.4

    Two series of measurements of the same quantity gave the results \( x_{1} = 1.518 \pm 0.012 \) and \( x_{2} = 1.535 \pm 0.015 \). Are the two results mutually compatible?

  2. 10.5

    Two series of measurements of the same quantity gave the results \( x_{1} = 163 \pm 6 \) and \( x_{2} = 180 \pm 4 \). Are the two results mutually compatible?