Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

We will now examine the general principles of the Theory of Errors . Apart from the a posteriori justification of some assumptions we have already made and used, this chapter will give us new theoretical tools which we will use in the development of new techniques in the analysis of experimental results.

9.1 The Normal or Gaussian Law of Errors

All believe in the exponential law of errors; the experimentalists because they think it can be proved with mathematics and the mathematicians because they believe it has been established experimentally.

E. Whittaker and G. Robinson [1]

The law known as Gauss’ law of errors , was first formulated by Laplace in 1783. Laplace based his derivation on the assumption that the deviation of a measurement belonging to a group of measurements from the group’s mean is due to a large number of small deviations, due to causes which are independent of each other. He assumed that deviations of the same magnitude are equally probable to be positive or negative. Gauss, later, proved the law based on the assumption that the numerical mean is the most probable value of a number of equivalent measurements.

The mathematical form of the law defines the normal or Gaussian probability density function , which gives the distribution of the results x of the measurements of a physical magnitude x, as

$$ f(x) = \frac{1}{{\sqrt {2\uppi} \;\sigma }}{\text{e}}^{{ - (x - \mu )^{2} /2\sigma^{2} }} . $$
(9.1)

The probability density depends on two parameters, μ and σ. It is proved that μ is the mean of a large number of measurements or the value towards which the mean \( \overline{x} \) of a series of measurements tends as the number of measurements tends to infinity and σ is the standard deviation of the measurements from μ. The function (9.1) describes the distribution of a very large number (\( N \to \infty \)) of measurement results, i.e. what we call the parent population (or universe) of all the possible results x of the measurements. A series of measurements of x constitutes a sample taken at random from this population. From this finite number of experimental results, we may deduce the best estimates for μ and σ, according to what we have shown in Chap. 4.

Having in mind what we have shown for the binomial and the Gaussian distributions in Chap. 7, the extraction of the expression (9.1) for the probability density is simple. Let us suppose that the magnitude being measured has real value equal to \( x_{0} \) and that the final deviation of a measurement x from \( x_{0} \) is due to Ν independent sources of error. To simplify the arguments, we assume that the random errors from these sources all have the same magnitude \( \varepsilon \). This limitation in not necessary and the law may also be extracted for a general distribution of these elementary errors [2]. The probabilities for the errors to have values \( \pm \varepsilon \) are both equal to \( p = \frac{1}{2} \). For Ν errors of magnitude \( \varepsilon \), the final results will lie between the values \( x = x_{0} - N\varepsilon \) and \( x = x_{0} + N\varepsilon \).

If, in a measurement, n errors are positive and, therefore, \( N - n \) errors are negative, the result of the measurement will be

$$ x = x_{0} + n\varepsilon - (N - n)\varepsilon = x_{0} + (2n - N)\varepsilon . $$
(9.2)

The probability of this occurring is, according to the binomial distribution,

$$ P_{N} (n) = \frac{N !}{n!(N - n)!}p^{n} (1 - p)^{N - n} = \frac{N !}{n!(N - n)!}\left( {\frac{1}{2}} \right)^{N} , $$
(9.3)

and this is the probability of a measurement having as its result the value \( x = x_{0} + (2n - N)\varepsilon \).

In the limit, in which the binomial distribution approaches the Gaussian distribution, this probability becomes (see Sect. 7.4)

$$ P_{N} (n) = \frac{1}{{\sqrt {2\pi } \, \sigma }}{\text{e}}^{{ - (n - \mu )^{2} /2\sigma^{2} }} $$
(9.4)

with

$$ \mu = Np = \frac{N}{2},\quad \sigma = \sqrt {Npq} = \frac{\sqrt N }{2}. $$
(9.5)

When we take as variable the continuous variable x [see Eqs. (7.54)–(7.56)], we have the probability distribution

$$ f(x) = \frac{1}{{\sqrt {2\pi } \;\sigma }}{\text{e}}^{{ - (x - \mu )^{2} /2\sigma^{2} }} , $$
(9.6)

with

$$ \mu = x_{0} ,\quad \sigma = \sqrt {Npq} \, \varepsilon = \frac{\sqrt N }{2}\varepsilon . $$
(9.7)

The Gaussian curve is shown in Fig. 9.1. This distribution was, as we have already mentioned, originally derived by de Moivre for the results of coin tossing.

Fig. 9.1
figure 1

The Gaussian function

The relation between the binomial distribution and the error of a measurement, according to the assumptions we have made, is seen in Fig. 9.2. A large enough number of small balls fall, colliding with regularly arranged cylindrical obstacles, such that they force the balls to deviate to the left or to the right by a given constant step \( \varepsilon /2 \). The probabilities for a deviation to the right and for a deviation to the left are equal, \( p = \frac{1}{2} \). The balls do not interact with each other during their descent. Shown in the figure is the distribution of the final positions of the balls after \( N = 24 \) such deviations. The distribution is binomial and, in the limit, Gaussian. The symmetry of the distribution is seen, as well as the fact that small deviations are more probable than large ones, since they are attained in a larger number of possible ways (paths). The arrangement is due to Francis Galton and is known by its Latin name quincunx , which describes the way fruit-bearing trees are planted. Since in the figure it is \( N = 24 \), the deviations are even multiples of \( \varepsilon /2 \). Thus, we see that the deviations have values which are integral multiples of \( \varepsilon \) between \( {-} 12\varepsilon \) and \( {+} 12\varepsilon \). Each ball may be considered to be the result of a measurement, which is subjected to 24 small deviations of magnitude \( \varepsilon /2, \) with equally probable positive or negative sign.

Fig. 9.2
figure 2

The distribution of the deviations suffered by falling small balls which are forced 24 times to deviate to the left or to the right, by equal steps and with equal probabilities. Their number in each region of deviation is given by the binomial distribution

Laplace and Gauss assumed that the normal distribution has universal validity, based on their own studies of experimental results. Today, this is known not to be true. Apart from the cases in which the deviations from the normal distribution are too striking to neglect, the distribution is used because, as put by Jeffreys [3]: ‘The normal law of errors cannot be proved theoretically. The justification for its use is that the law represents measurements of many kinds without being too wrong and that it is much easier to use than other laws, which would be more accurate in certain cases’.

Despite all these, even when the parent population does not have a normal distribution, the distribution of the means of a series of a finite number of measurements is nearer to the normal distribution than the parent population. This follows from a very important theorem, the central limit theorem , which we will discuss in detail below.

The first example of measurements appearing to be normally distributed was given by Bessel , who grouped the results of 300 measurements of the right ascension of stars. The errors were given in seconds of arc and lied between –1 and +1 s. The histogram of the errors (Fig. 9.3) is symmetrical with respect to zero, since Bessel grouped together the positive and the negative errors. As shown in the figure, a Gaussian curve with mean \( \overline{x} = 0 \) (as expected) and standard deviation from the mean \( \sigma_{x} = 0.20 \) s, is fitted to the histogram. The fit of the Gaussian to the data is very good. In fact, it so good that there were suggestions that a selection of values was made, in order to get a better agreement with the normal distribution of errors.

Fig. 9.3
figure 3

The histogram of Bessel for the errors of 300 measurements of the right ascension of stars. The Gaussian fitted to the histogram has a mean \( \overline{x} = 0 \) and a standard deviation \( \sigma_{x} = 0. 20\;{\text{s}} \)

Birge [4] performed, with a spectrometer, a series of 500 adjustments in which he placed the vertical wire of the cross wire in the optical field of the instrument as near as he could to the center of a wide but symmetrical spectral line in the solar spectrum. This is the procedure followed in the measurement of the wavelength of the spectral line. He recorded the readings on the instrument’s scale, in μm. The frequencies of the measurements’ residuals from their mean value, \( \upsilon_{i} = x_{i} - \overline{x} \), are presented in the histogram of Fig. 9.4. The agreement of the distribution of the errors with a normal distribution is very good. We may check whether the deviations of the values of Birge’s histogram are near the expected ones. If \( \Delta N_{\text{G}} \) are the values of the Gaussian curve fitted to the histogram of Fig. 9.4 (thick curve), then the expected deviations will be, according to the Poisson distribution, of the order of \( \sqrt {\Delta N_{\text{G}} } \). The curves \( \Delta N_{\text{G}} \) and \( \Delta N_{\text{G}} \pm \sqrt {\Delta N_{\text{G}} } \) are also drawn in the figure. We note that the deviations of the histogram’s columns from the Gaussian curve are within or near the expected limits.

Fig. 9.4
figure 4

Histogram of the residuals \( \upsilon_{i} = x_{i} - \overline{x} \) of the 500 measurements performed by Birge with a spectrometer in order to test the normal law of errors. The Gaussian fitted to the histogram has a standard deviation of \( \sigma_{\upsilon } = 3.6 \) μm. Apart from the Gaussian curve \( \Delta {N}_{\text{G}} \) (thick line) the curves \( \Delta {N}_{\text{G}} \pm \sqrt {\Delta{N}_{\text{G}} } \) are also drawn. The expected standard deviation of \( \Delta {N}_{\text{G}} \) is, according to the Poisson distribution, equal to \( \sqrt {\Delta {N}_{\text{G}} } \)

In other cases, the errors in the measurements of other experiments are not described so satisfactorily by a Gaussian curve. In some cases, the data required the use of a sum of two normal curves with different standard deviations. Naturally, it is obvious that the use of two curves, with four parameters to be determined instead of the two of a single curve, will always give better agreement than a single curve. On the other hand, the need for the use of two curves for a better fit, might be an indication of either that the errors are due to two widely different sources or that the measurements were not performed under exactly the same experimental conditions (i.e. they are derived from two different parent populations).

9.2 The Lyapunov Central Limit Theorem

A theorem of very great importance in the Theory of Probability and Statistics is the Lyapunov central limit theorem , which we will now discuss, without proving it [5]. An elementary formulation of the theorem, which is adequate for the purposes of this book, is the following:

If \( x_{1} ,\;x_{2} , \ldots \;x_{N} \) are the Ν values of a random sample taken from a parent population of the random variable x , which has mean μ and standard deviation σ, then, as the number Ν tends to infinity, the distribution of the means \( \overline{x} \) of the \( x_{i} \) approaches a normal distribution with mean and standard deviation

$$ \overline{{(\overline{x})}} = \mu_{{\overline{x}}} = \mu \quad {\text{and}} \quad \sigma_{{\overline{x}}} = \frac{\sigma }{\sqrt N }, $$
(9.8)

respectively. In other words, the probability density of the means \( \overline{x} = \frac{1}{N}(x_{1} + x_{2} + \ldots + x_{N} ) \) tends to

$$ f(\overline{x}) = \frac{1}{{\sqrt {2\uppi} \;\sigma_{{\overline{x}}} }}{\text{e}}^{{ - (\overline{x} - \mu_{{\overline{x}}} )^{2} /2\sigma_{{\overline{x}}}^{2} }} = \frac{\sqrt N }{{\sqrt {2\uppi} \;\sigma }}{\text{e}}^{{ - N(\overline{x} - \mu )^{2} /2\sigma^{2} }} . $$
(9.9)

It must be noted that the distribution of the parent population does not need to be normal.

We will explain the meaning of the theorem with the aid of Fig. 9.5. Figure 9.5a shows the probability density of the parent distribution of all the possible results of the measurements of x. To make the description of the sampling method easier, the area under the curve has been divided into a finite number of identical rectangular cells. In the limit, this number will be considered to tend to infinity. To each cell there corresponds a small region of x values, given by the projection of the cell on the x-axis. The process of sampling is simply the random picking of a number Ν of these cells. The cells have the same probability of being picked during the sampling. At points of large \( f(x) \), the vertical column consists of a larger number of cells and the values of x corresponding to this column are more likely to be the result of a measurement. The 10 black cells in the figure could be the ones selected in a sampling with \( N = 10 \) values (measurements). Their projection on the x-axis results in the histogram of the 10 measurements (Fig. 9.5b). The mean \( \overline{x} \) of these Ν values is evaluated. The central limit theorem states that, independently of the shape of the distribution of the parent population, these means, \( \overline{x} \), which result from different series of Ν measurements each, have a distribution which, for large Ν, tends to a normal distribution with mean equal to the mean of the parent population, \( \overline{{(\overline{x})}} = \mu_{{\overline{x}}} = \mu \), and standard deviation \( \sigma_{{\overline{x}}} = \frac{\sigma }{\sqrt N } \), where \( \sigma \) is the standard deviation of the parent population.

Fig. 9.5
figure 5

The central limit theorem. a The parent population, with mean μ and standard deviation σ. b One series of N measurements with mean \( \overline{x}_{i} \). c The distribution of the means, \( \overline{x}_{i} \). It has a mean \( \overline{{(\overline{x})}} = \mu_{{\overline{x}}} \), which tends to \( \mu \) for large N and a standard deviation \( \sigma_{{\overline{x}}} \), which tends to \( {\sigma \mathord{\left/ {\vphantom {\sigma {\sqrt N}}} \right. \kern-0pt} {\sqrt N}} \)

We will demonstrate the central limit theorem with a few examples, in which the sampling is done from parent populations of known distributions.

Example 9.1

Use the Monte Carlo method to check the validity of the central limit theorem for measurement numbers Ν = 1, 2, 4, 8 and 16, when the parent distribution of the measurements has probability density: \( f(x) = 0 \) everywhere, except in the interval \( 0 \le x \le 1 \), in which it is \( f(x) = 1 \).

The probability density \( f(x) \) has been drawn in the figure that follows.

It has a mean value \( \mu = 0.5 \) and a standard deviation given by the relation

$$ \sigma = \sqrt {\int\nolimits_{0}^{1} {(x - 0.5)^{2} f(x)\,dx} } = \sqrt {\left[ {\frac{1}{3}(x - 0.5)^{3} } \right]_{ 0}^{ 1} } = \sqrt {\frac{1}{12}} = 0.289. $$

The Monte Carlo method will be used in a simulation of the experimental process in order to ‘find’ the results \( x_{i} \) of the measurements. The first 50 000 of the decimal digits of π were used as the random numbers required for the application of the method. They were divided into groups of 5 digits and divided by 105, thus giving 10 000 numbers between 0 and 1, with 5 significant figures each (from 0.00000 to 0.99999).

Since the results \( x_{i} \) are uniformly distributed between 0 and 1 (constant probability density) and the same must be true for random numbers, if they are indeed random, the random numbers found as described above are taken directly to be the values of \( x_{i} \).

Thus, the first 625 numbers gave 625 results \( x_{i} \), which are recorded in the first histogram of the figure given below (\( N = 1 \)). The variable is denoted by \( \overline{x} \) for uniformity with the rest of the histograms, but these ‘mean’ values consist of one measurement each (\( N = 1 \)). As a consequence, this histogram must reproduce, approximately, the probability density \( f(x) \). The Gaussian which is fitted to the histogram, with some dose of exaggeration, has a mean of 0.509 and \( \sigma_{{\overline{x}}} = 0.29 \).

Next, the first \( 2 \times 625 = 1250 \) random numbers gave 625 pairs of values, the means of which, \( \overline{x} \), were found and are given in the second histogram (Ν = 2). Even with N being just two, the fitting of a Gaussian curve to the distribution of the means is very satisfactory.

The procedure is continued for Ν = 4, 8 and 16. For Ν = 16, the \( 16 \times 625 = 10\, 000 \) random numbers give 625 series of 16 results of measurement \( x_{i} \) each. The distribution of the means of these groups of 16 values is given by the last histogram. The Gaussian approximation to this histogram is seen to be very good.

Given with all the histograms, in dashed line, is the mean number of values corresponding to each class, taking into account the width of the classes and the total number of means \( \overline{x} \), which is 625 in all cases.

The table below gives, for comparison, the theoretically expected values of the means \( \overline{{(\overline{x})}} \) of the \( \overline{x} \) and the standard deviations \( \sigma_{{\overline{x}}} = \sigma /\sqrt N \), as well as the values determined with the simulation of the experiment which was performed.

Ν

\( \overline{{(\overline{x})}} \)

\( \sigma_{{\overline{x}}} \)

Theoretically expected

‘Experimental’ result

Theoretically expected

‘Experimental’ result

1

2

4

8

16

0.5

0.5

0.5

0.5

0.5

0.509

0.500

0.505

0.499

0.499

0.289

0.204

0.144

0.102

0.072

0.29

0.21

0.15

0.10

0.07

As a conclusion, we note that the mean value of the means, \( \overline{{(\overline{x})}} \), tends very quickly towards the expected value according to the central limit theorem (as should be expected for the relatively large number of 625 measurements!). The distribution of \( \overline{x} \) also tends towards a normal distribution very quickly, with its standard deviation of the Gaussian being \( \sigma_{{\overline{x}}} \propto 1/\sqrt N \).

In the next example, a distribution of x will be used which is very far from being normal, a kind of ‘anti-Gaussian’ distribution. The example will also give us the opportunity of a better understanding of the Monte Carlo method.

Example 9.2

Use the Monte Carlo method in order to test the validity of the central limit theorem for numbers of measurements Ν = 1, 2, 4, 8 and 16, when the parent distribution of the measurements is given by the probability density:

$$ f(x) = 0\; {\text{everywhere}}\;{\text{except}}\;{\text{in}}\;{\text{the}}\;{\text{region}}\;0 \le x \le 1,\,{\text{where}}\;{\text{it}}\;{\text{is}}\;f(x) = 3(1-2x)^{2} . $$

The function of the probability density is normalized. It has a parabolic shape (see figure below) with a minimum equal to 0 at \( x = 0.5 \). Due to the symmetry of the distribution, the mean value of x is \( \overline{x} = 0.5 \). This distribution was chosen as an example of an anti-Gaussian distribution since results near the mean have very small probability of being observed, while the opposite happens for values near the edges of the distribution.

The standard deviation of this distribution is

$$ \sigma = \sqrt {\int\nolimits_{0}^{1} {(x - 0.5)^{2} f(x) dx} } = \sqrt {12\int\nolimits_{0}^{1} {(x - 0.5)^{4} \,dx} } = \sqrt {\left[ {\frac{12}{5}(x - 0.5)^{5} } \right]_{0}^{1} } = \sqrt {\frac{3}{20}} = 0.387. $$

The Monte Carlo method will be used again in the simulation of the measurement procedure. 10 000 random numbers, \( n_{i} \), between 0 and 1, with 5 significant figures each, were found exactly as in the last example, using the first 50 000 decimal digits of π. The correspondence of these random numbers \( n_{i} \) to values of \( x_{i} \) is a little more difficult than in the last example, because now the values of \( x_{i} \) are not uniformly distributed between 0 and 1. As the way in which this will be achieved is of general importance for the Monte Carlo method, we will describe it in some detail.

In the present example we need to attribute values \( x_{i} \) of \( x \) to 104 random numbers. It should be noted that the 104 random numbers may take any one of 104 possible values, between 0.0000 and 0.9999. We divide the interval \( [0,\, 1] \) into 104 strips (see figure), each with different width \( \Delta x \), choosing these \( \Delta x \) in such a way that the area under the curve between \( x \) and \( x + \Delta x \) has the value of \( f(x)\,\Delta x = 10^{ - 4} \). The area corresponds to the probability that a value lies between \( x \) and \( x + \Delta x \). We now have 104 surface elements of equal areas and 104 possible values of the random numbers. The 104 random numbers each have the same probability to appear and the result of a measurement is equally probable to lie in one of the 104 surface elements.

We will correspond the 104 surface elements to the 104 random numbers in increasing order, so that the random number 0.0000 corresponds to the first \( \Delta x \) interval, the number 0.0001 to the second and so on, up to the number 0.9999 which corresponds to the 10 000th interval. We may then say that the appearance of a random number is equivalent to the result of the corresponding measurement lying in the interval between \( x \) and \( x + \Delta x \) that is covered by the corresponding surface element. Let the increasing order numbers of the strips be \( N_{i} \), between 0 and 9999. The appearance of a random number \( n_{i} \), which according to the convention we adopted belongs to the strip with order number \( N_{i} \equiv n_{i} \times 10^{4} \), is interpreted to mean that one ‘measurement’ gave a result lying in this strip. The area of the surface under the curve and to the left of the \( N_{i} \)-th strip is equal to \( S_{i} = N_{i} \times 10^{ - 4} = n_{i} \). We conclude that the corresponding value \( x_{i} \) of \( x \) is such that the area under the curve from \( x = 0 \) to \( x_{i} \) is equal to \( S_{i} = n_{i} \). Thus, the random number \( n_{i} \) uniquely defines a probability between 0 and 1, which corresponds to an area \( S_{i} \) which, in its turn, corresponds to a value \( x_{i} \) such that

$$ S_{i} = \int\nolimits_{ 0}^{{x_{i} }} {f(x)\,{\text{d}}x} = n_{i} . $$

This equation may be solved for \( x_{i} (n_{i} ) \).

For the probability density \( f(x) = 3(1-2x)^{2} \) (\( 0 \le x \le 1 \)), we find that it is \( S_{i} = \frac{1}{2}\left[ {1 + (2x_{i} - 1)^{3} } \right] \) and, therefore, \( x_{i} = \frac{1}{2} + \frac{1}{2}(2S_{i} - 1)^{1/3} \), from which relation we have the correspondence of \( x_{i} \) to the random number \( n_{i} \):

$$ x_{i} = \frac{1}{2} + \frac{1}{2}(2n_{i} - 1)^{1/3} . $$

In order to apply the Monte Carlo method to the present example, the random numbers of the previous example were used. Thus, the first 625 numbers gave, via the last relation, 625 results \( x_{i} \), which are recorded in the first histogram of the figure that follows (for Ν = 1). Again the variable is denoted by \( \overline{x} \) for the purposes of uniformity with the rest of the histograms, although these ‘mean values’ are actually single values. This histogram must reproduce the probability density \( f(x) \), something which is seen to happen to a satisfactory degree. The process is simply sampling and the result shows the degree to which a sample of 625 measurements may determine the parent population.

The next histogram (for Ν = 2) is based on the first \( 2 \times 625 = 1250 \) random numbers, which gave 625 pairs of values, the means of which, \( \overline{x} \), were found. As expected, a maximum is observed in the region of \( x = 0.5 \), which has its origin in the means of pairs of values, one of which originates from the region around the center of the distribution (\( {\approx} 0 \)) and the other from the regions of its edges (\( {\approx} 1 \)). Striking maxima and minima due to similar combinations are still visible in the histogram for Ν = 4, but the Gaussian shape of the distribution is already clearly visible. The mean for Ν = 4 is 0.511 (instead of the expected 0.5) and the standard deviation is 0.22 (instead of the expected 0.19).

The normal shape of the histograms is more evident for means evaluated from Ν = 8 and Ν = 16 values of \( x_{i} \) (last two histograms).

The table that follows gives the theoretically expected values of the mean values \( \overline{{(\overline{x})}} \) of the means \( \overline{x} \) and of the standard deviations \( \sigma_{{\overline{x}}} = \sigma /\sqrt N \), as well as the values which were determined through the simulation of the experiment we have performed.

Ν

\( \overline{{(\overline{x})}} \)

\( \sigma_{{\overline{x}}} \)

Theoretically expected

‘Experimental’ result

Theoretically expected

‘Experimental’ result

1

2

4

8

16

0.5

0.5

0.5

0.5

0.5

0.511

0.498

0.496

0.387

0.274

0.194

0.137

0.097

0.220

0.132

0.095

We note that, although the distribution of the measurements of the parent population is far from being normal, the distribution of the means \( \overline{x} \) tends, relatively quickly, to the normal form, in agreement with the central limit theorem. The standard deviations are again observed to be in satisfactory agreement with the theoretical relation \( \sigma_{{\overline{x}}} = \sigma /\sqrt N \).

One more (purely theoretical) example demonstrating the validity of the central limit theorem will be given in Sect. 9.6 (Example 9.5), in the study of convolution and the calculation of the means and the standard deviations of sums of numbers picked from a certain distribution.

A note regarding random numbers . The subject of random numbers is of enormous importance in applications of the Monte Carlo method and in simulation in general. There are large tables of random numbers [6], as well as algorithms for their production [7] (pseudorandom numbers ). In this book, we usually choose to use the decimal digits of π as a source of random numbers. The absolute randomness of a series of digits is not easy to prove and is actually impossible to prove beyond any doubt. The only thing one can say is that the series of these particular digits have passed successfully certain basic tests, such as, for example, that the variations of the frequencies of appearance of the 10 digits (0, 1, …, 9) are within the statistically expected limits, the same holds for the 100 two-digit combinations (00, 01, 02, …, 99) etc. The digits of π have successfully passed these tests [8]. Of course, a large number of people, ranging from professional mathematicians to amateurs interested in the theory of numbers, continuously search and find many coincidences, the occurrence of which is expected to be much rarer than observed in practice. Yasumasa Kanada, for example, having calculated the first 206.1 billion decimal digits of π, found out that the sequence 01234567891 appears 5 times, instead of the expected two times. Not paying any attention to such ‘strange phenomena’, the decimal digits of π are considered to be adequately random for the purposes of this book.

9.3 The Best Estimate that May Be Made for the Real Value of a Magnitude, Based on the Results of Ν Measurements of It

Assume that we have Ν values (measurements), \( x_{i} \) \( (i = 1, 2, \ldots , \,N) \), of a random variable, which we will denote by \( {\mathbf{x}} \). We take these values to be normally distributed about the real value \( x_{0} \) of \( {\mathbf{x}} \), with standard deviation \( \sigma \). Then, referring to Fig. 9.6, we may say that

Fig. 9.6
figure 6

The probability density function for a result x, and the results x i of N measurements

  • the probability for the first value of \( {\mathbf{x}} \) to lie between \( x_{1} \) and \( x_{1} + {\text{d}}x_{1} \) is \( \frac{1}{{\sqrt {2\uppi} \,\sigma }}{\text{e}}^{{ - (x_{1} - x_{0} )^{2} /2\sigma^{2} }} {\text{d}}x_{1} \),

  • the probability for the second value of \( {\mathbf{x}} \) to lie between \( x_{2} \) and \( x_{2} + {\text{d}}x_{2} \) is \( \frac{1}{{\sqrt {2\uppi} \,\sigma }}{\text{e}}^{{ - (x_{2} - x_{0} )^{2} /2\sigma^{2} }} {\text{d}}x_{2} \) etc., and

  • the probability for the N-th value of \( {\mathbf{x}} \) to lie between \( x_{N} \) and \( x_{N} + {\text{d}}x_{N} \) is \( \frac{1}{{\sqrt {2\uppi} \,\sigma }}{\text{e}}^{{ - (x_{N} - x_{0} )^{2} /2\sigma^{2} }} {\text{d}}x_{N}. \)

The compound probability for all the measurements to lie within the limits mentioned is:

$$ \begin{aligned} {\text{d}}^{N} P& = \frac{1}{{\left( {\sqrt {2\uppi} \,\sigma } \right)^{N} }}\exp \left\{ { - \frac{1}{{2\sigma^{2} }}\left[ {(x_{1} - x_{0} )^{2} + (x_{2} - x_{0} )^{2} + \ldots + (x_{N} - x_{0} )^{2} } \right]} \right\}\\&\quad{\text{d}}x_{1} {\text{d}}x_{2} \ldots {\text{d}}x_{N} , \end{aligned} $$
(9.10)

or

$$ {\text{d}}^{N} P = \frac{1}{{\left( {\sqrt {2\uppi} \,\sigma } \right)^{N} }}{\text{e}}^{{ - \chi^{2} /2}} \;{\text{d}}^{N} \upsilon , $$
(9.11)

where

$$ \chi^{2} \equiv \frac{1}{{\sigma^{2} }}\left[ {(x_{1} - x_{0} )^{2} + (x_{2} - x_{0} )^{2} + \ldots + (x_{N} - x_{0} )^{2} } \right] $$
(9.12)

and

$$ {\text{d}}^{N} \upsilon = {\text{d}}x_{1} {\text{d}}x_{2} \ldots {\text{d}}x_{N} $$
(9.13)

may be considered to be the element of the Ν-dimensional volume about the point \( ( x_{1} ,\, x_{2} \ldots x_{N} ) \).

The real value \( x_{0} \) is not known. We consider that the best estimate we can make for it is the value \( \hat{x}_{0} \) of \( x_{0} \) which maximizes the probability for the results of the measurements of \( {\mathbf{x}} \) we have to occur. For given limits \( dx_{1} ,\;dx_{2} ,\; \ldots ,\,dx_{N} \), this happens when the quantity \( \chi^{2} \) is minimum, i.e. for

$$ \frac{{\partial \chi^{2} }}{{\partial x_{0} }} = \frac{2}{{\sigma^{2} }}\left[ {(x_{1} - x_{0} ) + (x_{2} - x_{0} ) + \ldots + (x_{N} - x_{0} )} \right]_{{x_{0} = \hat{x}_{0} }} = 0 $$
(9.14)

or

$$ N \hat{x}_{0} - (x_{1} + x_{2} + \ldots + x_{N} ) = 0. $$
(9.15)

The best estimate for \( x_{0} \) is, therefore,

$$ \hat{x}_{0} = \frac{1}{N}(x_{1} + x_{2} + \ldots + x_{N} ), $$
(9.16)

i.e. the mean of the Ν values of x.

The result may be considered to be a proof of the principle of least squares, first formulated by Legendre . The principle states that:

The most probable value of a magnitude being measured is that which minimizes the sum of the squares of the deviations of the results of the measurements from this value.

Both Gauss and Laplace studied this principle. Gauss, assuming the mean of the measurements to be the most probable value of the magnitude being measured, derived the normal law of errors. Inversely, the normal law of errors may be used in order to prove the principle of the most probable value, as we have done.

9.4 The Weighting of Values

If the values we have at our disposal come from different parent populations with different standard deviations (i.e. distributions relative to the real value) due to the different accuracies in the determination of each value, then we will have

$$ {\text{d}}^{N} P = \frac{1}{{\left( {\sqrt {2\pi } \,} \right)^{N} \sigma_{1} \sigma_{2} \ldots \sigma_{N} }}\,\exp \,\left\{ { - \left[ {\frac{{(x_{1} - x_{0} )^{2} }}{{2\sigma_{1}^{2} }} + \frac{{(x_{2} - x_{0} )^{2} }}{{2\sigma_{2}^{2} }} + \ldots + \frac{{(x_{N} - x_{0} )^{2} }}{{2\sigma_{N}^{2} }}} \right]} \right\}\;{\text{d}}x_{1} \,{\text{d}}x_{2} \ldots {\text{d}}x_{N} , $$
(9.17)

and the minimization of

$$ \chi^{2} = \frac{{(x_{1} - x_{0} )^{2} }}{{\sigma_{1}^{2} }} + \frac{{(x_{2} - x_{0} )^{2} }}{{\sigma_{2}^{2} }} + \ldots + \frac{{(x_{N} - x_{0} )^{2} }}{{\sigma_{N}^{2} }} $$
(9.18)

gives the relation

$$ \hat{x}_{0} = \frac{{(x_{1} /\sigma_{1}^{2} ) + (x_{2} /\sigma_{2}^{2} ) + \ldots + (x_{N} /\sigma_{N}^{2} )}}{{1/\sigma_{1}^{2} + 1/\sigma_{2}^{2} + \ldots + 1/\sigma_{N}^{2} }} $$
(9.19)

as the best estimate for the real value \( x_{0} \).

Equation (9.19) may also be written as

$$ \hat{x}_{0} = \frac{{\sum\limits_{i = 1}^{N} {w_{i} x_{i} } }}{{\sum\limits_{i = 1}^{N} {w_{i} } }} = \overline{x} $$
(9.20)

which is the weighted mean of x, with statistical weight for value \( x_{i} \) equal to

$$ w_{i} = \frac{1}{{\sigma_{i}^{2} }}. $$
(9.21)

The importance of a value in the determination of the mean is, therefore, inversely proportional to the square of its standard deviation. The bigger the standard deviation of a measurement, the smaller the weight it is given in the determination of the mean, something that appears qualitatively reasonable. When all the measurements have the same weight, it is \( w_{i} /\sum\limits_{i} {w_{i} } = 1/N \) and the equations reduce to the known ones.

More generally, if the weights \( w_{1} ,\;w_{2} , \ldots ,\;w_{N} \) are attributed, for whatever reason, to the values of measurements \( x_{1} ,\;x_{2} , \ldots ,\;x_{N} \), respectively, the weighted mean of these values is given by

$$ \overline{x} = \frac{{\sum\limits_{i = 1}^{N} {w_{i} x_{i} } }}{{\sum\limits_{i = 1}^{N} {w_{i} } }}. $$
(9.22)

Equations (9.18) and (9.19) show that the magnitude \( \sum {w_{i} (x_{i} - x_{0} )^{2} } \) has a minimum when \( \overline{x} = x_{0} \). In order to evaluate the standard deviation of the weighted values \( x_{1} ,\;x_{2} , \ldots ,\;x_{N} \), we normalize the statistical weights \( w_{i} \) so that their sum is equal to unity, by dividing each one with \( \sum {w_{i} } \). Defining the normalized statistical weights

$$ \beta_{i} \equiv \frac{{w_{i} }}{{\sum\limits_{i = 1}^{N} {w_{i} } }}, $$
(9.23)

for which it is true that

$$ \sum\limits_{i} {\beta_{i} } = 1, $$
(9.24)

we have the weighted mean of x,

$$ \overline{x} = \sum\limits_{i = 1}^{N} {\beta_{i} x_{i} } . $$
(9.25)

When all the measurements have the same weight, it is \( \beta_{i} = \frac{1}{N} \).

The weighted standard deviation of the values \( x_{i} \) is defined as

$$ s_{x} \equiv \sqrt {\,\sum\limits_{i = 1}^{N} {\beta_{i} (x_{i} - \overline{x})^{2} } } $$
(9.26)

and the weighted standard deviation of the mean \( \overline{x} \) is given by

$$ \sigma_{{\overline{x}}} \equiv \sqrt {\,\frac{1}{(N - 1)}\sum\limits_{i = 1}^{N} {\beta_{i} (x_{i} - \overline{x})^{2} } } , $$
(9.27)

where N here is the number of x values with non-zero weight. It should be noted that in the evaluation of the standard deviation, \( \beta_{i} \) is used as statistical weight and not \( \beta_{i}^{2} \).

Example 9.3

The results of 5 measurements, \( x_{i} \), with their statistical weights \( w_{i} \) are given in columns 2 and 3 of the table below. Find the weighted mean of the results and its standard deviation.

i

\( x_{i} \)

\( w_{i} \)

\( \beta_{i} \)

\( \beta_{i} x_{i} \)

\( x_{i} - \overline{x} \)

\( (x_{i} - \overline{x})^{2} \)

\( \beta_{i} (x_{i} - \overline{x})^{2} \)

1

2

3

4

5

5.05

5.25

5.16

5.09

5.17

2

1

3

4

1

0.182

0.091

0.273

0.363

0.091

0.919

0.478

1.409

1.848

0.470

−0.074

0.126

0.036

−0.034

0.046

0.00548

0.01588

0.00130

0.00116

0.00212

0.000997

0.001445

0.000355

0.000420

0.000193

Sums

11

1

5.124

  

0.003410

The weighted mean is \( \overline{x} = \sum\limits_{i = 1}^{N} {\beta_{i} x_{i} } = 5.124. \)

The weighted standard deviation of the measurements is \( s_{x} = \sqrt {\,\sum\limits_{i = 1}^{N} {\beta_{i} (x_{i} - \overline{x})^{2} } } = \sqrt {0.003410} = 0.0584 \)

and the weighted standard deviation of the mean is

$$ \sigma_{{\overline{x}}} = \sqrt {\,\frac{1}{(N - 1)}\, \sum\limits_{i = 1}^{N} {\beta_{i} (x_{i} - \overline{x})^{2} } } = \frac{{s_{x} }}{{\sqrt {N - 1} }} = \frac{0.0584}{\sqrt 4 } = 0.0292. $$

The final result is:

$$ x = 5.124 \pm 0.029. $$

Without weighting, these quantities are \( \overline{x} = 5.144 \), \( s_{x} = 0.069 \) and \( \sigma_{{\overline{x}}} = 0.035 \).

Example 9.4 [E]

Solve Example 9.3 using Excel®.

We will first evaluate the weighted mean. We enter the values of \( x_{i} \) and \( w_{i} \) in columns A and B, respectively. Highlight an empty cell, say E1. Left click on cell E1 and type:

$$ {=}{\mathbf{SUMPRODUCT}}\left( {{\mathbf{A1}}{:}{\mathbf{A5}};{\mathbf{B1}}{:}{\mathbf{B5}}} \right)/{\mathbf{SUM}}\left( {{\mathbf{B1}}{:}{\mathbf{B5}}} \right) $$

Pressing ENTER will return the number 5.123636 in cell E1. This is the required mean, \( \overline{x} = 5{.}1236 \) mm.

We will give this number the name M. To do this, we right click on cell E1. In the dialog box that opens, we select Define Name… and in the cell for Name we write M. Press ENTER.

We will now evaluate the weighted standard deviation. We first evaluate the terms \( (x_{r} - \overline{x})^{2} \). We highlight cell C1 and type: =(A1-M)^2. Pressing ENTER returns the number 0.005422 in cell C1. To fill cells C1 to C5 with the values of \( (x_{r} - \overline{x})^{2} \), we highlight cells C1-C5 and press Fill > Down.

To evaluate the standard deviation, we highlight an empty cell, say E2 and type

$$ {=} {\mathbf{SQRT}}\left( {{\mathbf{SUMPRODUCT}}\left( {{\mathbf{B1}}{:}{\mathbf{B5}};{\mathbf{C1}}{:}{\mathbf{C5}}} \right)/{\mathbf{SUM}}\left( {{\mathbf{B1}}{:}{\mathbf{B5}}} \right)} \right) $$

Pressing ENTER returns the number 0.058352.

The weighted standard deviation of the measurements is \( s_{x} = \sqrt {\,\frac{1}{{\sum\limits_{i = 1}^{N} {w_{i} } }}\sum\limits_{i = 1}^{N} {w_{i} (x_{i} - \overline{x})^{2} } } = 0.058352 \) and the weighted standard deviation of the mean is

$$ \sigma_{{\overline{x}}} = \frac{{s_{x} }}{{\sqrt {N - 1} }} = \frac{0.058352}{\sqrt 4 } = 0.029176. $$

The final result is:

$$ x = 5.124 \pm 0.029. $$

Example 9.5 [O]

Solve Example 9.3 using Origin®.

We enter \( x_{i} \) and \( w_{i} \) in columns A(X) and B(Y). Highlight column A by left-clicking on its label. Then

$$ {\mathbf{Statistics}} > {\mathbf{Descriptive}}\;{\mathbf{Statistics}} > {\mathbf{Statistics}}\;{\mathbf{on}}\;{\mathbf{Columns}} > {\mathbf{Open}}\;{\mathbf{Dialog}} \ldots $$

In the window that opens, in Input Data, Range 1, Data Range, column A is already selected. In Weighting Range, we select column B(Y).

In Quantities, we click Mean and Standard Deviation.

We open the window Computation Control. We select Weight Method, Direct Weight and Variant Divisor of Moment, WS. We press OK. The results are:

Mean = \( \overline{x} = 5.12364 \), Standard Deviation = \( s_{x} = 0.05835 \).

We calculate \( \sigma_{{\overline{x}}} = \frac{{s_{x} }}{{\sqrt {N - 1} }} = \frac{0.05835}{{\sqrt {5 - 1} }} = 0.02918 \)

Summarizing, \( \overline{x} = 5.124 \), \( s_{x} = 0.058 \) and \( \sigma_{{\overline{x}}} = 0.029 \), in agreement with the results of Example 9.3.

Example 9.6 [P]

Solve Example 9.3 using Python.

from __future__ import division import numpy as np import math

# Enter the values given as the components of the vector x:

x = np.array([5.05, 5.25, 5.16, 5.09, 5.17])

# Enter the corresponding weights w of the x values:

w = np.array([2, 1, 3, 4, 1])

# Evaluation

N = len(x) wmean = np.average(x, weights = w) variance = np.average((x-wmean)**2, weights = w) stdev = math.sqrt(variance)

# Presentation of the results:

print (''Number of values, N ='', N) print (''Weighted mean = '', wmean) print (''Weighted standard deviation of the sample ='', stdev) print (''Weighted standard deviation of the mean ='', stdev/math.sqrt(N-1))

# Results:

Number of values, N = 5 Weighted mean = 5.12363636364 Weighted standard deviation of the sample = 0.058352023766840865 Weighted standard deviation of the mean = 0.029176011883420432

Example 9.7 [R]

Solve Example 9.3 using R.

The vectors x and w have as their components the values of x and w, respectively:

> x <- c(5.05, 5.25, 5.16, 5.09, 5.17) > w <- c(2, 1, 3, 4, 1)

The weighted mean is found as

> wmean = weighted.mean(x,w) > wmean [1] 5.123636

The variance of the sample, \( s_{x}^{2} \), is the weighted mean of the quantity \( (x_{i} - \overline{x})^{2} \). This is found to be:

> variance = weighted.mean((x-wmean)^2, w) > variance [1] 0.003404959

The standard deviation of the sample is \( s_{x} = \sqrt {s_{x}^{2} } \)

> sqrt(variance) [1] 0.05835202

and the standard deviation of the mean is \( \sigma_{{\overline{x}}} = \frac{{s_{x} }}{{\sqrt {N - 1} }} = \frac{0.0584}{\sqrt 4 } = 0.0292 \).

Summarizing, \( \overline{x} = 5.124 \), \( s_{x} = 0.058 \) and \( \sigma_{{\overline{x}}} = 0.029 \), in agreement with the results of Example 9.3.

Let us suppose that we have N results of measurements, \( x_{i} \), each with its weight \( w_{i} \) and that they can be grouped in a number of Κ classes each of which consists of measurements with the same values of x and w:

$$ \underbrace {{\overbrace {{w_{1} x_{1} + w_{1} x_{1} + \ldots + w_{1} x_{1} }}^{{n_{1} \;{\text{terms}}}}}}_{k = 1} + \underbrace {{\overbrace {{w_{2} x_{2} }}^{{n_{2} \;{\text{terms}}}}}}_{k = 2} + \underbrace {{\overbrace {{w_{3} x_{3} + \ldots + w_{3} x_{3} }}^{{n_{3} \;{\text{terms}}}}}}_{k = 3} + \ldots + \underbrace {{\overbrace {{w_{K} x_{K} + w_{K} x_{K} + \ldots + w_{K} x_{K} }}^{{n_{K} \;{\text{terms}}}}}}_{k = K}. $$
(9.28)

It should be noted that any two x’s or any two w’s may be the same in two different groups but not both x and w may be the same in any two different groups. In such a case the two terms would be placed in the same group.

In this case, the numerator of Eq. (9.22) may be written as

$$ \sum\limits_{i = 1}^{N} {w_{i} x_{i} } = n_{1} w_{1} x_{1} + n_{2} w_{2} x_{2} + \ldots + n_{K} w_{K} x_{K} = \sum\limits_{k = 1}^{K} {n_{k} w_{k} x_{k} } . $$
(9.29)

Similarly,

$$ \sum\limits_{i = 1}^{N} {w_{i} } = n_{1} w_{1} + n_{2} w_{2} + \ldots + n_{K} w_{K} = \sum\limits_{k = 1}^{K} {n_{k} w_{k} } . $$
(9.30)

Therefore, the weighted mean is

$$ \overline{x} = \frac{{\sum\limits_{k = 1}^{K} {n_{k} w_{k} x_{k} } }}{{\sum\limits_{k = 1}^{K} {n_{k} w_{k} } }} $$
(9.31)

It is seen that the product \( n_{k} w_{k} \) replaces the weight \( w_{i} \) in estimating the weighted mean. In this sense it may be considered to be an active weight.

In a similar way, the weighted sample standard deviation is

$$ s_{x} = \sqrt {\;\frac{{\sum\limits_{k = 1}^{K} {n_{k} w_{k} (x_{k} - \overline{x})^{2} } }}{{\;\sum\limits_{k = 1}^{K} {n_{k} w_{k} } }}} . $$
(9.32)

The weighted population standard deviation is

$$ \sigma_{{\overline{x}}} = \sqrt {\;\frac{{\sum\limits_{k = 1}^{K} {n_{k} w_{k} (x_{k} - \overline{x})^{2} } }}{{(N - 1)\;\sum\limits_{k = 1}^{K} {n_{k} w_{k} } }}} , $$
(9.33)

where N is the number of x values with non-zero weight.

In a way similar to that of Eq. (9.23) we may define the normalized statistical weights

$$ \beta_{k} \equiv \frac{{n_{k} w_{k} }}{{\sum\limits_{k = 1}^{K} {n_{k} w_{k} } }}, $$
(9.34)

for which it is true that

$$ \sum\limits_{k} {\beta_{k} } = 1, $$
(9.35)

Equations (9.31)–(9.33) reduce to Eqs. (9.25)–(9.27), respectively.

Example 9.8

The results of \( N = 30 \) measurements \( x_{i} \), with their statistical weights \( w_{i} \), are grouped in \( K = 9 \) classes as shown in the table below.

\( k \)

1

2

3

4

5

6

7

8

9

\( n_{k} \)

2

1

4

7

6

3

4

1

2

\( x_{k} \)

4.60

4.70

4.80

4.90

5.00

5.10

5.20

5.30

5.40

\( w_{k} \)

3

2

4

4

3

4

1

2

2

Find the weighted mean of the results and its standard deviation.

We construct a table with the quantities \( k \), \( n_{k} \), \( x_{k} \), \( w_{k} \), \( n_{k} w_{k} \), \( n_{k} w_{k} x_{k} \), \( (x_{k} - \overline{x})^{2} \) and \( n_{k} w_{k} (x_{k} - \overline{x})^{2} \).

\( k \)

\( n_{k} \)

\( x_{k} \)

\( w_{k} \)

\( n_{k} w_{k} \)

\( n_{k} w_{k} x_{k} \)

\( (x_{k} - \overline{x})^{2} \)

\( n_{k} w_{k} (x_{k} - \overline{x})^{2} \)

1

2

4.6

3

6

27.6

0.1267

0.7602

2

1

4.7

2

2

9.40

0.0655

0.1310

3

4

4.8

4

16

76.8

0.0243

0.3888

4

7

4.9

4

28

137.2

0.0031

0.0868

5

6

5.0

3

18

90.0

0.0019

0.0342

6

3

5.1

4

12

61.2

0.0208

0.2496

7

4

5.2

1

4

20.8

0.0596

0.2384

8

1

5.3

2

2

10.6

0.1184

0.2368

9

2

5.4

2

4

21.6

0.1972

0.7888

Sums:

30

  

92

455.2

 

2.9146

The weighted mean is \( \overline{x} = \frac{{\sum\limits_{k = 1}^{K} {n_{k} w_{k} x_{k} } }}{{\sum\limits_{k = 1}^{K} {n_{k} w_{k} } }} = \frac{455.2}{92} = 4.948. \)

The weighted standard deviation of the sample is \( s_{x} = \sqrt {\;\frac{{\sum\limits_{k = 1}^{K} {n_{k} w_{k} (x_{k} - \overline{x})^{2} } }}{{\;\sum\limits_{k = 1}^{K} {n_{k} w_{k} } }}} = \sqrt {\frac{2.9146}{92}} = 0.1778 . \)

The weighted standard deviation of the mean is \( \sigma_{{\overline{x}}} = \frac{{s_{x} }}{{\sqrt {N - 1} }} \), where \( N = \;\sum\limits_{k = 1}^{9} {n_{k} } = 30 \). Therefore, \( \sigma_{{\overline{x}}} = \frac{0.1778}{{\sqrt {29} }} = 0.0330 \). The final result is: \( x = 4.948 \pm 0.033 \).

Example 9.9 [E]

Solve Example 9.8 using Excel®.

Comparing Eqs. (9.31) and (9.32) with Eqs. (9.22) and (9.26), it is obvious that this example is the same as Example 9.4 [E] if we replace \( w_{i} \) with \( n_{i} w_{i} \). We enter the values of \( n_{i} \), \( x_{i} \) and \( w_{i} \) in columns A, B and C, respectively. We calculate the values of \( n_{i} w_{i} \): In cell D1 we type = A1*C1 and press ENTER. We fill down to cell D9. Column D now contains the values of \( n_{i} w_{i} \).

We will first evaluate the weighted mean. Highlight an empty cell, say E1. Left click on cell E1 and write:

$$ {= } {\mathbf{SUMPRODUCT}}\left( {{\mathbf{B1}}{:}{\mathbf{B9}};{\mathbf{D1}}{:}{\mathbf{D9}}} \right)/{\mathbf{SUM}}\left( {{\mathbf{D1}}{:}{\mathbf{D9}}} \right) $$

Pressing ENTER will return the number 4.9478 in cell E1. This is the required mean, \( \overline{x} = 4.9478 \) mm.

We will give this number the name M. To do this, we right click on cell E1. In the dialog box that opens, we select Define Name… and in the cell for Name we write M.

We will now evaluate the weighted standard deviation. We first evaluate the terms \( (x_{r} - \overline{x})^{2} \). We highlight cell F1 and type: =(B1-M)^2. Pressing ENTER returns the number 0.120983 in cell F1. To fill cells F1 to F9 with the values of \( (x_{r} - \overline{x})^{2} \), we highlight cells F1-F9 and press Fill > Down .

To evaluate the standard deviation, we highlight an empty cell, say G1 and type

$$ {=} {\mathbf{SQRT}}\left( {{\mathbf{SUMPRODUCT}}\left( {{\mathbf{D1}}{:}{\mathbf{D9}};{\mathbf{F1}}{:}{\mathbf{F9}}} \right)/{\mathbf{SUM}}\left( {{\mathbf{D1}}{:}{\mathbf{D9}}} \right)} \right) $$

Pressing ENTER returns the number 0.177836.

The weighted standard deviation of the measurements is \( s_{x} = \sqrt {\,\frac{1}{{\sum\limits_{i = 1}^{N} {w_{i} } }}\sum\limits_{i = 1}^{N} {w_{i} (x_{i} - \overline{x})^{2} } } = 0.177836 \) and the weighted standard deviation of the mean is \( \sigma_{{\overline{x}}} = \frac{{s_{x} }}{{\sqrt {N - 1} }} \), where \( N = \;\sum\limits_{k = 1}^{9} {n_{k} } = 30. \) Therefore, \( \sigma_{{\overline{x}}} = \frac{0.1778}{{\sqrt {29} }} = 0.0330 \). The final result is: \( x = 4.948 \pm 0.033 \).

Example 9.10 [O]

Solve Example 9.8 using Origin®.

We enter \( n_{i} \), \( x_{i} \) and \( w_{i} \) in columns A(X), B(Y) and C(Y). Highlight column B by left-clicking on its label. Then

$$ {\mathbf{Statistics}} > {\mathbf{Descriptive}}\;{\mathbf{Statistics}} > {\mathbf{Statistics}}\;{\mathbf{on}}\;{\mathbf{Columns}} > {\mathbf{Open}}\;{\mathbf{Dialog}} \ldots $$

In the window that opens, in Input Data, Range 1, Data Range, column B is already selected. In Weighting Range, we select column C(Y).

In Quantities, we click Mean and Standard Deviation.

We open the window Computation Control. We select Weight Method, Direct Weight and Variance Divisor of Moment, WS. We press OK. The results are:

Weighted Mean \( \overline{x} = 4.94783 \), Weighted Standard Deviation of the Sample \( s_{x} = 0.17784 \).

We calculate the weighted standard deviation of the mean using the equation \( \sigma_{{\overline{x}}} = \frac{{s_{x} }}{{\sqrt {N - 1} }} \), where \( N = \;\sum\limits_{k = 1}^{9} {n_{k} } = 30 \). Therefore, \( \sigma_{{\overline{x}}} = \frac{0.1778}{{\sqrt {29} }} = 0.0330 \). The final result is: \( x = 4.948 \pm 0.033 \).

Example 9.11 [P]

Solve Example 9.8 using Python.

from __future__ import division import numpy as np import math

# Enter values of members of the groups:

n = np.array([2, 1, 4, 7, 6, 3, 4, 1, 2])

# Enter the values given as the components of the vector x:

x = np.array([4.6, 4.7, 4.8, 4.9, 5, 5.1, 5.2, 5.3, 5.4])

# Enter the corresponding weights w of the x values:

wt = np.array([3, 2, 4, 4, 3, 4, 1, 2, 2])

# "Active" weights:

w = n*wt

# Evaluation

G = len(x) N = sum(n) wmean = np.average(x, weights = w) variance = np.average((x-wmean)**2, weights = w) stdev = math.sqrt(variance)

# Presentation of the results

print (''Number of groups, G ='', G) print (''Number of measurements, N ='', N) print (''Weighted mean ='', wmean) print (''Weighted standard deviation of the sample ='', stdev) print (''Weighted standard deviation of the mean ='', stdev/math.sqrt(N-1))

# Results:

Number of groups, G = 9 Number of measurements, N = 30 Weighted mean = 4.94782608696 Weighted standard deviation of the sample = 0.17783618553232663 Weighted standard deviation of the mean = 0.033023350612542336

Example 9.12 [R]

Solve Example 9.8 using R.

Comparing Eqs. (9.31) and (9.32) with Eqs. (9.22) and (9.26), it is obvious that this example is the same as Example 9.4 [E] if we use as weights the values \( W_{k} = n_{k} w_{k} \).

\( k \)

1

2

3

4

5

6

7

8

9

\( n_{k} \)

2

1

4

7

6

3

4

1

2

\( x_{k} \)

4.60

4.70

4.80

4.90

5.00

5.10

5.20

5.30

5.40

\( w_{k} \)

3

2

4

4

3

4

1

2

2

\( W_{k} = n_{k} w_{k} \)

6

2

16

28

18

12

4

2

4

We define the vectors

> x <- c(4.60, 4.70, 4.80, 4.90, 5, 5.10, 5.20, 5.30, 5.40) > W <- c(6, 2, 16, 28, 18, 12, 4, 2, 4) >

and find the weighted mean

> Wmean = weighted.mean(x,W) > Wmean [1] 4.947826 >

The variance of the sample, \( s_{x}^{2} \), is the weighted mean of the quantity \( (x_{i} - \overline{x})^{2} \). This is found to be:

> variance = weighted.mean((x-Wmean)^2, W) > variance [1] 0.03162571 >

The standard deviation of the sample is \( s_{x} = \sqrt {s_{x}^{2} } \)

> sqrt(variance) [1] 0.1778362

We calculate the weighted standard deviation of the mean using the equation \( \sigma_{{\overline{x}}} = \frac{{s_{x} }}{{\sqrt {N - 1} }} \), where \( N = \;\sum\limits_{k = 1}^{9} {n_{k} } = 30 \). Therefore, \( \sigma_{{\overline{x}}} = \frac{0.1778}{{\sqrt {29} }} = 0.0330 \).

Summarizing the results: \( \overline{x} = 4.948 \), \( s_{x} = 0.1778 \) and \( \sigma_{{\overline{x}}} = 0.033 \).

In the case when the statistical weight of measurement \( x_{i} \) is, according to Eq. (9.21), equal to \( w_{i} = \frac{1}{{\sigma_{i}^{2} }} \), then, from Eq. (9.22) we have for the weighted mean

$$ \overline{x} = \frac{{\sum\limits_{i = 1}^{N} {x_{i} /\sigma_{i}^{2} } }}{{\sum\limits_{i = 1}^{N} {1/\sigma_{i}^{2} } }}, $$
(9.28)

for the weighted standard deviation of the measurements

$$ s_{x} = \sqrt {\,\, \frac{{\sum\limits_{i = 1}^{N} {(x_{i} - \overline{x})^{2} /\sigma_{i}^{2} } }}{{\,\,\sum\limits_{i = 1}^{N} {1/\sigma_{i}^{2} } }}} $$
(9.29)

and for the weighted standard deviation of the mean

$$ \sigma_{{\overline{x}}} = \sqrt {\,\, \frac{{\sum\limits_{i = 1}^{N} {(x_{i} - \overline{x})^{2} /\sigma_{i}^{2} } }}{{(N - 1)\,\,\sum\limits_{i = 1}^{N} {1/\sigma_{i}^{2} } }}} $$
(9.30)

The same is true when we have the mean values \( \overline{x}_{1} \), \( \overline{x}_{2} \), …, \( \overline{x}_{r} \), …, \( \overline{x}_{M} \), of Μ different series of measurements, which have standard deviations \( \sigma_{{\overline{x}_{1} }} ,\, \sigma_{{\overline{x}_{2} }} , \ldots ,\, \sigma_{{\overline{x}_{r} }} , \ldots ,\, \sigma_{{\overline{x}_{M} }} \), respectively. In this case, the statistical weight of each mean is the inverse of the square of its standard deviation. The means have a (general) mean

$$ \overline{{(\overline{x})}} = \frac{{\sum\limits_{r = 1}^{M} {\overline{x}_{r} /\sigma_{{\overline{x}_{r} }}^{2} } }}{{\sum\limits_{r = 1}^{M} {1/\sigma_{{\overline{x}_{r} }}^{2} } }} $$
(9.31)

while the standard deviation of this general mean is

$$ \sigma_{{\overline{{(\bar{x})}} }} = \sigma \left( {\overline{{(\bar{x})}} } \right) = \sqrt {\frac{{\sum\limits_{r = 1}^{M} {\left[ {\bar{x}_{r} - \overline{{(\bar{x})}} } \right]^{2} /\sigma_{{\bar{x}_{r} }}^{2} } }}{{\sum\limits_{r = 1}^{M} {1/\sigma_{{\bar{x}_{r} }}^{2} } }}} . $$
(9.32)

Example 9.13

Three experiments for the determination of the speed of light in vacuum gave the following results, in m/s:

$$ c_{1} = 299\, 792\, 459.3 \pm 1.6,\quad c_{2} = 299\, 792\;457.82 \pm 0.86,\quad c_{3} = 299\;792\;458.4 \pm 1.1. $$

Find the weighted mean of these results and its standard deviation, taking as weights the inverses of the square of the standard deviation in each case.

Since the numbers are given with many digits, to avoid loss of accuracy, we subtract from all of them the number \( c_{0} = 299\;792\,40\;{\text{m}}/{\text{s}} \) and work with the smaller numbers that remain, x.

i

\( c_{i} \) (m/s)

\( x_{i} \) (m/s)

\( \sigma_{i} \) (m/s)

\( 1/\sigma_{i}^{2} \) (m/s)2

\( \beta_{i} \)

\( \beta_{i} x_{i} \)

(m/s)

\( c_{i} - \overline{c} \)

(m/s)

\( (c_{i} - \overline{c})^{2} \)

\( \beta_{i} (c_{i} - \overline{c})^{2} \) (m/s)2

1

2

3

299 792 459.3

299 792 457.82

299 792 458.4

9.3

7.82

8.4

1.60

0.86

1.10

0.3906

1.3521

0.8264

0.1520

0.5263

0.3217

1.414

4.116

2.702

1.068

–0.412

–0.168

1.141

0.170

0.0282

0.1734

0.0895

0.0091

Sums

 

2.5691

1

8.232

  

0.272

The weighted mean of the results is \( \overline{c} = c_{0} + \overline{x} = c_{0} + \sum\limits_{i} {\beta_{i} x_{i} } = 299\,792\,450 + 8.232 = 299\,792\,458.232{\text{ m/s}} \).

From the sum \( \sum\limits_{i} {\beta_{i} (c_{i} - \overline{c})^{2} } = 0.272 \) (m/s)2, we find that \( s_{c} = \sqrt {\sum\limits_{i} {\beta_{i} (c_{i} - \overline{c})^{2} } } = \sqrt {0.272} = 0.52 \) m/s.

By Eq. 9.27, the standard deviation of the mean is \( \sigma_{{\overline{c}}} = \frac{{s_{c} }}{{\sqrt {N - 1} }} = \sqrt {\frac{0.272}{2}} = 0.37 \) m/s.

Therefore, \( c = 299\;792\;458.23\; \pm 0. 3 7\;{\text{m}}/{\text{s}}. \)

Example 9.14 [E]

Solve Example 9.13 using Excel®.

i

\( c_{i} \) (m/s)

\( x_{i} \) (m/s)

\( \sigma_{i} \) (m/s)

1

2

3

299 792 459.3

299 792 457.82

299 792 458.4

9.3

7.82

8.4

1.60

0.86

1.10

Acting as above, we subtract from all the values the quantity \( c_{0} = \) 299 792 450 m/s and work with the smaller numbers that remain, x. We enter \( x_{i} \) and \( \sigma_{i} \) in cells A2-A4 and B2-B4, respectively. We will evaluate the weights to be used, \( w_{i} = 1/\sigma_{i}^{2} \). We highlight cell C2 and type in it =1/(B2)^2. We Fill Down to cell C4. Column C now contains the values of \( w_{i} \).

We will first evaluate the weighted mean. Highlight an empty cell, say E2. Left click on cell E2 and write:

$$ {=} {\mathbf{SUMPRODUCT}}\left( {{\mathbf{A2}}{:}{\mathbf{A4}};{\mathbf{C2}}{:}{\mathbf{C4}}} \right)/{\mathbf{SUM}}\left( {{\mathbf{C2}}{:}{\mathbf{C4}}} \right) $$

Pressing ENTER will return the number 8.2316 in cell E2. We will give this number the name M. To do this, we right click on cell E2. In the dialog box that opens, we select Define Name… and in the cell for Name we write M.

The weighted mean of the results is \( \overline{c} = c_{0} + \overline{x} = 299\,792\,450 + 8.2316 = 299\,792\,458.232{\text{ m/s}} \).

We will now evaluate the weighted standard deviation. We first evaluate the terms \( (x_{i} - \overline{x})^{2} \). We highlight cell F2 and type: =(A2-M)^2. Pressing ENTER returns the number 0.141478 in cell F2. To fill cells F2 to F4 with the values of \( (x_{i} - \overline{x})^{2} \), we highlight cells F2-F4 and press Fill > Down.

To evaluate the standard deviation, we highlight an empty cell, say G2 and type

$$ {=} {\mathbf{SQRT}}\left( {{\mathbf{SUMPRODUCT}}\left( {{\mathbf{C2}}{:}{\mathbf{C4}};{\mathbf{F2}}{:}{\mathbf{F4}}} \right)/{\mathbf{SUM}}\left( {{\mathbf{C2}}{:}{\mathbf{C4}}} \right)} \right) $$

Pressing ENTER returns the number 0.5214. The weighted standard deviation of the measurements is \( s_{c} = \sqrt {\,\frac{1}{{\sum\limits_{i = 1}^{N} {w_{i} } }}\sum\limits_{i = 1}^{N} {w_{i} (x_{i} - \overline{x})^{2} } } = 0.5214 \). The standard deviation of the mean is \( \sigma_{{\overline{c}}} = \frac{{s_{c} }}{{\sqrt {N - 1} }} = \frac{0.5214}{\sqrt 2 } = 0.3687 \) m/s. Therefore, c = 299 792 458.23 \( \pm \) 0.37 m/s.

Example 9.15 [O]

Solve Example 9.13 using Origin®.

i

\( c_{i} \) (m/s)

\( x_{i} \)

(m/s)

\( \sigma_{i} \)

(m/s)

1

2

3

299 792 459.3

299 792 457.82

299 792 458.4

9.3

7.82

8.4

1.60

0.86

1.10

Acting as above, we subtract from all the measurements the quantity \( c_{0} = \) 299 792 450 m/s and work with the smaller numbers that remain, x. We enter \( x_{i} \) and \( \sigma_{i} \) in columns A and B, respectively.

  • Highlight column A and then: Column > Set As > Y

  • Highlight column B and then: Column > Set As > Y Error

Highlight columns A and B and then,

$$ {\mathbf{Statistics}} > {\mathbf{Descriptive}}\,{\mathbf{Statistics}} > {\mathbf{Statistics}}\,{\mathbf{on}}\,{\mathbf{Columns}} > {\mathbf{Open}}\,{\mathbf{Dialog}} \ldots $$

In the window that opens,

$$ {\mathbf{Input}} > {\mathbf{Input}}\,{\mathbf{Data}} > {\mathbf{Range}}\,{\mathbf{1}} > {\mathbf{Weighting}}\,{\mathbf{Range}} > {\mathbf{B}}({\mathbf{E}}) $$

Open the Quantities window and tick: Mean, Standard Deviation

Open the Computation Control window and Weight Method > Instrumental

The last choice sets the weight of each measurement \( x_{i} \) equal to \( w_{i} = 1/\sigma_{i}^{2} \), where \( \sigma_{i} \) is the error in \( x_{i} \). Then,

$$ {\mathbf{Variance}}\,{\mathbf{Divisor}}\,{\mathbf{of}}\,{\mathbf{Moment}} > {\mathbf{WS}} $$

The last choice sets the denominator of Eq. (9.29) equal to \( w = \sum\limits_{i} {1/\sigma_{i}^{2} } \).

Pressing OK we obtain the results (for column A):

[Mean] \( = \overline{x} = 8.2316{\text{ m/s}} \), [Standard Deviation] \( = s_{c} = s_{x} = 0.52138{\text{ m/s}} \)

By Eq. 9.27, the standard deviation of the mean is \( \sigma_{{\overline{c}}} = \frac{{s_{c} }}{{\sqrt {N - 1} }} = \frac{0.52138}{\sqrt 2 } = 0.36867 \) m/s.

The final result is \( c = \) 299 792 458.23 \( \pm \) 0.37 m/s, in agreement with the results of Example 9.4.

Example 9.16 [P]

Three experiments for the determination of the speed of light in vacuum gave the following results, in m/s:

$$ c_{1} = 299\, 792\, 459.3 \pm 1.6,\quad c_{2} = 299\, 792\, 457.82 \pm 0.86,\quad c_{3} = 299\;792\;458.4 \pm 1.1. $$

Find the weighted mean of these results and its standard deviation, taking as weights the inverses of the square of the standard deviation in each case.

from __future__ import division import numpy as np import math

# Enter the values given as the components of the vector x:

x = np.array([299792459.3, 299792457.82, 299792458.4])

# Enter the values of the errors corresponding to the values of x:

s = np.array([1.6, 0.86, 1.1])

# Evaluation:

# Evaluate the corresponding weights w of the x values:

w = 1/(s*s) N = len(x) wmean = np.average(x, weights = w) variance = np.average((x-wmean)**2, weights = w) stdev = math.sqrt(variance)

# Presentation of the results:

print (''Number of values, N ='', N) print (''Weighted mean ='', wmean) print (''Weighted standard deviation of the mean ='', stdev/math.sqrt(N-1))

# Results:

Number of values, N = 3 Weighted mean = 299792458.232 Weighted standard deviation of the mean = 0.36867082350704317

The final result is \( c = \) 299 792 458.23 \( \pm \) 0.37 m/s.

Example 9.17 [R]

Three experiments for the determination of the speed of light in vacuum gave the following results, in m/s:

$$ c_{1} = 299\, 792\, 459.3 \pm 1.6,\quad c_{2} = 299\, 792\, 457.82 \pm 0.86,\quad c_{3} = 299\, 792\, 458.4 \pm 1.1. $$

Find the weighted mean of these results and its standard deviation, taking as weights the inverses of the square of the standard deviation in each case.

i

\( c_{i} \) (m/s)

\( x_{i} \)

(m/s)

\( \sigma_{i} \)

(m/s)

1

2

3

299 792 459.3

299 792 457.82

299 792 458.4

9.3

7.82

8.4

1.60

0.86

1.10

We form the vector x with the values of \( x_{i} = c_{i} - c_{0} \) as components and s with the probable errors in \( c_{i} \) (or \( x_{i} \)).

> x <- c(9.3, 7.82, 8.4) > s <- c(1.60, 0.86, 1.10) > w <- c(1/s^2) > w [1] 0.3906250 1.3520822 0.8264463

The weighted mean of x is:

> weighted.mean(x, w) [1] 8.2316 > wmean = weighted.mean(x, w)

The variance of the sample, \( s_{x}^{2} \), is the weighted mean of the quantity \( (x_{i} - \overline{x})^{2} \), and the standard deviation of the sample is \( s_{x} = \sqrt {s_{x}^{2} } \):

> variance <- weighted.mean((x-wmean)^2, w) > variance [1] 0.2718363 > stdev <- sqrt(variance) > stdev [1] 0.5213793

We calculate the weighted standard deviation of the mean using the equation \( \sigma_{{\overline{x}}} = \frac{{s_{x} }}{{\sqrt {N - 1} }} \). Therefore, \( \sigma_{{\overline{x}}} = \frac{0.5213793}{\sqrt 2 } = 0.368671 \). The final result is \( c = \) 299 792 458.23 \( \pm \) 0.37 m/s.

9.5 The Joint Probability Density for Two Random Variables

We will now examine the probability density of a random variable which is a function of two other random variables. To avoid confusion, we will adopt the following notation [9]:

  • A random variable is denoted by a bold letter, \( {\mathbf{x}} \), and the values it takes by italic \( x \).

  • The probability density of the random variable \( {\mathbf{x}} \) is denoted by \( f_{{\mathbf{x}}} (x) \).

  • The probability for the random variable \( {\mathbf{x}} \) to take a value which is smaller than or equal to \( x \) is denoted by \( P\{ {\mathbf{x}} \le x\} \).

  • The distribution function of the random variable \( {\mathbf{x}} \) is denoted by \( F_{{\mathbf{x}}} (x) \) and is equal to the probability for the random variable \( {\mathbf{x}} \) to take a value which is equal to or smaller than \( x \). Obviously,

  • $$ F_{{\mathbf{x}}} (x) = P\{ {\mathbf{x}} \le x\} $$
    (9.33)

It is

$$ f_{{\mathbf{x}}} (x) = \frac{{{\text{d}}F_{{\mathbf{x}}} (x)}}{{{\text{d}}x}}. $$
(9.34)

The probability for the random variable \( {\mathbf{x}} \) to have a value larger than \( x_{1} \) and smaller than or equal to \( x_{2} \), where \( x_{1} < x_{2} \), is denoted by \( P\{ x_{1} < {\mathbf{x}} \le x_{2} \} \). Obviously, it is

$$ P\{ x_{1} < {\mathbf{x}} \le x_{2} \} = P\{ {\mathbf{x}} \le x_{2} \} - P\{ {\mathbf{x}} \le x_{1} \} . $$
(9.35)

The joint or common probability density function of the variables \( {\mathbf{x}} \) and \( {\mathbf{y}} \), denoted by \( f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) \), is such that the probability for the random variable \( {\mathbf{x}} \) to have a value between \( x \) and \( x + {\text{d}}x \) and the random variable \( {\mathbf{y}} \) to have a value between \( y \) and \( y + {\text{d}}y \) is equal to \( f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y)\,{\text{d}}x\,{\text{d}}y \).

The function \( f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) \) is said to be normalized if it is

$$ \int\nolimits_{-\infty }^{\infty } {\int\nolimits_{ - \infty }^{\infty } {f_{{{\varvec{x}},{\varvec{y}}}} (x,y)\,{\text{d}}x\,{\text{d}}y = 1} } . $$
(9.36)

The joint or common distribution function of the variables \( {\mathbf{x}} \) and \( {\mathbf{y}} \) is denoted by \( F_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) \) and is equal to the probability for the random variable \( {\mathbf{x}} \) to have a value smaller than or equal to \( x \) and the random variable \( {\mathbf{y}} \) to have a value smaller than or equal to \( y \). It is

$$ F_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) \equiv P\{ {\mathbf{x}} \le x,\, {\mathbf{y}} \le y\} . $$
(9.37)

The following relations are considered to be obvious [10]:

$$ f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) = \frac{{\partial^{2} F_{{{\mathbf{x}},{\mathbf{y}}}} (x,y)}}{\partial x\, \partial y} $$
(9.38)
$$ f_{{\bf{x}}} (x) = \int\nolimits_{ - \infty }^{\infty } {f_{{{\bf{x}},{\bf{y}}}} (x,y)\, {\text{d}}y} \quad f_{{\bf{y}}} (y) = \int\nolimits_{ - \infty }^{\infty } {f_{{{\bf{x}},{\bf{y}}}} (x,y)\, {\text{d}}x} $$
(9.39)
$$ F_{{\bf{x}}} (x) = F_{{{\bf{x}},{\bf{y}}}} (x,\,\infty ) = \int\nolimits_{ - \infty }^{\infty } {{\text{d}}y\,\int\nolimits_{ - \infty }^{x} {f_{{{\bf{x}},{\bf{y}}}} (\chi ,y)\,{\text{d}}\chi \,} } $$
(9.40)
$$ F_{{\bf{y}}} (x) = F_{{{\bf{x}},{\bf{y}}}} (\infty ,\,y) = \int\nolimits_{ - \infty }^{\infty } {\,{\text{d}}x\,\int\nolimits_{ - \infty }^{y} {f_{{{\bf{x}},{\bf{y}}}} (x,\psi )\,{\text{d}}\psi } }. $$
(9.41)

Let the random variable \( {\mathbf{x}} \) have probability density \( f_{{\mathbf{x}}} (x) \) and the variable \( {\mathbf{y}} \) have probability density \( f_{{\mathbf{y}}} (y) \). These are known as marginal probability densities . If the variables \( {\mathbf{x}} \) and \( {\mathbf{y}} \) are independent of each other, then the probability for the random variable \( {\mathbf{x}} \) to have a value between \( x \) and \( x + {\text{d}}x \) and the random variable \( {\mathbf{y}} \) to have value between \( y \) and \( y + {\text{d}}y \) is equal to the product of the two separate probabilities, i.e.

$$ f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y)\,{\text{d}}x\,{\text{d}}y = f_{{\mathbf{x}}} (x)f_{{\mathbf{y}}} (y)\,{\text{d}}x\,{\text{d}}y. $$
(9.42)

If \( {\mathbf{x}} \) and \( {\mathbf{y}} \) are normally distributed, with means and standard deviations \( \mu_{x} ,\,\,\sigma_{x} \) and \( \mu_{y} ,\,\,\sigma_{y} \), respectively, then

$$ f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) = \frac{1}{{2\uppi \sigma_{x} \sigma_{y} }}\, {\text{e}}^{{ - \frac{{(x - \mu_{x} )^{2} }}{{2\sigma_{x}^{2} }} - \frac{{(y - \mu_{y} )^{2} }}{{2\sigma_{y}^{2} }}}} . $$
(9.43)

This function of the two variables, \( x \) and \( y \), has been drawn, in contour form, in Fig. 9.7.

Fig. 9.7
figure 7

The joint probability density function \( f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) \) for the normally distributed random variables \( {\mathbf{x}} \) and \( {\mathbf{y}} \), which have means and standard deviations \( \mu_{x},\,\sigma_{x} \) and \( \mu_{y},\,\sigma_{y} \), respectively. Three ellipses of constant \( f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) = c \) are shown in the figure. Also shown are the marginal probability densities \( f_{{\mathbf{x}}} (x) \) and \( f_{{\mathbf{y}}} (y) \)

Drawn in the figure are:

  1. 1.

    Curves of constant values \( f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) = c \), which are ellipses with center at the point (\( \mu_{x} ,\,\,\mu_{y} \)). Putting \( f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) = c \) in Eq. (9.43), we find the equations of these ellipses to be

    $$ \frac{{(x - \mu_{x} )^{2} }}{{\sigma_{x}^{2} }} + \frac{{(y - \mu_{y} )^{2} }}{{\sigma_{y}^{2} }} = - 2\, \ln (2\uppi \sigma_{x} \sigma_{y} c). $$
    (9.44)
  2. 2.

    The marginal probability densities

    $$ f_{{\mathbf{x}}} (x) = \frac{1}{{\sqrt {2\uppi } \sigma_{x} }}\, {\text{e}}^{{ - \frac{{(x - \mu_{x} )^{2} }}{{2\sigma_{x}^{2} }}}} \quad {\text{and}}\quad f_{{\mathbf{y}}} (y) = \frac{1}{{\sqrt {2\uppi } \sigma_{y} }}\, {\text{e}}^{{ - \frac{{(y - \mu_{y} )^{2} }}{{2\sigma_{y}^{2} }}}} , $$
    (9.45)

    situated at the upper part of the figure and the right-hand side, respectively.

  3. 3.

    The surface element \( \,{\text{d}}x\,{\text{d}}y \) used in the evaluation of the probability \( f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y)\,{\text{d}}x\,{\text{d}}y \) for the random variable \( {\mathbf{x}} \) to have a value between \( x \) and \( x + {\text{d}}x \) and the random variable \( {\mathbf{y}} \) to have a value between \( y \) and \( y + {\text{d}}y \).

  4. 4.

    The parallelogram lying between the values of \( x_{1} \) and \( x_{2} \), and \( y_{1} \) and \( y_{2} \), over which the function \( f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) \) must be integrated for the evaluation of the probability for the random variable \( {\mathbf{x}} \) to have a value greater than \( x_{1} \) and smaller than or equal to \( x_{2} \) and the random variable \( {\mathbf{y}} \) to have a value greater than \( y_{1} \) and smaller than or equal to \( y_{2} \). This probability is equal to

    $$ P\{ x_{1} < {\mathbf{x}} \le x_{2} ,\;y_{1} < {\mathbf{y}} \le y_{2} \} = \frac{1}{{2\uppi \sigma_{x} \sigma_{y} }}\, \int\nolimits_{{x_{1} }}^{{x_{2} }} {{\text{e}}^{{ - \frac{{(x - \mu_{x} )^{2} }}{{2\sigma_{x}^{2} }}}} {\text{d}}x} \int\nolimits_{{y_{1} }}^{{y_{2} }} {{\text{e}}^{{ - \frac{{(y - \mu_{y} )^{2} }}{{2\sigma_{y}^{2} }}}} {\text{d}}y} . $$
    (9.46)

Example 9.18

For the distribution of Eq. (9.43) and Fig. 9.7, find the probability that a point (x, y) lies within the ellipse with center the point (\( \mu_{x} ,\,\,\mu_{y} \)) and semi-axes equal to \( \sigma_{x} \) and \( \sigma_{y} \), along the respective axes.

The joint probability density function is \( f_{{{\mathbf{x}},{\mathbf{y}}}} (x,y) = \frac{1}{{2\uppi \sigma_{x} \sigma_{y} }}\, {\text{e}}^{{ - \frac{{(x - \mu_{x} )^{2} }}{{2\sigma_{x}^{2} }} - \frac{{(y - \mu_{y} )^{2} }}{{2\sigma_{y}^{2} }}}} \). The probability that a point lies in the surface element \( \text{d}x\,\text{d}y \) about the point \( \text{(}x,\,y) \) is

$$ \text{d}^{2} P = \frac{1}{{2\uppi \sigma_{x} \sigma_{y} }}\,{\text{e}}^{{ - \frac{{(x - \mu_{x} )^{2} }}{{2\sigma_{x}^{2} }} - \frac{{(y - \mu_{y} )^{2} }}{{2\sigma_{y}^{2} }}}} \text{d}x\,\text{d}y. $$

The probability that a point (x, y) lies within the ellipse with center the point (\( \mu_{x} ,\,\,\mu_{y} \)) and semi-axes equal to \( \sigma_{x} \) and \( \sigma_{y} \), along the respective axes is found by integrating this over the surface of the ellipse, as sown in figure (a):

$$ P_{{\sigma_{x} ,\sigma_{y} }} = \frac{1}{{2\uppi \sigma_{x} \sigma_{y} }}\;\iint\limits_{{\text{ellipse}}} {{\text{e}}^{{ - \frac{{(x - \mu_{x} )^{2} }}{{2\sigma_{x}^{2} }} - \frac{{(y - \mu_{y} )^{2} }}{{2\sigma_{y}^{2} }}}} \text{d}x\,\text{d}y}. $$

We change the variables to \( \chi = \frac{{x - \mu_{x} }}{{\sigma_{x} }} \) and \( \psi = \frac{{y - \mu_{y} }}{{\sigma_{y} }} \). Then

$$ \text{d}^{2} P = \frac{1}{{2\uppi }}\, {\text{e}}^{{ - \chi^{2} /2 - \psi^{2} /2}} \text{d}\chi \,\text{d}\psi $$

and the ellipse transforms into the circle \( \chi^{2} + \psi^{2} = 1 \). The surface integral is now, as shown in figure (b),

$$ P_{{\sigma_{x} ,\,\,\sigma_{y} }} = \frac{1}{{2\uppi }}\, \iint\limits_{{\text{circle}}} {{\text{e}}^{{ - \chi^{2} /2 - \psi^{2} /2}} \text{d}\chi \,\text{d}\psi } = 4 \times \frac{1}{{2\uppi }}\int\nolimits_{0}^{1} {{\text{e}}^{{ - \psi^{2} /2}} \,\text{d}\psi } \;\int\nolimits_{0}^{{\sqrt {1 - \psi^{2} } }} {{\text{e}}^{{ - \chi^{2} /2}} \text{d}\chi } . $$

Since

$$ \int\nolimits_{0}^{{\sqrt {1 - \psi^{2} } }} {{\text{e}}^{{ - \chi^{2} /2}} \text{d}\chi } = \sqrt {\frac{2}{\pi }} \;\text{erf}\left( {\frac{{\sqrt {1 - \psi^{2} } }}{\sqrt 2 }} \right), $$

we have

$$ P_{{\sigma_{x} ,\,\,\sigma_{y} }} = \sqrt {\frac{2}{\pi }} \;\int\nolimits_{0}^{1} {\text{e}^{{ - \psi^{2} /2}} \;\text{erf}\left( {\frac{{\sqrt {1 - \psi^{2} } }}{\sqrt 2 }} \right)} \;{\text{d}}\psi . $$

We could not find this integral in the tables, so we resorted to numerical integration. This gave:

$$ P_{{\sigma_{x} ,\,\,\sigma_{y} }} = 0.394. $$

The probability that a point (x, y) lies within the ellipse with center the point (\( \mu_{x} ,\,\,\mu_{y} \)) and semi-axes equal to \( \sigma_{x} \) and \( \sigma_{y} \), along the respective axes is, therefore 39.4%.

  • For the \( 2\sigma_{x} \), \( 2\sigma_{y} \) ellipse it is \( P_{{2\sigma_{x} ,\,\,2\sigma_{y} }} = 0.865 \).

  • For the \( 3\sigma_{x} \), \( 3\sigma_{y} \) ellipse it is \( P_{{3\sigma_{x} ,\,\,3\sigma_{y} }} = 0.989 \).

This last result states that 99% of the points lie within the ellipse with center the point (\( \mu_{x} ,\,\,\mu_{y} \)) and semi-axes equal to \( 3\sigma_{x} \) and \( 3\sigma_{y} \), along the respective axes. The percentages may be remembered as 40-90-99. These results are illustrated in the figure below.

9.6 The Probability Density of the Sum of Two Random Variables

Let the random variable \( {\mathbf{z}} = {\mathbf{x}} + {\mathbf{y}} \) take values \( z = x + y \), where the random variables \( {\mathbf{x}} \) and \( {\mathbf{y}} \) are normally distributed. The probability density \( f_{{\mathbf{z}}} (z) = f_{{{\mathbf{x}} + {\mathbf{y}}}} (x + y) \) is required.

Figure 9.8 shows the probability densities \( f_{{\mathbf{x}}} (x) \) and \( f_{{\mathbf{y}}} (y) \) of \( {\mathbf{x}} \) and \( {\mathbf{y}} \) [(a) and (b) respectively]. In Fig. 9.8a, the area of the strip under the curve \( f_{{\mathbf{x}}} (x) \) between \( x \) and \( x + {\text{d}}x \) gives the probability for \( {\mathbf{x}} \) to have a value between \( x \) and \( x + {\text{d}}x \). In Fig. 9.8b, the area of the region under the curve \( f_{{\mathbf{y}}} (y) \) and in the region of values \( - \infty < {\mathbf{y}} \le y \), gives the probability for \( {\mathbf{y}} \) to have a value in the region \( - \infty < {\mathbf{y}} \le y \).

Fig. 9.8
figure 8

The probability densities \( f_{{\mathbf{x}}} (x) \) and \( f_{{\mathbf{y}}} (y) \) of the random variables \( {\mathbf{x}} \) and \( {\mathbf{y}} \)

Due to the independence of the random variables \( {\mathbf{x}} \) and \( {\mathbf{y}} \) from each other, the probability for \( {\mathbf{x}} \) to have a value between \( x \) and \( x + {\text{d}}x \) and \( {\mathbf{y}} \) to have a value smaller than or equal to \( y \) is

$$ \begin{aligned} P\{ x < {\textbf{x}} \le x + {\text{d}}x,\;{\textbf{y}} \le y\} & = P\{ x < {\textbf{x}} \le x + {\text{d}}x\} \,\,P\{ {\textbf{y}} \le y\} \\ & = f_{{\textbf{x}}} (x)\, {\text{d}}x\, F_{{\varvec{y}}} (y) = f_{{\textbf{x}}} (x)\, {\text{d}}x\, \int\nolimits_{ - \infty }^{y} {f_{{\textbf{y}}} (y)\, {\text{d}}y} . \\ \end{aligned} $$
(9.47)

If it is \( z = x + y \), this is the probability for the random variable \( {\mathbf{x}} \) to have a value between \( x \) and \( x + dx \) and the sum \( {\mathbf{z}} = {\mathbf{x}} + {\mathbf{y}} \) to have a value smaller than or equal to \( z \):

$$ P\{ x < {\textbf{x}} \le x + {\text{d}}x,\;{\textbf{z}} \le z\} = f_{{\textbf{x}}} (x)\, dx\, \int\nolimits_{ - \infty }^{z - x} {f_{{\textbf{y}}} (y)\, {\text{d}}y} . $$
(9.48)

The probability for the random variable \( {\mathbf{x}} \) to have any value and the sum \( {\mathbf{z}} = {\mathbf{x}} + {\mathbf{y}} \) to have a value which is smaller than or equal to \( z \), i.e. the probability for \( {\mathbf{z}} \) to have a value which is smaller than or equal to \( z \), is

$$ F_{{\textbf{z}}} (z) = \int\nolimits_{ - \infty }^{\infty } {f_{{\textbf{x}}} (x)\, {\text{d}}x} \, \int\nolimits_{ - \infty }^{z - x} {f_{{\textbf{y}}} (y)\, {\text{d}}y} . $$
(9.49)

The geometrical interpretation of this relation is seen in Fig. 9.9. The double integral of Eq. (9.49) gives the probability which corresponds to the integration of the function \( f_{{\mathbf{x}}} (x)f_{{\mathbf{y}}} (y)\, \) over the shaded region and under the straight line \( x + y = z \). In the shaded region, it is \( x + y \le z \). The magnitude \( {\text{d}}x\, \int_{ - \infty }^{z - x} {f_{{\mathbf{y}}} (y)\, {\text{d}}y} \) is evaluated in the strip of Fig. 9.9 between \( x \) and \( x + {\text{d}}x \). Integrating then for all the values of \( x \), we cover all the shaded region of the figure (\( x + y \le z \)) and we find \( F_{{\mathbf{z}}} (z) \).

Fig. 9.9
figure 9

The region of integration of the function \( f_{{\mathbf{x}}} (x)f_{{\mathbf{y}}} (y)\, \) (shaded region) for the evaluation of the distribution function \( F_{{\mathbf{z}}} (z) \) of the sum \( {\mathbf{z}} = {\mathbf{x}} + {\mathbf{y}} \)

The probability density for \( {\mathbf{z}} \) is found by differentiating of \( F_{{\mathbf{z}}} (z) \) with respect to \( z \):

$$ f_{{\textbf{z}}} (z) = \frac{{{\text{d}}F_{{\textbf{z}}} (z)}}{{{\text{d}}z}} = \int\nolimits_{ - \infty }^{\infty } {f_{{\textbf{x}}} (x)\, {\text{d}}x} \;\left\{ {\frac{\partial }{\partial z}\int\nolimits_{ - \infty }^{z - x} {f_{{\textbf{y}}} (y){\text{d}}y} } \right\} = \int\nolimits_{ - \infty }^{\infty } {f_{{\textbf{x}}} (x)\, {\text{d}}x} \;\left\{ {f_{{\textbf{y}}} (z - x)} \right\}. $$
(9.50)

Therefore,

$$ f_{{\textbf{z}}} (z) = \int\nolimits_{ - \infty }^{\infty } {f_{{\textbf{x}}} (x)\, f_{{\textbf{y}}} (z - x)\, {\text{d}}x} $$
(9.51)

and, due to symmetry,

$$ f_{{\textbf{z}}} (z) = \int\nolimits_{ - \infty }^{\infty } {f_{{\textbf{x}}} (z - y)\, f_{{\textbf{y}}} (y)\, {\text{d}}y} . $$
(9.52)

These integrals express the convolution of the functions \( f_{{\mathbf{x}}} (x) \) and \( f_{{\mathbf{y}}} (y) \), which is denoted by \( f_{{\mathbf{x}}} (x)\, {*} \,f_{{\mathbf{y}}} (y) \) or, equivalently, by \( f_{{\mathbf{y}}} (y)\, {*} \,f_{{\mathbf{x}}} (x) \).

If \( {\mathbf{x}} \) and \( {\mathbf{y}} \) take only positive values, then Eqs. (9.51) and (9.52) simplify to

$$ f_{{\textbf{z}}} (z) = \int\nolimits_{0}^{z} {f_{{\textbf{x}}} (x)\, f_{{\textbf{y}}} (z - x)\, {\text{d}}x} ,\quad (z > 0) $$
(9.53)

and

$$ f_{{\textbf{z}}} (z) = \int\nolimits_{0}^{z} {f_{{\textbf{x}}} (z - y)\, f_{{\textbf{y}}} (y)\, {\text{d}}y} ,\quad (z > 0). $$
(9.54)

The proof is simple: writing Eq. (9.51) as

$$ f_{{\textbf{z}}} (z) = \int\nolimits_{ - \infty }^{0} {f_{{\textbf{x}}} (x)\, f_{{\textbf{y}}} (z - x)\,{\text{d}}x} + \int\nolimits_{0}^{z} {f_{{\textbf{x}}} (x)\, f_{{\textbf{y}}} (z - x)\, {\text{d}}x} + \int\nolimits_{z}^{\infty } {f_{{\textbf{x}}} (x)\, f_{{\textbf{y}}} (z - x)\, {\text{d}}x} , $$
(9.55)

we see that the first integral is equal to zero because in the region of integration (\( - \infty ,\, 0 \)) the function \( f_{{\mathbf{x}}} (x) \) is equal to zero, while the third integral is also equal to zero because for \( x > z \) the function \( f_{{\mathbf{y}}} (z - x)\, \) is equal to zero. Only the second integral remains, which gives Eq. (9.53). Equation (9.54) is proved in the same way.

9.6.1 The Probability Density of the Sum of Two Normally Distributed Random Variables

Let \( {\mathbf{x}} \) and \( {\mathbf{y}} \) be two mutually independent random variables with means and standard deviations \( \mu_{x} ,\,\,\sigma_{x} \) and \( \mu_{y} ,\,\,\sigma_{y} \), respectively. Then, it will be

$$ f_{{\mathbf{x}}} (x) = \frac{1}{{\sqrt {2\uppi } \sigma_{x} }}\,{\text{e}}^{{ - \frac{{(x - \mu_{x} )^{2} }}{{2\sigma_{x}^{2} }}}} \quad {\text{and}}\quad f_{{\mathbf{y}}} (y) = \frac{1}{{\sqrt {2\uppi } \sigma_{y} }}\,{\text{e}}^{{ - \frac{{(y - \mu_{y} )^{2} }}{{2\sigma_{y}^{2} }}}} $$
(9.56)

and the probability density for their sum, \( {\mathbf{z}} = {\mathbf{x}} + {\mathbf{y}} \), will be, according to Eq. (9.53),

$$ f_{{\bf{z}}} (z) = \frac{1}{{2\uppi \sigma_{x} \sigma_{y} }}\int\nolimits_{ - \infty }^{\infty } {\exp \left\{ { - \frac{{(x - \mu_{x} )^{2} }}{{2\sigma_{x}^{2} }} - \frac{{(z - x - \mu_{y} )^{2} }}{{2\sigma_{y}^{2} }}} \right\}} \, \,{\text{d}}x $$
(9.57)

After some algebraic manipulation, the exponent may be written in the form:

$$ \left\{ {} \right\} = - \frac{{(z - \mu_{x} - \mu_{y} )^{2} }}{{2 (\sigma_{x}^{2} + \sigma_{y}^{2} )}} - \frac{{\left( {x - \frac{{\sigma_{y}^{2} \mu_{x} + \sigma_{x}^{2} (\mu_{y} - z)}}{{\sigma_{x}^{2} + \sigma_{y}^{2} }}} \right)^{2} }}{{2\frac{{\sigma_{x}^{2} \sigma_{y}^{2} }}{{\sigma_{x}^{2} + \sigma_{y}^{2} }}}}. $$
(9.58)

Therefore,

$$ f_{{\mathbf{z}}} (z) = \frac{1}{{2\uppi \,\sigma_{x} \sigma_{y} }}\exp \left[ { - \frac{{(z - \mu_{x} - \mu_{y} )^{2} }}{{2 (\sigma_{x}^{2} + \sigma_{y}^{2} )}}} \right]\;\int\nolimits_{ - \infty }^{\infty } {\exp \, \left[ { - \frac{{\left( {x - \frac{{\sigma_{y}^{2} \mu_{x} + \sigma_{x}^{2} (\mu_{y} - z)}}{{\sigma_{x}^{2} + \sigma_{y}^{2} }}} \right)^{2} }}{{2\frac{{\sigma_{x}^{2} \sigma_{y}^{2} }}{{\sigma_{x}^{2} + \sigma_{y}^{2} }}}}} \right]} \; {\text{d}}x. $$
(9.59)

The value of the integral is simply \( \frac{{\sqrt {2\uppi } \sigma_{x} \sigma_{y} }}{{\sqrt {\sigma_{x}^{2} + \sigma_{y}^{2} } }} \). Thus, (9.59) becomes

$$ f_{{\mathbf{z}}} (z) = \frac{1}{{\sqrt {2\uppi } \, \sqrt {\sigma_{x}^{2} + \sigma_{y}^{2} } }}\exp \left[ { - \frac{{(z - \mu_{x} - \mu_{y} )^{2} }}{{2 (\sigma_{x}^{2} + \sigma_{y}^{2} )}}} \right]\;, $$
(9.60)

or

$$ f_{{\mathbf{z}}} (z) = \frac{1}{{\sqrt {2\uppi } \, \sigma_{z} }}{\text{e}}^{{ - \frac{{(z - \mu_{z} )^{2} }}{{2 \sigma_{z}^{2} }}}} \;, $$
(9.61)

which is a normal (Gaussian) distribution with mean and standard deviation

$$ \mu_{z} = \mu_{x} + \mu_{y} \quad {\text{and}}\quad \sigma_{z} = \sqrt {\sigma_{x}^{2} + \sigma_{y}^{2} } , $$
(9.62)

respectively.

This is the same result as that found in Sect. 6.2.2. The result may be generalized to more added terms and, obviously, \( {\mathbf{x}} \) and \( {\mathbf{y}} \) may also take negative values. The result is valid, therefore, for the algebraic sum of any number of normally distributed variables.

Example 9.19

The magnitude x has a real value \( x_{0} \) and a series of measurements of it have a mean value \( \overline{x} \) and standard deviation of the mean \( \sigma_{{\overline{x}}} \). The magnitude y, of the same nature as x, has a real value \( y_{0} \) and a series of measurements of it have a mean value \( \overline{y} \) and standard deviation of the mean \( \sigma_{{\overline{y}}} \). What is the probability for \( y_{0} \) to be greater than \( x_{0} \)?

The probability density of the mean \( {\overline{\mathbf{z}}} \) of the difference \( {\mathbf{z}} = {\mathbf{y}} - {\mathbf{x}} \) was found to be equal to

$$ f_{{{\overline{\mathbf{z}}}}} (\overline{z}) = \frac{1}{{\sqrt {2\uppi } \, \sigma_{{\overline{z}}} }}\,\,{\text{e}}^{{ - \frac{{(\overline{z} - \mu_{{\overline{z}}} )^{2} }}{{2 \sigma_{{\overline{z}}}^{2} }}}} ,\quad {\text{where}}\quad \mu_{{\overline{z}}} = \overline{y} - \overline{x}\quad {\text{and}}\quad \sigma_{{\overline{z}}} = \sqrt {\sigma_{{\overline{x}}}^{2} + \sigma_{{\overline{y}}}^{2} } . $$

The probability for the random variable \( {\overline{\mathbf{z}}} \) to have a value greater than \( \overline{z} \) is (Sect. 4.4.2)

$$ { \Pr } \{ {\overline{\mathbf{z}}} > \overline{z}\} = \frac{1}{2} - \frac{1}{2}{\text{erf}}\,\left( {\frac{{\overline{z} - \mu_{{{\overline{\text{z}}}}} }}{{\sqrt 2\, \sigma_{{\overline{z}}} }}} \right) = \frac{1}{2} -\Phi \,\left( {\frac{{\overline{z} - \mu_{{{\overline{\text{z}}}}} }}{{\sigma_{{\overline{z}}} }}} \right). $$

The values \( x_{0} \) and \( y_{0} \) are the real values of the magnitudes x and y. We wish to find the probability for \( y_{0} \) to be greater than \( x_{0} \). The best estimates we have for \( x_{0} \) and \( y_{0} \) are \( \overline{x} \) and \( \overline{y} \), respectively. Therefore, the best estimate we can have for the probability for \( y_{0} \) to be greater than \( x_{0} \), is equal to the probability for the value of the magnitude \( {\overline{\mathbf{z}}} \) to be greater than 0,

$$ { \Pr } \{ y_{0} > x_{0} \} = \frac{1}{2} - \frac{1}{2}{\text{erf}}\,\left( {\frac{{ - \mu_{{{\overline{\text{z}}}}} }}{{\sqrt 2\, {\upsigma}_{{\overline{z}}} }}} \right) = \frac{1}{2} -\Phi \,\left( {\frac{{ - \mu_{{{\overline{\text{z}}}}} }}{{\sigma_{{\overline{z}}} }}} \right) $$

and finally, since it is \( {\text{erf }}( - z) = - {\text{erf }}(z) \) and \( \Phi{ }( - z) = -\Phi { }(z) \),

$$ { \Pr } \{ y_{0} > x_{0} \} = \frac{1}{2} + \frac{1}{2}{\text{erf}}\,\left( {\frac{{\overline{y} - \overline{x}}}{{\sqrt 2\, \sqrt {\sigma_{{\overline{x}}}^{2} + \sigma_{{\overline{y}}}^{2} } }}} \right) = \frac{1}{2} +\Phi \,\left( {\frac{{\overline{y} - \overline{x}}}{{\sqrt {\sigma_{{\overline{x}}}^{2} + \sigma_{{\overline{y}}}^{2} } }}} \right). $$
  • If it is \( \overline{y} - \overline{x} = 3\sigma_{{\overline{z}}} \), then \( { \Pr } \{ y_{0} > x_{0} \} = \tfrac{1}{2} +\Phi\,(3) = 0.999 \).

  • If it is \( \overline{y} - \overline{x} = 2\sigma_{{\overline{z}}} \), then \( { \Pr } \{ y_{0} > x_{0} \} = \tfrac{1}{2} +\Phi\,(2) = 0.977 \).

  • If it is \( \overline{y} - \overline{x} = \sigma_{{\overline{z}}} \), then \( { \Pr } \{ y_{0} > x_{0} \} = \tfrac{1}{2} +\Phi\,(1) = 0.84 \).

  • If it is \( \overline{y} = \overline{x} \), then \( { \Pr } \{ y_{0} > x_{0} \} = \tfrac{1}{2} +\Phi\,(0) = 0.5 \).

    • Also,

  • If it is \( \overline{y} - \overline{x} = - \sigma_{{\overline{z}}} \), then \( { \Pr } \{ y_{0} > x_{0} \} = \tfrac{1}{2} -\Phi \,(1) = 0.16 \).

  • If it is \( \overline{y} - \overline{x} = - 2\sigma_{{\overline{z}}} \), then \( { \Pr } \{ y_{0} > x_{0} \} = \tfrac{1}{2} -\Phi \,(2) = 0.023 \).

  • If it is \( \overline{y} - \overline{x} = - 3\sigma_{{\overline{z}}} \), then \( { \Pr } \{ y_{0} > x_{0} \} = \tfrac{1}{2} -\Phi \,(3) = 0.001 \).

The example that follows is rather extensive and may be omitted without any consequences in the understanding of what will follow. It is, however, useful, since it deals with many topics we have already discussed and uses the last theoretical results.

Example 9.20

Find the probability densities for the sums of \( n \) random values of \( x \), in the case of the distribution of the measurements having a probability density \( f_{{\mathbf{x}}} (x) = 0 \) for \( x < 0 \) and \( f_{{\mathbf{x}}} (x) = \alpha \, {\text{e}}^{ - \alpha x} \) (\( 0 < \alpha ,\;\,0 \le x \)) and check the validity of the central limit theorem .

The characteristics of the distribution

The probability density is normalized because \( \alpha \int_{0}^{\infty } {{\text{e}}^{ - \alpha x} } {\text{d}}x\, = 1 \).

We will first find the mean and the standard deviation of x.

The mean is

$$ \overline{x} = \alpha \int\nolimits_{0}^{\infty } {x {\text{e}}^{ - \alpha x} {\text{d}}x} \, = \frac{1}{\alpha }. $$

The standard deviation is

$$ \sigma_{x} = \sqrt {\int_{0}^{\infty } {\left( {x - \frac{1}{\alpha }} \right)^{ 2} \alpha {\text{e}}^{ - \alpha x} } {\text{d}}x} $$

or

$$ \begin{aligned} \sigma_{x} & = \sqrt {\alpha \int_{0}^{\infty } {x^{2} {\text{e}}^{ - \alpha x} {\text{d}}x} - \frac{2}{\alpha }\int_{0}^{\infty } {x {\text{e}}^{ - \alpha x} {\text{d}}x} + \frac{1}{{\alpha^{2} }}\int_{0}^{\infty } { {\text{e}}^{ - \alpha x} {\text{d}}x} } = \sqrt {\frac{1}{{\alpha^{2} }} - \frac{1}{{\alpha^{2} }} + \frac{1}{{\alpha^{2} }}} = \frac{1}{\alpha } \end{aligned} $$
$$ \sigma_{x} = \frac{1}{\alpha }. $$

The probability densities of the sum of n values of \( {\mathbf{x}} \)

We now denote as \( {\mathbf{z}}_{n} = {\mathbf{x}}_{1} + {\mathbf{x}}_{2} + \ldots + {\mathbf{x}}_{i} + \ldots + {\mathbf{x}}_{n} \) the sum of n values of the variable \( {\mathbf{x}} \) (e.g. measurements of the magnitude \( {\mathbf{x}} \)).

The probability density of single values \( {\mathbf{z}}_{1} = {\mathbf{x}} \) is the given function

$$ f_{{\mathbf{x}}} (x) = \alpha \, {\text{e}}^{ - \alpha x} \equiv f_{{{{z}}_{{{1}}} }} (z_{1} )\quad (0 \le x). $$

According to Eq. (9.53) the probability density for the sum of two values of \( {\mathbf{x}} \), is given by the convolution of \( f_{{{\mathbf{z}}_{{\mathbf{1}}} }} (z_{1} ) \) with itself:

$$ f_{{{\mathbf{z}}_{{{2}}} }} (z_{2} ) = \int\nolimits_{0}^{{z_{2} }} {f_{{{\mathbf{z}}_{1} }} (z_{1} )\, f_{{{{\bf z}}_{1} }} (z_{2} - z_{1} )\, {\text{d}}z_{1} } . $$

Substituting, we find

$$ f_{{{\mathbf{z}}_{{{2}}} }} (z_{2} ) = \int\nolimits_{0}^{{z_{2} }} {(\alpha {\text{e}}^{{ - \alpha z_{1} }} )(\alpha {\text{e}}^{{ - \alpha z_{2} + \alpha z_{1} }} )\, {\text{d}}z_{1} } = \alpha^{2} {\text{e}}^{{ - \alpha z_{2} }} \int\nolimits_{0}^{{z_{2} }} {{\text{d}}z_{1} } = \alpha^{2} z_{2} {\text{e}}^{{ - \alpha z_{2} }} . $$

Knowing the probability density for the sum of two values of \( {\mathbf{x}} \), we may find the probability density for the sum of three values of \( {\mathbf{x}} \). This will be equal to the convolution of the probability density for the sum of two values with the probability density for the result of a measurement:

$$ \begin{aligned} f_{{{{\bf z}}_{3} }} (z_{3} ) & = \int\nolimits_{0}^{{z_{3} }} {f_{{{{\bf z}}_{2} }} (z_{2} )\, f_{{{{\bf z}}_{1} }} (z_{3} - z_{2} )\, {\text{d}}z_{2} } \\ f_{{{{\bf z}}_{3} }} (z_{3} ) & = \int\nolimits_{0}^{{z_{3} }} {(\alpha^{2} z_{2} {\text{e}}^{{ - \alpha z_{2} }} )(\alpha {\text{e}}^{{ - \alpha z_{3} + \alpha z_{2} }} )\, {\text{d}}z_{2} } \\ &= \alpha^{3} {\text{e}}^{{ - \alpha z_{3} }} \int\nolimits_{0}^{{z_{3} }} {z_{2} \, {\text{d}}z_{2} } = \alpha^{3} \frac{{z_{3}^{2} }}{2 !}{\text{e}}^{{ - \alpha z_{3} }} . \\ \end{aligned} $$

The general relation for the probability density \( f_{{{\mathbf{z}}_{{{{n}} + {{1}}}} }} (z_{n + 1} ) \) is given by the convolution of the probability density for \( n \) measurements, \( f_{{{\mathbf{z}}_{{{n}}} }} (z_{n} ) \), with the probability density for the result of a measurement, \( f_{{{\mathbf{z}}_{{{1}}} }} (z_{1} ) \),

$$ f_{{{{\bf z}}_{n + 1} }} (z_{n + 1} ) = \int\nolimits_{0}^{{z_{n + 1} }} {f_{{{{\bf z}}_{n} }} (z_{n} )\, f_{{{{\bf z}}_{1} }} (z_{n + 1} - z_{n} )\, {\text{d}}z_{n} } . $$

We test the assumption that it is \( f_{\textbf{z}_{n}} (z_{n} ) = \alpha^{n} \frac{{z_{n}^{n - 1} }}{(n - 1) !}\, {\text{e}}^{{ - \alpha z_{n} }} \). Substituting in the last relation, we find

$$ \begin{aligned} f_{{{{\bf z}}_{n + 1} }} (z_{n + 1} ) & = \int\nolimits_{0}^{{z_{n + 1} }} {\left( {\alpha^{n} \frac{{z_{n}^{n - 1} }}{(n - 1) !}\,{\text{e}}^{{ - \alpha z_{n} }} } \right)\, \left( {\alpha {\text{e}}^{{ - \alpha z_{n + 1} + \alpha z_{n} }} } \right)\, {\text{d}}z_{n} } \\ f_{{{{\bf z}}_{n + 1} }} (z_{n + 1} ) & = \alpha^{n + 1} \frac{{{\text{e}}^{{ - \alpha z_{n + 1} }} }}{(n - 1) !}\int\nolimits_{0}^{{z_{n + 1} }} {z_{n}^{n - 1} {\text{d}}z_{n} } = \alpha^{n + 1} \frac{{z_{n + 1}^{n} }}{n !}{\text{e}}^{{ - \alpha z_{n + 1} }} , \\ \end{aligned} $$

which is in agreement with the assumption we made for the form of \( f_{{{\mathbf{z}}_{{{\it n}}} }} (z_{n} ) \). Since the formula \( f_{{{\mathbf{z}}_{{{\it n}}} }} (z_{n} ) = \alpha^{n} \frac{{z_{n}^{n - 1} }}{(n - 1) !}\,{\text{e}}^{{ - \alpha z_{n} }} \) gives the correct results for \( n = \) 1, 2 and 3 (which we already know) and, since, as we have just proved, if it is valid for \( f_{{{\mathbf{z}}_{{{\it n}}} }} (z_{n} ) \) then it will be valid for \( f_{{{\mathbf{z}}_{n + 1} }} (z_{n + 1} ) \) also, we reach the conclusion that it is valid for all values of \( n \).

The probability densities for the sums \( {\mathbf{z}}_{n} = {\mathbf{x}}_{1} + {\mathbf{x}}_{2} + \ldots + {\mathbf{x}}_{i} + \ldots + {\mathbf{x}}_{n} \) are, therefore,

$$ f_{{{\mathbf{z}}_{{{1}}} }} (z_{1} ) \equiv f_{{\mathbf{x}}} (x) = \alpha \, {\text{e}}^{ - \alpha x} ,\quad f_{{{\mathbf{z}}_{{\mathbf{2}}} }} (z_{2} ) = \alpha^{2} z_{2} {\text{e}}^{{ - \alpha z_{2} }} ,\quad f_{{{\mathbf{z}}_{3} }} (z_{3} ) = \alpha^{3} \frac{{z_{3}^{2} }}{2 !}{\text{e}}^{{ - \alpha z_{3} }} ,\quad \ldots $$

and are given by the general formula

$$ f_{{{\mathbf{z}}_{{{\it n}}} }} (z_{n} ) = \alpha^{n} \frac{{z_{n}^{n - 1} }}{(n - 1) !}\, {\text{e}}^{{ - \alpha z_{n} }} . $$

In the figure that follows, the curves \( f_{{{\mathbf{z}}_{{{\it n}}} }} (z_{n} ) \) were drawn for \( \alpha = 1 \) and the values of \( n = \) 1, 2, 4, 8 and 16.

figure g

The probability densities for the sum of a number of \( n = \) 2, 4, 8 or 16 values, taken at random from a parent population with probability density \( f_{{\mathbf{z}}} (z) = \,{\text{e}}^{ - z} \) (\( z \ge 0 \))

The probability densities of the means \( \overline{x} \) of \( n \) measurements of \( {\mathbf{x}} \).

Knowing the probability densities \( f_{{{\mathbf{z}}_{n} }} (z_{n} ) \) of the sums of \( n \) measurements of \( {\mathbf{x}} \) and if \( \overline{x} = z_{n} /n \) is the mean of \( n \) measurements of \( {\mathbf{x}} \), we wish to find the probability density \( f_{n} (\overline{x}) \) of the values \( \overline{x} \).

The relation between the functions \( f_{{{\mathbf{z}}_{n} }} (z_{n} ) \) and \( f_{n} (\overline{x}) \) is found as follows: Because it is

$$ \begin{aligned} & ({\text{Probability}}\;{\text{for}}\;{\text{the}}\;{\text{mean}}\;{\overline{\mathbf{x}}}\;{\text{of}}\;{\text{the}}\;n\;{\text{measurements}}\;{\text{of}}\;{\mathbf{x}}\;{\text{to}}\;{\text{lie}}\;{\text{between}}\;\overline{x}\;{\text{and}}\;\overline{x} + {\text{d}}\overline{x}) \\ & \quad = ({\text{Probability}}\;{\text{for}}\;{\text{the}}\;{\text{sum}}\;{\mathbf{z}}_{{{\it n}}} \;{\text{of the}}\;n\;{\text{measurements}}\;{\text{of}}\;{\mathbf{x}}\;{\text{to}}\;{\text{lie}}\;{\text{between}}\;z_{n} \;{\text{and}}\;z_{n} + {\text{d}}z_{n} ) \\ \end{aligned} $$

we have

$$ f_{n} (\overline{x})\, \text{d}\overline{x} = f_{{{\mathbf{z}}_{{{\it n}}} }} (z_{n} )\,{\text{d}}z_{n} $$

and

$$ f_{n} (\overline{x})\, = f_{{{\mathbf{z}}_{{{\it n}}} }} (z_{n} )\,\left| { \frac{{{\text{d}}z_{n} }}{{{\text{d}}\overline{x}}} } \right|, $$

where the absolute value is taken as, by definition, the probability densities are positive.

Taking into account the fact that \( \overline{x} = z_{n} /n \), we have, \( \frac{{{\text{d}}z_{n} }}{{{\text{d}}\overline{x}}} = n \), \( f_{n} (\overline{x})\, = n\,f_{{{\mathbf{z}}_{{{\it n}}} }} (z_{n} ) \) and so the probability densities of the means \( \overline{x} \) of \( n \) measurements of \( {\mathbf{x}} \) is:

$$ f_{n} (\overline{x}) = n\alpha^{n} \frac{{(n \overline{x})^{n - 1} }}{(n - 1) !}\,{\text{e}}^{{ - n\alpha \overline{x}}} \quad {\text{or}}\quad f_{n} (\overline{x}) = \frac{{(n \alpha )^{n} }}{(n - 1) !}\overline{x}^{ n - 1} \,{\text{e}}^{{ - n\alpha \overline{x}}} . $$

These functions have been drawn in the figure that follows, for \( \alpha = \) 1 and \( n = \) 1, 2, 4, 8 and 16.

figure h

The probability densities of the means of a number \( n = \) 2, 4, 8 or 16 values, which are taken at random from a parent population with probability density \( f_{{\mathbf{z}}} (z) = e^{ - z} \) (\( z \ge 0 \))

Since

$$ \int\nolimits_{0}^{\infty } {\frac{{(n \alpha )^{n} }}{(n - 1) !}\overline{x}^{ n - 1} \, {\text{e}}^{{ - n\alpha \overline{x}}} } {\text{d}}\overline{x} = \frac{1}{(n - 1) ! }\int\nolimits_{0}^{\infty } {(n \alpha \overline{x})^{ n - 1} \,{\text{e}}^{{ - n\alpha \overline{x}}} } {\text{d}}(n \alpha \overline{x}) = \frac{1}{(n - 1) ! }\int\nolimits_{0}^{\infty } {t^{ n - 1} \, {\text{e}}^{ - t} } {\text{d}}t = 1, $$

the probability densities \( f_{n} (\overline{x}) \) are normalized, as expected.

The asymptotic approach to the Gaussian curve

The maximum of \( f_{n} (\overline{x}) \) appears at the value of \( \overline{x} \) for which it is \( {\text{d}}f_{n} (\overline{x})/{\text{d}}\overline{x} = 0 \), i.e. for \( \overline{x} = \frac{n - 1}{\alpha n} \). Substituting in \( f_{n} (\overline{x}) \), we find its maximum value: \( \hat{f}_{n} = \frac{{(n - 1)^{n - 1} }}{(n - 1) !}\, \alpha n \,{\text{e}}^{ - ( n - 1)} \).

In terms of \( \hat{f}_{n} \), we have \( f_{n} (\overline{x}) = \hat{f}_{n}\left( {{\text{e}}\, \frac{{n \alpha \overline{x}}}{n - 1}} \right)^{ n - 1} \, {\text{e}}^{{ - n\alpha \overline{x}}}\) .

Let

$$ \alpha \overline{x} \equiv \frac{n - 1}{n} + \delta , $$

where \( \delta \) expresses, in units of \( 1/\alpha \), the distance along the \( \overline{x} \)-axis from the point \( \overline{x} = \frac{n - 1}{\alpha n} \) which corresponds to the curve’s maximum. Then,

$$ f_{n} (\overline{x}) = \hat{f}_{n} \;\;\left( {1 + \frac{n\delta }{n - 1}} \right)^{ n - 1} \, {\text{e}}^{ - n\delta } $$

Taking logarithms,

$$ \ln \left( {f_{n} (\overline{x})\, /\hat{f}_{n} } \right) = \;(n - 1)\;\ln \left( {1 + \frac{n\delta }{n - 1}} \right) - n\delta . $$

For small values of \( \delta \),

$$ \begin{aligned} \ln \left( {f_{n} (\overline{x})\, /\hat{f}_{n} } \right) & = (n - 1)\,\left[ {\frac{n}{n - 1}\delta - \frac{1}{2}\left( {\frac{n}{n - 1}} \right)^{2} \delta^{2} + \ldots } \right] - n\delta \\ \ln \left( { f_{n} (\overline{x})\, /\hat{f}_{n} } \right) & = - \frac{1}{2}\, \frac{{n^{2} }}{n - 1}\, \delta^{2} \\ \end{aligned} $$

and, therefore,

$$ f_{n} (\overline{x}) = \hat{f}_{n} \;{\text{e}}^{{ - \, \frac{{n^{2} \delta^{2} }}{2(n - 1)}}} . $$

Returning to \( \overline{x} \) via the relation \( \alpha \overline{x} = \frac{n - 1}{n} + \delta \), we have \( f_{n} (\overline{x}) = \hat{f}_{n} \;\exp \left\{ { - \frac{{\left( {\overline{x} - \frac{n - 1}{\alpha n}} \right)^{ 2} }}{{2\, \left( {\frac{{\sqrt {n - 1} }}{\alpha n}} \right)^{ 2} }}} \right\} \) .

For large \( n \), Stirling’s formula gives \( \hat{f}_{n} \approx \frac{1}{{\sqrt {2\pi } }}\, \frac{\alpha n}{{\sqrt {n - 1} }} \) and, therefore, it is

$$ f_{n} (\overline{x}) = \frac{1}{{\sqrt {2\pi } }}\, \frac{\alpha n}{{\sqrt {n - 1} }}\,\;\exp \left\{ { - \frac{{\left( {\overline{x} - \frac{n - 1}{\alpha n}} \right)^{ 2} }}{{2\, \left( {\frac{{\sqrt {n - 1} }}{\alpha n}} \right)^{ 2} }}} \right\}, $$

which is a Gaussian with mean \( \overline{{(\overline{x})}} = \frac{n - 1}{\alpha n} \) and standard deviation \( \sigma_{{\overline{x}}} = \frac{{\sqrt {n - 1} }}{\alpha n} \).

We notice that \( \sigma_{{\overline{x}}} = \frac{{\sqrt {n - 1} }}{\alpha n} = \frac{1/\alpha }{\sqrt n }\sqrt {\frac{n - 1}{n}} = \frac{{\sigma_{x} }}{\sqrt n }\sqrt {\frac{n - 1}{n}} \) and \( \sigma_{{\overline{x}}} \to \frac{{\sigma_{x} }}{\sqrt n } \) as \( n \to \infty \).

We see that the central limit theorem applies. In the figure it is seen that the curve for \( n = 16 \) is already very similar to a Gaussian.

Programs

Excel

Ch. 09. Excel—Weighted Mean and Standard Deviations

Origin

Ch. 09. Origin—Weighted Mean and Standard Deviations

Python

Ch. 09. Python—Weighted Mean and Standard Deviations

R

Ch. 09. R—Weighted Mean and Standard Deviations

Problems

  1. 9.1

    [E.O.P.R.] The results of 8 measurements of a magnitude are

    $$ \begin{array}{*{20}l} {x_{i} :} \hfill & {5.24} \hfill & {5.42} \hfill & {5.20} \hfill & {5.00} \hfill & {5.15} \hfill & {5.32} \hfill & {5.24} \hfill & {5.37} \hfill \\ \end{array} . $$

    Find the mean \( \overline{x} \) of the measurements and its error, \( \sigma_{{\overline{x}}} \),

    1. (a)

      if equal weights are attributed to the results,

    2. (b)

      if the weights given to the results are, respectively,

      $$ \begin{array}{*{20}l} {w_{i} } \hfill & 2 \hfill & 1 \hfill & 1 \hfill & 3 \hfill & 3 \hfill & 2 \hfill & 1 \hfill & 2 \hfill \\ \end{array} . $$
  2. 9.2

    [E.O.P.R.] A series of 10 measurements of the quantity \( x \) gave the result \( x_{1} = 8.65 \pm 0.12 \), while another series of 20 measurements of the same quantity gave \( x_{2} = 8.45 \pm 0.08 \). Find the value of \( x \) and its error for the total of the 30 measurements, if to the two values are attributed the weights:

    1. (a)

      equal to the number of measurements in each result and

    2. (b)

      inversely proportional to the square of the error of each result.

  1. 9.3

    The probability of observation of the discrete values \( x_{1} ,\, \,x_{2} ,\, \ldots ,\, \,x_{N} \) is proportional to

    $$ P = \alpha^{N} \, {\text{e}}^{{ - \alpha^{2} \, \sum\limits_{i} {(x_{i} - \overline{x})^{2} } }} $$

    where \( \overline{x} \) is the mean of the \( x \)’s. Find the value of \( \alpha \) which maximizes Ρ.