Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

3.1 Sampling Random Numbers

Let X be a random variable (RV) obeying a cumulative distribution function (cdf)

$$ P\left( {X \le x} \right) = F_{X} (x);\quad F_{X} ( - \infty ) = 0;\quad F_{X} (\infty ) = 1 $$
(3.1)

In the following, if the rv X obeys a cdf we shall write \( X\sim F_{X} (x) .\) From the definition, it follows that \( F_{X} (x) \) is a non-decreasing function and we further assume that it is continuous and differentiable at will. The corresponding probability density function (pdf) is then

$$ f_{X} (x) = \frac{{{\text{d}}F_{X} (x)}}{{{\text{d}}x}};\quad f_{X} (x) \ge 0;\quad \int\limits_{ - \infty }^{\infty } {f_{X} (x)} {\text{d}}x = 1 $$
(3.2)

We now aim at sampling numbers from the cdf \( F_{X} (x) \) A sequence of N ≫ 1 values \( \left\{ X \right\} \equiv \left\{ {x_{1} ,x_{2} , \ldots ,x_{N} } \right\} \) sampled from \( F_{X} (x) \) must be such that the number n of sampled points falling within an interval \( \Updelta x \ll X_{\max } - X_{\min } \) (where \( X_{\min } \) and \( X_{\max } \) are the minimum and maximum values in {X}) is

$$ \frac{n}{N} \simeq \int\limits_{\Updelta x} {f_{X} (x){\text{d}}x} $$
(3.3)

In other words, we require that the histogram of the sampled data approximates \(f_{X} (x) .\) Also, the \( x_{i} \) values should be uncorrelated and, if the sequence {X} is periodic, the period after which the numbers start repeating should be as large as possible.

Among all the distributions, the uniform distribution in the interval [0,1), denoted as U[0,1) or, more simply U(0,1), plays a role of fundamental importance since sampling from this distribution allows obtaining rvs obeying any other distribution [1].

3.2 The Uniform Random Number Generator: Sampling from the Uniform Distribution

The cdf and pdf of the distribution U [0,1) are

$$ \begin{array}{*{20}l} \begin{aligned} U_{R} (r) & = r; \\ & = 0 \\ & = 1 \\ \end{aligned} & \begin{aligned} u_{R} (r) & = 1 \\ & = 0 \\ & = 0 \\ \end{aligned} & \begin{gathered} \text{for} \hfill \\ \text{for} \hfill \\ \text{for} \hfill \\ \end{gathered} & \begin{aligned} 0 & \le r \le 1 \\ r & < 0 \\ r & > 1 \\ \end{aligned} \\ \end{array} $$
(3.4)

The generation of random numbers R uniformly distributed in [0,1) has represented, and still represents, an important problem subject of active research. In the beginning, the outcomes of intrinsically random phenomena were used (e.g., throwing a coin or dice, spinning the roulette, counting of radioactive sources of constant intensity, etc.), but soon it was realized that, apart from the non-uniformity due to imperfections of the mechanisms of generation or detection, the frequency of data thus obtained was too low and the sequences could not be reproduced, so that it was difficult to find and fix the errors in the MCS codes in which the random numbers generated were used.

To overcome these difficulties, the next idea was to fill tables of random numbers to store in the computers (in 1955 RAND corporation published a table with 106 numbers), but the access to the computer memory decreased the calculation speed and, above all, the sequences that had been memorized were always too short with respect to the growing necessities.

Finally, in 1956, von Neumann proposed to have the computer directly generate the ‘random’ numbers by means of an appropriate function \( g( \cdot ) \) which should allow one to find the next number \( R_{k + 1} \) from the preceding one \( R_{k} \) i.e.,

$$ R_{k\; + \;1} = g(R_{k} ) $$
(3.5)

The sequence thus generated is inevitably periodic: in the course of the sequence, when a number is obtained that had been obtained before, the subsequence between these two numbers repeats itself cyclically, i.e., the sequence enters a loop. Furthermore, the sequence itself can be reproduced so that it is obviously not ‘random’, rather deterministic. However, if the function \( g(r) \) is chosen correctly, it can be said to have a pseudorandom character if it satisfies a number of randomness tests. In particular, Von Neumann proposed to obtain \( R_{k + 1} \) by taking the central digits of the square of \( R_{k} .\) For example, for a computer with a four-digit word, if \( R_{k} = 4,567 ,\) then \( R_{k}^{2} = 20,857,489 \) and \( R_{k + 1} = 8,574 ,\) \( R_{k + 2} = 5,134 ,\) and so on. This function turns out to be lengthy to be calculated and to give rise to rather short periods; furthermore, if one obtains \( R_{k} = 0000 ,\) then all the following numbers are also zero. Presently, the most commonly used methods for generating sequences {R} of numbers from a uniform distribution are inspired from the Monte Carlo roulette game. In a real roulette game the ball, thrown with high initial speed, performs a large number of revolutions around the wheel and finally it comes to rest within one of the numbered compartments. In an ideal machine nobody would doubt that the final compartment, or its associated number, is actually uniformly sampled among all the possible compartments or numbers.

In the domain of the real numbers within the interval [0,1), the game could be modeled by throwing a point on the positive x-axis very far from the origin with a method having an intrinsic dispersion much larger than unity: then, the difference between the value so obtained and the largest integer smaller than this value may be reasonably assumed as sampled from U[0,1). In a computer, the above procedure is performed by means of a mixed congruential relationship of the kind

$$ R_{k + 1} = (aR_{k} + c)\bmod m $$
(3.6)

In words, the new number \( R_{k + 1} \) is the remainder, modulo a positive integer m, of an affine transform of the old \( R_{k} \) with non-negative integer coefficients a and c. The above expression, in some way, resembles the uniform sampling in the roulette game, \( aR_{k} + c \) playing the role of the distance travelled by the ball and m that of the wheel circumference. The sequence so obtained is made up of numbers \( R_{k} < m \) and it is periodic with period \( p < m .\) For example, if we choose \( R_{0} = a = c = 5 \) and m = 7, the sequence is {5,2,1,3,6,0,5,…}, with a period p = 6. The sequences generated with the above described method are actually deterministic so that the sampled numbers are more appropriately called pseudorandom numbers. However, the constants a, c, m may be selected so that:

  • The generated sequence satisfies essentially all randomness tests;

  • The period p is very large.

Since the numbers generated by the above procedure are always smaller than m, when divided by m they lie in the interval [0,1).

Research to develop algorithms for generating pseudo-random numbers is still ongoing. Good statistical properties, low speed in numbers generation and reproducibility are central requirements for these algorithms to be suitable for MC simulation.

Other Pseudo-Random Number Generation (PRNG) algorithms include the Niederreiter [2], Sobol [3], and Mersenne Twister [4] algorithms. For example, this latter allows generating numbers with an almost uniform distribution in the range [0, 2k − 1], where k is the computer word length (nowadays, k = 32 or 64). Further details on other methods are given in [516], with wide bibliographies which we suggest to the interested reader.

Before leaving this issue, it is important to note that for the generation of pseudo-random numbers U[0,1) many computer codes do not make use of machine subroutines, but use congruential subroutines which are part of the program itself. Thus, for example, it is possible that an excellent program executed on a machine with a word of length different from the one it was written for gives absurd results. In this case it should not be concluded that the program is ‘garbage’, but it would be sufficient to appropriately modify the subroutine that generates the random numbers.

3.3 Sampling Random Numbers from Generic Distributions

3.3.1 Sampling by the Inverse Transform Method: Continuous Distributions

Let \( X \in ( - \infty , + \infty ) \) be a rv with cdf \( F_{X} (x) \) and pdf \( f_{X} (x) ,\) viz.,

$$ F_{X} (x) = \int\limits_{ - \infty }^{x} {f_{X} (x'){\text{d}}x'} = \text{P} \left( {X \le x} \right) $$
(3.7)

Since \( F_{X} (x) \) is a non-decreasing function, for any \( y \in [0,1) \) its inverse may be defined as

$$ F_{X}^{ - 1} (y) = \inf \left\{ {x:F_{X} (x) \ge y} \right\} $$
(3.8)

With this definition, we take into account the possibility that in some interval \( [x_{s} ,x_{d} ] \) \( F_{X} (x) \) is constant (and \( f_{X} (x) \) zero), that is

$$ F_{X} (x) = \gamma \quad {\text{for}} \quad x_{s} < x \le x_{d} $$
(3.9)

In this case, from definition (3.8) it follows that corresponding to the value γ, the minimum value \( x_{s} \) is assigned to the inverse function \( F_{X}^{ - 1} (\gamma ). \) This is actually as if \( F_{X} (x) \) were not defined in \( (x_{s} ,x_{d} ]; \) however, this does not represent a disadvantage, since values in this interval can never be sampled because the pdf \( f_{X} (x) \) is zero in that interval. Thus, in the following, we will suppose that the intervals \( (x_{s} ,x_{d} ] \) (open to the left and closed to the right), in which \( F_{X} (x) \) is constant, are excluded from the definition domain of the rv X. By so doing, the \( F_{X} (x) \) will always be increasing (instead of non-decreasing). We now show that it is always possible to obtain values \( X\sim F_{X} (x) \) starting from values R sampled from the uniform distribution U R [0,1). In fact, if R is uniformly distributed in [0,1), we have

$$ \text{P} \left( {R \le r} \right) = U_{R} (r) = r $$
(3.10)

Corresponding to a number R extracted from \( U_{R} (r), \) we calculate the number \( X = F_{X}^{ - 1} (R) \) and wonder about its distribution. As it can be seen in Fig. 3.1, for the variable X we have

$$ \text{P} \left( {X \le x} \right) = \text{P} \left( {F_{X}^{ - 1} (R) \le x} \right) $$
(3.11)

Because \( F_{X} \) is an increasing function, by applying it to the arguments at the right-hand side of Eq. (3.11), the inequality is conserved and from Eq. (3.10) we have

$$ \text{P} \left( {X \le x} \right) = \text{P} \left( {R \le F_{X} (x)} \right) = F_{X} (x) $$
(3.12)

It follows that \( X = F_{X}^{ - 1} (R) \) is extracted from \( F_{X} (x). \) Furthermore, because \( F_{X} (x) = r \)

$$ \text{P} \left( {X \le x} \right) = \text{P} \left( {R \le r} \right) $$
(3.13)

In terms of cdf

$$ U_{R} (R) = F_{X} (x) \quad {\text{and}} \quad R = \int\limits_{ - \infty }^{x} {f_{X} (x'){\text{d}}x'} $$
(3.14)

This is the fundamental relationship of the inverse transform method which for any R value sampled from the uniform distribution U R [0,1) gives the corresponding X value sampled from the \( F_{X} (x) \) distribution (Fig. 3.1). However, it often occurs that the cdf \( F_{X} (x) \) is noninvertible analytically, so that from Eq. (3.8) it is not possible to find \( X\sim F_{X} (x) \) as a function of \( R\sim U[0,1). \) An approximate procedure that is often employed in these cases consists in interpolating \( F_{X} (x) \) with a polygonal function and in performing the inversion of Eq. (3.8) by using the polygonal. Clearly, the precision of this procedure increases with the number of points of \( F_{X} (x) \) through which the polygonal passes. The calculation of the polygonal is performed as follows:

  • If the interval of variation of x is infinite, it is approximated by the finite interval \( (x_{a} ,x_{b} ) \) in which the values of the pdf \( \,f_{X} (x) \) are sensibly different from zero: for example, in case of the univariate normal distribution \( N(\mu ,\sigma^{2} ) \) with mean value \( \mu \) and variance \( \sigma^{2} , \) this interval may be chosen as \( (\mu - 5\sigma ,\mu + 5\sigma ); \)

  • The interval (0,1) in which both \( F_{X} (x) \) and \( U_{R} (r) \) are defined is divided in n equal subintervals of length 1/n and the points \( x_{0} = x_{a} ,x_{1} ,x_{2} , \ldots ,x_{n} = x_{b} \) such that \( F_{X} (x_{i} ) = i/n, \) (i = 0,1,…,n) are found, e.g., by a numerical procedure.

At this point the MC sampling may start: for each R sampled from the distribution U R [0,1), we compute the integer \( i^{*} = Int(R \cdot n) \) and then obtain the corresponding X value by interpolating between the points \( x_{{i^{*} }} ,\) \( i^{*} /n \) and \( x_{{i^{*} \, + \,1}} ,\) \( i^{*} + 1/n. \) For example, in case of a linear interpolation we have

$$ X = x_{{i^{*} }} + (x_{{i^{*} + 1}} - x_{{i^{*} }} )(R \cdot n - i^{*} ) $$
(3.15)

For a fixed number n of points \( x_{i} \) upon which the interpolation is applied, the described procedure can be improved by interpolating with arcs of parabolas in place of line segments. The arcs can be obtained by imposing continuity conditions of the function and its derivatives at the points \( x_{i} \) (cubic splines). The expression of X as a function of R is in this case more precise, but more burdensome and difficult to calculate. Currently, given the ease with which it is possible to increase the RAM memory of the computers, to increase the precision it is possibly preferable to increase the number n of points and to use the polygonal interpolation: as a rule of thumb, a good choice is often n = 500.

$$ R:U_{R} (r) = r\;{\text{in}}\;[0,1) \Rightarrow X\sim F_{X} (x) $$
(3.16)
Fig. 3.1
figure 1

Inverse transform method: continuous distributions

3.3.2 Sampling by the Inverse Transform Method: Discrete Distributions

Let X be a rv which can only have the discrete values \( x_{k} , \) k = 0,1,…, with probabilities

$$ f_{k} = \text{P} \left( {X = x_{k} } \right) \ge 0,\,k = 0, 1, \ldots $$
(3.17)

Ordering the {x} sequence so that \( x_{k - 1} < x_{k} , \) the cdf is

$$ F_{k} = \text{P} \left( {X \le x_{k} } \right) = \sum\limits_{i = 0}^{k} {f_{i} } = F_{k - 1} + f_{k} \quad k = 0, 1, \ldots $$
(3.18)

where, by definition, \( F_{ - 1} = 0. \) The normalization condition of the cdf (Eq. 3.18) now reads

$$ \mathop {\lim }\limits_{k \to \infty } F_{k} = 1 $$
(3.19)

Following the scheme of the inverse transform method, given a value R sampled from the uniform distribution, the probability that R falls within the interval \( (F_{k - 1} ,F_{k} ] \) is in the discrete case

$$ \text{P} \left( {F_{k\; - \;1} < R \le F_{k} } \right) = \int\limits_{{F_{k - 1} }}^{{F_{k} }} {{\text{d}}r} = F_{k} - F_{k\; - \;1} = f_{k} = \text{P} (X = x_{k} ) $$
(3.20)

In words, for any R ∼ U[0,1), we get the realization \( X = x_{k} \) where k is the index for which \( F_{k - 1} < R \le F_{k} \) (Fig. 3.2).

Fig. 3.2
figure 2

Inverse transform method: discrete distributions, \( k = 2 \Rightarrow X = x_{2} \)

In practice, a realization of X is sampled from the cdf \( F_{k} \) through the following steps:

  1. 1.

    Sample an R ∼ U[0,1);

  2. 2.

    Set k = 0; \( F = f_{o} ; \)

  3. 3.

    If \( R \le F, \) proceed to 5);

  4. 4.

    Viceversa, i.e., if \( R > F, \) set \( k \leftarrow k + 1 \) and then \( F \leftarrow F + f_{k} \) and proceed to 3);

  5. 5.

    The required realization is \( X = x_{k} . \)

If the \( F_{k} \) values can be pre-computed, e.g., if their number is finite, the cycle (3–4) may be simplified by comparing R and \( F_{k} \) at step 3 and increasing only k at step 4.

3.3.2.1 Examples of Application of the Inverse Transform Sampling Method

3.3.2.1.1 Uniform Distribution in the interval (a,b)

A rv X is uniformly distributed in the interval (a,b) if

$$ \begin{array}{*{20}l} \begin{aligned} F_{X} (x) & = \frac{{x - a}}{{b - a}}; \\ & = 0 \\ & = 1 \\ \end{aligned} & \begin{aligned} f_{X} (x) & = \frac{1}{{b - a}} \\ & = 0 \\ & = 0 \\ \end{aligned} & \begin{aligned} {\text{for}} & \\ {\text{for}} & \\ {\text{for}} & \\ \end{aligned} & \begin{aligned} a & \le x \le b \\ x & < a \\ x & > b \\ \end{aligned} \\ \end{array} $$
(3.21)

Substituting in Eq. (3.18) and solving with respect to X yields

$$ X = a + (b - a)R $$
(3.22)

As a first application, we show how it is possible to simulate Buffon’s experiment, mentioned in the Introduction, with the aim of finding the probability P in Eq. (1.1). When the needle is thrown at random, the axis of the needle can have all possible orientations, with equal probability. Let φ be the angle between the needle’s axis and the normal to the lines drawn on the floor. By symmetry, it is possible to consider the interval \( \left( {0,\pi /2} \right) \) and from Eq. (3.21), with a = 0 and \( b = \pi /2, \) we have

$$ F_{\Upphi } (\phi ) = \frac{\phi }{{\frac{\pi }{2}}};\quad f_{\Upphi } (\phi ) = \frac{2}{\pi } $$
(3.23)

Corresponding to a random value Φ, the needle projection on the normal to the lines is \( L\cos \Upphi \) and thus the probability that the needle intercepts one of the lines is given by the ratio \( L\cos \Upphi /D. \) Multiplying by \( f_{\Upphi } (\phi ) \) and integrating, we obtain the value calculated by Buffon

$$ P = \int\limits_{0}^{{\frac{\pi }{2}}} {\frac{{L\cos \phi }}{D}\frac{\pi }{2}d\phi } = \frac{{L/D}}{{\pi /2}} $$
(3.24)

Operatively, for a given number of needle throws N ≫ 1, e.g., N = 103, the procedure is as follows:

  • Sample an \( R_{1} \sim U[0,1); \)

  • Calculate from Eq. (3.22) \( \Upphi = R_{1} \pi /2 \) then, the probability that the needle intercepts a line is \( h = \frac{L\cos \Upphi }{D}; \)

  • Sample an \( R_{2} \sim U[0,1) \) if \( R_{2} < h, \) the needle has intercepted a line and thus we set \( N_{s} = N_{s} + 1, \) where \( N_{s} \) is the counter of the number of times the needle has intercepted a line.

At the end of this procedure, the estimate of the probability P is

$$ P \cong \frac{{N_{s} }}{N} $$

A problem of, perhaps, more practical interest is that of sampling a direction \( \underline{\Upomega } \) from an isotropic angular distribution in space. This is, for example, a case of interest for the choice of an initial direction of flight for a neutron emitted by fission. In polar coordinates, the direction is identified by the angle \( \vartheta \in (0,\pi ) \) between \( \underline{\Upomega } \) and the z axis and by the angle φ∈(−π, π) between the projection of \( \underline{\Upomega } \) on the xy plane and the x axis. Correspondingly,

$$ {\text{ d}}\underline{\Upomega } = \sin \vartheta \, {\text{ d}}\vartheta {\text{ d}}\phi = - {\text{d}}\mu {\text{ d}}\phi $$
(3.25)

where, as usual, \( \mu = \cos \vartheta . \) The pdf of the isotropic distribution is then

$$ f_{{\underline{\Upomega } }} (\underline{\Upomega } )d\underline{\Upomega } \equiv f_{\mu ,\Upphi } (\mu ,\phi ){\text{d}}\mu {\text{d}}\phi = \frac{{\left| {d\underline{\Upomega } } \right|}}{4\pi }f_{1} (\mu )\text{d}\mu f_{2} (\phi )\text{d}\phi $$
(3.26)

where

$$ f_{1} (\mu ) = \frac{1}{2};\quad f_{2} (\phi ) = \frac{1}{2\pi } $$
(3.27)

The required pdf is given by the product of two uniform pdfs, namely \( f_{1} (\mu ) \) and \( f_{2} (\phi ) .\) If \( R_{\mu } ,\) \( R_{\Upphi } \) are two rvs ∼ U[0,1), we have

$$ R_{\mu } = \int\limits_{ - 1}^{\mu } {f_{1} (\mu '){\text{ d}}\mu '} = \frac{\mu + 1}{2};\quad R_{\Upphi } = \int\limits_{ - \pi }^{\Upphi } {f_{2} (\phi ){\text{ d}}\phi } = \frac{\Upphi + \pi }{2\pi } $$
(3.28)

and finally

$$ \mu = - 1 + 2R_{\mu } ;\quad \Upphi = - \pi + 2\pi R_{\Upphi } $$
(3.29)

In practice, the direction cosines u, v, w of \( \underline{\Upomega } \) are obtained through the following steps:

  • Sampling of two rvs \( R_{\mu } ,\) \( R_{\Upphi } \)U[0,1);

  • Computation of \( \mu = - 1 + 2R_{\mu } , \) \( \Upphi = - \pi + 2\pi R_{\Upphi } ; \)

  • Finally,

$$ \begin{array}{*{20}l} {u = \underline{\Upomega } \cdot \underline{i} = \sin \vartheta \cos \Upphi = \sqrt {1 - \mu^{2} } \cos \Upphi } \\ {v = \underline{\Upomega } \cdot \underline{j} = \sin \vartheta \sin \Upphi = \sqrt {1 - \mu^{2} } \sin \Upphi = \sqrt {1 - \mu^{2} - u^{2} } } \\ {w = \underline{\Upomega } \cdot \underline{k} = \mu } \\ \end{array} $$
(3.30)

Note that a given value of μ pertains to two quadrants, so that care must be taken in selecting the proper one.

3.3.2.1.2 Exponential Distribution

Let us consider a two-state system whose transition probabilities from one state to the other only depend on the present state and not on the way in which this state was reached. Examples of such systems are:

  • A radioactive nucleus: the two states are the nucleus at a given time, which we will call initial time, and the nucleus at the moment of disintegration; the rv in question, which is the argument of the transition pdf, is the time length between the initial time, at which we know that the nucleus is intact and the time at which the nucleus disintegrates.

  • The path of a neutron in a medium: the two states are the neutron in a given position, which we will call initial, and the neutron in the position at which the collision occurs; the rv in consideration, which is the argument of the transition pdf, is the length of the flight path between the initial positions.

  • A component of an industrial plant: the two states of the component are its nominal state and its failure state. The rv in consideration, which is the argument of the transition pdf, is the difference between the time at which we know that the component is in one of its two states, and the time at which the component moves to the other state.

Such systems, characterized by ‘lack-of-memory’, are said to be ‘markovian’, and they are said to be ‘homogeneous’ or ‘inhomogeneous’ according to whether the transitions occur with constant or variable-dependent (space- or time-dependent) rates, respectively. In the latter case, if the argument of the rate of leaving a given state is the sojourn time of the system in that state, the process is called ‘semi-markovian’. Thus, a semi-markovian system is markovian only at the times of transition.

A rv \( X \in [0,\infty ) \) is said to be exponentially distributed if its cdf \( F_{X} (x) \) and pdf \( f_{X} (x) \) are

$$ \begin{array}{*{20}c} \begin{aligned} F_{X} (x) & = 1 - e^{{ - \int_{0}^{x} {\lambda (u)du} }} ; \\ & = 0 \\ \end{aligned} & \begin{aligned} f_{X} (x) & = \lambda (x)e^{{ - \int_{0}^{x} {\lambda (u)du} }} \\ & = 0 \\ \end{aligned} & \begin{aligned} \, & {\text{for}}\quad 0 \le x \le \infty \\ & {\text{otherwise}} \\ \end{aligned} \\ \end{array} $$
(3.31)

where \( \lambda ( \cdot ) \)is the transition rate, also called hazard function within the context of the last example mentioned above. In the following, we shall refer to an exponential distribution of a time variable, T. Corresponding to a realization of a rv R ∼ U[0,1), the realization t of the exponentially distributed rv T can be obtained by solving the equation

$$ \int\limits_{0}^{t} {\lambda (u){\text{d}}u} = - \log (1 - R) $$
(3.32)

Let us first consider time-homogeneous systems, i.e., the case of constant λ. Correspondingly, Eq. (3.31) becomes

$$ F_{T} (t) = 1 - e^{ - \lambda t} ;\quad f_{T} (t) = \lambda e^{ - \lambda t} $$
(3.33)

The moments of the distribution with respect to the origin are

$$ \mu '_{k} = \frac{k!}{{\lambda^{k} }}\,\left( {k = 1,2, \ldots } \right) $$
(3.34)

Realizations of the associated exponentially distributed rv T are easily obtained from the inverse transform method. The sampling of a given number N ≫ 1 of realizations is performed by repeating the following procedure:

  • Sample a realization of R ∼ U[0,1);

  • Compute \( t = - \frac{1}{\lambda }\log (1 - R). \)

An example of a time-homogeneous markovian process is the failure of a component, provided it is assumed that it does not age: such component, still good (state 1) at time t, has a probability \( \lambda {\text{d}}t \) of failing (entering state 2) between t and \( t + {\text{d}}t; \) note that this probability does not depend neither on the time t nor on the age of the component at time t. The probability density per unit time that the component, still good at time \( t_{0} , \) will fail at time \( t \ge t_{0} \) is

$$ f_{T} (t) = e^{{ - \lambda (t - t_{0} )}} \cdot \lambda $$
(3.35)

The collisions of a neutron with the nuclei of an homogeneous medium represent an example of a space-homogeneous markovian process: a neutron with energy E, traveling along a specified direction, say the x axis, at the point x has a probability \( \Upsigma_{\text{total}} (E){\text{d}}x \) of undergoing a collision between x and \( x + {\text{d}}x, \) where \( \Upsigma_{\text{total}} (E) \) is the macroscopic total cross-section which plays the role of λ in the Eq. (3.31) for the exponential distribution; note that this probability does not depend neither on the point x where the neutron is, nor on the distance traveled by that neutron before arriving at x. The probability density per unit length that a neutron at point \( x_{0} \) will make the first collision at point \( x \ge x_{0} \) is

$$ f(x,E) = e^{{ - \Upsigma_{\text{total}} (E)(x - x_{0} )}} \cdot \Upsigma_{\text{total}} (E) $$
(3.36)

Returning to processes in the time domain, a generalization of the exponential distribution consists in assuming that the probability density of occurrence of an event, namely λ, is time dependent. As mentioned above, in this case we deal with non-homogeneous markovian processes. For example, although in the reliability and risk analyses of an industrial plant or system, one often assumes that the failures of the components occur following homogeneous Markov processes, i.e., exponential distributions with constant rates, it is more realistic to consider that the age of the system influences the failure probability, so that the transition probability density is a function of time. A case, commonly considered in practice is that in which the pdf is of the kind

$$ \lambda (t) = \beta \, \alpha \, t^{\alpha - 1} $$
(3.37)

with \( \beta > 0 \, , \) \( \alpha > 0. \) The corresponding distribution, which constitutes a generalization of the exponential distribution, is called Weibull distribution and was proposed in the 1950s by W. Weibull in the course of its studies on the strength of materials. The cdf and the pdf of the Weibull distribution are

$$ F_{T} (t) = 1 - e^{{ - \beta \, t^{\alpha } }} \quad f_{T} (t) = \alpha \beta \, t^{\alpha - 1} e^{{ - \beta \, t^{\alpha } }} $$
(3.38)

The moments with respect to the origin are

$$ \mu '_{k} = \beta^{{ - \frac{k}{\alpha }}} \Upgamma \left( {\frac{k}{\alpha } + 1} \right) \quad k = 1, 2, \ldots $$
(3.39)

In the particular case of \( \alpha = 1, \) the Weibull distribution reduces to the exponential distribution with constant transition rate \( \lambda = \beta . \) The importance of the Weibull distribution stems from the fact that the hazard functions of the components of most industrial plants closely follow this distribution in time, with different parameter values describing different phases of their life. In practice, a realization t of the rv T is sampled from the Weibull distribution through the following steps:

  • Sampling of a realization of the rv R ∼ U[0,1);

  • Computation of \( t = \left( { - \frac{1}{\beta }\ln (1 - R)} \right)^{{\frac{1}{\alpha }}} . \)

3.3.2.1.3 Multivariate Normal Distribution

Let us consider a multivariate normal distribution of order k of the vector of rvs \( \underline{Z} \equiv (k,1). \) The pdf is

$$f_{{\underline{Z} }} (\underline{z} ;\underline{a} ,\Sigma ) = \frac{1}{{(2\pi )^{{\frac{k}{2}}} \left| \Sigma \right|^{{\frac{1}{2}}} }}e^{{ - \frac{1}{2}(\underline{z} - \underline{a} )^{\prime}\Sigma ^{{ - 1}} (\underline{z} - \underline{a} )}} $$
(3.40)

where with the hyphen we indicate the transpose, \( \underline{a} \equiv (k,1)\) is the vector of the mean values, and \( \Upsigma \equiv (k,k) \) is the symmetric covariance matrix, positive-defined and with determinant \( \left| \Upsigma \right| \) given by

$$ \Upsigma = E\left[ {(\underline{z} - \underline{a} )(\underline{z} - \underline{a} )^{\prime}} \right] = \left( {\begin{array}{*{20}l} {\sigma_{1}^{2} } & {\sigma_{12}^{2} } & \cdots & {\sigma_{1k}^{2} } \\ {\sigma_{21}^{2} } & {\sigma_{2}^{2} } & \cdots & {\sigma_{2k}^{2} } \\ \vdots & \vdots & \ddots & \vdots \\ {\sigma_{k1}^{2} } & {\sigma_{k2}^{2} } & \cdots & {\sigma_{k}^{2} } \\ \end{array} } \right) $$
(3.41)

The generic term of Σ is

$$ \sigma_{ij}^{2} = E\left[ {(z_{i} - a_{i} )(z_{j} - a_{j} )} \right] $$
(3.42)

and the elements \( \sigma_{i}^{2} , \) i = 1,2,…,k are the variances of the k normal variates. The pdf f in Eq. (3.40) is generally indicated as \( N(\underline{a} ,\Upsigma ) \) and correspondingly a rv Z distributed according to f is indicated as \( \underline{Z} \sim N(\underline{a} ,\Upsigma ). \)

The sampling from f of a random vector z, realization of Z can be done in the following way [17]:

  1. 1.

    \( i = - 1; \)

  2. 2.

    \( i \leftarrow i + 2; \)

  3. 3.

    Sample two values \( u_{i} , \) \( u_{i + 1} \) from the distribution U[−1,1);

  4. 4.

    If \( u_{i}^{2} + u_{i + 1}^{2} > 1 \) both values are rejected and we go back to 3. Otherwise, they are both accepted. Note that if the values are accepted, the point \( P \equiv (u_{i} ,u_{i + 1} ) \) is uniformly distributed on the circle with center at the origin and radius 1;

  5. 5.

    Calculate the values

    $$ y_{i} = u_{i} \sqrt { - 2\frac{{\log (u_{i}^{2} + u_{i + 1}^{2} )}}{{u_{i}^{2} + u_{i + 1}^{2} }}} $$
    (3.43)
    $$ y_{i + 1} = u_{i + 1} \sqrt { - 2\frac{{\log (u_{i}^{2} + u_{i + 1}^{2} )}}{{u_{i}^{2} + u_{i + 1}^{2} }}} $$
    (3.44)
  6. 6.

    It can be shown that the variables \( y_{i} \) and \( y_{i + 1} \) are independent and identically distributed (iid) standard normal variables ∼N(0,1);

  7. 7

    If k is even, and if \( i + 1 \le k, \) we return to 2);

  8. 8.

    If k is odd and if \( i \le k, \) we return to 2. In this last case, \( y_{k + 1} \) is calculated but not used;

  9. 9.

    At this point, we have the random vector \( \underline{y} \equiv (k,1) \) having iid components distributed as N(0,1). By Cholesky’s factorization of the matrix Σ into the product of a sub triangular matrix U and its transpose U′, i.e., \( \Upsigma = U \cdot U^{\prime}, \) the random vector z is given by the expression

$$ \underline{z} = \underline{a} + U\underline{y} $$
(3.45)

Because \( E[\underline{y} ] = 0 \) and \( Var[\underline{y} ] = I\) we have

$$ E[\underline{z} ] = \underline{a} $$
(3.46)
$$Var[\underline{z} ] = E[(\underline{z} - \underline{a} )(\underline{z} - \underline{a} )^{\prime}] = E[U\underline{y} \underline{y} U^{\prime}] = U \, Var[\underline{y} ]U^{\prime} = UU^{\prime} = \Upsigma $$
(3.47)
3.3.2.1.4 Determination of a conditioned pdf

In Eq. (3.40) let us partition z into two sub-vectors \( \underline{z}_{1}\) and \( \underline{z}_{2} \) relative to the first p and the remaining \( q = k - p \) components, respectively. We then have

$$ \underline{z} = \left[ {\begin{array}{*{20}l} {\underline{z}_{1} } \\ {\underline{z}_{2} } \\ \end{array} } \right];\quad \underline{a} = \left[ {\begin{array}{*{20}l} {\underline{a}_{1} } \\ {\underline{a}_{2} } \\ \end{array} } \right] $$
(3.48)

Correspondingly, we partition Σ in sub matrices

$$ \Upsigma = \left[ {\begin{array}{*{20}l} {\Upsigma_{11} } & {\Upsigma_{12} } \\ {\Upsigma^{\prime}_{12} } & {\Upsigma_{22} } \\ \end{array} } \right] $$
(3.49)

We now write the pdf f in terms of the two groups of variables. We have

$$ \Upsigma^{ - 1} = \left[ {\begin{array}{*{20}l} {\Upsigma_{p}^{ - 1} } & { - \Upsigma_{p}^{ - 1} \Upsigma_{12} \Upsigma_{22}^{ - 1} } \\ { - \Upsigma_{22}^{ - 1} \Upsigma^{\prime}_{12} \Upsigma_{p}^{ - 1} } & { - \Upsigma_{22}^{ - 1} + \Upsigma_{22}^{ - 1} \Upsigma^{\prime}_{12} \Upsigma_{p}^{ - 1} \Upsigma_{12} \Upsigma_{22}^{ - 1} } \\ \end{array} } \right] $$
(3.50)

where

$$ \Upsigma_{p} = \Upsigma_{11} - \Upsigma_{12} \Upsigma_{22}^{ - 1} \Upsigma^{\prime}_{12} $$
(3.51)

Furthermore, we have

$$ \left| \Upsigma \right| = \left| {\Upsigma_{22} } \right|\left| {\Upsigma_{p} } \right| $$
(3.52)

The exponent of \( f_{{\underline{Z} }} (\underline{z} ;\underline{a} ,\Upsigma ) \) can be expressed in terms of the partitioned quantities

$$\begin{aligned} & (\underline{z} - \underline{a} )^{\prime}\Sigma ^{{ - 1}} (\underline{z} - \underline{a} ) = \left( {(\underline{z} _{1} - \underline{a} _{1} )^{\prime}{\text{ }}(\underline{z} _{2} - \underline{a} _{2} )^{\prime}} \right) \\ & \cdot \left[ {\begin{array}{*{20}c} {\Sigma _{p}^{{ - 1}} \left[ {(\underline{z} _{1} - \underline{a} _{1} ) - \Sigma _{{12}} \Sigma _{{22}}^{{ - 1}} (\underline{z} _{2} - \underline{a} _{2} )} \right]} \\ {\Sigma _{{22}}^{{ - 1}} \left[ { - \Sigma ^{\prime}_{{12}} \Sigma _{p}^{{ - 1}} (\underline{z} _{1} - \underline{a} _{1} ) + (I + \Sigma ^{\prime}_{{12}} \Sigma _{p}^{{ - 1}} \Sigma _{{12}} \Sigma _{{22}}^{{ - 1}} )(\underline{z} _{2} - \underline{a} _{2} )} \right]} \\ \end{array} } \right] \\ =& (\underline{z} _{1} - \underline{a} _{1} )^{\prime}\Sigma _{p}^{{ - 1}} (\underline{z} _{1} - \underline{a} _{1} ) + (\underline{z} _{2} - \underline{a} _{2} )^{\prime}\Sigma _{{22}}^{{ - 1}} \left[ {(I + \Sigma ^{\prime}_{{12}} \Sigma _{p}^{{ - 1}} \Sigma _{{12}} \Sigma _{{22}}^{{ - 1}} )(\underline{z} _{2} - \underline{a} _{2} )} \right] \\ & - (\underline{z} _{1} - \underline{a} _{1} )^{\prime}\Sigma _{p}^{{ - 1}} \Sigma _{{12}} \Sigma _{{22}}^{{ - 1}} (\underline{z} _{2} - \underline{a} _{2} ) - (\underline{z} _{2} - \underline{a} _{2} )^{\prime}\Sigma _{{22}}^{{ - 1}} \Sigma ^{\prime}_{{12}} \Sigma _{p}^{{ - 1}} (\underline{z} _{1} - \underline{a} _{1} ) \\ = & (\underline{z} _{1} - \underline{a} _{1} )^{\prime}\Sigma _{p}^{{ - 1}} \left[ {(\underline{z} _{1} - \underline{a} _{1} ) - \Sigma _{{12}} \Sigma _{{22}}^{{ - 1}} (\underline{z} _{2} - \underline{a} _{2} )} \right] + (\underline{z} _{2} - \underline{a} _{2} )^{\prime}\Sigma _{{22}} (\underline{z} _{2} - \underline{a} _{2} ) \\ & - (\underline{z} _{2} - \underline{a} _{2} )^{\prime}\Sigma _{{22}}^{{ - 1}} \Sigma ^{\prime}_{{12}} \Sigma _{p}^{{ - 1}} \left[ {(\underline{z} _{1} - \underline{a} _{1} ) - \Sigma _{{12}} \Sigma _{{22}}^{{ - 1}} (\underline{z} _{2} - \underline{a} _{2} )} \right] \\ \end{aligned} $$
(3.53)

By putting

$$ \underline{a}_{p} = \underline{a}_{1} + \Upsigma_{12} \Upsigma_{22}^{ - 1} (\underline{z}_{2} - \underline{a}_{2} ) $$
(3.54)

we have

$$ (\underline{z} - \underline{a} )^{\prime}\Upsigma^{ - 1} (\underline{z} - \underline{a} ) = (\underline{z}_{2} - \underline{a}_{2} )^{\prime}\Upsigma_{22}^{ - 1} (\underline{z}_{2} - \underline{a}_{2} ) + (\underline{z}_{1} - \underline{a}_{p} )^{\prime}\Upsigma_{p}^{ - 1} (\underline{z}_{1} - \underline{a}_{p} ) $$
(3.55)

Correspondingly, \( f_{{\underline{Z} }} (\underline{z} ;\underline{a} ,\Sigma ) \) can be written as follows

$$ f_{{\underline{Z} }} (\underline{z} _{1} ,\underline{z} _{2} ) = \left[ {g(\underline{z} _{1} |\underline{z} _{2} )} \right]\left[ {h(\underline{z} _{2} )} \right] $$
(3.56)

where

$$ g(\underline{z}_{1} |\underline{z}_{2} ) = \frac{{e^{{ - \frac{1}{2}(\underline{z}_{1} - \underline{a}_{p} )^{\prime}\Upsigma_{p}^{ - 1} (\underline{z}_{1} - \underline{a}_{p} )}} }}{{(2\pi )^{\frac{p}{2}} \left| {\Upsigma_{p} } \right|^{\frac{1}{2}} }} $$
(3.57)
$$ h(\underline{z}_{2} ) = \frac{{e^{{ - \frac{1}{2}(\underline{z}_{2} - \underline{a}_{2} )^{\prime}\Upsigma_{22}^{ - 1} (\underline{z}_{2} - \underline{a}_{2} )}} }}{{(2\pi )^{\frac{q}{2}} \left| {\Upsigma_{22} } \right|^{\frac{1}{2}} }} $$
(3.58)

It follows that \( f(\underline{z} ;\underline{a} ,\Upsigma )\) can be factored into the product of a q-variate multinormal \( h(\underline{z}_{2} ;\underline{a}_{2} ,\Upsigma_{22} ), \) having mean value \( \underline{a}_{2} \) and covariance matrix \( \Upsigma_{22} , \) and a conditioned p-variate multinormal \( g(\underline{z}_{1} ;\underline{a}_{p} ,\Upsigma_{p} |\underline{z}_{2} ), \) which is also multinormal with mean value \( \underline{a}_{p} \) depending on \( \underline{z}_{2} , \) and covariance matrix \( \Sigma _{p} \) Operatively, to sample a vector realization \( \underline{\tilde{z}} \equiv (\underline{\tilde{z}}_{1} ,\underline{\tilde{z}}_{2} ) \) from \( f(\underline{z}_{1} ,\underline{z}_{2} ), \) we first sample a vector \( \underline{\tilde{z}}_{2} \) from \( h(\underline{z}_{2} ;\underline{a}_{2} ,\Upsigma_{22} )\) and, then, a vector \( \underline{\tilde{z}}_{1}\) from \( g(\underline{z}_{1} ;\underline{a}_{p} (\underline{\tilde{z}}_{2} ),\Upsigma_{p} |\underline{\tilde{z}}_{2} ).\)

3.3.2.1.5 Multinomial Distribution

Let us consider a random process which can only have n possible outcomes, the probability of the kth one being \( f_{k} . \) Examples are the throwing of a dice, the kind of interaction that a neutron can have with a nucleus (scattering, absorption, fission, etc.), once it is known that the interaction has occurred, the kind of transition (degradation, failure, repair, etc.) that a multi-state component may undergo from its current state to one of the other reachable ones, given that a transition is known to have occurred. The process is simulated by first dividing the interval [0,1) in n successive subintervals of amplitudes \( f_{1} ,\,f_{2} ,\, \ldots ,\,f_{n} \) and then performing a large number of trials in each of which a rv R ∼ U[0,1) is thrown on the interval [0,1) (Fig. 3.3). Every time a point falls within the kth subinterval, we say that out of the n possible ones the kth event has occurred: the probability of this event obeys the Bernoulli distribution, in which \( f_{k} \) is the probability of success and \( \sum\nolimits_{j = 1j \ne k}^{n} {f_{j} } = 1 - f_{k} \) is the complementary probability of the point falling elsewhere. The probability that in N trials, the point falls \( n_{k} \) times within the subinterval \( f_{k} \) is given by the binomial distribution

$$ \left( {\begin{array}{*{20}l} N \\ {n_{k} } \\ \end{array} } \right)f_{k}^{{n_{k} }} (1 - f_{k} )^{{N - n_{k} }} $$
(3.59)

The generalization of this distribution leads to the multinomial distribution which gives the probability that in N trials, the point falls \( n_{1} \) times in the subinterval \( f_{1} ,\) \( n_{2} \) times in \( f_{2} ,\)…, \( n_{n} \) times in \( f_{n} .\) Formally, the multinomial distribution is given by

$$ \frac{N!}{{n_{1} !n_{2} ! \ldots n_{n} !}}f_{1}^{{n_{1} }} f_{2}^{{n_{2} }} \ldots f_{n}^{{n_{n} }} $$
(3.60)

where, obviously, \( n_{1} + n_{2} + \cdots + n_{n} = N. \)

Fig. 3.3
figure 3

Sampling the occurrence of an event from a multinomial distribution

3.3.3 Sampling by the Composition Method

This method can be applied for sampling random numbers from a pdf that can be expressed as a mixture of pdfs.

3.3.3.1 Continuous Case

Let X be a rv having a pdf of the kind

$$ f_{X} (x) = \int {q(y)p(x,y){\text{d}}y} $$
(3.61)

where \( q(y) \ge 0 ,\) \( p(x,y) \ge 0 ,\) \( \forall x,y \) and where the integral is extended over a given domain of y. By definition of pdf, we have

$$ \int {f_{X} (x){\text{d}}x} = \int {\int {{\text{d}}x{\text{d}}yq(y)p(x,y)} } = 1 $$
(3.62)

The pdf \( f_{X} (x) \) is actually a mixture of pdfs. Indeed, the integrand function can be written as follows:

$$ q(y)p(x,y) = q(y)\int {p(x^{\prime},y){\text{d}}x^{\prime}} \frac{p(x,y)}{{\int {p(x^{\prime},y){\text{d}}x^{\prime}} }} = h_{Y} (y)g(x|y) $$
(3.63)

where

$$ h_{Y} (y) = q(y)\int {p(x^{\prime},y){\text{d}}x^{\prime}} ;\quad g(x|y) = \frac{p(x,y)}{{\int {p(x^{\prime},y){\text{d}}x^{\prime}} }} $$
(3.64)

Let us show that \( h_{Y} (y) \) is a pdf in y, and that \( g(x|y) \) is a pdf in x. Because \( p(x,y) \ge 0 ,\) \( h_{Y} (y) \ge 0 \) and \( g(x|y) \ge 0. \) The normalization of \( h_{Y} (y) \) can be derived immediately from that of \( f_{X} (x) \) and the normalization of \( g(x|y) \) is evident. Finally, the pdf \( f_{X} (x) \) can be written as

$$ f_{X} (x) = \int {h_{Y} (y)g_{X} (x|y){\text{d}}y} $$
(3.65)

where we have added the subscript X to the pdf \( g(x|y) \) to indicate that it is a pdf of the rv X. Note that y plays the role of a parameter that is actually a random realization of the rv Y having a pdf \( h_{Y} (y). \)

To sample a realization of X from \( f_{X} (x) \) we proceed as follows:

  • Sample a realization of Y from the univariate \( h_{Y} (y) ;\)

  • Sample a realization of X from the univariate \( g_{X} (x|Y) \) (note that at this point Y has a known numerical value).

For example, let

$$ f_{X} (x) = n\int\limits_{1}^{\infty } {y^{ - n} e^{ - xy} {\text{d}}y} \,\left( {n > 1,\,0 \le x < \infty } \right) $$
(3.66)
$$ = ne^{ - x} \sum\limits_{k = 1}^{n - 1} {\frac{{( - x)^{k - 1} }}{(n - 1)(n - 2) \cdots (n - k)} + n\frac{{( - x)^{n - 1} }}{(n - 1)!}Ei(x)} $$
(3.67)

where

$$ Ei(x) = \int_{x}^{\infty } {\frac{{e^{ - y} }}{y}dy} $$
(3.68)

is the integral Exponential function. Sampling from the explicit expression of the integral is too complicate, so that we prefer to resort to the composition method. Let us choose

$$ q(y) = ny^{ - n} ,\,p(x,y) = e^{ - xy} $$
(3.69)

so that

$$ \int\limits_{0}^{\infty } {p(x,y){\text{d}}x} = \frac{1}{y} $$
(3.70)

and thus

$$ h_{Y} (y) = ny^{n - 1} ,\,g_{X} (x|y) = ye^{ - xy} $$
(3.71)

The operative sequence for sampling a realization of X from \( f_{X} (x) \) is thus

  • We sample \( R_{1} ,R_{2} \sim U[0,1) ;\)

  • By using \( R_{1} ,\) we sample a value of Y from \( h_{Y} (y) \) with the inverse transform method

    $$ R_{1} = \int\limits_{1}^{Y} {h_{Y} (y){\text{d}}y} = 1 - Y^{ - n} $$
    (3.72)

    We have

    $$ Y = (1 - R_{1} )^{{ - \frac{1}{n}}} $$
    (3.73)
  • By substituting the value of Y in \( g_{X} (x|y) \) we have

$$ g_{X} (x|Y) = Ye^{ - Yx} $$
(3.74)

Hence, \( g_{X} (x|Y) \) is an exponential distribution with parameter Y. By using \( R_{2} \) we finally sample the desired realization X from the \( p_{X} (x|Y) \)

$$ X = - \frac{1}{Y}\ln (1 - R_{2} ) = - (1 - R_{1} )^{\frac{1}{n}} \ln (1 - R_{2} ) $$
(3.75)

For example, for n = 3 the rigorous expression for \( f_{X} (x) \) is

$$ f_{X} (x) = \frac{3}{2}[(1 - x)e^{ - x} + x^{2} Ei(x)] $$
(3.76)

In Fig. 3.4, we show the analytical form of \( f_{X} (x) \) (full line) and the result of the MCS (indicated with the circles) with 105 random values sampled by the previously illustrated procedures. The values are calculated with the following Matlab® program:

  • dx=.001;

  • x=dx:dx:10;

  • y=1.5*((1-x).*exp(-x)+x.∧2.*expint(x));

  • l=1e5;

  • ymc=-rand(l,1).∧0.33333.*log(rand(l,1));

  • dey=(max(ymc)-min(ymc))/50;

  • [h,xx]=hist(ymc,50);

  • hn=h/(l*dey);

  • plot(x,y);

  • hold;

  • plot(xx,hn,′o′)

  • axis([0 5 0 1.6])

Fig. 3.4
figure 4

Example of sampling from a continuous distribution, by the composition method. Analytical = solid line; MCS = circle

3.3.3.2 Discrete Case

Let X be a rv having pdf of the kind

$$ f_{X} (x) = \sum\limits_{k = 0}^{\infty } {q_{k} p(x,y_{k} )} $$
(3.77)

where \( q_{k} \ge 0,\,p(x,y_{k} ) \ge 0\quad \forall k,x,y_{k} \) and where

$$ \int {f_{X} (x){\text{d}}x} = \sum\limits_{k = 0}^{\infty } {q_{k} \int {p(x,y_{k} ){\text{d}}x} } = 1 $$
(3.78)

The pdf \( f_{X} (x) \) is really a mixture of pdfs. Indeed, each term of the sum can be written as follows

$$ q_{k} p(x,y_{k} ) = q_{k} \int {p(x^{\prime},y_{k} ){\text{d}}x^{\prime}} \frac{{p(x,y_{k} )}}{{\int {p(x^{\prime},y_{k} ){\text{d}}x^{\prime}} }} = h_{k} g(x|y_{k} ) $$
(3.79)

where

$$ h_{k} = q_{k} \int {p(x^{\prime},y_{k} ){\text{d}}x^{\prime}}; \quad g(x|y_{k} ) = \frac{{p(x,y_{k} )}}{{\int {p(x^{\prime},y_{k} ){\text{d}}x^{\prime}} }} $$
(3.80)

We show that in fact \( h_{k} \) is a probability, and that \( g(x|y_{k} ) \) is a pdf in x. Because \( p(x,y_{k} ) \ge 0 \) and \( q_{k} \ge 0, \) it follows that \( h_{k} \ge 0 \) and \( g(x|y_{k} ) \ge 0. \) The normalization of \( h_{k} \) follows immediately from that of \( f_{X} (x) \)

$$ \sum\limits_{k} {h_{k} } = \sum\limits_{k} {q_{k} \int {p(x^{\prime},y_{k} ){\text{d}}x^{\prime}} } = \int {f_{X} (x^{\prime}){\text{d}}x^{\prime}} = 1 $$
(3.81)

The normalization of \( g(x|y_{k} ) \) is evident. Finally, the pdf \( f_{X} (x) \) can be written as

$$ f_{X} (x) = \sum\limits_{k = 0}^{\infty } {h_{k} g_{X} (x|y_{k} )} $$
(3.82)

where \( g_{X} (x|y_{k} ) \) is a pdf depending on the parameter \( y_{k} ,\) which is a discrete rv having probability \( h_{k} . \)

To sample a value of X from \( f_{X} (x) \) we proceed as follows:

  • Sample a value \( Y_{k} \) from the distribution \( h_{k} \) (k = 0,1,…);

  • Sample a value of X from \( g_{X} (x|Y_{k} ). \)

For example let

$$ f_{X} (x) = \frac{5}{12}[1 + (x - 1)^{4} ]\quad 0 \le x \le 2 $$
(3.83)

i.e.,

$$ q_{1} = q_{2} = \frac{5}{12};\quad p(x,y_{1} ) = 1;\quad p(x,y_{2} ) = (x - 1)^{4} $$
(3.84)

We have

$$ h_{1} = q_{1} \int\limits_{0}^{2} {p(x,y_{1} ){\text{d}}x} = \frac{5}{12} \times 2 = \frac{5}{6} $$
(3.85)
$$ h_{2} = q_{2} \int\limits_{0}^{2} {p(x,y_{2} ){\text{d}}x} = \frac{5}{12}\frac{2}{5} = \frac{1}{6} $$
(3.86)
$$ g_{X} (x|y_{1} ) = \frac{1}{2} $$
(3.87)
$$ g_{X} (x|y_{2} ) = \frac{{(x - 1)^{4} }}{\frac{2}{5}} = \frac{5}{2}(x - 1)^{4} $$
(3.88)

Operatively, to sample a value of X from \( f_{X} (x) \)

  • Sample \( R_{1} ,R_{2} \sim U[0,1) ;\)

  • If \( R_{1} \le h_{1} = \frac{5}{6} ,\) sample a value of X from \( g_{X} (x|y_{1} ) ,\) i.e.,

    $$ R_{2} = \int\limits_{0}^{X} {g_{X} (x|y_{1} ){\text{d}}x} = \frac{1}{2}X $$
    (3.89)

    and thus

    $$ X = 2R_{2} $$
    (3.90)
  • If \( R_{1} \ge h_{1} ,\) we extract a value of X from \( g_{X} (x|y_{2} ) ,\) i.e.,

$$ R_{2} = \frac{5}{2}\int\limits_{0}^{X} {(x - 1)^{4} {\text{d}}x} = \frac{1}{2}[(X - 1)^{5} + 1] $$
(3.91)

and thus

$$ X = 1 + (2R_{2} - 1)^{{{1 \mathord{\left/ {\vphantom {1 5}} \right. \kern-\nulldelimiterspace} 5}}} $$
(3.92)

In Fig. 3.5, we show the analytical form of \( f_{X} (x) \) (full line) and the result of the MCS (indicated with the circles) with 105 samplings. The values were calculated with the following Matlab® program:

  • dx=0.001;

  • x=0:dx:2;

  • y=0.41667*(1+(x−1).∧4);

  • n=1e5;

  • c1=5/6;

  • c2=1/5;

  • r1=rand(n,1);

  • r2=rand(n,1);

  • X=zeros(n,1);

  • ip=find(r1<c1);

  • ig=find(r1>=c1);

  • X(ip)=2*r2(ip);

  • val=2*r2(ig)−1;

  • X(ig)=1+sign(val).*abs(val).∧c2;

  • deX=(max(X)−min(X))/50;

  • [h,xx]=hist(X,50);

  • hn=h/(n*deX);

  • plot(x,y);

  • hold;

  • plot(xx,hn,′o′)

Fig. 3.5
figure 5

Example of sampling from a discrete distribution, by the composition method. Analytical = solid line; MCS = circles

3.3.4 Sampling by the Rejection Method

Let \( f_{X} \left( x \right) \) be an analytically assigned pdf, in general quite complicated. The sampling of a realization of a rv \( X \) from its pdf with the rejection method consists in the tentative sampling of the realization of a rv \( X^{\prime} \) from a simpler density function, and then testing the given value with a test that depends on the sampling of another rv. Then, \( X = X^{\prime} \) only if the test is passed; else, the value of \( X^{\prime} \) is rejected and the procedure is restarted. The main disadvantage of this method can be the low efficiency of acceptance if many realizations of \( X^{\prime} \) are rejected before one is accepted as X. In the following, when the sampling of a realization of \( Z \) from a pdf \( g_{Z} \left( z \right) ,\) \( z \in \left( {z_{1} ,z_{2} } \right) ,\) can be easily done, for example by using one of the methods given in the previous paragraphs, we will simply say that we sample a \( Z \sim G\left( {z_{1} ,z_{2} } \right). \)

3.3.4.1 The von Neumann Algorithm

In its simplest version, the method of sampling by rejection can be summarized as follows: given a pdf \( f_{X} \left( x \right) \) limited in \( \left( {a,b} \right) \) let

$$ h\left( x \right) = \frac{{f_{X} \left( x \right)}}{{\mathop {\mathop {\max }\limits_{x} f_{X} \left( x \right)}\limits_{{}} }} $$
(3.93)

so that \( 0 \le h\left( x \right) \le 1, \) \( \forall x \in \left( {a,b} \right). \)

The operative procedure to sample a realization of \( X \) from \( f_{X} \left( x \right) \) is the following:

  1. 1.

    Sample \( X^{\prime}\sim U\left( {a,b} \right) ,\) the tentative value for \( X ,\) and calculate \( h\left( {X^{\prime}} \right) ;\)

  2. 2.

    Sample \( R\sim U\left[ {0,\left. 1 \right)} \right. .\) If \( R \le h\left( {X^{\prime}} \right) \) the value \( X^{\prime} \) is accepted; else start again from 1.

More generally, a given complicated \( f_{X} \left( x \right) \) can be factored into the product of a density \( g_{{X^{\prime}}} \left( x \right) ,\) from which it is simple to sample a realization of \( X^{\prime} ,\) and a residual function \( H\left( x \right) ,\) i.e.,

$$ f_{X} \left( x \right) = g_{{X^{\prime}}} \left( x \right)H\left( x \right) $$
(3.94)

Note that \( H\left( x \right) \) is not negative, being the ratio of two densities. We set

$$ B_{H} = \mathop {\max }\limits_{x} H\left( x \right) = \frac{{\mathop {\max }\limits_{x} \left( {f_{X} \left( x \right)} \right)}}{{g_{{X^{\prime}}} \left( x \right)}} $$
(3.95)

and have

$$ f_{X} \left( x \right) = \frac{{g_{{X^{\prime}}} \left( x \right)H\left( x \right)}}{{B_{H} }}B_{H} = g_{{X^{\prime}}} \left( x \right)h\left( x \right)B_{H} $$
(3.96)

where

$$ h\left( x \right) = \frac{H\left( x \right)}{{B_{H} }}\quad {\text{so}}\,{\text{that}} \quad 0 \le h\left( x \right) \le 1 $$
(3.97)

Dividing by the integral of \( f_{X} \left( x \right) \) over the entire domain, by hypothesis equal to one, we have

$$ f_{X} \left( x \right) = \frac{{g_{{X^{\prime}}} \left( x \right)h\left( x \right)}}{{\int_{ - \infty }^{\infty } {g_{{X^{\prime}}} \left( z \right)h\left( z \right){\text{d}}z} }} $$
(3.98)

Integrating Eq. (3.96) over the domain of \( x \) we have

$$ \int\limits_{ - \infty }^{\infty } {g_{{X^{\prime}}} \left( z \right)h\left( z \right)} {\text{d}}z = \tfrac{1}{{B_{H} }} $$
(3.99)

From Eqs. (3.97) and (3.98), we also have

$$ \int\limits_{ - \infty }^{\infty } {g_{{X^{\prime}}} \left( z \right)h\left( z \right){\text{d}}z \le } \int\limits_{ - \infty }^{\infty } {g_{{X^{\prime}}} } \left( z \right){\text{d}}z = 1 $$
(3.100)

so that \( B_{H} \ge 1 .\) The sampling of a random realization of \( X \) from \( f_{X} \left( x \right) \) can be done in two steps:

  1. 1.

    Sample a realization of \( X^{\prime} \) from the pdf \( g_{{X^{\prime}}} \left( x \right) ,\) which is simple by construction

    $$ \text{P} \left( {X^{\prime} \le x} \right) = G_{{X^{\prime}}} \left( x \right) = \int\limits_{ - \infty }^{x} {g_{{X^{\prime}}} \left( z \right)} {\text{d}}z $$
    (3.101)

    and then compute the number \( h\left( {X^{\prime}} \right) ;\)

  2. 2.

    Sample \( R\sim U\left[ {0,\left. 1 \right)} \right. .\) If \( R \le h\left( {X^{\prime}} \right) ,\) the sampled realization of \( X^{\prime} \) is accepted, i.e., \( X = X^{\prime} ;\) else the value of \( X^{\prime} \) is rejected and we start again from 1. The acceptance probability of the sampled value \( X^{\prime} \) is thus

$$ \text{P} \left( {R \le h\left( {X^{\prime}} \right)} \right) = h\left( {X^{\prime}} \right) $$
(3.102)

We show that the accepted value is actually a realization of \( X \) sampled from \( f_{X} \left( x \right) .\) The probability of sampling a random value \( X^{\prime} \) between \( z \) and \( z + dz \) and accepting it, is given by the product of the probabilities

$$ \text{P} \left( {z \le X^{\prime} < z + {\text{ d}}z} \right)\text{P} \left( {R \le h\left( z \right)} \right) = g_{{X^{\prime}}} \left( z \right){\text{d}}zh\left( z \right) $$
(3.103)

The corresponding probability of sampling a random value \( X^{\prime} \le x \) and accepting it is

$$ \text{P} \left( {X^{\prime} \le x{\text{ AND }}R \le h\left( {X^{\prime}} \right)} \right) = \int\limits_{ - \infty }^{x} {g_{{X^{\prime}}} \left( z \right)} h\left( z \right){\text{d}}z $$
(3.104)

The probability that a sampled \( X^{\prime} \) is accepted, i.e., the probability of success is given by the above expression for \( x \to \infty \)

$$ \text{P} ({\text{success}}) = \text{P} \left( {X^{\prime} \le x{\text{ AND }}R \le h\left( {X^{\prime}} \right)} \right) = \int\limits_{ - \infty }^{\infty } {g_{{X^{\prime}}} } \left( z \right)h\left( z \right){\text{d}}z $$
(3.105)

The distribution of the accepted values (the others are rejected) is then

$$ \begin{array}{*{20}l} {\text{P} \left( {X^{\prime} \le x\left| {\text{success}} \right.} \right) = \frac{{\text{P} \left( {X^{\prime} \le x{\text{ AND }}R \le h\left( {X^{\prime}} \right)} \right)}}{{\text{P} ({\text{success}})}}} \\ { = \frac{{\int_{\_\infty }^{x} {g_{{X^{\prime}}} \left( z \right)h\left( z \right){\text{d}}z} }}{{\int_{ - \infty }^{\infty } {g_{{X^{\prime}}} \left( z \right)h\left( z \right){\text{d}}z} }}} \\ \end{array} $$
(3.106)

and the corresponding pdf is Eq. (3.97), i.e., the given \( f_{X} \left( x \right) .\)

The simple version of the rejection method by von Neumann is the case

$$ g_{{X^{\prime}}} \left( x \right) = \frac{1}{b - a} $$
(3.107)

The efficiency \( \varepsilon \) of the method is given by the probability of success, i.e., from Eq. (3.99)

$$ \varepsilon = \text{P} ({\text{success}}) = \int\limits_{ - \infty }^{\infty } {g_{{X^{\prime}}} } (z)h(z){\text{d}}z = \frac{1}{{B_{H} }} $$
(3.108)

Let us now calculate the average number of trials that we must make before obtaining one success. The probability of having the first success at the kth trial is given by the geometric distribution

$$ P_{k} = \left( {1 - \varepsilon } \right)^{k - 1} \varepsilon ,\,k = 1,2, \ldots $$
(3.109)

and the average number of trials to have the first success is

$$ \begin{array}{lllll} E(k) & = \sum\limits_{{k = 1}}^{\infty } {kP_{k} } = \varepsilon \sum\limits_{{k = 1}}^{\infty } {k\left( {1 - \varepsilon } \right)^{{k - 1}} } = - \varepsilon \frac{{\rm{d}}}{{{\rm{d}}\varepsilon }}\sum\limits_{{k = 1}}^{\infty } {\left( {1 - \varepsilon } \right)^{k} } \\ & = - \varepsilon \frac{{\rm{d}}}{{{\rm{d}}\varepsilon }}\frac{1}{{1 - \left( {1 - \varepsilon } \right)}} = - \varepsilon \frac{{\rm{d}}}{{{\rm{d}}\varepsilon }}\frac{1}{\varepsilon } = \frac{1}{\varepsilon } = B_{H} \\ \end{array} $$
(3.110)

For example, let the pdf

$$ f_{X} \left( x \right) = \frac{2}{{\pi \left( {1 + x} \right)\sqrt x }},\,0 \le x \le 1 $$
(3.111)

For \( x = 0 ,\) \( f_{X} \left( x \right) \) diverges and thus we cannot use the simple rejection technique. Note that the factor causing the divergence of \( f_{X} \left( x \right) ,\) i.e.,\( 1/\sqrt x ,\) is proportional to the pdf of the rv \( R^{2} ,\) with \( R\sim U\left[ {0,\left. 1 \right)} \right. .\) By the change of variables

$$ X^{\prime} = R^{2} $$
(3.112)

the cdf of the rv \( X^{\prime} \) is

$$ G_{{X^{\prime}}} \left( x \right) = \text{P} \left( {X^{\prime} \le x} \right) = \text{P} \left( {R^{2} \le x} \right) = \text{P} \left( {R \le \sqrt x } \right) = \sqrt x $$
(3.113)

and the corresponding pdf is

$$ g_{{X^{\prime}}} (x) = \frac{{{\text{d}}G_{{X^{\prime}}} \left( x \right)}}{{{\text{d}}x}} = \frac{1}{2\sqrt x } $$
(3.114)

Hence, \( f_{X} \left( x \right) \) can be written as

$$ f_{X} \left( x \right) = \frac{1}{2\sqrt x }\frac{4}{\pi }\frac{1}{1 + x} = g_{{X^{\prime}}} \left( x \right)H\left( x \right) $$
(3.115)

where

$$ H(x) = \frac{4}{\pi }\frac{1}{1 + x} $$
(3.116)

We have

$$ B_{H} = \mathop {\max }\limits_{x} H(x) = \frac{4}{\pi } $$
(3.117)

and thus

$$ h\left( x \right) = \frac{H(x)}{{B_{H} }} = \frac{1}{1 + x},\,0 \le x \le 1 $$
(3.118)

The operative procedure to sample a realization of the rv \( X \) from \( f_{X} \left( x \right) \) is then:

  1. 1.

    Sample \( R_{1} \sim U\left[ {0,1} \right) \) and then calculate:

    $$ X^{\prime} = R_{1}^{2} \,{\text{and}}\,h\left( {X^{\prime}} \right) = \frac{1}{{1 + R_{1}^{2} }} $$
    (3.119)
  2. 2.

    Sample \( R_{2} \sim U\left[ {0,1} \right) .\) If \( R_{2} \le h\left( {X^{\prime}} \right) \) accept the value of \( X^{\prime} ,\) i.e., \( X = X^{\prime} ;\) else start again from 1.

The efficiency of the method, i.e., the probability that an extracted value of \( X^{\prime} \) is accepted is

$$ \varepsilon = \frac{1}{{B_{H} }} = \frac{\pi }{4} = 78.5\;\% $$
(3.120)

In Fig. 3.6, we show the analytical \( f_{X} \left( x \right) \) (full line) and the result of the MCS (indicated by circles) with \( 10^{5} \) trials.

Fig. 3.6
figure 6

Example of sampling by the rejection method. Analytical = solid line; MCS = circles

The values were calculated with the following Matlab® program:

  • clear;dx=0.001;x=dx:dx:1;lx=length(x);u=ones(1,lx);

  • y=(2/pi)*u./((1+x).*sqrt(x));

  • n=1e5;r1=rand(1,n);r2=rand(1,n);v=ones(1,n);

  • h=v./(1+r1.∧2);ip=find(r2<h);X=r1(ip).∧2;

  • nn=length(X);deX=(max(X)-min(X))/50;

  • [h,xx]=hist(X,50);hn=h/(nn*deX);

  • disp([′Efficiency=′num2str(nn/n)]),pause(10);

  • plot(x,y);hold;plot(xx,hn,′o′);

  • xlabel(′xvalues′);title(′f(x):–analytical;ooMonteCarlo′) hold

In this case, the acceptance efficiency indeed turned out to be 78.5 %

3.4 The Solution of Definite Integrals by Monte Carlo Simulation

3.4.1 Analog Simulation

Let us consider the problem of obtaining an estimate of the n-dimensional definite integral

$$ G = \int\limits_{D} {g(\underline{x} )f_{{\underline{X} }} (\underline{x} ){\text{d}}\underline{x} } $$
(3.121)

where x is an n-dimensional variable and the integration is extended to the domain \( D \in {\mathbb{R}}^{n} .\) We can always make the hypothesis that \( f(\underline{x} )\) has the characteristics of a pdf, i.e.,

$$ \begin{array}{*{20}c} {f_{{\underline{X} }} (\underline{x} ) > 0} & {\forall \underline{x} \in D,} & {\int\limits_{D} {f_{{\underline{X} }} (\underline{x} ){\text{d}}\underline{x} } = 1} \\ \end{array} $$
(3.122)

If a factor \( f_{{\underline{X} }} (\underline{x} ) \) having the above characteristics cannot be identified in the function to be integrated, it is always possible to set \( f_{{\underline{X} }} (\underline{x} ) \) equal to a constant value to be determined from the normalization condition. From a statistical perspective, it is therefore possible to consider x as a random realization of a rv having pdf \( f_{{\underline{X} }} (\underline{x} ) .\) It then follows that \( g(\underline{x} ) \) is also a rv and G can be interpreted as the expected value of \( g(\underline{x} ) ,\) i.e.,

$$ E[g(\underline{x} )] = \int\limits_{D} {g(\underline{x} )f_{{\underline{X} }} (\underline{x} ){\text{d}}\underline{x} } = G $$
(3.123)

The variance of \( g(\underline{x} )\) is then

$$ Var[g(\underline{x} )] = \int\limits_{D} {\left[ {g(\underline{x} ) - G} \right]^{2} f_{{\underline{X} }} (\underline{x} ){\text{d}}\underline{x} } = E\left[ {g^{2} (\underline{x} )} \right] - G^{2} $$
(3.124)

The MCS estimation of G can be approached with a method known as that of the mean value estimation or of the ‘dart game’.

Let us consider a dart game in \( {\mathbb{R}}^{n} \) in which the probability of hitting a point \( \underline{x} \in d\underline{x} \) is \( f_{{\underline{X} }} (\underline{x} )d\underline{x} ;\) we make the hypothesis that the dart throws are independent of each other and also that \( f_{{\underline{X} }} (\underline{x} )\) does not change as we proceed with the game. When a player hits point x, he is given a prize \( g(\underline{x} ) .\) In a series of N throws in which the points \( \underline{x}_{1} ,\underline{x}_{2} , \ldots ,\underline{x}_{N}\) are hit, the assigned prizes are \( g(\underline{x}_{1} ),g(\underline{x}_{2} ), \ldots ,g(\underline{x}_{N} ) .\) The average prize per throw is, then

$$ G_{N} = \frac{1}{N}\sum\limits_{i = 1}^{N} {g(\underline{x}_{i} )}$$
(3.125)

Because the \( g(\underline{x}_{i} ) \)‘s are rvs, \( G_{N} \) is also a rv, having expected value and variance equal to

$$ \begin{array}{*{20}l} {E\left[ {G_{N} } \right] = \frac{1}{N}\sum\limits_{i = 1}^{N} {E\left[ {g(\underline{x}_{i} )} \right]} } & {Var\left[ {G_{N} } \right] = \frac{1}{{N^{2} }}\sum\limits_{i = 1}^{N} {Var\left[ {g(\underline{x}_{i} )} \right]} } \\ \end{array} $$
(3.126)

In Eq. (3.126), \( E\left[ {g(\underline{x}_{i} )} \right] \) and \( Var\left[ {g(\underline{x}_{i} )} \right]\) are the expected value and the variance of \( g(\underline{x} ) \) computed at the point hit by the player on the ith throw. Each of these expected values is taken over an ensemble of \( M \to \infty \) players who hit points \( \underline{x}_{{i_{1} }} ,\underline{x}_{{i_{2} }} , \ldots ,\underline{x}_{{i_{M} }} \) at their ith throw. Because the probability distribution of these values does not depend on considering the particular ith throw, i.e., \( f_{{\underline{X} }} (\underline{x} ) \) is independent of i, the process is stationary and

$$ E\left[ {g(\underline{x}_{i} )} \right] = \mathop {\lim }\limits_{M \to \infty } \frac{1}{M}\sum\limits_{j = 1}^{M} {g(\underline{x}_{{i_{j} }} )} = E\left[ {g(\underline{x} )} \right] = G$$
(3.127)

Similarly

$$ Var\left[ {g(\underline{x}_{i} )} \right] = Var\left[ {g(\underline{x} )} \right] = E\left[ {g^{2} (\underline{x} )} \right] - G^{2} $$
(3.128)

We thus obtain

$$ E\left[ {G_{N} } \right] = E\left[ {g(\underline{x} )} \right] = G $$
(3.129)
$$ Var\left[ {G_{N} } \right] = \frac{1}{N}Var\left[ {g(\underline{x} )} \right] = \frac{1}{N}\left[ {E\left[ {g^{2} (\underline{x} )} \right] - G^{2} } \right] $$
(3.130)

In practical cases, \( E\left[ {g^{2} (\underline{x} )} \right] \) and G are unknown (G is indeed the target of the present evaluation) and in their place we can use the estimates with N ≫ 1. That is, we suppose that the process, in addition to being stationary, is also ergodic, and thus

$$ \begin{array}{*{20}l} {E\left[ {g(\underline{x} )} \right] \approx \frac{1}{N}\sum\limits_{i = 1}^{N} {g(\underline{x}_{i} )} = \overline{g} } & {E\left[ {g^{2} (\underline{x} )} \right] \approx \frac{1}{N}\sum\limits_{i = 1}^{N} {g^{2} (\underline{x}_{i} )} = \overline{{g^{2} }} } \\ \end{array} $$
(3.131)

Thus for N ≫ 1, it follows that \( G \approx G_{N} \) and

$$ \begin{array}{*{20}l} {E\left[ {G_{N} } \right] \approx G_{N} = \overline{g} } & {Var\left[ {G_{N} } \right] \approx s_{{G_{N} }}^{2} = \frac{1}{N}\left( {\overline{{g^{2} }} - \overline{g}^{2} } \right)} \\ \end{array} $$
(3.132)

In the last formula it is common to substitute N − 1 in place of N in the denominator, to account for the degree of freedom that is lost in the calculation of \( \overline{g} ;\) generally, because N ≫ 1, this correction is negligible.

3.4.2 Forced (Biased) Simulation

The evaluation of G by the analog method just illustrated yields poor results whenever \( g(\underline{x} ) \) and \( f_{{\underline{X} }} (\underline{x} ) \) are such that where one is large the other is small: indeed, in this case most of the sampled \( \underline{x}_{i} \) values result in small \( g(\underline{x}_{i} ) \) values which give scarce contribution to \( G_{N} ,\) and few large \( g(\underline{x}_{i} ) \) values which ‘de-stabilize’ the sample average. This situation may be circumvented in the following manner, within a sampling scheme known under the name ‘Importance Sampling’ (see also Sect. 6.2). Eq. (3.121) can be written as

$$ G = \int\limits_{D} {\frac{{g(\underline{x} )f_{{\underline{X} }} (\underline{x} )}}{{f_{1} (\underline{x} )}}f_{1} (\underline{x} ){\text{d}}\underline{x} } = \int\limits_{D} {g_{1} (\underline{x} )f_{1} (\underline{x} ){\text{d}}\underline{x} } $$
(3.133)

where \( f_{1} (\underline{x} ) \ne f_{{\underline{X} }} (\underline{x} ) \) is an appropriate function (typically called ‘Importance Sampling Distribution’, see Sect. 6.2) having the characteristics of a pdf as given in Eq. (3.122) and the prize of the dart game becomes

$$ g_{1} (\underline{x} ) = \frac{{g(\underline{x} )f_{{\underline{X} }} (\underline{x} )}}{{f_{1} (\underline{x} )}}$$
(3.134)

As we shall formally demonstrate later, the optimal choice of the pdf \( f_{1} (\underline{x} ) \) is \( f_{1} (\underline{x} ) = k\left| {g(\underline{x} )} \right|f_{{\underline{X} }} (\underline{x} ) .\) Then, in correspondence of every value \( \underline{x}_{i} \) extracted from \( f_{1} (\underline{x} ) \) one would obtain always the same prize \( g_{1} (\underline{x}_{i} ) = 1/k \) and the variance of G N would be actually zero: this means that just one sampling would suffice to obtain the exact value of G. However, we shall show that the determination of the constant k poses the exact same difficulties of computing G.

In view of (3.133), from a statistical point of view x can be interpreted as a rv distributed according to the pdf \( f_{1} (\underline{x} ) .\) As before, it follows that the prize \( g_{1} (\underline{x} ) \) is also a rv and G can be interpreted as the expected value of \( g_{1} (\underline{x} ) .\) If \( E_{1} \) and \( Var_{1} \) denote the expected value and variance with respect to the pdf \( f_{1} (\underline{x} ) ,\) we get

$$ E_{1} \left[ {g_{1} (\underline{x} )} \right] = \int\limits_{D} {g_{1} (\underline{x} )f_{1} (\underline{x} ){\text{d}}\underline{x} } = G$$
(3.135)
$$ Var_{1} \left[ {g_{1} (\underline{x} )} \right] = \int\limits_{D} {\left[ {g_{1} (\underline{x} ) - G} \right]^{2} f_{1} (\underline{x} ){\text{d}}\underline{x} } = E_{1} \left[ {g_{1}^{2} (\underline{x} )} \right] - G^{2} $$
(3.136)

As before, G can be estimated with the dart game method by sampling N values \( \underline{x}_{1} ,\underline{x}_{2} , \ldots ,\underline{x}_{N} \) from the pdf \( f_{1} (\underline{x} ) ,\) calculating the corresponding values of the prize \( g_{1} (\underline{x}_{i} ) ,\) and computing the sample mean by arithmetic averaging. The rv is thus

$$ G_{1N} = \frac{1}{N}\sum\limits_{i = 1}^{N} {g_{1} (\underline{x}_{i} )} $$
(3.137)

and

$$ \begin{array}{*{20}l} {E_{1} \left[ {G_{1N} } \right] = \frac{1}{N}\sum\limits_{i = 1}^{N} {E_{1} \left[ {g_{1} (\underline{x}_{i} )} \right]} } \\ {Var_{1} \left[ {G_{1N} } \right] = \frac{1}{{N^{2} }}\sum\limits_{i = 1}^{N} {Var_{1} \left[ {g_{1} (\underline{x}_{i} )} \right]} } \\ \end{array} $$
(3.138)

Similar to the analog case, we obtain

$$ \begin{array}{*{20}l} {E_{1} \left[ {g_{1} (\underline{x}_{i} )} \right] = E_{1} \left[ {g_{1} (\underline{x} )} \right] = G} \\ {Var_{1} \left[ {g_{1} (\underline{x}_{i} )} \right] = Var_{1} \left[ {g_{1} (\underline{x} )} \right] = E_{1} \left[ {g_{1}^{2} (\underline{x} )} \right] - G^{2} } \\ \end{array}$$
(3.139)

and

$$ E_{1} \left[ {G_{1N} } \right] = E_{1} \left[ {g_{1} (\underline{x} )} \right] = G $$
(3.140)
$$ Var_{1} \left[ {G_{1N} } \right] = \frac{1}{N}Var_{1} \left[ {g_{1} (\underline{x} )} \right] = \frac{1}{N}\left[ {E_{1} \left[ {g_{1}^{2} (\underline{x} )} \right] - G^{2} } \right]$$
(3.141)

The estimates of the expected values from the corresponding averages are

$$ \begin{array}{*{20}l} {E_{1} \left[ {g_{1} (\underline{x} )} \right] \simeq \frac{1}{N}\sum\limits_{i = 1}^{N} {g_{1} (\underline{x}_{i} ) = \overline{{g_{1} }} } } \\ {E_{1} \left[ {g_{1}^{2} (\underline{x} )} \right] \simeq \frac{1}{N}\sum\limits_{i = 1}^{N} {g_{1}^{2} (\underline{x}_{i} ) = \overline{{g_{1}^{2} }} } } \\ \end{array} $$
(3.142)

and finally, for N ≫ 1

$$ \begin{array}{*{20}l} {G_{1N} = \overline{{g_{1} }} \simeq G} \\ {Var_{1} \left[ {G_{1N} } \right] = \frac{1}{N}Var_{1} \left[ {g_{1} (\underline{x} )} \right] = \frac{1}{N}\left[ {E_{1} \left[ {g_{1}^{2} (\underline{x} )} \right] - G^{2} } \right] \simeq \frac{1}{N}\left( {\overline{{g_{1}^{2} }} - \overline{{g_{1} }}^{2} } \right)} \\ \end{array} $$
(3.143)

The variance \( Var_{1} \left[ {G_{1N} } \right] \) of the estimated value \( G_{1N} \) depends on the choice of \( f_{1} (\underline{x} ) .\) To minimize it amounts to finding the pdf \( f_{1} (\underline{x} ) \) which minimizes \( E_{1} \left[ {g_{1}^{2} (\underline{x} )} \right] ,\) with the imposition of the normalization condition that is required for \( f_{1} (\underline{x} ) \) to be a pdf. By using the method of Lagrange multipliers, the optimal \( f_{1} (\underline{x} ) \) is found by rendering stationary the functional

$$ \begin{aligned} \ell \left\{ {f_{1} } \right\} & = \int\limits_{D} {g_{1}^{2} (\underline{x} )f_{1} (\underline{x} ){\text{d}}\underline{x} } + \frac{1}{{\lambda^{2} }}\left[ {\int\limits_{D} {f_{1} (\underline{x} ){\text{d}}\underline{x} } - 1} \right] \\ &= {\int\limits_{D} {\left[ {\frac{{g^{2} (\underline{x} )f_{{\underline{X} }}^{2} (\underline{x} )}}{{f_{1} (\underline{x} )}} + \frac{1}{{\lambda^{2} }}f_{1} (\underline{x} )} \right]{\text{d}}\underline{x} - \frac{1}{{\lambda^{2} }}} } \\ \end{aligned} $$
(3.144)

The pdf \( f_{1} (\underline{x} ) \) is then the solution to \( \frac{\partial \ell }{{\partial f_{1} }} = 0 .\) The condition is satisfied if

$$ - \frac{{g^{2} (\underline{x} )f_{{\underline{X} }}^{2} (\underline{x} )}}{{f_{1}^{2} (\underline{x} )}} + \frac{1}{{\lambda^{2} }} = 0 $$
(3.145)

from which we obtain

$$ f_{1} (\underline{x} ) = \left| \lambda \right|\left| {g(\underline{x} )} \right|f_{{\underline{X} }} (\underline{x} ) $$
(3.146)

The constant \( \left| \lambda \right| \) can be determined from the normalization condition on \( f_{1} (\underline{x} ) ,\) so that finally the best \( f_{1} (\underline{x} ) \) is

$$ f_{1} (x) = \frac{{\left| {g(\underline{x} )} \right|f_{{\underline{X} }}^{2} (\underline{x} )}}{{\int\limits_{D} {\left| {g(\underline{x}^{\prime})} \right|f_{{\underline{{X^{\prime}}} }} (\underline{x}^{\prime}){\text{d}}\underline{x}^{\prime}} }} $$
(3.147)

In correspondence to this optimal pdf, we have the value

$$ \begin{aligned} \mathop {\min }\limits_{{f_{1} }} E_{1} \left[ {g_{1}^{2} (\underline{x} )} \right] & = \int\limits_{D} {\frac{{g^{2} (\underline{x} )f_{{\underline{X} }}^{2} (\underline{x} )}}{{f_{1} (\underline{x} )}}{\text{d}}\underline{x} }\\ &= {\int\limits_{D} {\frac{{g^{2} (\underline{x} )f_{{\underline{X} }}^{2} (\underline{x} )}}{{\left| {g(\underline{x} )} \right|f_{{\underline{X} }}^{{}} (\underline{x} )}}{\text{d}}\underline{x} } \int\limits_{D} {\left| {g(\underline{x}^{\prime})} \right|f_{{\underline{{X^{\prime}}} }}^{{}} (\underline{x}^{\prime}){\text{d}}\underline{x}^{\prime}} } \\ \end{aligned} $$
(3.148)

Since independent of the sign of \( g(\underline{x} ) ,\) \( g^{2} (\underline{x} )/g(\underline{x} ) = \left| {g(\underline{x} )} \right|,\) we obtain

$$ \mathop {\min }\limits_{{f_{1} }} E_{1} \left[ {g_{1}^{2} (\underline{x} )} \right] = \left[ {\int\limits_{D} {\left| {g(\underline{x} )} \right|f_{{\underline{X} }} (\underline{x} ){\text{d}}\underline{x} } } \right]^{2} $$
(3.149)

and correspondingly, from Eq. (3.141)

$$ \mathop {\min }\limits_{{f_{1} }} Var_{1} \left[ {G_{1N} } \right] = \left\{ {\left[ {\int\limits_{D} {\left| {g(\underline{x} )} \right|f_{{\underline{X} }} (\underline{x} ){\text{d}}\underline{x} } } \right]^{2} - G^{2} } \right\}$$
(3.150)

In particular, if \( g(\underline{x} ) \ge 0 ,\) the variance of G is equal to zero [18].

Figure 3.7 shows an example in which it is advantageous to use forced simulation: in fact, compared to what happens when using the natural pdf \( f_{{\underline{X} }} (\underline{x} ) ,\) the maximum of the optimal pdf \( f_{1} (\underline{x} ) = f_{{\underline{X} }}^{*} (\underline{x} ) \) is shifted toward the maximum of \( g(\underline{x} ) ,\) and the values sampled from that optimal pdf more frequently correspond to high values of the prize \( g(\underline{x} ) .\)

Fig. 3.7
figure 7

An example in which it would be appropriate to resort to forced simulation

The described procedure for performing the optimal choice of \( f_{1} (\underline{x} ) ,\) which would lead us to Eq. (3.147), is not operative because to calculate \( f_{1} (\underline{x} ) \) one must know how to calculate the denominator of Eq. (3.147), and the difficulty of this operation is equivalent to the difficulty of calculating G.

This apparently surprising result could have been foreseen by examining Eq. (3.147) for \( f_{1} (\underline{x} ) .\) By following the dart game technique, to calculate G one must sample a sequence of values \( \left\{ {\underline{x}_{1i} } \right\} \) from \( f_{1} (\underline{x} ) \) and then calculate the corresponding sequence of prizes \( \left\{ {g(\underline{x}_{1i} )} \right\} \) with Eq. (3.134). Because by hypothesis we have \( g(\underline{x}_{1i} ) \ge 0 ,\) for each \( \underline{x}_{1i} \) we have

$$ g(\underline{x}_{1i} ) = \frac{{g(\underline{x}_{1i} )f_{{\underline{X} }} (\underline{x}_{1i} )}}{{\frac{{g(\underline{x}_{1i} )f_{{\underline{X} }} (\underline{x}_{1i} )}}{{\int\limits_{D} {g(\underline{x} )f_{{\underline{X} }} (\underline{x} ){\text{d}}\underline{x} } }}}} = \int\limits_{D} {g(\underline{x} )f_{{\underline{X} }} (\underline{x} ){\text{d}}\underline{x} } = G$$
(3.151)

Then, it turns out that all the prizes \( g(\underline{x}_{1i} ) \) are equal to each other and to G, so that the variance of the sequence \( \left\{ {g(\underline{x}_{1i} )} \right\} \) is zero.

Operatively, one does not know how to calculate the denominator of Eq. (3.147), which is the value G of the integral (3.121) whose solution we are seeking. Then, in practice one chooses an \( f_{1}^{*} (\underline{x} ) \) ‘close’ to \( f_{1} (\underline{x} ) \) given by Eq. (3.147) and this allows estimating G with a considerably smaller variance than that which would be obtained by using the natural \( f_{{\underline{X} }} (\underline{x} ) \) directly: sampling the values \( \left\{ {\underline{x}_{1i} } \right\} \) from a pdf \( f_{1}^{*} (\underline{x} ) \) that approximates \( f_{1} (\underline{x} ) ,\) the values of the sequence \( \left\{ {g(\underline{x}_{1i} )} \right\} \) are almost equal to G and their variance is small.

The forced pdf \( f_{1}^{*} (\underline{x} ) \) is usually assigned dependent on a vector \( \underline{\alpha } \) of parameters, which are then determined so as to minimize the variance of the estimate. We clarify this with an example.

Let us estimate

$$ G = \int\limits_{0}^{1} {\cos \frac{\pi x}{2}{\text{d}}x} $$
(3.152)

The integral can be calculated analogically and we have G = 2/π = 0.6366198. Assuming that we are unable to perform the integration, we write the integral in the form of Eq. 3.121 by setting

$$ g(x) = \begin{array}{*{20}l} {\cos \frac{\pi x}{2}} & {f(x) = 1} \\ \end{array} $$
(3.153)

Then

$$ \begin{array}{*{20}l} {E\left[ {g^{2} (x)} \right] = \int\limits_{0}^{1} {\cos^{2} \left( {\frac{\pi x}{2}} \right)} {\text{d}}x = \frac{1}{2}} \\ {Var\left[ {g(x)} \right] = \frac{1}{2} - \left( {\frac{2}{\pi }} \right)^{2} = 9.47152 \cdot 10^{ - 2} } \\ \end{array} $$
(3.154)

Let us consider the two cases of estimating G by analog and optimally forced MCS.

The analog estimate given by Eq. (3.132) can be found with the following Matlab® program (\( N = 10^{4} \)histories)

  • N=1e4; r=rand(N,1);g=cos(pi*r/2);

  • GN=mean(g); s2GN=var(g)/N

The following values are obtained

$$ \begin{array}{*{20}l} {\begin{array}{*{20}l} {G_{N} = 0.6342} & {s_{{G_{N} }}^{2} = 9.6 \cdot 10^{ - 6} } \\ \end{array} } \\ {\begin{array}{*{20}l} {\text{or}} \\ {G_{N} = (634.2 \pm 3.1) \cdot 10^{ - 3} } \\ \end{array} } \\ \end{array} $$
(3.155)

The value \( G_{N} \) so obtained is consistent with the true value of G, from which it differs by 0.8 standard deviations.

In the case of the forced estimate of G, according to the optimal procedure we should calculate \( f_{1} (x) \) with Eq. (3.147). Because \( g(x) \ge 0 \) for \( 0 \le x \le 1 ,\) we know that in this case we would obtain \( Var_{1} \left[ {G_{1N} } \right] = 0 .\) We have \( f_{1} (x) = \frac{g(x)}{k} ,\) where \( k \) is the constant denominator of Eq. (3.147). Let us suppose that we are unable to calculate \( k :\) to find \( f_{1}^{*} (x) \) close to the optimal \( f_{1} (x) \) we approximate \( g(x) \) with the first two terms of the Taylor’s expansion of the cosine function

$$ f_{1}^{*} (x) \simeq \frac{{\left[ {1 - \frac{1}{2}\left( {\frac{\pi x}{2}} \right)^{2} } \right]}}{k} $$
(3.156)

The pdf \( f_{1}^{*} (x) \) is thus of the kind

$$ f_{1}^{*} (x) = a - bx^{2} $$
(3.157)

From the normalization condition, we have \( a - \frac{b}{3} = 1 \) and thus

$$ f_{1}^{*} (x) = a - 3(a - 1)x^{2} $$
(3.158)

From the nonnegativity condition it follows that

$$ {\text{for}}\;0 \le x < 1/\sqrt 3 \,{\text{it}}\,{\text{must}}\,{\text{be}}\,{\text{that}}\,a > - \frac{{3x^{2} }}{{1 - 3x^{2} }}\,{\text{and}}\,{\text{thus}}\,a > 0; $$
$$ {\text{For}}\;1/\sqrt 3 < x \le 1\,{\text{it}}\,{\text{must}}\,{\text{be}}\,{\text{that}}\,a \le \frac{{3x^{2} }}{{3x^{2} - 1}}\,{\text{and}}\,{\text{thus}}\,a \le 3/2. $$

It follows that \( f_{1}^{*} (x) \) has been determined with the exception of the parameter a, whose optimal value must be found inside the interval \( \left( {0,\frac{3}{2}} \right) .\)

From Eq. (3.134) we then have

$$ g_{1} \left( x \right) = \frac{{g\left( x \right)f_{x} \left( x \right)}}{{f_{1} \left( x \right)}} = \frac{{\cos \left( {\frac{\pi x}{2}} \right)}}{{a - 3\left( {a - 1} \right)x^{2} }} $$
(3.159)

In the ideal case, i.e., supposing we are able to evaluate the integrals, we would have

$$ E_{1} \left[ {g_{1}^{2} (x)} \right] = \int\limits_{0}^{1} {\frac{{g^{2} (x)}}{{f_{1}^{*} (x)}}{\text{d}}x} = \int\limits_{0}^{1} {\frac{{\cos^{2} \left( {\frac{\pi x}{2}} \right)}}{{a - 3(a - 1)x^{2} }}{\text{d}}x} $$
(3.160)

The minimum value for this expression is found for a = 3/2, and is equal to 0.406275. By substituting into Eq. (3.141) we have

$$ Var_{1} \left[ {G_{1N} } \right] = \frac{1}{N}\left[ {0.406275 - \left( {\frac{2}{\pi }} \right)^{2} } \right] = \frac{1}{N}9.9026 \cdot 10^{ - 4} $$
(3.161)

By choosing for N the value 104, like in the analog case, we obtain a variance that is smaller by two orders of magnitude. In a real case, when we might be unable to evaluate the integrals, the value of the parameter a is determined by trial and error. For each trial value a, the \( f_{1}^{*} (x) \) is given by Eq. (3.158) and is completely determined. From this \( f_{1}^{*} (x) \) we sample N values x i (i = 1,2,…,N), we calculate the corresponding values \( g_{1} (x_{i} ) \) with Eq. (3.159) and then \( \overline{{g_{1} }} \) and \( \overline{{g_{1}^{2} }} \) with Eq. (3.142); finally, we calculate \( Var_{1} \left[ {G_{1N} } \right] \) with Eq. (3.143). Among all the trial values a, the best choice is the one for which \( Var_{1} \left[ {G_{1N} } \right] \) is minimum. For this example, the determination of a was done by using the following Matlab® program (N = 104 histories) inside the interval [0,1.5] with steps equal to 0.05

  • clear; N=1e4; g=zeros(N,1); s2G1N=[];

  • a=0:0.05:1.5; la=length(a);

  • for k=1:la

    • for n=1:N

      • rr=zeros(1,3);r=zeros(1,3);

      • c=[a(k)−1 0a(k) rand]; rr=roots(c);lrr=length(rr);

      • j=0;

      • for kk=1:lrr

        • r(kk)=−1;

        • if imag(rr(kk))==0

        • j=j+1;

        • r(j)=rr(kk);

        • end

      • end

      • i=find(r>0 && r<1); x=r(i);

      • g(n)=cos(pi*x/2)/(a(k)−3*(a(k)−1)*x^2);

    • end

    • s2G1N=[s2G1N var(g)];

  • end

  • plot(a, s2G1N/N)

Figure 3.8 reports the value \( s_{{G_{1N} }}^{2} \) as a function of a (left) and the two pdfs \( f_{1} (x) \) and \( f_{1}^{*} (x) \) (right). This latter was calculated as a function of x, for the optimal value a = 1.5. From the Figure, it can be seen that \( Var_{1} \left[ {G_{1N} } \right] \) decreases monotonically as a increases, and reaches the same minimum a = 3/2, as in the theory. The corresponding estimate of G was obtained by using a forced MCS with the following Matlab® program

Fig. 3.8
figure 8

Estimated variance as a function of parameter a (left); forced pdf \( f_{1} (x) \)and \( f_{1}^{*} (x) \) (right)

  • clear;

  • N=1e4; g=zeros(N,1); s2G1N=[]; a=1.5;

  • for n=1:N

    • rr=zeros(1,3);r=zeros(1,3);

    • c=[a1 0a rand]; rr=roots(c);lrr=length(rr); j=0;

    • for kk=1:lrr

    • r(kk)=−1;

      • if imag(rr(kk))==0

      • j=j+1; r(j)=rr(kk);

      • end

    • end

    • i=find(r>0 && r<1); x=r(i); g1(n)=cos(pi*x/2)/(a3*(a1)*x^2);

  • end

  • G1N=mean(g1); s2G1N=var(g1)/N;

The following values are obtained

$$ \begin{array}{*{20}l} {\overline{{g_{1N} }} = 0.6366} & {s_{{G_{1N} }}^{2} = 9.95 \cdot 10^{ - 8} } & {or} & {G_{1N} = } \\ \end{array} (636.6 \pm .032) \cdot 10^{ - 3} $$

This result shows that choosing the optimal pdf \( f_{1}^{*} (x) \) allows us to estimate G with a variance that is two orders of magnitude smaller than that of the analog computation.

3.4.2.1 Extension to the Multivariate Case

Let us consider the definite integral G of Eq. (3.121). The sampling of a rv vector \( \underline{x} \) from \( f_{{\underline{X} }} (\underline{x} ) \) can be done by starting from the following identity

$$ f_{{\underline{X} }} (x_{1} ,x_{2} , \ldots ,x_{n} ) = \left[ {\prod\limits_{j = 1}^{n - 1} {f_{j + 1} (x_{j} |x_{j - 1} ,x_{j - 2} , \ldots ,x_{1} )} } \right]f(x_{n} |x_{n - 1} , \ldots ,x_{1} ) $$
(3.162)

where

$$ \begin{aligned} f_{j\; + \;1} (x_{j} |x_{j\; - \;1} ,x_{j\; - \;2} , \ldots ,x_{1} ) & = \frac{{\int {{\text{d}}x_{j\; + \;1} {\text{d}}x_{j + 2} \ldots {\text{d}}x_{n} f_{{\underline{X} }} (\underline{x} )} }}{{\int {{\text{d}}x_{j} {\text{d}}x_{j\; + \;1} {\text{d}}x_{j + 2} \ldots {\text{d}}x_{n} f_{{\underline{X} }} (\underline{x} )} }} \\ & = \frac{{f_{\text{marg}} (x_{1} ,x_{2} , \ldots ,x_{j} )}}{{f_{\text{marg}} (x_{1} ,x_{2} , \ldots ,x_{j\; - \;1} )}} \\ \end{aligned} $$
(3.163)
$$ f(x_{n} |x_{n\; - \;1} ,x_{n\; - \;2} , \ldots,x_{1} ) = \frac{{f_{{\underline{X} }} (\underline{x} )}}{{\int {{\text{d}}x_{n} f(\underline{x} )} }} $$
(3.164)

where \( f_{marg} (x_{1} ,x_{2} , \ldots ,x_{j} ) \) is the marginal pdf of \( f_{{\underline{X} }} (\underline{x} ) \) with respect to the variables \( x_{1} ,x_{2} , \ldots ,x_{j} .\) From Eq. (3.162), it can be seen that we can sample \( \underline{x}\) by sampling successively the x j components from conditional univariate distributions, i.e.,

$$ \begin{gathered} \left[ {x_{1} \,{\text{from}}\,f_{2} (x_{1} )} \right],\,\left[ {x_{2} \,{\text{from}}\,f_{3} (x_{2} |x_{1} )} \right], \ldots , \hfill \\ \left[ {x_{n} \,{\text{from}}\,f(x_{n} |x_{n - 1} , \ldots ,x_{1} )} \right] \hfill \\ \end{gathered} $$
(3.165)

In words, we sample \( x_{1} \) from the marginal distribution of f with respect to all the variables except for \( x_{1} ,\) \( x_{2} \) from the conditional distribution with respect to the obtained value of \( x_{1} \) and marginal with respect to all the remaining variables except for \( x_{2} ,\) and so on.

3.5 Sensitivity Analysis by Monte Carlo Simulation

The definite integral G defined by (3.121) depends on the values of the parameters that appear in the function \( g(\underline{x} ) \) and in the pdf \( f_{{\underline{X} }} (\underline{x} ) .\) Let us suppose, for simplicity, that those functions have in common a scalar parameter p: we want to make a MC estimate of the sensitivity of G with respect to a variation of the parameter p, namely \( {\text{d}}G/{\text{d}}p .\) Thus, by including the parameter explicitly as an argument, we can write Eq. (3.121) as

$$ G(p) = \int\limits_{D} {g(\underline{x} ;p)f_{{\underline{X} }} (\underline{x} ;p){\text{d}}\underline{x} } $$
(3.166)

Of course, a special case of this formulation is the one in which only g or only f depends on p.

We present two procedures for estimating the sensitivity, both similar to the one described for forced simulation [1923].

3.5.1 Correlated Sampling

Let us set, for brevity

$$ \begin{array}{*{20}l} {} & \begin{aligned} g^{*} \equiv\,& g(\underline{x} ;p + \Updelta p) \\ g \equiv\,& g(\underline{x} ;p) \\ \end{aligned} & {} & \begin{aligned} f_{{\underline{X} }}^{*} \equiv\,& f_{{\underline{X} }} (\underline{x} ;p + \Updelta p) \\ f_{{\underline{X} }} \equiv\,& f_{{\underline{X} }} (\underline{x} ;p) \\ \end{aligned} \\ \end{array} $$
(3.167)

Further, let us indicate with \( E\left[ \cdot \right] \) and \( E^{*} \left[ \cdot \right] \) the expected values of the argument calculated with the pdfs \( f_{{\underline{X} }} \) and \( f_{{\underline{X} }}^{*} ,\) respectively. Corresponding to the value p + Δp of the parameter, the definite integral defined by Eq. (3.121) becomes

$$ G^{*} \equiv G(p + \Updelta p) = \int\limits_{D} {g(\underline{x} ;p + \Updelta p)f_{{\underline{X} }} (\underline{x} ;p + \Updelta p){\text{d}}\underline{x} } = E^{*} \left[ {g^{*} } \right] $$
(3.168)

Also,

$$ \begin{array}{*{20}l} {G^{*} \equiv G(p + \Updelta p) = \int\limits_{D} {g(\underline{x} ;p + \Updelta p)\frac{{f_{{\underline{X} }} (\underline{x} ;p + \Updelta p)}}{{f_{{\underline{X} }} (\underline{x} ;p)}}f_{{\underline{X} }} (\underline{x} ;p){\text{d}}\underline{x} } } \\ { = E^{*} \left[ {g^{*} \frac{{f_{{\underline{X} }}^{*} }}{{f_{{\underline{X} }} }}} \right] \equiv E\left[ h \right]} \\ \end{array} $$
(3.169)

where we set

$$ h(\underline{x} ;p,\Updelta p) = g(\underline{x} ;p + \Updelta p)\frac{{f_{{\underline{X} }} (\underline{x} ;p + \Updelta p)}}{{f_{{\underline{X} }} (\underline{x} ;p)}} \equiv g^{*} \frac{{f_{{\underline{X} }}^{*} }}{{f_{{\underline{X} }} }} $$
(3.170)

Corresponding to a given Δp (in general we choose Δp/p <<1). The MCS estimate of \( g^{*} \) can be done simultaneous to that of G with the described method of the dart game (Sect. 3.4.1): for each of the N values \( \underline{x}_{i} \) sampled from \( f_{{\underline{X} }} (\underline{x} ;p) ,\) we accumulate the value \( g(\underline{x}_{i} ;p) \) with the aim of calculating \( G_{N} \) to estimate G, and we also accumulate the value \( h(\underline{x}_{i} ;p,\Updelta p) \) with the aim of calculating \( G_{N}^{*} \) as estimate of \( G^{*} .\) The values \( G_{N} \) and \( G_{N}^{*} ,\) calculated by using the same sequence {\( \underline{x}_{i} \)}, are correlated. We have

$$ G_{N}^{*} = \frac{1}{N}\sum\limits_{i = 1}^{N} {h(\underline{x}_{i} ;p,\Updelta p)} $$
(3.171)

But

$$ \begin{array}{*{20}l} {E\left[ h \right] = G_{N}^{*} } \\ {Var\left[ h \right] = E\left[ {h^{2} } \right] - \left( {G^{*} } \right)^{2} } \\ \end{array} $$
(3.172)

Thus

$$ \begin{array}{*{20}l} {E\left[ {G_{N}^{*} } \right] = G^{*} } \\ {Var\left[ {G_{N}^{*} } \right] = \frac{1}{N}Var\left[ h \right] = \frac{1}{N}\left\{ {E\left[ {h^{2} } \right] - \left( {G^{*} } \right)^{2} } \right\}} \\ \end{array} $$
(3.173)

To compute the sensitivity of G with respect to the variation of the parameter from p to p + Δp, let us define

$$ \Updelta G_{N}^{*} = G_{N}^{*} - G_{N} = \frac{1}{N}\sum\limits_{i = 1}^{N} {(h_{i} - g_{i} )} $$
(3.174)

where, for brevity, we set

$$ h_{i} \equiv h(\underline{x}_{i} ;p + \Updelta p)\,{\text{and}}\,g_{i} \equiv g(\underline{x}_{i} ;p) $$
(3.175)

We have

$$ E\left[ {h_{i} - g_{i} } \right] = E\left[ {h - g} \right] = E\left[ h \right] - E\left[ g \right] = G^{*} - G $$
(3.176)
$$ \begin{aligned} Var\left[ {h_{i} - g_{i} } \right] & = Var\left[ {h - g} \right] = E\left[ {\left\{ {(h - g) - G^{*} - G} \right\}^{2} } \right]\\ &= E\left[ {\left( {h - G^{*} } \right)^{2} } \right] + E\left[ {\left( {g - G} \right)^{2} } \right] - 2E\left[ {\left( {h - G^{*} } \right)\left( {g - G} \right)} \right]\\ &= {Var\left[ h \right] + Var\left[ g \right] - 2\left\{ {E\left[ {hg} \right] - G^{*} G} \right\}} \\ \end{aligned} $$
(3.177)

The sensitivity \( {\text{d}}G/{\text{d}}p \) and its variance are estimated as

$$ E\left[ {\frac{{\Updelta G_{N} }}{\Updelta p}} \right] = \frac{1}{\Updelta p}E\left[ {h_{i} - g_{i} } \right] = \frac{1}{\Updelta p}\left( {G^{*} - G} \right) \simeq \frac{1}{\Updelta p}\left( {\overline{h} - \overline{g} } \right) $$
(3.178)
$$ \begin{array}{*{20}l} {Var\left[ {\frac{{\Updelta G_{N} }}{\Updelta p}} \right] = \frac{1}{N}\left\{ {\frac{{Var\left[ h \right] + Var\left[ g \right] - 2E\left[ {hg} \right] + 2G^{*} G}}{{(\Updelta p)^{2} }}} \right\} \simeq } \\ {\frac{1}{N}\left\{ {\frac{{\left( {\overline{{h^{2} }} - \overline{h}^{2} } \right) + \left( {\overline{{g^{2} }} - \overline{g}^{2} } \right) - 2E\left[ {\overline{hg} - \overline{h} \overline{g} } \right]}}{{(\Updelta p)^{2} }}} \right\}} \\ \end{array} $$
(3.179)

The value of G, with its variance, and the sensitivity \( {\text{d}}G/{\text{d}}p ,\) with its variance, can be estimated by calculating, for each value \( \underline{x}_{i} \) of the sequence {\( \underline{x}_{i} \)} sampled from \( f_{{\underline{X} }} (\underline{x} ;p) ,\) the three values

$$ g_{i} \equiv g(\underline{x}_{i} ;p),\,g_{i}^{*} \equiv g(\underline{x}_{i} ;p + \Updelta p)\,{\text{and}}\,f_{i}^{*} \equiv f_{{\underline{X} }} (\underline{x}_{i} ;p + \Updelta p) $$
(3.180)

and by accumulating the five quantities

$$ g_{i} ;\quad g_{i}^{2} ;\quad h_{i} = g_{i}^{*} \frac{{f_{i}^{*} }}{{f_{i} }};\quad h_{i}^{2} \equiv \left( {g_{i}^{*} \frac{{f_{i}^{*} }}{{f_{i} }}} \right)^{2} ;\quad h_{i} g_{i} = g_{i}^{*} \frac{{f_{i}^{*} }}{{f_{i} }}g_{i} $$
(3.181)

After the N accumulations, we calculate the arithmetic averages \( \overline{g} ,\overline{{g^{2} }} ,\,\overline{h} ,\,\overline{{h^{2} }} ,\,\overline{hg} \) which, substituted in Eqs. (3.178) and (3.179), give the desired estimates.

3.5.2 Differential Sampling

By differentiating Eq. (3.166) with respect to the parameter of interest p, we obtain the expression for the first-order sensitivity of G with respect to a variation of p

$$ \begin{aligned} {}\frac{\partial G}{\partial p} & = \int {\left[ {\frac{{\partial g(\underline{x} ;p)}}{\partial p}f_{{\underline{X} }} (\underline{x} ;p) + g(\underline{x} ;p)\frac{{\partial f_{{\underline{X} }} (\underline{x} ;p)}}{\partial p}} \right]} {\text{d}}\underline{x} \\& = {\int {\left[ {\frac{\partial }{\partial p}\ln g(\underline{x} ;p) + \frac{\partial }{\partial p}\ln f_{{\underline{X} }} (\underline{x} ;p)} \right]} g(\underline{x} ;p)f_{{\underline{X} }} (\underline{x} ;p){\text{d}}\underline{x} } \\ \end{aligned} $$
(3.182)

The MC estimate of the first-order sensitivity can be obtained by sampling N values {\( \underline{x}_{i} \)} from \( f_{{\underline{X} }} (\underline{x} ;p) ,\) and calculating the arithmetic average

$$ \left( {\frac{\partial G}{\partial p}} \right)_{N} = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left[ {\frac{\partial }{\partial p}\ln g(\underline{x}_{i} ;p) + \frac{\partial }{\partial p}\ln f_{{\underline{X} }} (\underline{x}_{i} ;p)} \right]g(\underline{x}_{i} ;p)} $$
(3.183)

The extension of this simple procedure to the calculation of the pure or mixed sensitivity of a generic nth order is straightforward.

3.6 Monte Carlo Simulation Error and Quadrature Error

Let us finally compare the statistical error made by estimating G by using the MCS method with N trials, and the numerical error derived by a quadrature formula in which the integrand function is calculated in N points [24, 25]. In any case, analog or biased, the MC error [see Eqs. (3.131) and (3.143)] varies with \( N^{{ - \tfrac{1}{2}}} ,\) i.e.,

$$ \varepsilon_{MC} \sim N^{{ - \tfrac{1}{2}}} $$
(3.184)

In the case of a fairly regular function, the error in any form of quadrature varies like \( \Updelta^{k} \) with Δ equal to the integration interval and k a small integer which depends on the numerical method employed in the quadrature formula. In general, k increases with the complexity of the rule, but is at most 2 ÷ 3.

In the case of a hypercube with n dimensions and side length 1, the number of points on one edge is \( \Updelta^{ - 1} \) so that the total number of points is \( N = \Updelta^{ - n} \) and the numerical quadrature error is

$$ \varepsilon_{q} \sim \Updelta^{k} \sim N^{{ - \tfrac{k}{n}}} $$
(3.185)

The MCS estimate is convenient, i.e., \( \varepsilon_{MC} \le \varepsilon_{q} ,\) if \( n \ge 2k = 6 ,\) i.e., if it is necessary to evaluate an integral in a domain that is at least 6-dimensional.