Definitions

Signal processing begins with signals. The simplest signal is a sine wave with a single spectral component, i.e., with a single frequency, as shown in Fig. 14.1. It is sometimes called a pure tone. A sine wave function of time t with amplitude C, angular frequency ω, and starting phase φ, is given by

x ( t ) = C sin ( ω t + φ ) .
(14.1)

The amplitude has the same units as the waveform x, the angular frequency has units of radians per second, and the phase has units of radians.

Fig. 14.1
figure 1

A sine wave with amplitude C and period T. A little more than three and a half cycles are shown. The starting phase is φ = 0

Because there are 2π radians in one cycle

ω = 2 π f ,
(14.2)

and (14.1) can be written as

x ( t ) = C sin ( 2 π f t + φ )
(14.3)

or as

x ( t ) = C sin ( 2 π t T + φ ) ,
(14.4)

where f is the frequency in cycles per second (or Hertz) and T is the period in units of seconds per cycle, T = 1/f.

complex wave is the sum of two or more sine waves, each with its own amplitude, frequency, and phase. For example,

x ( t ) = C 1 sin ( ω 1 t + φ 1 ) + C 2 sin ( ω 2 t + φ 2 )
(14.5)

is a complex wave with two spectral components having frequencies f 1 and f 2. The period of a complex wave is the reciprocal of the greatest common divisor of f 1 and f 2. For instance, if f 1 =400  Hz and f 2 =600  Hz, then the period is 1/(200 Hz) or 5 ms. The fundamental frequency is the reciprocal of the period.

A general waveform can be written as a sum of N components,

x ( t ) = n = 1 N C n sin ( ω n t + φ n ) ,
(14.6)

and the fundamental frequency is the greatest common divisor of the set of frequencies {f n }.

An alternative description of the general waveform can be derived by using the trigonometric identity

sin ( θ 1 + θ 2 ) = sin θ 1 cos θ 2 + sin θ 2 cos θ 1
(14.7)

so that

x ( t ) = n = 1 N A n cos ( ω n t ) + B n sin ( ω n t ) ,
(14.8)

where A n  = C n  sin φ n , and B n  = C n  cos φ n , are the cosine and sine partial amplitudes respectively. Thus the two parameters C n and φ n are replaced by two other parameters A n and B n .

Because of the trigonometric identity

sin 2 θ + cos 2 θ = 1 ,
(14.9)

the amplitude C n can be written in terms of the partial amplitudes,

C n 2 = A n 2 + B n 2 ,
(14.10)

as can the component phase

φ n = Arg ( A n , B n ) .
(14.11)

The Arg function is essentially an inverse tangent, but because the principal value of the arctangent function only runs from - π 2 to π 2 , an adjustment needs to be made when B n is negative. In the end,

Arg ( A n , B n ) = arctan ( A n B n ) ( for  B n 0 )
(14.12)

and

Arg ( A n , B n ) = arctan ( A n B n ) + π ( for  B n < 0 ) .

The remaining sections of this chapter provide a brief treatment of real signals x(t) – first as continuous functions of time and then as sampled data. Readers who are less familiar with the continuous approach may wish to refer to the more extensive treatment in [14.1].

Fourier Series

The Fourier series applies to a function x(t) that is periodic. Periodicity means that we can add any integral multiple m of T to the running time variable t and the function will have the same value as at time t, i.e.

x ( t + mT ) = x ( t ) , for all integral  m .
(14.13)

Because m can be either positive or negative and as large as we like, it is clear that x is periodic into the infinite future and past. Then Fourierʼs theorem says that x can be represented as a Fourier series like

x ( t ) = A 0 + n = 1 [ A n cos ( ω n t ) + B n sin ( ω n t ) ] .
(14.14)

All the cosines and sines have angular frequencies ω n that are harmonics, i.e., they are integral multiples of a fundamental angular frequency ω 0,

ω n = n ω 0 = 2 π n T ,
(14.15)

where n is an integer.

The fundamental frequency f 0 is given by f 0 = ω 0/(2π). The fundamental frequency is the lowest frequency that a sine or cosine wave can have and still fit exactly into one period of the function x(t) because f 0 = 1/T. In order to make a function x(t) with period T, the only sines and cosines that are allowed to enter the sum are those that fit exactly into the same period T. These are those sines and cosines with frequencies that are integral multiples of the fundamental.

The factors A n and B n in (14.14) are the Fourier coefficients. They can be calculated by projecting the function x(t) onto sine and cosine functions of the harmonic frequencies ω n . Projecting means to integrate the product of x(t) and a sine or cosine function over a duration of time equal to a period of x(t). Sines and cosines with different harmonic frequencies are orthogonal over a period. Consequently, projecting x(t) onto, for example cos (3ω 0 t), gives exactly the Fourier coefficient A 3.

It does not matter which time interval is used for integration, as long as it is exactly one period in duration. It is common to use the interval − T/2 to T/2.

The orthogonality and normality of the sine and cosine functions are described by the following equations:

2 T - T / 2 T / 2 d t sin ( ω n t ) cos ( ω m t ) = 0 ,
(14.16)

for all m and n;

2 T - T / 2 T / 2 d t cos ( ω n t ) cos ( ω m t ) = δ n , m
(14.17)

and

2 T - T / 2 T / 2 d t sin ( ω n t ) sin ( ω m t ) = δ n , m ,
(14.18)

where δ n,m is the Kronecker delta, equal to one if m = n and equal to zero otherwise.

It follows that the equations for A n and B n are

A n = 2 T - T / 2 T / 2 d t x ( t ) cos ( ω n t ) for  n > 0 ,
(14.19)
B n = 2 T - T / 2 T / 2 d t x ( t ) sin ( ω n t ) for  n > 0 .
(14.20)

The coefficient A 0 is simply a constant that shifts the function x(t) up or down. The constant A 0 is the only term in the Fourier series (14.14) that could possibly have a nonzero value when averaged over a period. All the other terms are sines and cosines; they are negative as much as they are positive and average to zero. Therefore, A 0 is the average value of x(t). It is the direct-current (DC) component of x. To find A 0 we project the function x(t) onto a cosine of zero frequency, i.e. onto the number 1, which leads to the average value of x,

A 0 = 1 T - T / 2 T / 2 d t x ( t ) .
(14.21)

The Spectrum

The Fourier series is a function of time, where A n and B n are coefficients that weight the cosine and sine contributions to the series. The coefficients A n and B n are real numbers that may be positive or negative.

An alternative approach to the function x(t) deemphasizes the time dependence and considers mainly the coefficients themselves. This is the spectral approach. The spectrum simply consists of the values of A n and B n , plotted against frequency, or equivalently, plotted against the harmonic number n. For example, if we have a signal given by

x ( t ) = 5 sin ( 2 π 150 t ) + 3 cos ( 2 π 300 t ) - 2 cos ( 2 π 450 t ) + 4 sin ( 2 π 450 t )
(14.22)

then the spectrum consists of only a few terms. The period of the signal is 1 150 s , the fundamental frequency is 150 Hz, and there are two additional harmonics: a second harmonic at 300 Hz and a third at 450 Hz. The spectrum is shown in Fig. 14.2.

Fig. 14.2
figure 2

(a) , (b) The amplitudes A and B for the signal in (14.22); (c) , (d) the corresponding magnitude and phases

Symmetry

Many important periodic functions have symmetries that simplify the Fourier series. If the function x(t) is an even function [x(− t) = x(t)] then the Fourier series for x contains only cosine terms. All coefficients of the sine terms B n are zero. If x(t) is odd [x(− t) =− x(t)], the the Fourier series contains only sine terms, and all the coefficients A n are zero. Sometimes it is possible to shift the origin of time to obtain a symmetrical function. Such a time shift is allowed if the physical situation at hand does not require that x(t) be synchronized with some other function of time or with some other time-referenced process. For example, the sawtooth function in Fig. 14.3 is an odd function. Therefore, only sine terms are present in the series.

Fig. 14.3
figure 3

The Fourier series of an odd function like this sawtooth consists of sine terms only. The Fourier coefficients can be computed by an integral over a single period from − T/2 to T/2

The Fourier coefficients can be calculated by doing the integral over the interval shown by the heavy line. The integral is easy to do analytically because x(t) is just a straight line. The answer is

B n = 2 π ( - 1 ) ( n + 1 ) n .
(14.23)

Consequently, the sawtooth function itself is given by

x ( t ) = 2 π n = 1 ( - 1 ) ( n + 1 ) n sin ( 2 π n t T ) .
(14.24)

A bridge between the Fourier series and the Fourier transform is the complex form for the spectrum,

X n = A n + i B n .
(14.25)

Because of Eulerʼs formula, namely

e i θ = cos θ + i sin θ ,
(14.26)

it follows that

X n = 2 T - T / 2 T / 2 d t x ( t ) e i ω n t .
(14.27)

Fourier Transform

The Fourier transform of a time-dependent signal is a frequency-dependent representation of the signal, whether or not the time dependence is periodic. Compared to the frequency representation in the Fourier series, the Fourier transform differs in several ways. In general the Fourier transform is a complex function with real and imaginary parts. Whereas the Fourier series representation consists of discrete frequencies, the Fourier transform is a continuous function of frequency. The Fourier transform also requires the concept of negative frequencies. The transformation tends to be symmetrical with respect to the appearance of positive and negative frequencies and so negative frequencies are just as important as positive frequencies. The treatment of the Fourier integral transform that follows mainly states results. For proof and further applications the reader may wish to consult [14.1, mostly Chap. 8].

The Fourier transform of signal x(t) is given by the integral

X ( ω ) = [ x ( t ) ] = d t e - i ω t x ( t ) .
(14.28)

Here, and eleswhere unless otherwise noted, integrals range over all negative and positive values, i.e. −∞ to +∞.

The inverse Fourier transform expresses the signal as a function of time in terms of the Fourier transform,

x ( t ) = 1 2 π d ω e i ω t X ( ω ) .
(14.29)

These expressions for the transform and inverse transform can be shown to be self-consistent. A key fact in the proof is that the Dirac delta function can be written as an integral over all time,

δ ( ω ) = 1 2 π d t e ± i ω t ,
(14.30)

and similarly

δ ( t ) = 1 2 π d ω e ± i ω t .
(14.31)

Because a delta function is an even function of its argument, it does not matter if the + or − sign is used in these equations.

Reality and Symmetry

The Fourier transform X(ω) is generally complex. However, signals like x(t) are real functions of time. In that connection (14.29) would seem to pose a problem, because it expresses the real function x as an integral involving the complex exponential multiplied by the complex Fourier transform. The requirement that x be real leads to a simple requirement on its Fourier transform X. The requirement is that X(− ω) must be the complex conjugate of X(ω), i.e., X(− ω) = X *(ω). That means that

Re X ( - ω ) = Re X ( ω )
(14.32)

and

Im X ( - ω ) = - Im X ( ω ) .

Similar reasoning leads to special results for signals x(t) that are even or odd functions of time t. If x is even [x(− t) = x(t)] then the Fourier transform X is not complex but is entirely real. If x is odd [x(− t) =− x(t)] then the Fourier transform X is not complex but is entirely imaginary.

The polar form of the Fourier transform is normally a more useful representation than the real and imaginary parts. It is the product of a magnitude, or absolute value, and an exponential phase factor,

X ( ω ) = | X ( ω ) | exp [ i φ ( ω ) ] .
(14.33)

The magnitude is a positive real number. Negative or complex values of X arise from the phase factor. For instance, if X is entirely real then φ(ω) can only be zero or 180°.

Examples

A few example Fourier transforms are insightful.

The Gaussian

The Fourier transform of a Gaussian is a Gaussian. The Gaussian function of time is

g ( t ) = 1 σ 2 π e - t 2 / ( 2 σ 2 ) .
(14.34)

The function is normalized to unit area, in the sense that the integral of g(t) over all time is 1.0. The Fourier transform is

G ( ω ) = e - ω 2 σ 2 / 2 .
(14.35)

The Unit Rectangle Pulse

The unit rectangle pulse, r(t), is a function of time that is zero except on the interval − T 0/2 to T 0/2. During that interval the function has the value 1/T 0, so that the function has unit area. The Fourier transform of this pulse is

R ( ω ) = [ sin ( ω T 0 2 ) ] / ( ω T 0 2 ) ,
(14.36)

or, in terms of frequency

R ( f ) = sin ( π f T 0 ) π f T 0 ,

as shown in Fig. 14.4.

Fig. 14.4
figure 4

The Fourier transform of a single pulse with duration T 0 as a function of frequency f expressed in dimensionless form fT 0

The function of the form ( sin x)/x is sometimes called the sinc function. However, ( sin πx)/(πx) is also called the sinc function. Therefore, whenever the sinc function is used by name it must be defined.

Both the Gaussian and the unit rectangle illustrate a reciprocal effect sometimes called the uncertainty principle. The Gaussian function of time g(t) is narrow if σ is small because σ appears in the denominator of the exponential in g(t). Then the Fourier transform G(ω) is wide because σ appears in the numerator of the exponential in G(ω). Similarly, the unit rectangle is narrow if T 0 is small. Then the Fourier transform R(ω) is broad because R(ω) depends on the product ωT 0. The general statement of the principle is that, if a function of one variable (e.g. time) is compact, then the transform representation, that is the function of the conjugate variable (e.g. frequency), is broad, and vice versa. The extreme expression of the uncertainty principle appears in the Fourier transform of a function that is constant for all time. According to (14.30), that transform is a delta function of frequency. Conversely, the Fourier transform of a delta function is a constant for all frequency. That means that the spectrum of an ideal impulse contains all frequencies equally.

A contrast between the Fourier transforms of Gaussian and rectangle pulses is also revealing. Because the Gaussian is a smooth function of time, the transform has a single peak. Because the rectangle has sharp edges, there are oscillations in the transform. If the rectangle is given sloping or rounded edges, the amplitude of the oscillations is reduced.

Time-Shifted Function

If y(t) is a time-shifted version of x(t), i.e.

y ( t ) = x ( t - t 0 ) ,
(14.37)

then the Fourier transform of y is related to the Fourier transform of x by the equation

Y ( ω ) = exp ( - i ω t 0 ) X ( ω ) .
(14.38)

The transform Y is the same as X except for a phase shift that increases linearly with frequency. There are two important implications of this equation. First, because the magnitude of the exponential with imaginary argument is 1.0, the magnitude of Y is the same as the magnitude of X for all values of ω. Second, reversing the logic of the equation shows that, if the phase of a signal is changed in such a way that the phase shift is a linear function of frequency, then the change corresponds only to a shift along the time axis for the function of time, and not to a distortion of the shape of the wave. A general phase-shift function of frequency can be separated into the best-fitting straight line and a residual. Only the residual distorts the shape of the signal as a function of time.

Derivatives and Integrals

If v(t) is the derivative of x(t), i.e., v(t) = dx/dt, then the Fourier transform of v is related to the transform of x by the equation

V ( ω ) = i ω X ( ω ) .
(14.39)

Thus, differentiating a signal is equivalent to ideal high-pass filtering with a boost of 6 dB per octave, i.e., doubling the frequency doubles the ratio of the output to the input, as processed by the differentiator. Differentiating also leads to a simple phase shift of 90° (π/2 rad) in the sense that the new factor of i equals exp (iπ/2). The differentiation equation can be iterated. The Fourier transform of the n-th derivative of x(t) is (iω)n X(ω).

Integration is the inverse of differentiation, and that fact becomes apparent in the Fourier transforms. If w(t) is the integral of x(t), i.e.,

w ( t ) = - t d t x ( t ) ,
(14.40)

then the Fourier transform of w is related to the Fourier transform of x by the equation

W ( ω ) = X ( ω ) i ω + X ( 0 ) δ ( ω ) .
(14.41)

The first term above could have been anticipated based on the transform of the derivative. The second term corresponds to the additive constant of integration that always appears in the context of an integral. The number X(0) is the average (DC) value of the signal x(t), and if this average value is zero then the second term can be neglected.

Products and Convolution

If the signal x is the product of two functions y and w, i.e. x(t) = y(t)w(t) then, according to the convolution theorem, the Fourier transform of x is the convolution of the Fourier transforms of y and w, i.e.

X ( ω ) = 1 2 π Y ( ω ) * W ( ω ) .
(14.42)

The convolution, indicated by the symbol *, is defined by the integral

X ( ω ) = 1 2 π d ω Y ( ω ) W ( ω - ω ) .
(14.43)

The convolution theorem works in reverse as well. If X is the product of Y and W, i.e.

X ( ω ) = Y ( ω ) W ( ω )
(14.44)

then the functions of time x, y, and w are related by a convolution,

x ( t ) = - d t y ( t ) w ( t - t )
(14.45)

or

x ( t ) = y ( t ) * w ( t ) .

The symmetry of the convolution equations for multiplication of functions of frequency and multiplication of functions of time is misleading. Multiplication of frequency functions, e.g. X(ω) = Y(ω)W(ω), normally corresponds to a linear operation on signals generally known as filtering. Multiplication of signal functions of time, e.g. y(t)w(t), is a nonlinear operation such as modulation.

Power, Energy, and Power Spectrum

The instantaneous power in a signal is defined as P(t) = x 2(t). This definition corresponds to the power that would be transferred by a signal to a unit load that is purely resistive, or dissipative. Such a load is not at all reactive, wherein energy is stored for some fraction of a cycle.

The energy in a signal is the accumulation of power over time,

E = d t P ( t ) = d t x 2 ( t ) .
(14.46)

At this point, a distinction must be made between finite-duration signals and infinite-duration signals. For a finite-duration signal, the above integral exists. By substituting the Fourier transform for x(t), one finds that

E = 1 2 π d ω X ( ω ) X ( - ω ) or d ω E ( ω ) .
(14.47)

Thus the energy in the signal is written as the accumulation of of the energy spectral density,

E ( ω ) = 1 2 π X ( ω ) X ( - ω ) = 1 2 π | X ( ω ) | 2 .
(14.48)

The symmetry between (14.46) and (14.47) is known as Parsevalʼs theorem. It says that one can compute the energy in a signal by either a time or a frequency integral.

The power spectral density is obtained by dividing the energy spectral density by the duration of the signal, T D,

P ( ω ) = E ( ω ) T D .
(14.49)

For white noise, the power density is constant on average, P(ω) = P 0. From (14.47) it is evident that a signal cannot be white over the entire range of frequencies out to infinite frequency without having infinite energy. One is therefore limited to noise that is white over a finite frequency band.

For pink noise the power density is inversely proportional to frequency, P(ω) = c/ω, where c is a constant. The energy integral in (14.47) for pink noise also diverges. Therefore, pink noise must be limited to a finite frequency band.

Turning now to infinite-duration signals, for an infinite-duration signal the energy is not well defined. It is likely that one would never even think about an infinite-duration signal if it were not for the useful concept of a periodic signal. Although the energy is undefined, the power P is well defined, and so is the power spectrum, or power spectral density P(ω). As expected, the power is the integral of the power spectral density,

P = d ω P ( ω ) ,
(14.50)

where P(ω) is given in terms of X from (14.27),

P ( ω ) = π 2 n = 0 | X n | 2 [ δ ( ω - ω n ) + δ ( ω + ω n ) ] .
(14.51)

It is not hard to convert densities to different units. For instance, the power spectral density can be written in terms of frequency f instead of ω (ω = 2πf). By the definition of a density we must have that

P = d f P ( f ) .
(14.52)

This definition is consistent with the fact that a delta function has dimensions that are the inverse of its argument dimensions. Therefore, δ(ω) = δ(2πf) = δ(f)/(2π), and

P ( f ) = 1 4 n = 0 | X n | 2 [ δ ( f - f n ) + δ ( f + f n ) ] .
(14.53)

Autocorrelation

The autocorrelation function a f of a signal x(t) provides a measure of the similarity between the signal at time t and the same signal at a different time, t + τ. The variable τ is called the lag, and the autocorrelation function is given by

a f ( τ ) = - d t x ( t ) x ( t + τ ) .
(14.54)

When τ is zero then the integral is just the square of x(t), and this leads to the largest possible value for the autocorrelation, namely E. For a signal of finite duration, the autocorrelation must always be strictly less than its value at τ = 0. Consequently, the normalized autocorrelation function a(τ)/a(0) is always less than 1.0 (τ ≠ 0).

By substituting (14.29) for x(t) one finds a frequency integral for the autocorrelation function,

a f ( τ ) = 1 2 π - d ω e i ω t | X ( ω ) | 2 ,
(14.55)

or, from (14.47),

a f ( τ ) = - d ω e i ω τ E ( ω ) .
(14.56)

Equation (14.56) says that the autocorrelation function is the Fourier transform of the energy spectral density. This relationship is known as the Wiener–Khintchine relation. Because E(− ω) = E(ω), one can write a f in a way that proves that it is a real function with no imaginary part,

a f ( τ ) = 2 0 d ω cos ( ω τ ) E ( ω ) .
(14.57)

Furthermore, because the cosine is an even function of its argument [a f(− τ) = a f(τ)], the autocorrelation function only needs to be given for positive values of the lag.

A signal does not have finite duration if it is periodic. Then the autocorrelation function is defined as

a ( τ ) = lim T D 1 2 T D - T D T D d t x ( t ) x ( t + τ ) .
(14.58)

If the period is T then a(τ) = a(τ + nT) for all integer n, and the maximum value occurs at a(0) or a(nT). Because of the factor of time in the denominator of (14.58), the function a(τ) is the Fourier transform of the power spectral density and not of the energy spectral density.

A critical point for both a f(τ) and a(τ) is that autocorrelation functions are independent of the phases of spectral components. This point seems counterintuitive because waveforms depend on phases and it seems only natural that the correlation of a waveform with itself at some later time should reflect this phase dependence. However, the fact that autocorrelation is the Fourier transform of the energy or power spectrum proves that the autocorrelation function must be independent of phases because the spectra are independent of phases.

For example, if x(t) is a periodic function with zero average value, it is defined by (14.6). Then it is not hard to show that the autocorrelation function is given by

a ( τ ) = 1 2 n = 1 N C n 2 cos ( ω n τ ) .
(14.59)

The autocorrelation function is only a sum of cosines with none of the phase information. Only the harmonic frequencies and amplitudes play a role.

Cross-Correlation

Parallel to the autocorrelation function, the cross-correlation function is a measure of the similarity of the signal x(t) to the signal y(t) at a different time, i.e. the similarity to y(t + τ). The cross-correlation function is

ρ 0 ( τ ) = d t x ( t ) y ( t + τ ) .
(14.60)

In practice, the cross-correlation is usually normalized,

ρ ( τ ) = d t x ( t ) y ( t + τ ) [ d t 1 x 2 ( t 1 ) d t 2 y 2 ( t 2 ) ] 1 / 2 ,
(14.61)

so that the maximum value of ρ(τ) is equal to 1.0. Unlike the autocorrelation function, the maximum of ρ(τ) does not necessarily occur at τ = 0. For example, if signal y(t) is the same as signal x(t) except that y(t) has been delayed by T del then ρ(τ) has its maximum value 1.0 when τ = T del.

Statistics

Measured signals are always finite in length. Definitions of statistical terms for measured signals, together with their continuum limits are given in this section.

The number of samples in a measurement is N. The duration of the measured signal is T D, and T D = Nδt, where δt is the inverse of the sample rate.

The sampled signal has values x i , (1 ≤ i ≤ N), and the continuum analog is the signal x(t), (0 ≤ t ≤ T D).

The average value, or mean, is

x ¯ = 1 N i = 1 N x i or 1 T D 0 T D d t x ( t ) .
(14.62)

The variance is

σ 2 = 1 N - 1 i = 1 N ( x i - x ¯ ) 2

or

1 T D 0 T D d t [ x ( t ) - x ¯ ] 2 .
(14.63)

The standard deviation is the square root of the variance, σ = σ 2 .

The energy is

E = δ t i = 1 N x i 2 or 0 T D d t x 2 ( t ) .
(14.64)

The average power is

P ¯ = 1 N i = 1 N x i 2 or E T D .
(14.65)

The root-mean-square (RMS) value is the square root of the average power, x RMS = P ¯ .

Signals and Processes

Signals are the observed results of processes. A process is stationary if its stochastic properties, such as mean and standard deviation, do not change during the time for which a signal is observed. Signals provide incomplete glimpses into processes.

The best estimate of the mean of the underlying process is equal to the mean of an observed signal. The expected error in the estimate of the mean of the underlying process, the so-called standard deviation of the mean, is

s = σ N ,
(14.66)

where N is the number of data points contributing to the mean of the observed signal.

Distributions

Digitized signals are often regarded as sampled data {x}. If the data are integers or are put into bins j then the probability that the signal has value x j is the probability mass function PMF(j) = N j /N, the ratio of the number of samples in bin j to the total number of samples. If data are continuous floating-point numbers, the analogous distribution is the probability density function PDF(x). In terms of these distributions, the mean is given by

x ¯ = x j PMF ( j ) or - d x x PDF ( x ) .
(14.67)

The most important PDF is the normal (Gaussian) density G(x),

G ( x ) = 1 σ 2 π exp [ ( x - x ¯ ) 2 2 σ 2 ] .
(14.68)

Like all PDFs, G(x) is normalized to unit area, i.e.

- d x G ( x ) = 1 .
(14.69)

The probability that x lies between some value x 1 and x 1 + dx is PDF(x 1) dx, and normalization reflects the simple fact that x must have some value.

The probability that variable x is less than some value x 1 is the cumulative distribution function (CDF),

CDF ( x 1 ) = - x 1 d x PDF ( x ) .
(14.70)

If the density is normal, the integral is the cumulative normal distribution (CND),

CND ( x ) = 1 σ 2 π - x d x exp ( x 2 2 σ 2 ) .
(14.71)

It is convenient to think of the CND as a function of x compared to the standard deviation, i.e., as a function of y = ( x - x ¯ ) / σ , as shown in Fig. 14.5.

C ( y ) = 1 2 π - y d y exp ( y 2 2 ) .
(14.72)
Fig. 14.5
figure 5

The area under the normal density is the cumulative normal. Here the area is the function C(y)

Because of the symmetry of the normal density,

C ( - y ) = 1 - C ( y ) .
(14.73)

Therefore, it is enough to know C(y) for y > 0. A few important values follow.

Table 14.1 can be used to find probabilities. For example, the probability that a normally distributed variable lies between its mean and its mean plus a standard deviation, i.e., between x ¯ and x ¯ + σ , is C(1) − 0.5 = 0.3413. The probability that it lies within plus or minus two standard deviations (± 2σ) of the mean is 2[C(2) − 0.5] =0.9546.

Table 14.1 Selected values of the cumulative normal distribution

The importance of the normal density lies in the central limit theorem, which says that the distribution for a sum of random variables approaches a normal distribution as the number of variables becomes large. In other words, if the variable x is a sum

x = x 1 + x 2 + x 3 + x N = i = 1 N x i ,
(14.74)

then no matter how the individual x i are distributed, x will be normally distributed in the limit of large N.

Multivariate Distributions

A multivariate distribution is described by a joint probability density PDF(x, y), where the probability that variable x has a value between x 1 and x 1 + dx and simultaneously variable y has a value between y 1 and y 1 + dy is

P ( x 1 , y 1 ) = PDF ( x 1 , y 1 ) d x d y .
(14.75)

The normalization requirement is

d x d y PDF ( x , y ) = 1 .
(14.76)

The marginal probability density for x, PDF(x), is the probability density for x itself, regardless of the value of y. Hence,

PDF ( x ) = d y PDF ( x , y ) .
(14.77)

The y dependence has been integrated out.

The conditional probability density PDF(x|y) describes the probability of a value x, given a specific value of y, for instance, if y = y 1, then

PDF ( x | y 1 ) = PDF ( x , y 1 ) / d x PDF ( x , y 1 ) ,
(14.78)

or

PDF ( x | y 1 ) = PDF ( x , y 1 ) / PDF ( y 1 ) .
(14.79)

The probability that x = x 1 and y = y 1 is equal to the probability that y = y 1 multiplied by the conditional probability that if y = y 1 then x = x 1, i.e.,

P ( x 1 , y 1 ) = P ( x 1 | y 1 ) P ( y 1 ) .
(14.80)

Similarly, the probability that x = x 1 and y = y 1 is equal to the probability that x = x 1 multiplied by the conditional probability that if x = x 1 then y = y 1, i.e.

P ( x 1 , y 1 ) = P ( y 1 | x 1 ) P ( x 1 ) .
(14.81)

The two expressions for P(x 1, y 1) must be the same, and that leads to Bayesʼ Theorem,

P ( x 1 | y 1 ) = P ( y 1 | x 1 ) P ( x 1 ) P ( y 1 ) .
(14.82)

Moments

The m-th moment of a signal is defined as

x m ¯ = 1 N i = 1 N x i m or 1 T D 0 T D d t x m ( t ) .
(14.83)

Hence the first moment is the mean (14.62) and the second moment is the average power (14.65).

The m-th central moment is

μ m = 1 N i = 1 N ( x i - x ¯ ) m or 1 T D 0 T D d t [ x ( t ) - x ¯ ] m .
(14.84)

The first central moment is zero by definition. The second central moment is the alternating-current (AC) power, which is equal to the average power (14.65) minus the time-independent (or DC) component of the power.

The third central moment is zero if the signal probability density function is symmetrical about the mean. Otherwise, the third moment is a simple way to measure how the PDF is skewed. The skewness is the normalized third moment,

skewness = μ 3 μ 2 3 / 2 .
(14.85)

The fourth central moment leads to an impression about how much strength there is in the wings of a probability density compared to the standard deviation. The normalized fourth moment is the kurtosis,

kurtosis = μ 4 μ 2 2 .
(14.86)

For instance, the kurtosis of a normal density, which has significant wings, is 3. But the kurtosis of a rectangular density, which is sharply cut off, is only 9/5.

Hilbert Transform and the Envelope

The Hilbert transform of a signal x(t) is ℋ[x(t)] or function x I(t), where

x I ( t ) = [ x ( t ) ] = 1 π - d t x ( t ) t - t .
(14.87)

Some facts about the Hilbert transform are stated here without proof. Proofs and further applications may be found in appendices to [14.1].

First, the Hilbert transform is its own inverse, except for a minus sign,

x ( t ) = - [ x I ( t ) ] = - 1 π - d t x I ( t ) t - t .
(14.88)

Second, a signal and its Hilbert transform are orthogonal in the sense that

d t x ( t ) x I ( t ) = 0 .
(14.89)

Third, the Hilbert transform of sin (ωt + φ) is −  cos (ωt + φ), and the Hilbert transform of cos (ωt + φ) is sin (ωt + φ).

Further the Hilbert transform is linear. Consequently, for any function for which a Fourier transform exists,

[ n A n cos ( ω n t ) + B n sin ( ω n t ) ] = n A n sin ( ω n t ) - B n cos ( ω n t )
(14.90)

or

[ n C n sin ( ω n t + φ n ) ] = - n C n cos ( ω n t + φ n ) = n C n sin ( ω n t + φ n - π 2 ) .
(14.91)

Comparing the two sine functions above makes it clear why a Hilbert transform is sometimes called a 90° rotation of the signal.

Figure 14.6 shows a Gaussian pulse x(t) and its Hilbert transform, x I(t). The Gaussian pulse was made by adding up 100 cosine harmonics with amplitudes given by a Gaussian spectrum per (14.35). The Hilbert transform was computed by using the same amplitude spectrum and replacing all the cosine functions by sine functions.

Fig. 14.6
figure 6

A Gaussian pulse x(t) and its Hilbert transform x I(t) are the real and imaginary parts of the analytic signal corresponding to the Gaussian pulse

Figure 14.6 illustrates the difficulty often encountered in computing the Hilbert transform using the time integrals that define the transform and its inverse. If we had to calculate x(t) by transforming x I(t) using (14.88) we would be troubled by the fact that x I(t) goes to zero so slowly. An accurate calculation of x(t) would require a longer time span than that shown in the figure.

The Analytic Signal

The analytic signal x ~ ( t ) for x(t) is given by the complex sum of the original signal and an imaginary part equal to the Hilbert transform of x(t),

x ~ ( t ) = x ( t ) + i x I ( t ) .
(14.92)

The analytic signal, in turn, can be used to calculate the envelope of signal x(t). The envelope e(t) is the absolute value – or magnitude – of the analytic signal

e ( t ) = | x ~ ( t ) | .
(14.93)

For instance, if x(t) = A cos (ωt + φ), then x I(t) = A sin (ωt + φ) and

x ~ ( t ) = A [ cos ( ω t + φ ) + i sin ( ω t + φ ) ] .
(14.94)

By Eulerʼs theorem

x ~ ( t ) = A exp [ i ( ω t + φ ) ] ,
(14.95)

and the absolute value is

e ( t ) = { A exp [ i ( ω t + φ ) ] A exp [ - i ( ω t + φ ) ] } 1 / 2 = A .
(14.96)

Filters

Filtering is an operation on a signal that is typically defined in frequency space. If x(t) is the input to a filter and y(t) is the output then the Fourier transforms of x and y are related by

Y ( ω ) = H ( ω ) X ( ω ) ,
(14.97)

where H(ω) is the transfer function of the filter. The transfer function has a magnitude and a phase

H ( ω ) = | H ( ω ) | exp [ i Φ ( ω ) ] .
(14.98)

The frequency-dependent magnitude is the amplitude response, and it characterizes the filter type – low pass, high pass, bandpass, band-reject, etc. The phase Φ(ω) is the phase shift for a spectral component with frequency ω. The amplitude and phase responses of a filter are explicitly separated by taking the natural logarithm of the transfer function

ln H ( ω ) = ln [ | H ( ω ) | ] + i Φ ( ω ) .
(14.99)

Because ln |H| =  ln (10) log |H|,

ln H ( ω ) = 0 . 1151 G ( ω ) + i Φ ( ω ) ,
(14.100)

where G is the filter gain in decibels, and Φ is the phase shift in radians.

One-Pole Low-Pass Filter

The one-pole low-pass filter serves as an example to illustrate filter concepts. This filter can be made from a single resistor (R) and a single capacitor (C) with a time constant τ = RC. The transfer function of this filter is

H ( ω ) = 1 1 + i ω τ = 1 - i ω τ 1 + ω 2 τ 2 .
(14.101)

The filter is called one-pole because there is a single value of ω for which the denominator of the transfer function is zero, namely ω = 1/(iτ) =− i/τ.

The magnitude (or amplitude) response is

| H ( ω ) | = 1 1 + ω 2 τ 2 .
(14.102)

The filter cut-off frequency is the half-power point (or 3-dB-down point), where the magnitude of the transfer function is 1 2 compared to its maximum value. For the one-pole low-pass filter, the half-power point occurs when ω = 1/τ.

Filters are often described by their asymptotic frequency response. For a low-pass filter, asymptotic behavior occurs at high frequency, where, for the one-pole filter |H(ω)| ∝ 1/ω. The 1/ω dependence is equivalent to a high-frequency slope of −6 dB/octave, i.e., for octave frequencies,

L 2 - L 1 = 20 log ( ω 1 2 ω 1 ) = 20 log 1 2 = - 6 .
(14.103)

A filter with an asymptotic dependence of 1/ω 2 has a slope of −12 dB/octave, etc.

The phase shift of the low-pass filter is the arctangent of the ratio of the imaginary and real parts of the transfer function,

Φ ( ω ) = tan - 1 ( Im [ H ( ω ) ] Re [ H ( ω ) ] ) ,
(14.104)

which, for the one-pole filter, is Φ(ω) =  tan −1(− ωτ). The phase shift is zero at zero frequency, and approaches 90° at high frequency. This phase behavior is typical of simple filters in that important phase shifts occur in frequency regions where the magnitude shows large attenuation.

Phase Delay and Group Delay

The phase shifts introduced by filters can be interpreted as delays, whereby the output is delayed in time compared to the input. In general, the delay is different for different frequencies, and therefore, a complex signal composed of several frequencies is bent out of shape by the filtering process. Systems in which the delay is different for different frequencies are said to be dispersive.

Two kinds of delay are of interest. The phase delay simply reinterprets the phase shift as a delay. The phase delay T φ is given by T φ  =− Φ(ω)/ω. The group delay T g is given by the derivative T g =− dΦ(ω)/dω. Phase and group delays for a one-pole low-pass filter are shown in Fig. 14.7 together with the phase shift.

Fig. 14.7
figure 7

The phase shift Φ for a one-pole low-pass filter can be read on the left ordinate. The phase and group delays can be read on the right ordinate

Resonant Filters

Resonant filters, or tuned systems, have an amplitude response that has a peak at some frequency where ω = ω 0. Such filters are characterized by the resonant frequency, ω 0, and by the bandwidth, 2Δω. The bandwidth is specified by half-power points such that |H(ω 0 +Δ ω)|2 ≈| H(ω 0)|2/2 and |H(ω 0 −Δ ω)|2 ≈| H(ω 0)|2/2. The sharpness of a tuned system is often quoted as a Q value, where Q is a dimensionless number given by

Q = ω 0 2 Δ ω .
(14.105)

As an example, a two-pole low-pass filter with a resonant peak near the angular frequency ω 0 is described by the transfer function

H ( ω ) = ω 0 2 ω 0 2 - ω 2 + i ω ω 0 / Q .
(14.106)

Impulse Response

Because filtering is described as a product of Fourier transforms, i.e., in frequency space, the temporal representation of filtering is a convolution

y ( t ) = d t h ( t - t ) x ( t ) = d t h ( t ) x ( t - t ) .
(14.107)

The two integrals on the right are equivalent.

Equation (14.107) is a special form of linear processor. A more general linear processor is described by the equation

y ( t ) = d t h ( t , t ) x ( t ) ,
(14.108)

where h(t, t′) permits a perfectly general dependence on t and t′. The special system in which only the difference in time values is important, i.e. h(t, t′) = h(t − t′), is a linear time-invariant system. Filters are time invariant.

A system that operates in real time obeys a further filter condition, namely causality. A system is causal if the output y(t) depends on the input x(t′) only for t′ < t. In words, this says that the present output cannot depend on the future input. Causality requires that h(t) = 0 for t < 0. For the one-pole, low-pass filter of (14.101) the impulse response is

h ( t ) = 1 τ e - t / τ for  t > 0 , h ( t ) = 0 for  t < 0 , h ( t ) = 1 2 τ for  t = 0 .
(14.109)

For the two-pole low-pass resonant filter of (14.106), the impulse response is

h ( t ) = ω 0 1 - [ 1 / ( 2 Q )] 2 e - ω 0 2 Q t × sin { ω 0 t 1 - [ 1 / ( 2 Q )] 2 } , t 0 , h ( t ) = 0 , t < 0 .
(14.110)

Dispersion Relations

The causality requirement on the impulse response, h(t) = 0 for t < 0, has implications for the transfer function. Causality means that the real and imaginary parts of the transfer function are Hilbert transforms of one another. Specifically, if the real and imaginary parts of H are defined as H(ω) = H R(ω) + iH I(ω) then

H R ( ω ) = 1 π - d ω H I ( ω ) ω - ω ,
(14.111)

and

H I ( ω ) = - 1 π - d ω H R ( ω ) ω - ω .

The symbol ℘ signifies that the principal value of a divergent integral should be taken. In many cases, this requires no special steps, and definite integrals from integral tables give the correct answers.

These equations are known as dispersion relations. They arise from doing an integral in frequency space to calculate the impulse response for t < 0. The fact that this calculation must return zero means that H(ω) must have no singularities in the complex frequency plane for frequencies with a negative imaginary part. Similar dispersion relations apply to the natural log of the transfer function, relating the filter gain to the phase shift as in (14.100)

G ( ω ) = G ( 0 ) - ω 2 0 . 1151 π - d ω Φ ( ω ) ω ( ω 2 - ω 2 )
(14.112)

and

Φ ( ω ) = 0 . 1151 ω π - d ω G ( ω ) ω 2 - ω 2 .

Because G(ω) is even and Φ(ω) is odd, both integrands are even in ω′, and these integrals can be replaced by twice the integral from zero to infinity. The second equation above is particularly powerful. It says that, if we want to find the phase shift of a system, we only have to measure the gain of the system in decibels, multiply by 0.1151, and do the integral. Of course, it is in the nature of the integral that in order to find the phase shift at any given frequency we need to know the gain over a wide frequency range.

The dispersion relations for gain and phase shift also arise from a contour integral over frequencies with a negative imaginary part, but now the conditions on H(ω) are more stringent. Not only must H(ω) have no poles for Im(ω) < 0, but ln H(ω) must also have no poles. Consequently H(ω) must have no zeros for Im(ω) < 0. A system that has neither poles nor zeros for Im(ω) < 0 is said to be minimum phase. The dispersion relations in (14.112) only apply to a system that is minimum phase.

Matched Filtering

One of the most important problems in acoustics especially in sonar, nondestructive evaluation and medical imaging [14.2] involves the transmission of a known pulse or signal into the medium under investigation followed by its reception. The noisy, distorted, measurement is processed to extract the pulse that has been reflected or scattered back from an object (target or scatterer). For instance, sonar, much like radar, uses this information to obtain the range of the object from the transmitter by simply monitoring the round-trip travel time. In both nondestructive evaluation and medical imaging, an ultrasonic pulse is transmitted into the material or tissue medium under investigation and the received signal is processed to remove the effects of the transmitted pulse (deconvolution) to produce an image of the medium for analysis [14.3].

The basic problem to be solved is that of maximizing the output signal-to-noise ratio (SNROUT) at the receiver. The underlying system model is given by

y ( t ) = h ( r , t ) * x ( t ) + n ( t ) ,
(14.113)

where x is the transmitted pulse, y is the received signal or measurement, h is the impulse response of the medium in both space r and time t, n is the contaminating zero-mean, uncorrelated (white) noise and * is the convolution operation.

The problem is:

Problem

GIVEN a known signal x(t) in additive uncorrelated (white) noise n(t) of (14.113), FIND the optimum filter response, h m (t) that maximizes the output signal-to-noise ratio.

Here the output SNR is defined by

max mf SNR OUT = ξ OUT E { n 2 ( t ) } = | h m ( t ) * x ( t ) | 2 σ n 2 = | 0 T d τ h m ( τ ) x ( t - τ ) | 2 σ n 2
(14.114)

for h m the optimum processor or matched-filter, ξ OUT, the output signal energy and σ n 2 the noise variance.

The solution to this problem is classical and reduces to applying the Schwartz inequality [14.4,5] to the numerator of the previous expression, that is,

| h m ( t ) * x ( t ) | 2 ξ mf × ξ x
(14.115)

for ξ the respective energies. When h m (t) is related to x(t) by a constant, say unity, then this relation is satisfied with equality at some time T such that its solution is

h m ( t ) = x ( T - t )
(14.116)

the time-reversed, shifted (by T) signal or replicant. The matched filtering operation applied to the received measurement is therefore

ρ xy ( T - t ) = E { h m ( t ) * y ( t ) } = E { x ( T - t ) * y ( t ) } ,
(14.117)

where ρ xy is the cross-correlation function of the known signal x(t) and the measurement y(t).

As an example, consider the problem of detecting and locating a known acoustic pulse of unity amplitude (inset) transmitted and received on a noisy sensor (0 dB SNR) as shown in Fig. 14.8. The matched-filter is the time-reversed, shifted replicant of the pulse which is used to process the raw data. After processing (cross-correlation) of the data and replicant the result distinctly shows the location of the pulse in the measurement data.

Fig. 14.8
figure 8

Matched-filtering of noisy acoustic pulse: Pulse (inset), measurement/noise (dashed) at 0 dB SNR matched-filter output (filled line)

Time-Reversal Processor

One of the more intriguing techniques in acoustic signal processing is based on the concept of time-reversal [14.2]. We have already noted in Sect. 14.7.6 that the optimum matched-filter is the time-reversed, shifted replicant of the transmitted pulse, h m (t) = x(T − t). In digital signal processing (e.g. MATLAB [14.6]), two-pass filter design to remove phase lag is a standard operation that actually time reverses the original filter (f(t)) output to achieve a zero-phase design, that is,

[ f ( t ) * f ( - t ) ] = [ F ( ω ) × F ( - ω ) ] = [ F ( ω ) × F * ( ω ) ] = | F ( ω ) | 2 ,
(14.118)

where * is the conjugation operation.

In acoustics, time-reversal (T/R) is an intricate part of nondestructive evaluation (flaw detection) and medical operations (lithotripsy) [14.7]. T/R can be applied in two basic scenarios. The first is a monostatic operation in which a transceiver (transmitter/receiver) device transmits the signal or wave (transceiver array) into the medium and receives the reflected or scattered signal. The second is a bistatic operation that occurs when a signal or wave is launched into the medium by a transmitter(s) and captured by a separate receiver(s) [14.8]. The fundamental theory can be found in [14.2] and [14.8] for each case, respectively.

The key to the time-reversal processor evolves from the matched-filter concept in which the known signal is simply replaced by the known impulse response or Greenʼs function of the medium in both space and time/frequency, g(r, r 0;t) or G(r, r 0;ω). For time-reversal, we have that the matched-filter solution is again found by maximizing the output SNR leading to the modified numerator of (14.115)

| h m ( t ) * g ( r , r 0 ; t ) | 2 ξ mf × ξ g ,
(14.119)

that is satisfied with equality at some time T, if

h m ( t ) = g ( r , r 0 ; T - t ) .
(14.120)

Thus, for T/R, the optimal matched-filter solution is the time-reversed, shifted Greenʼs function. Note that in order for the time-reversal property to hold, then spatial reciprocity from source-to-receiver (r 0 → r) must be valid for receiver-to-source (r → r 0) or, formally, g(r, r 0;T − t) ↔ g(r 0, r;T − t).

If an array is employed then these results include the ability to focus the beam on reversal back to the source yielding the optimal space/time matched-filter solution [14.5]. Optimality occurs in temporal (autocorrelation peak) and spatial (array focus) gains. For this case, T/R processing is essentially a technique to focus on a reflective object through a homogeneous or inhomogeneous medium that is excited by a broadband source. It converts a divergent wave generated from a source into a convergent wave focused on the source while compensating for all geometric distortions and reducing the associated noise [14.7].

We illustrate the basic T/R operation (bistatic) in Fig. 14.9 where we observe (1) a point source transmitting through the medium creating a divergent wavefront sampled spatially by the receiver array, much like dropping a pebble into a puddle, that is,

y ( r , t ) = g ( r , r ; τ ) * δ ( r - r 0 ; τ - t ) = g ( r , r 0 ; t ) .
(14.121)

The received data is (2) time-reversed

y ( r , - t ) = g ( r 0 , r ; - t )
(14.122)

and re-transmitted propagating back through the medium focusing on the source

x ^ ( r 0 , t ) = g ( r , r 0 ; t ) * y ( r , - t ) = g ( r , r 0 ; t ) * g ( r 0 , r ; - t ) .
(14.123)

Applying reciprocity, we have (3)

x ^ ( r 0 , t ) = [ g ( r , r 0 ; t ) * g ( r , r 0 ; - t ) ] = 𝒜 gg ( r , t ) ,
(14.124)

where 𝒜 gg (r , t) is the autocorrelation function at the -th sensor. Expanding these relations across an entire L-element receiver array gives

x ^ ( r 0 , t ) = = 1 L 𝒜 gg ( r , t ) = L × 𝒜 ¯ gg ( r , t ) .
(14.125)
Fig. 14.9
figure 9

Time-reversal processing: (a) Divergent wavefront point source transmission. (b) Reversal convergent wavefront focusing. (c) Time-reversal operation: convolve and sum

It can be observed by this example the key to T/R processing is to time-reverse the Greenʼs function (replicant) in order to mitigate the effects of the medium and perform matched-filtering of Sect. 14.7.6. In practice, obtaining the Greenʼs function in the bistatic case is obtained by first transmitting an impulse-like pulse from the source to the array and then performing the T/R operations above [14.7]. The monostatic case is more complex but achieves similar results [14.2].

As an example, consider a communications problem where a coded information sequence is to be recovered from a signal transmitted through the channel medium to a client [14.8]. The client station first transmits a pilot signal that is a narrow pulse (impulse-like) to the base station receiving array establishing a link or path (g(r , r 0;t)). Once the link is established, the base station transmits a coded message (information) sequence i(t) consisting of ones and zeros to the client along with the reversed Greenʼs function learned from the received pilot

y ( r , - t ) = g ( r 0 , r ; - t ) * i ( t )
(14.126)

as shown in Fig. 14.10a where we see the reversed transmission with the coded message highlighted. After propagation through the medium the data are received at the client station as

z client ( r 0 , t ) = g ( r , r 0 ; t ) * y ( r , - t ) = [ g ( r , r 0 ; t ) * g ( r 0 , r ; - t ) ] * i ( t )
(14.127)

Again applying reciprocity and recognizing the autocorrelation (with array gain), we have

z client ( r 0 , t ) = L × 𝒜 ¯ ( r ; t ) * i ( t )
(14.128)

as shown in Fig. 14.10b where we see the original reversed transmission and the desired code along with the recovered symbols. Clearly placing a threshold at 0 and applying (> 0 → 1) would recover the information perfectly. Thus, T/R establishes a unique link (path) between base and client stations using the learned Greenʼs function enabling an increase in SNR, both temporal and spatial as well as establishing a secure channel.

Fig. 14.10
figure 10

Time-reversal communications: (a) Reversed, transmitted information sequence. (b) Zoomed T/R processed data extracting the transmitted code

Spectral Estimation

Unfortunately measured signals rarely contain readily available desired information; they are contaminated with random noise (Sect. 14.9). Deterministic processing or analysis techniques like Fourier transforms can be calculated even though theoretically the transform does not exist for this case. However, the Wiener–Khintchine relation (Sect. 14.5) relates the autocorrelation (deterministic) to the power spectrum as a Fourier transform pair. Thus, the basis of random signal processing resides in applying this relation in some form with the expectation operation incorporated to mitigate the randomness, that is, for the sampled random signal x k we have

P xx ( z ) = 𝒵 [ ρ xx ( ) ] = E { X ( z ) X * ( z ) } and ρ xx ( ) = 1 2 π i P xx ( z ) z - d z .
(14.129)

where z indicates the z-transform (Sect. 14.12). Therefore, the autocorrelation and power spectrum enable a way to statistically characterize a random signal.

Classical Spectral Methods

There are a variety of methods to estimate the power spectrum from noisy measurements. Classical methods evolved with the discovery of the fast Fourier transform (FFT) techniques that led to the development of the correlation method of spectral estimation which is an application of the Wiener–Khintchine relation and is given by

P ^ xx ( z ) = 𝒵 [ W ( k ) × ρ ^ xx ( ) ] ,
(14.130)

where ρ ^ xx is an estimate of the autocorrelation function using lag sums or the FFT

ρ ^ xx ( ) = 1 N x k = 0 N x x ( k ) x ( k + ) , = 0 , 1 , 2 ,
(14.131)

for lag , data length N x and spectral window function W that must satisfy certain positivity constraints (e.g. maximum at origin) [14.9,10].

Another well-known classical approach, especially useful when long data records are available, is the averaged periodogram method [14.9]. Here windowed data are sectioned, transformed (FFT) to create the periodograms that are then averaged ( P ¯ xx ( z ) ) to produce the estimate,

P ^ xx ( z ) = P ¯ xx ( z ) = 1 N x n = 1 N x | X ( z ; n ) | 2 .
(14.132)

Both methods suffer from the tradeoff of bias/variance. For the correlation method the spectral window choice specifies this tradeoff, while the periodogram approach enables choice of section window length and the number of averages performed.

Parametric Spectral Methods

More of the current spectral estimation techniques incorporate parametric models of the signal into the processor. We start with the discrete transfer function model of Sect. 14.12, but with a noisy input for the random case. The signal measurement Y(z) is given by

H ( z ) = Y ( z ) E ( z ) = B ( z ) A ( z ) ,
(14.133)

where E(z) is white noise with variance σ 2, B ( z ) = b 0 + b 1 z - 1 + + b N b z - N b and A ( z ) = 1 + a 1 z - 1 + + a N a z - N a are numerator and denominator polynomials and can be factored as in Sect. 14.12 into the so-called zeros and poles of H(z). Thus, with this representation various models can be characterized as special cases, that is,

H pole-zero ( z ) = σ B ( z ) A ( z ) , H all-pole ( z ) = σ A ( z ) , H all-zero ( z ) = σ B ( z ) .
(14.134)

Taking inverse 𝒵-transforms of these relations and applying the delay property z q Y(z) → y kq gives the equivalent difference equation models as

y k + a 1 y k - 1 + + a N a y k - N a = σ ( b 0 e k + b 1 e k - 1 + + b N b e k - N b ) ( pole-zero ) , y k + a 1 y k - 1 + + a N a y k - N a = σ b 0 e k ( all-pole ) , y k = σ ( b 0 e k + b 1 e k - 1 + + b N b e k - N b ) ( all-zero ) ,
(14.135)

where all-pole is called an autoregressive (AR) model, all-zero is called a moving average (MA) model and pole-zero model is called an autoregressive-moving-average (ARMA) model. These models prove quite useful in parametric signal processing applications as well spectral estimators because the power spectrum can be represented as

P pole - zero ( z ) = H pole - zero ( z ) × H pole - zero ( z - 1 ) = | H pole - zero ( z ) | 2 = σ 2 | B ( z ) | 2 | A ( z ) | 2 , P all - pole ( z ) = H all - pole ( z ) × H all - pole ( z - 1 ) = | H all - pole ( z ) | 2 = σ 2 | A ( z ) | 2 , P all - zero ( z ) = H all - zero ( z ) × H all - zero ( z - 1 ) = | H all - zero ( z ) | 2 = σ 2 | B ( z ) | 2 .
(14.136)

Parametric spectral estimation techniques require two steps: (1) estimate the parameters of the parametric model { a ^ n } , { b ^ n } ; and (2) calculate the corresponding power spectrum as shown above varying z or using the FFT.

A wide variety of methods exist to estimate the power spectrum parametrically [14.11]. Perhaps one of the most popular techniques evolved from the processing of seismic and speech signals [14.9]. It is called linear prediction or equivalently the maximum entropy method (MEM) in which the algorithm is applied to estimate an all-pole model from noisy measurements and then used to estimate the spectrum as in (14.136). The linear predictor solves the following set of equations recursively for the unknown parameters using the well-known Levinson recursion [14.11]

a ^ = R xx - 1 × r xx ,
(14.137)

where R xx is an N a  × N a Toeplitz (covariance) matrix and r xx is an N a  × 1 covariance vector. Once the parameters are estimated using the recursion, the power spectrum is estimated as discussed before.

Another technique that has evolved is the minimum variance distortionless response (MVDR) processor that is considered a data adaptive method because it designs a set of optimal narrowband filters at each spectral frequency bin while minimizing the measurement noise variance. It is similar to the classical correlation/periodogram techniques with the exception that the narrowband filters adapt to the process. Theoretically, the MVDR processor attempts to minimize the output (noise) power at each spectral bin subject to the constraint that the narrowband filters pass center frequencies (ω m ) at each bin with unity gain, |F(ω m )| = 1. Mathematically, the formal problem is to

min f f T R yy f | F ( ω m ) | = 1 ,
(14.138)

where f is an N y  × 1 vector of filter weights, R yy is the N y  × N y covariance matrix (Toeplitz). The constraint in the Fourier domain is

| F ( ω m ) | = 1 = d T ( ω m ) × f
(14.139)

for d T ( ω m ) = [ 1 e - i ω n e - iN y ω n ] .

The solution to this problem is obtained by applying Lagrange multipliers to obtain ([14.11] for more details)

P MVDR ( ω m ) = 1 d ( ω m ) R yy - 1 d ( ω m ) m = 1 , 2 , ,
(14.140)

where stands for the Hermitian (conjugate) transposed operator.

In practice, the MVDR method exhibits more resolution than the classical correlation/periodogram estimators, but less than the all-pole (AR) technique. It is interesting to note that there exists a relationship between the MEM and MVDR spectral estimators called Burgʼs formula [14.11] given by

1 P MVDR ( ω ) = 1 N a n = 1 N a 1 P MEM ( ω ; n ) ,
(14.141)

which provides and alternate method of calculating P MVDR by averaging all lower order AR models.

Subspace Spectral Methods

Another class of spectral estimators that has evolved is the eigenvector or subspace methods. It follows from the covariance matrix R yy and its eigen-decomposition. An eigenvalue λ of a matrix R is defined as a root of det(R − λ I) with corresponding eigenvector e satisfying the relation (R − λ I) × e = 0. The idea is based on estimating sinusoidal signals in uncorrelated noise [14.11] and was extended to the case of multiple sinusoids. The well-known multiple signal classification (MUSIC) method of spectral estimation is a special case of the eigenvector methods with the eigenvalues set to unity [14.12].

A suite of algorithms evolved with the basic idea of first finding the rank of R yy or equivalently the dimension (N e) of the signal subspace by eigen-decomposition. If e n is the n-th eigenvector of R yy , then the corresponding frequency (spectral) line estimator is given by

P EIG ( ω m ) = 1 n = N e + 1 N y | 1 λ n d ( ω m ) e n | 2 m = 1 , 2 ,
(14.142)

N e is the number of independent signal (eigenvectors) vectors or equivalently the rank of R yy .

It should also be noted that the temporal spectral and frequency line estimators can be extended to the spatial domain, that is, spatial spectral (power) estimation. These techniques lead to the popular direction of arrival (DOA) or localization methods mimicking those of the time domain with the a wave space-time signal replacing the time series x(r, t) → x(t), the spatial wavenumber replacing the temporal frequency (κ m  → ω m ) and therefore, the spatial power spectrum replacing the spectrum P(κ m ) → P(ω m ) [14.12], [14.13].

Consider an application of these methods to synthesized data consisting of sinusoids (35 Hz,50  Hz,56  Hz,90  Hz) in broadband (30–70 Hz) noise at 0 dB SNR in which we:

  • Constructed an ensemble of 100 realizations of the process;

  • Selected a spectral estimation method: FFT, correlation, periodogram, MVDR, MEM, MUSIC;

  • Performed the spectral estimation for each ensemble member (green in Fig. 14.11);

  • Estimated the average power spectrum over the ensemble (red in Fig. 14.11); and

  • Estimated the average spectral peaks (blue list in Fig. 14.11).

The results of the application are shown in Fig. 14.11. Noting the performance of each of the estimators, we see some of their interesting characteristics. In Fig. 14.11a the FFT method produced a set of very sharp spectral peaks in the correct frequency locations but unfortunately also included many extraneous peaks due to the randomness of the bandpass noise. The correlation method (Fig. 14.11b) performed the similarly but used the windowed covariance function to reduce the noise, while the periodogram (Fig. 14.11c) method smoothed the signal eliminating some of the desired peaks. The MVDR method (Fig. 14.11d) produced an enhanced spectral estimate with noise reduction capability as well. Both MEM and MUSIC frequency-line estimators (Fig. 14.11e,f) performed quite well and extracted the sinusoidal spectral line reliably with smaller variances. The completes the section on spectral estimation using classical, parametric and eigen-decomposition techniques.

Fig. 14.11
figure 11

Power spectral estimation of sinusoids (35 Hz, 50 Hz, 56 Hz, 90 Hz) in bandpass (30–70 Hz) noise at 0 dB SNR: (a) FFT method, (b) correlation method, (c) average periodogram method, (d) MVDR method, (e) MEM method, (f) MUSIC method

Thus, classical spectral estimators fall into the all-zero (MA) category, while the parametric estimators fall into the pole-zero (ARMA) representations while the eigen-decomposition processors fall into the subspace category.

Spectrogram

An extension of spectral estimation techniques to time-varying spectra enables the development of the spectrogram which is a member of the time-frequency class of spectra (frequency versus time versus power). Evolving from speech and sonar (bearing-time) system developments, the spectrogram is a powerful tool [14.3,9]. It can be calculated using both classical and parametric methods. The classical spectrogram estimator uses the short-time (or windowed) Fourier transform (STFT) in which a small (in length) data window is slid through the data record using the FFT spectral approach, while the parametric approach relies on the sequential estimation of model (AR, MA, ARMA) parameters providing an instantaneous spectral estimate at each step (if desired) [14.3].

The spectrogram or more properly instantaneous power spectrum is defined by P(z;k), where z is the transform (frequency) and k is the time sample (index). The classical FFT spectrogram estimator is given by

P ( z ; k ) = | [ W k × y k ] | 2
(14.143)

for W the short-time window function, while the parametric approach for the pole-zero (ARMA) spectral estimator is

P ^ ARMA ( z ; k ) = | σ B ^ ( z ; k ) A ^ ( z ; k ) | 2 ,
(14.144)

the polynomials B(z;k) and A(z;k) are identical to those of (14.133) but with time-varying coefficients { b n b n ( k ) ; a n a n ( k ) } .

The parameter estimation technique is sequential in that each time sample is processed individually and the ARMA parameters updated recursively (in time), that is,

Θ ^ k = Θ ^ k - 1 + G k ( y k - y ^ k | k - 1 ) ,
(14.145)

where Θ is the joint parameter (a and b coefficients) vector, G is the weight vector and y ^ is the one-step prediction of y based on k − 1 samples [14.3]. Once these parameters are estimated, the spectrogram is estimated instantaneously at k (if desired) as P ^ ARMA ( z ; k ) .

Reconsider the bandpass sinusoids example of the previous sections and process it with the ARMA spectrogram estimator. The results are shown in Fig. 14.12. Here we see that the spectra at the various sinusoidal frequencies appear as horizontal lines of constant frequency (red) across the spectrogram after the initial transients settle down. If at any time there were any changes in the spectrum, then the lines would indicate these changes. This approach is especially useful in condition monitoring for vibrations in machinery [14.14]. This completes the section, next we extend many of these ideas to incorporate more direct knowledge of the underlying acoustics into the processors.

Fig. 14.12
figure 12

Spectrogram of sinusoids in bandpass noise at 0 dB SNR using an ARMA parametric processor

Model-Based Signal Processing

Inherently, it seems like the more a-priori knowledge about the measurement and its underlying acoustics that can be incorporated into the processor, the better we can expect it to perform – as long as the information that is included is correct. One strategy called the model-based approach provides the essence of model-based signal processing. Many believe that all of the signal processing schemes can be cast into this generic framework. Simply, the model-based approach is

incorporating mathematical models of both physical phenomenology and the measurement process (including noise/uncertainty) into the processor to extract the desired information.

This approach provides a mechanism to include knowledge of the underlying acoustics in the form of mathematical propagation models, measurement system models and accompanying uncertainties such as instrumentation noise or ambient noise as well as model uncertainties directly into the resulting processor [14.3]. In this way the model-based processor (MBP) enables the interpretation of results directly in terms of the problem acoustics. The model-based processor is really an acoustic modelerʼs tool enabling the incorporation of any a-priori information about the particular application problem to extract the desired information. The fidelity of the model determines the complexity of the processor. These models can range from simple implied non-physical representations of the measurement data such as the Fourier or wavelet transforms to parametric models used for data prediction, to lumped mathematical physical representations characterized by ordinary differential equations to full physical partial differential equation models capturing the critical details of the acoustic wave propagation in a complex medium. The dominating factor of which model is the most appropriate is usually determined by how severe the measurements are contaminated with noise and the underlying uncertainties encompassing the philosophy of letting the problem dictate the approach. If the signal-to-noise ratio of the measurements is high, then simple non-physical techniques can be used to extract the desired information. Selecting the appropriate model to increase the SNR usually requires that the complexity of the model set increases to achieve the desired results.

A simple spectral estimation example of estimating sinusoids in noise as shown in Fig. 14.13 can be used to illustrate this approach. Suppose we have a noisy acoustical measurement (Fig. 14.13a) of an oscillation in random noise (SNR =0  dB) and we would like to extract the desired information (oscillation frequency). Our first simple approach to analyze the measurement data would be to take its Fourier transform and investigate the various frequency bands for resonant peaks. The result is shown in (Fig. 14.13b), where we basically observe a noisy spectrum and a set of potential resonances – but nothing really conclusive. Next we apply a broadband power spectral estimator with the resulting spectrum shown in Fig. 14.13c. Here we note that the resonances have clearly been enhanced and appear in well-defined bands while the noise is attenuated by the processor, but their still remains a significant amount of uncertainty in the spectrum due to all of the resulting spectral peaks. Upon seeing these resonances in the power spectrum, we might proceed next to a model to enhance the resonances even further by using our a-priori knowledge that there is essentially one dominant resonance we seek. The results of applying this processor are shown in Fig. 14.13d.

Fig. 14.13
figure 13

Simple sinusoid in noise oscillation example. (a) Noisy oscillation (10.54 Hz) in noise (0 dB). (b) Fourier spectrum. (c) Correlation method spectrum. (d) MEM method spectrum (AR model). (e) Model-based method spectrum (ODE)

Finally, we use this extracted model to develop an explicit model-based processor (MBP) by developing a set of harmonic equations for a sinusoid in noise and construct the MBP based on these relations.

y k = α sin ( ω 0 k ) + n k , ( Measurement/Noise Model ) y ^ k = α ^ sin ( ω ^ 0 k ) , ( MBP ) P ^ y ^ y ^ ( z ) = [ ρ y ^ y ^ ( ) ] , ( MBP-PSD )
(14.146)

where the MBP uses the knowledge of the sinusoidal model and noise statistics to estimate the parameters ( α ^ , ω ^ 0 ) and produce the enhanced spectral estimate. The results for the MBP are shown in Fig. 14.13e. So we see that once we have defined the acoustical problem, assessed the a-priori information including the underlying phenomenology, then we can proceed to incorporate more sophisticated acoustic models into the paradigm.

The Wiener Filter

Model-based signal processing requires three primary ingredients to develop a processor:

  1. 1.

    The model

  2. 2.

    The criterion

  3. 3.

    Algorithm.

Typically, once 1. and 2. are specified then 3. can follow.

One of the most robust input/output models available in acoustic signal processing is the finite impulse response (FIR) or equivalently moving-average (MA) model (h k ) expressed by the convolution operation of (14.45). One of the most popular applications of the FIR model is to identify an unknown system or black box by exciting it with a known input and measuring its output in order to provide data to estimate its impulse response. This is the original Wiener filter problem and solutions have been obtained in both time and frequency domains.

Suppose an unknown system is exited by an input x k and with measured output y k . We would like to obtain an estimate (linear) of the embedded signal s k based on past excitation data, that is,

s k = n = 0 N h h n x k - n .
(14.147)

If we represent the measurement y k as

y k = s k + n k ,
(14.148)

where n is zero-mean, white noise of variance R nn , then the optimal estimator is the solution to

min h 𝒥 k = E { e k 2 } ,
(14.149)

where e k : = y k  − s k .

Following the standard minimization approach, we differentiate (14.149) with respect to h j , set the result to zero and obtain the orthogonality condition

𝒥 h j = 2 E { e k e k h j } = 0 .
(14.150)

The error gradient is

e k h j = h j ( y k - s k ) = h j ( y k - n = 0 N h h n x k - n ) = x k - j ,
(14.151)

and therefore substituting (14.151) into (14.150), we obtain the orthogonality conditions

E { e k x k - j } = E { ( y k - n = 0 N h h n x k - n ) x k - j } = 0 , j = 0 , , N h ,
(14.152)

which yields the normal equations or the discrete Wiener–Hopf equations for the estimation problem as

n = 0 N h h n R xx ( n - j ) = r yx ( j ) , j = 0 , , N h .
(14.153)

Expanding these equations over the indices enables us to obtain

R xx h = r yx ,
(14.154)

where R xx ( N h + 1 ) × ( N h + 1 ) is a Toeplitz matrix. The optimal estimate, h ^ is obtained by inverting R xx to give the Wiener filter solution,

h ^ = R xx - 1 r yx .
(14.155)

We summarize the batch FIR solution as:

  • Criterion: J = E{e 2 k };

  • Models:

    • Measurement: y k  = s k  + n k ,

    • Signal: s k = n = 1 N h h n x k - n ,

    • Noise: R nn ,

  • Algorithm: h ^ = R xx - 1 r yx .

From this solution a variety of other problems can be solved such as the optimal deconvolution problem, in which the output and impulse response sequences {y k } and {h k } are given and the input sequence { x ^ k } is to be found. That is, the Wiener solution to this problem is

x ^ = R hh - 1 r yh ,
(14.156)

because convolution is an commutative operation (y = h*x = x*h).

A related application is that of time delay estimation where we assume a homogeneous medium (attenuation and delay)

h k = m = 1 N d α m δ ( k - τ m ) ,
(14.157)

where α ≤ 1 represents the weights (attenuation), τ the delays and h k is essentially an unequally-spaced sequence of impulses of decreasing amplitudes, that is, a delayed-attenuated impulse train. The objective in this delay estimation problem is to extract (estimate) the impulse train and locate the peaks corresponding to the delay times.

We apply this model-based approach to synthesized 5 MHz ultrasonic data sampled at 20 ns. The objective is NDE of material for flaw damage (pits) caused by ablation. The synthesized data are shown in Fig. 14.14a along with the resulting estimated response or fit using the optimal estimator of (14.156), y ^ k = h ^ k * x k . Here the order of the FIR processor was selected to be N h  =2500 weights for the 2750 simulated sample data set. Clearly, the estimated response overlays the data quite well providing a sanity check on the processor. In Fig. 14.14b, the estimated impulse response is shown along with its corresponding envelope. Here we note in the physical application that the ideal delayed/attenuated impulse train corresponding to the reflections from the flaws is distorted by the more realistic, highly scattering medium. However, the peaks corresponding to the delays indicating the flaw locations are clearly discernible. A practical approach is to use the squared impulse response, select a reasonable threshold for peak detection and locate the peaks. The results of this operation along with the corresponding envelope are shown in Fig. 14.14c. The estimated delays correspond reasonably well to the known flaw locations estimated from the delay and known sound-speed of the material.

Fig. 14.14
figure 14

Optimal time delay estimation results for NDE application: (a) Overlay of synthesized and estimated NDE data. (b) Estimated impulse response (time delays) and envelope. (c) Squared, thresholded, peak detected impulse response for time-delay estimates

We summarize the all-zero time-delay estimation approach as follows:

  • Criterion: J = E{ϵ 2 k };

  • Models:

    • Measurement: y k  = s k  + n k ,

    • Signal: s k  = h(r;k)*p k ,

    • Noise: R nn ,

    • Algorithm: h ^ = R pp - 1 r yp ,

    • Peak Detect: τ ^ k = max k h ^ k k .

The Kalman Filter

The Kalman filter is the optimal, linear, recursive, state (signal) estimator in Gaussian noise [14.3,15]. It incorporates all of the information available. The filter processes the measurement data to provide an estimate of the parameters (states, signals, variables, etc.) by incorporating:

  1. 1.

    Process (acoustics) and measurement system knowledge (dynamics)

  2. 2.

    Statistical knowledge of process and measurement noise as well as model uncertainties

  3. 3.

    Any information about the initial conditions or required parameters.

It combines all of this information along with the data to provide the best estimate possible.

The Kalman filter can be thought of as a signal reconstructor, a measurement filter and a whitening filter all wrapped up into one processor. The mathematics become quite involved and various derivations evolve from the purely Bayesian perspective, that is, the Kalman filter can be thought of as an optimal processor recursively calculating the required conditional means and covariances of the posterior Gaussian distribution of the signals (states). It can also be thought of as an orthogonal projector via the Gram–Schmidt procedure over a random vector space or equivalently the innovations approach to orthogonal decomposition. It can also be thought of as the minimum mean-squared error processor evolving from a weighted least-squares perspective. All of these approaches lead to the same solution, the Kalman filter, only because there is one unique optimum for this estimation problem.

Perhaps the simplest way to approach the development of this model-based processor is by observing its recursive form. All recursive (or sequential) processors have the form

S ^ NEW = S ^ OLD + Ke ,
(14.158)

where the NEW estimate is simply the OLD estimate ( S ^ ) plus a K-weighted error (e) term. This is precisely the recursive form of the Kalman filter

s ^ k | k NEW = s ^ k | k - 1 OLD + K k GAIN ( y k - y ^ k | k - 1 ) ERROR ,
(14.159)

where s ^ k | k is the signal or state estimate at time k given all of the data up to time k, K k is the weight or gain and e k is the innovation or residual error given by the difference between the measurement data y k and its corresponding estimate y ^ k | k - 1 . In a nutshell, (14.159) is the primary Kalman filter recursion which follows the simple recursive form of (14.158). The rest of the calculations are based on the model and conditional covariances that are also calculated recursively.

The underlying representation for this processor is the Gauss–Markov model in state-space form [14.3] given by

s k = As k - 1 + Bu k - 1 + w k - 1 , y k = Cs k + v k ,
(14.160)

for s, u, y the N s -signal (state), the N u -input (deterministic) and N y -measurement vectors, w, v the zero-mean, random Gaussian noise vectors with covariance matrices, R ww and R vv , and the appropriately dimensioned system and measurement matrices, A, B, C. Note that this model is a linear transformation of Gaussian processes causing all of the distributions to be Gaussian and has the Markov property that the signal at k only depends on it past value at k − 1.

The simplest example of a Kalman filter is the recursive form for the sample mean estimator. If the sample mean is defined by

S ^ ( K ) = 1 K n = 1 K s n ,

then removing the K-th term from the sum, and recognizing the expression for S ^ ( K - 1 ) leads to

S ^ ( K ) = 1 K s K + K - 1 K S ^ ( K - 1 )

or rearranging, we obtain the desired recursive form

S ^ ( K ) new = S ^ ( K - 1 ) old + 1 K gain [ s K - S ^ ( K - 1 ) ] error .

To obtain some insight into the filter operation, we can think of it in terms of a predictor-corrector paradigm in which the underlying phenomenological/measurement/noise models are used to predict the signal in-between the arrival of a new measurement. Once the new measurement data is available, the processor corrects the predicted signal estimate (OLD) as in (14.159). The prediction-step embedding the dynamic model is given by

s ^ k | k - 1 = A s ^ k - 1 | k - 1 + Bu k - 1 ( Signal prediction ) ,
(14.161)

which is accompanied by its corresponding prediction error covariance, P ~ k | k - 1 : = cov ( s k - s ^ k | k - 1 ) for

P ~ k | k - 1 = A P ~ k | k - 1 A T + R ww ( Covariance prediction ) ,
(14.162)

incorporating A the process (system) matrix, P ~ the previous error covariance and R ww , the process noise or uncertainty covariance matrix. So we see that the dynamical model (A,B) and statistics (R ww , P ~ ) are incorporated into this step quite naturally.

The correction-step (or update) incorporates the measurement data into the processor through the recursive form of (14.159) discussed earlier, that is,

s ^ k | k corrected = s ^ k | k - 1 predicted + K k e k DATA ( Signal correction ) ,
(14.163)

along with the corresponding innovation (e k ) and corrected error covariance, P ~ k | k : = cov ( s k - s ^ k | k ) given by

e k = y k - y ^ k | k - 1 = y k - C s ^ k | k - 1 ( Error innovation ) ,
(14.164)

and

P ~ k | k = P ~ k | k - 1 - K k C T P ~ k | k - 1 ( Covariance correction ) ,
(14.165)

incorporating the gain, K k and measurement system model, C along with the predicted error covariance of the previous step. The corrected error covariance enables us to observe the quality (error) of the signal estimate. The innovation (measurement error) plays a key role in monitoring filter performance (on-line), since the optimal solution (in theory) requires that {e k } be a zero-mean and uncorrelated (white) sequence, if the processor is operating properly. Performing whiteness tests is a great advantage compared to other statistical processors. The accompanying innovations covariance is also computed as part of the scheme

R ee = C P ~ k | k - 1 C T + R vv ( Innovations covariance ) ,
(14.166)

which incorporates both measurement system model (C) and measurement statistics (R vv ).

Finally, to complete the processor, the weight or gain (Kalman gain) is given by

K k = P ~ k | k - 1 C T R ee - 1 , ( Gain ) .
(14.167)

We summarize the Kalman filter or state-space MBP steps in Fig. 14.15.

Fig. 14.15
figure 15

Kalman Filter MBP: Predictor-Corrector Operations

Next let us consider a structural vibration signal enhancement problem. The structure (one story) is governed by the dynamic equation [14.14]

m x ·· ( t ) + c x · ( t ) + kx ( t ) = f ( t ) ,
(14.168)

where x is the displacement, f is the forcing function and m, c, k are the respective mass, damping and spring constants with the displacement measurement of gain G given by

y ( t ) = Gx ( t ) .
(14.169)

The objective is to develop the Kalman filter for this problem. First, we define the state vector as s(t): = [x(t)ẋ(t)]T and input u(t) = f(t). Solving for the highest derivative in (14.168), we obtain the state-space relations

s · ( t ) = A c s ( t ) + B c u ( t ) y ( t ) = C c s ( t )
(14.170)

with continuous-time (subscript c) representations

A c = [ 0 | 1 - - - - k m | - c m ] , B c = [ 0 1 m ] , C c = [ G 0 ] .
(14.171)

The data are digitized, so it is necessary to use a sampled-data representation that we develop using first differences

s · ( t ) s k - s k - 1 Δ t k

by substituting this approximation into (14.170) and extending it to a Gauss–Markov representation to obtain

s k = ( I + Δ t k A c ) s k - 1 + Δ t k B c u k - 1 + Δ t k w k - 1 , y k = C c s k + v k ,
(14.172)

or defining discrete notation, we have

A : = I + Δ t k A c , B : = Δ t k B c , C : = C c , R ww : = Δ t k R ww , R vv : = R vv ,
(14.173)

and therefore, the sampled-data system of (14.172) can be expressed simply as

s k = As k - 1 + B u k - 1 + w k - 1 ,
y k = Cs k + v k .

Now the Kalman filter equations follow directly

s ^ k | k - 1 = A s ^ k - 1 | k - 1 + Bu k - 1 ( Prediction ) ,
e k = y k - y ^ k | k - 1 = y k - C s ^ k | k - 1 ( Innovation ) ,
s ^ k | k = s k | k - 1 + K k e k ( Correction ) ,

and the other required matrices follow. This completes the formulation.

Concluding this section, we consider the problem of the passive localization of a planar (nonlinear in parameters) source or target. This problem occurs in a variety of applications such as the seismic localization of an earthquake using an array of seismometers, the passive localization of a target in ocean acoustics or the localization of a flaw in NDE. For our problem in ocean acoustics the model-based approach incorporates the underlying physics represented by an acoustic propagation (process) model depicting how the sound propagates from a source to the sensor (measurement) array of acoustic hydrophones. The statistics of the noise from the background or ambient noise, shipping noise, or uncertainty in the model parameters provides input to both the process and measurement system models. Besides the model parameters and initial conditions, the raw measurement data is input to the processor with the output being the filtered signal and unknown parameters.

Assume that a 50 Hz (ω 0) nonlinear plane wave source (target) at a bearing angle of 45° (θ 0) is impinging on a 2-element array at a 10 dB SNR. The plane wave signal is characterized mathematically by

s ( t ) = α e i κ 0 ( - 1 ) Δ sin θ 0 - i ω 0 t ,
(14.174)

where s (t) is the space-time signal measured by the l-th sensor, α is the plane wave amplitude factor with κ 0,Δ , θ 0, ω 0 the respective wavenumber, sensor spacing, bearing angle, and temporal frequency parameters. We would we like to solve the basic ocean acoustic processing localization problem estimating the target bearing angle θ 0 and temporal frequency, ω 0 parameters. The basic problem geometry and synthesized measurements (pressure-field) are shown in Fig. 14.16.

Fig. 14.16
figure 16

Plane wave propagation: (a) Problem geometry. (b) Synthesized 50 Hz, 45°, plane wave impinging on a 2-element sensor array at 10 dB SNR

For the plane wave, we have the following models:

  • Signal model:

    s ( t ) = α e i κ 0 ( - 1 ) Δ sin θ 0 - i ω 0 t ,
  • Measurement model

    p ( t ) = s ( t ) + n ( t ) ,
  • Noise model

    n ~ 𝒩 ( 0 , σ n 2 ) ,

where n (t) is zero mean, random (uncorrelated) Gaussian noise with variance, σ n 2. We use the notation “𝒩(m, v)” to define a Gaussian or normal probability distribution with mean m and variance v.

In essence, this is a problem of estimating a set of parameters, {θ 0, ω 0} from noisy array pressure-field measurements, {p (t)}. More formally, the target bearing angle and frequency estimation problem is stated as:

Problem

GIVEN a set of noisy array measurements {p (t)}, FIND the best estimates of the target bearing angle (θ 0) and temporal frequency (ω 0) parameters, θ ^ 0 and ω ^ 0 .

The classical approach to this problem is to first take one of the sensor channels and perform power spectral analysis of the filtered time series to estimate the temporal frequency ω 0. The bearing angle can be estimated independently by performing classical spatial spectral estimation (beamforming) [14.13] on the array data. The spatial spectral estimator is scanned over bearing angle indicating the true source location at the spectral peak of maximum power. The results of applying this approach to our problem are shown in Fig. 14.17a depicting the outputs of both spectral estimators peaking at the correct frequency and angle parameters.

Fig. 14.17
figure 17

Plane wave impinging on a 2-element sensor array – frequency and bearing estimation problem. (a) Classical spectral (temporal and spatial) estimation approach. (b) Model-based approach using parametric adaptive processor to estimate bearing angle, temporal frequency, and the corresponding residual or innovations (error) sequence

The MBP is implemented by incorporating the plane wave propagation, hydrophone array, and statistical noise models; however, the temporal frequency and bearing angle parameters are now unknown and must be estimated jointly along with the simultaneous enhancement of the pressure-field. The solution to this problem is performed by solving a joint (parameter/enhancement) estimation problem [14.3,16]. This is the parameter adaptive form [14.17] of the MBP used in many applications [14.8]. The filter becomes nonlinear (in the measurement model) as follows

θ ^ k | k = θ ^ k | k - 1 + K θ e k θ , ω ^ k | k = ω ^ k | k - 1 + K ω e k ω , p ^ k | k - 1 = c [ θ ^ k | k - 1 , ω ^ k | k - 1 ] .
(14.175)

The results are appealing as shown in Fig. 14.17b. We see the bearing angle and temporal frequency estimates as a function of time eventually converging to the true values (ω 0 =50  Hz, θ 0 =45°). The MBP also produces a residual error (innovations) sequence (shown in Fig. 14.17), which is used to determine its performance.

We summarize the classical and model-based solutions to the temporal frequency and bearing angle estimation problem. The classical approach simply performs spectral analysis temporally and spatially to extract the parameters from noisy data, while the model-based approach embeds the unknown parameters into its propagation, measurement, and noise models enabling a solution to the joint estimation problem. The MBP also monitors its performance by analyzing the statistics of its residual (or innovations) error sequence. This completes the section, next we consider a relevant extension of the MBP approach.

Wiener/Kalman Filter Equivalence

In this section, we show the equivalence of the Wiener filter as a special case of the Kalman filter. Is is well-known that there was a great conceptual stress created by the introduction of the recursive Kalman processor. Primarily due to the limitations of the Wiener filter which was constrained to statistically stationary (no time-varying statistics) signals. Of course, the Kalman paradigm is not constrained to stationary processes and easily handles time-varying models and statistics as well as multivariable problems (vector-matrix) making it an extremely powerful methodology in its versatility and application. Therefore, in order to demonstrate the equivalence we must limit the discussion to stationary processes and resort to the power spectral domain where the Wiener solution evolved.

In order to show the relationship between the Wiener filter and its state-space counterpart, the Kalman filter, we state the Wiener solution and then show that the steady-state Kalman (staionary) filter provides a unique solution with all the necessary properties. We use frequency-domain techniques to show the equivalence using 𝒵-transforms. We choose the frequency domain for historical reasons, since the classical Wiener solution has more intuitive appeal.

The Wiener filter solution in the frequency domain can be solved by spectral factorization, since

H ( z ) = P sy ( z ) × P yy - 1 ( z ) ,
(14.176)

where H(z) has all its poles and zeros within the unit circle (stable). The classical approach to Wiener filtering can be accomplished in the frequency domain by factoring the power spectral density (PSD) of the measurement sequence; that is,

P yy ( z ) = H ( z ) H T ( z - 1 ) .
(14.177)

The factorization is unique, stable, and minimum-phase [14.15,18].

Now we must show that the steady-state Kalman filter (ignoring the deterministic input) given by

s ^ k + 1 = A s ^ k + Ke k , y k = C s ^ k + e k = y ^ k + e k ,
(14.178)

where e is the zero-mean, white innovations with covariance R ee , is stable and minimum-phase and therefore, in fact, the Wiener solution. The transfer function of the innovations model is obtained by taking 𝒵-transforms of (14.178) as

T ( z ) = Y ( z ) E ( Z ) = C ( zI - A ) - 1 K .
(14.179)

Let us calculate the measurement covariance of (14.178),

R y ^ y ^ ( ) = Cov [ y k + y k ] = R y ^ y ^ ( ) + R y ^ e ( ) + R e y ^ ( ) + R ee ( ) ,
(14.180)

where y ^ k : = C s ^ k . Taking 𝒵-transforms, we obtain the measurement PSD as

P yy ( z ) = P y ^ y ^ ( z ) + P y ^ e ( z ) + P e y ^ ( z ) + P ee ( z ) .
(14.181)

Using linear system theoretical relations [14.3], we see that

P y ^ y ^ ( z ) = CP s ^ s ^ ( z ) C T = T ( z ) P ee ( z ) T T ( z - 1 ) , P ee ( z ) = R ee P y ^ e ( z ) = CP s ^ e ( z ) = T ( z ) P ee ( z ) , P e y ^ ( z ) = P ee ( z ) T T ( z - 1 ) .
(14.182)

Thus, the measurement PSD is given by

P yy ( z ) = T ( z ) P ee ( z ) T T ( z - 1 ) + T ( z ) P ee ( z ) + P ee ( z ) T T ( z - 1 ) + P ee ( z ) .
(14.183)

Since P ee (z) = R ee and R ee  ≥ 0, the following factorization always exists as

R ee = R ee 1 / 2 ( R ee T ) 1 / 2 .
(14.184)

Thus from (14.184), P yy (z) of (14.183) can be written as

P yy ( z ) = [ T ( z ) R ee 1 / 2 R ee 1 / 2 ] ( ( R ee T ) 1 / 2 T T ( z - 1 ) ( R ee T ) 1 / 2 ) : = T e ( z ) × T e T ( z - 1 ) ,
(14.185)

which shows that the innovations model indeed admits a spectral factorization of the type desired. To show that T e(z) is the unique, stable, minimum-phase spectral factor, it is necessary to show that |T e(z)| has all its poles within the unit circle (stable). It has been shown [14.15,18], that T e(z) does satisfy these constraints. Therefore

T e ( z ) H ( z )

is the Wiener solution. This completes the discussion on the equivalence of the steady-state Kalman filter and the Wiener filter.

Matched-Field Processing

One of the most popular methods of solving a large suite of problems in a wide variety of acoustic applications is the model-based matched-field processor (MFP) [14.19,20,21]. The MFP can be considered a combination of spatial power spectral estimation, matched-filtering and model-based signal processing.

The matched-field processor uses a propagation model with an assumed location of a flaw for nondestructive evaluation (NDE) or a target for sonar or a tumor for medical by transmitting a pulse that interrogates the medium (material, ocean, tissue). Matched field propagates the pulse acoustically through the medium to known sensor position(s) generating a replicant field (m) synthesized at each assumed target (source) location to match the measured field (y) at the sensor(s). The total power is estimated from the source to the sensor(s)

P ^ ( r ) = | m ( r ) y | 2 = m ( r ) R yy m ( r ) ,
(14.186)

where m(r) is the propagated field at r for each predicted source location and y is the measured field at each sensor. A matrix or grid (or image) of power estimates at each source coordinate is created during the search with the maximum peaks above a threshold selected as the estimates of flaw, target, or tumor location

max r P ^ ( r ) .
(14.187)

Consider the following NDE example to illustrate this approach. There exists a large critical part of a homogeneous material that must be inspected for any potential flaws. We would like to: (1) detect the presence of any flaws; and (2) determine their location. We assume a simple homogeneous (geometric spreading and time delay) material model in two dimensions.

The transmitted pulse is p(t) and the receiving array consists of 64 sensors. The flaws are located at xy-positions (200 mm, 300 mm), (500 mm, 900 mm), and (900 mm, 500 mm) from the array center. The space-time signals arriving at the array are governed by spherical wave propagation in the homogeneous medium (material) and satisfy

s ( r jk ; t ) = 1 | Δ r jk | × p ( t - τ jk ) for Δ r jk = | r - r jk | ; τ jk = | Δ r jk | ν ,

for τ the propagation delay, Δr the path length (source-to-sensor), the sensor element, j the x-position index, k the y-position index and ν the sound speed in the material. The signals are contaminated by white Gaussian noise at a SNR of 40 dB.

The MFP is implemented in Cartesian coordinates with the unknown (source) position given by

Δ r jk = | r - r jk | = ( x - x jk ) 2 + ( y - y jk ) 2

and corresponding matching function

m ( x jk , p jk ; t ) = 1 | Δ r jk | p model ( t - τ jk ) ,

with the power at each pixel given by

P ( x jk , p jk ) = | m ( r ; t ) p ( t ) | 2 = jk | m ( x jk , y jk ; t ) p ( t ) | 2 .

Using the MFP approach, flaw locations are estimated by:

  1. 1.

    Varying the assumed flaw positions (location parameter vector).

  2. 2.

    Calculating the matching vector (propagation model).

  3. 3.

    Calculating the corresponding power at the specified pixel location, a power image can be generated over desired range of pixels, j = 1,⋯ , N x ;k = 1,⋯ , N y .

  4. 4.

    Thresholding the image and selecting the dominant peaks.

In this problem, the resulting power field is shown in Fig. 14.18a with the thresholded image shown in Fig. 14.18b. The estimated flaw positions are: (207 mm,297  mm), (498 mm,898  mm) and (903 mm,502  mm) which are reasonable and can be attributed to the high SNR.

Fig. 14.18
figure 18

MFP of component part with three flaws at (200 mm,300  mm), (500 mm,900  mm), (900 mm,500  mm): (a) MFP power image. (b) Flaw detection and localization: (207 mm,297  mm), (498 mm,898  mm) and (903 mm,502  mm)

The Cepstrum

The cepstrum (pronounced kepstrum) is the inverse Fourier transform of the natural logarithm of the spectrum. Because it is the inverse transform of a function of frequency, the cepstrum is a function of a time-like variable. But just as the word cepstrum is an anagram of the word spectrum, the time-like coordinate is called the quefrency, an anagram of frequency. The field of cepstrology is full of word fun like this.

The complex cepstrum of complex spectrum Y(ω) is

q ( τ ) = 1 2 π - d ω e i ω τ ln [ Y ( ω ) ] ,
(14.188)

where τ is the quefrency. Because Y(ω) =| Y(ω)|eiφ(ω),

q ( τ ) = 1 2 π 0 d ω e i ω τ [ ln | Y ( ω ) | + i φ ( ω ) ] + 1 2 π 0 d ω e - i ω τ [ ln | Y ( - ω ) | + i φ ( - ω ) ] .
(14.189)

For a real signal y(t), the magnitude |Y(ω)| is an even function of ω, and φ(ω) is odd. Therefore,

q ( τ ) = 1 π 0 d ω [ ln | Y ( ω ) | ] cos ( ω τ ) + i π 0 d ω φ ( ω ) sin ( ω τ ) .
(14.190)

The real part of q comes from the magnitude, the imaginary part from the phase. The phase must be unwrapped; it cannot be artificially restricted to a 2π range.

It is common to deal only with the real part of the cepstrum q R. It is evident that the calculation will fail if |Y(ω)| is zero. The cepstrum is not applied to theoretical objects such as periodic functions of time that have delta function spectra – hence zeros. The cepstrum is applied to measured data, where it can lead to insight into features of the underlying processes.

The cepstrum is used in the acoustical and vibrational monitoring of machinery. Bearings and other rotating parts tend to produce sounds with interleaved periodic spectra. These periodicities lead to peaks at the corresponding quefrencies, revealing features that may not be apparent in the spectrum.

The cepstrum is particularly suited to the separation of source and filter functions. If Y is a filtered version of X, where the transfer function is H, then

| Y ( ω ) | = | H ( ω ) | | X ( ω ) | .
(14.191)

The logarithm operation turns the product on the righthand side into a sum, so that

q R ( τ ) = 1 π 0 d ω [ ln | H ( ω ) | ] cos ( ω τ ) + 1 π 0 d ω [ ln | X ( ω ) | ] cos ( ω τ ) .
(14.192)

For instance, if |Y| is the spectrum of a spoken vowel, then the term involving the formant filter |H| leads to a low-quefrency structure, and the term involving source spectrum |X| leads to a high-quefrency peak characteristic of the glottal pulse period.

The cepstrum can reveal reflections. As a simple example, we consider a direct sound X plus its reflection with relative amplitude a and delay T D. The sum then has a spectrum Y,

| Y ( ω ) | = [ 1 + a cos ( ω T D ) ] | X ( ω ) | ( a < 1 ) .
(14.193)

The logarithm of the factor in square brackets is periodic in ω with period 2π/T D. The corresponding term in the cepstrum leads to a peak at quefrency τ = T D, as shown in Fig. 14.19. The addition of more reflections with other delays will lead to additional peaks. Maintaining the anagram game, the separation of peaks along the quefrency axis is sometimes called liftering.

Fig. 14.19
figure 19

The cepstrum of an original signal to which is added a delayed version of the same signal, with a delay of 2 ms (a = 1). The original signal is the sum of two female talkers

Noise

Noise has many definitions in acoustics. Commonly, noise is any unwanted signal. In the context of communications, it is an excitation that competes with the information that one wishes to transmit. In signal processing, noise is defined as a random signal that can only be defined in statistical terms with no long-term predictability.

Thermal Noise

Thermal noise, or Johnson noise, is generated in a resistor. An electrical circuit that describes this source of noise is a resistor R in series with a voltage source that depends on R, such that the RMS voltage is given by the equation

V = 4 R k B T Δ f ,
(14.194)

where R is the resistance in ohms, k B is Boltzmannʼs constant, T is the absolute temperature, and Δf is the bandwidth over which the noise is measured.

The corresponding noise power can be defined by measuring the maximum power that is transferred to a load resistor connected across the series circuit above. Maximum power occurs when the load resistor also has a resistance R and has zero temperature so that the load resistor produces no Johnson noise of its own. Then the thermal noise power is given by

P = k B T Δ f .
(14.195)

Because k B T has dimensions of Joules and Δf has dimensions of inverse seconds, the quantity P has dimensions of watts, as expected. Boltzmannʼs constant is 1.38×10−23 J/K, and room temperature is 293 K. Therefore, the noise power density is 4×10−21 W/Hz. Because the power is proportional to the first power of the bandwidth, the noise is white. Johnson noise is also Gaussian.

Gaussian Noise

A noise is Gaussian if its instantaneous values form a Gaussian (normal) distribution. A noise distribution is illustrated in an experiment wherein an observer makes hundreds of instantaneous measurements of a noise voltage and plots these instantaneous values as a histogram. Unless there is some form of bias, the measured values are equally often positive and negative, and so the mean of the distribution is zero. The noise is Gaussian if the histogram derived in this way is a Gaussian function. The more intense the noise, the larger is the standard deviation of the Gaussian function. Because of the central limit theorem, there is a tendency for noise to be Gaussian. However, non-Gaussian noises are easily generated. Random telegraph noise, where instantaneous values can only be + 1 or − 1, is an example.

Band-Limited Noise

Band-limited noise can be written in terms of Fourier components,

x ( t ) = n = 1 N A n cos ( ω n t ) + B n sin ( ω n t ) .
(14.196)

The amplitudes A n and B n are defined only statistically. According to a famous paper by Einstein and Hopf [14.22], these amplitudes are normally distributed with zero mean, and the distributions of A n and B n have the same variance σ n 2. The distributions themselves can be thought of as representative of an ensemble of noises, all of which are intended by the creator to be the same: same duration and power, same frequency range and bandwidth.

Because the average power in a sine or cosine is 0.5, the average power in band-limited noise is

P = n = 1 N σ n 2 .
(14.197)

An alternative description of band-limited noise is the amplitude and phase form

x ( t ) = n = 1 N C n cos ( ω n t + φ n ) ,
(14.198)

where φ n are random variables with a rectangular distribution from 0 to 2π, and C n = A n 2 + B n 2 .

Given that A n and B n follow a Gaussian distribution with variance σ n , the amplitude C n follows a Rayleigh distribution f Rayl

f Rayl ( C n ) = C n σ n 2 e - C n 2 / ( 2 σ n 2 ) ( C n > 0 ) .
(14.199)

The peak of the Rayleigh distribution occurs at C n  = σ. The zeroth moment is 1.0 because the distribution is normalized. The first moment, or C n ¯ , is σ n π / 2 . The second moment is 2σ 2 n , and the fourth moment is 8σ 4 n .

The cumulative Rayleigh distribution can be calculated in closed form,

F Rayl ( C n ) = 0 C n d C n f Rayl ( C n ) = 1 - e - C n 2 / ( 2 σ n 2 ) .
(14.200)

Generating Noise

To generate the amplitudes A n and B n with normal distributions using a computer random-number generator, one can add up twelve random numbers and subtract 6. On the average, the amplitudes will have a normal distribution, because of the central limit theorem, with a mean of zero and a variance of 1.0.

To generate the amplitudes C n with a Rayleigh distribution, one can transform the random numbers r n that come from a computer random-number generator, according to the formula

C n = σ - 2 ln ( 1 - r n ) .
(14.201)

Equal-Amplitude Random-Phase Noise

Equal-amplitude random-phase (EARP) noise is of the form

x ( t ) = C n = 1 N cos ( ω n t + φ n ) ,
(14.202)

where φ n is again a random variable over the range 0 to 2π.

The advantage of EARP noise is that every noise sample has the same power spectrum. A possible disadvantage is that the amplitudes A n and B n are no longer normally distributed. Instead, they are distributed like the probability density functions for the sine or cosine functions, with square-root singularities at A n  =± C and B n  =± C. However, the actual values of noise are normally distributed as long as the number of noise components is more than about five.

Noise Color

White noise has a constant spectral density, which means that the power in white noise is proportional to the bandwidth. On the average, every band with given bandwidth Δf has the same amount of power. Pink noise has a spectral density that decreases inversely with the frequency. Consequently, pink noise decreases at a rate of 3 dB per octave. On the average, every octave band has the same amount of power.

Sampled Data

Converting an analog signal, such as a time-dependent voltage, into a digital representation dices the signal in two dimensions, the dimension of the signal voltage and the dimension of time. Dicing the signal voltage is known as quantization, dicing with respect to time is known as sampling.

Quantization and Quantization Noise

It is common for an analog-to-digital converter (ADC) to represent the values of input voltages as integers. The range of the integers is determined by the number of bits per sample in the conversion process. A conversion into an M-bit sample (or word) allows the voltage value to be represented by 2M bits. For instance, a 10-bit ADC that is restricted to converting positive voltages would represent 0 V by the number 0 and +10 V by 210 − 1 or 1023.

A 16-bit ADC would allow 216 or 65536 different values. A 16-bit ADC that converts voltages between −10 and +10 V would represent −10 V by −32768 and +10 V by +32767. Conversion is linear. Thus 0.3052 V would be converted to the sample value 1000 and 0.3055 V to the value 1001. A voltage of 0.3053 would also be converted to a value of 1000, no different from 0.3052. The discrepancy is an error known as the quantization error or quantization noise.

Quantization noise referenced to the signal is a signal-to-noise ratio. Standard practice makes this ratio as large as possible by assuming a signal with the maximum possible power. For the positive and negative ADC described above, maximum power occurs for a square wave between a sampled waveform value of − 2(M−1) and + 2(M−1). The power is the square of the waveform or 1 4 × 2 2 M .

For its part, the noise is a random variable that represents the difference between an accurately converted voltage and the actual converted value as limited by the number of bits in the sample word. This error is never more than 0.5 and never less than −0.5. The power in noise that fluctuates randomly over the range −0.5 to +0.5 is 1/12. Consequently the signal-to-noise (S/N) ratio is 3 × 22M. Expressed in decibels, this value is 10 log (3 × 22M), or 20M log (2) + 4.8 dB, or 6M + 4.8 dB. For a 16-bit word, this would be 96 + 4.8 or about 101 dB. An alternative calculation would assume that the maximum power is the power for the largest sine wave that can be reproduced by such a system. This sine has half the power of the square, and the S/N ratio is then about 6M dB.

Binary Representation

Digitized data, like a sampled waveform are represented in binary form by numbers (or words) consisting of digits 0 and 1. For example, an eight-bit word consisting of two four-bit bytes and representing the decimal number 7, would be written as

0 0 0 0 0 1 1 1 .

This number has 1 in the ones column, 1 in the twos column, 1 in the fours column, and nothing in any other column. One plus two plus four is equal to 7, which is what was desired.

An eight-bit word (M = 8) could represent decimal integers from 0 to 255. It cannot represent 2M, which is decimal 256. If one starts with the decimal number 255 and adds 1, the binary representation becomes all zeros, i.e. 255 + 1 = 0. It is like the 100000-mile odometer on an automobile. If the odometer reads 99999 and the car goes one more mile, the odometer reads 00000.

Signals are generally negative as often as they are positive, and that leads to a need for a binary representation of negative numbers. The usual standard is a representation known as twos-complement. In twos-complement representation, any number that begins with a 1 is negative. Thus, the leading digit serves as a sign bit.

In order to represent the number − x in an M-bit system one computes 2M − x. That way, if one adds x and − x one ends up with 2M, which is zero.

A convenient algorithm for calculating the twos-complement of a binary number is to reverse each bit, 0 for 1 and 1 for 0, and then add 1. Thus, in an eight-bit system the number − 7 is given by

1 1 1 1 1 0 0 1 .

Sampling Operation

The sampling process replaces an analog signal, which is a continuous function of time, by a sequence of points. The operation is equivalent to the process shown in Fig. 14.20, where the analog signal x(t) is multiplied by a train of evenly spaced delta functions to create a sequence of sampled values y(t).

Fig. 14.20
figure 20

An analog signal (a) x(t) is multiplied by a train of delta functions s(t) (b) to produce a sampled signal y(t) (c)

Intuitively, it seems evident that this operation is a sensible thing to do if the delta functions come along rapidly enough – rapid compared to the speed of the temporal changes in the waveform. That concept is most clearly seen by studying the Fourier transforms of functions x, s and y.

The Fourier transform of the analog signal is X(ω), with a spectrum that is limited to some highest frequency ω max. By contrast, the Fourier transform of the train of delta functions is, itself, a train of delta functions, S(ω) that extends over the entire frequency axis. Because the delta functions in time have period T s, the delta functions in S(ω) are separated by ω s, equal to 2π/T s. Because y(t) is the product of the time-dependent analog signal and the train of delta functions, the Fourier transform Y(ω) is the convolution of X(ω) and S(ω), as shown in part (b) of Fig. 14.21. Because of the convolution operation, Y(ω) includes multiple images of the original spectrum.

Fig. 14.21
figure 21

(a) The spectrum of the analog signal X(ω) is bounded in frequency. (b) The spectrum of the sampled signal, Y(ω), is the convolution of X(ω) and the Fourier transform of the sampling train of delta functions. Consequently, multiple images of X(ω) appear. Frequencies that are allowed by the sampling theorem are included in the dashed box. A particular frequency (circled) is followed through the multiple imaging

It is evident from Fig. 14.21b that, if the ω max is less than half of ω s, the multiple images will not overlap. That observation has the status of a theorem known as the sampling theorem, which says that the sampled signal is an adequate representation of an analog signal if the sample rate is more than twice the highest frequency in the analog signal, i.e., ω s > 2ω max.

As an example of a failure to apply the sampling theorem, suppose that a 600 Hz sine tone is sampled at a rate of 1000 Hz. The spectrum of the sampled signal will contain 600 Hz as expected, and it will also contain a component at 1000 − 600 =400  Hz. The 400 Hz component was not present in the original spectrum; it is an alias, an unwanted image of the 600 Hz tone.

Digital-to-Analog Conversion

In converting a signal from digital to analog form, one can begin with the train of delta functions that is signal y(t) as shown in Fig. 14.20c. An electronic device to do that is a digital-to-analog converter (DAC). However, as shown in Fig. 14.21b, this signal includes many high frequencies that are unwanted byproducts of the sampling process. Consequently, one needs to low-pass filter the signal so as to pass only the frequencies less than half the sample rate, i.e., the frequencies in the dashed box. Such a low-pass filter is called a reconstruction filter.

Practical DACs do not produce delta-function voltage spikes. Instead, they produce rectangular functions with durations pT s, where p is a fraction of a sample period 0 < p ≤ 1. If p = 1, the output of the DAC resembles a staircase function. Mathematically, replacing the delta function train of Fig. 14.20c by the train of rectangles is equivalent to convolving the function y(t) with a rectangular function. The consequence of this convolution is that the output is filtered, and the transfer function of the filter is the Fourier transform of the rectangle. The magnitude of the transfer function is

| H ( ω ) | = sin ( ω p T s / 2 ) ω p T s / 2 .
(14.203)

The phase shift of the filter is a pure delay and consequently unimportant. The effective filtering that results from the rectangles, known as sin(x)-over-x filtering, can be corrected by the reconstruction filter.

The Sampled Signal

This brief section will introduce a notation that will be useful in later discussions of sampled signals. It is supposed at the outset that one begins with a total of N samples, equally spaced in time by the sampling period T s. By convention, the first sample occurs at time t = 0 and the last sample occurs at time t = (N − 1)T s. Consequently, the signal duration is T D = (N − 1)T s.

In dealing with sampled signals, it is common to replace the time variable with a discrete index k. Thus,

x ( t ) = x ( kT s ) = x k ,
(14.204)

where the equation on the left indicates that the original data exist only at discrete time values.

Interpolation

The discrete-time values of a sampled waveform can be used to compute an approximate Fourier transform of the original signal. This Fourier transform is valid up to a frequency as high as half the sample rate, i.e., as high as ω s/2, or π/T s. The Fourier transform can then be used to estimate the values of the original signal x(t) at times other than the sample times. In this way, the Fourier transform computed from the samples serves to interpolate between the samples. Such an interpolation scheme proceeds as follows.

First, the Fourier transform is

X ( ω ) = T s k x k exp ( - i ω T s k ) ,
(14.205)

where, as noted above, x k is the signal x(t) at the times t = T s k, and the leading factor of T s gets the dimensions right.

Then the inverse Fourier transform is

x ( t ) = T s 2 π - ω s / 2 ω s / 2 d ω e i ω t k x k e - i ω T s k .
(14.206)

Reversing the order of sum and integral and using the fact that T s ω s/2 = π, we find that

x ( t ) = k x k sin π ( t / T s - k ) π ( t / T s - k ) .
(14.207)

The sinc function is 1.0 whenever t = T s k, and is zero whenever t is some other integer multiple of T s. Therefore, the sum on the right only interpolates; it does not change the values of x(t) when t is equal to a sample time.

Discrete Fourier Transform

The Fourier transform of a signal with finite duration is well defined in principle. The finite signal itself can be regarded as some base function that is multiplied by a rectangular window to limit the duration. Then the Fourier transform proceeds by convolving with the transform of the window. For example, a truncated exponentially decaying sine function can be regarded as a decaying sine, with the usual infinite duration, multiplied by a rectangular window. Then the Fourier transform of the truncated function is the Fourier transform of the decaying sine convolved with a sinc function – the Fourier transform of the rectangular window. Such a Fourier transform is a function of a continuous frequency, and it shows the broad spectrum associated with the abrupt truncation.

In digital signal processing the frequency axis is not continuous. Instead, the Fourier transform of a signal is defined at discrete frequencies, just as the signal itself is defined at discrete time points. This kind of Fourier transform is known as the discrete Fourier transform (DFT).

To compute the DFT of a function, one begins by periodically repeating the function over the entire time axis. For example, the truncated decaying sine in Fig. 14.22a is repeated in Fig. 14.22b where it should be imagined that the repetition precedes indefinitely to the left and right.

Fig. 14.22
figure 22

A decaying function in part (a) is periodically repeated in part (b) to create a periodic signal with period T D

Then the Fourier transform of the periodically repeated signal becomes a Fourier series. The fundamental frequency of the Fourier series is the reciprocal of the duration, f 0 = 1/T D, and the spectrum becomes a set of discrete frequencies, which are the harmonics of f 0. For instance, if the signal is one second in duration, the spectrum consists of the harmonics of 1 Hz, and if the duration is two seconds then the spectrum has all the harmonics of 0.5 Hz. As expected, the highest harmonic is limited to half the sample rate. That Fourier series is the DFT. Using x k ¯ to define the periodic repetition of the original discrete function, x k , the DFT X ¯ ( ω ) is defined for ω = 2πn/T D, where n indicates the n-th harmonic. In terms of the fundamental angular frequency ω 0 = 2π/T D, the DFT is

X ¯ ( n ω 0 ) = T s k = 0 N - 1 x k e - i n ω 0 k T s ,
(14.208)

where the prefactor T s keeps the dimensions right. The product ω 0 T s is equal to ω 0 T D/(N − 1) or 2π/(N − 1), and so

X ¯ ( n ω 0 ) = T s k = 0 N - 1 x k e - i 2 π n k / ( N - 1 ) .
(14.209)

Both positive and negative frequencies occur in the Fourier transform. Because the maximum frequency is equal to [1/(2T s)]/(1/T D) times the fundamental frequency, the number of discrete positive frequencies is (N − 1)/2, and the number of discrete negative frequencies is the same. Consequently the inverse DFT can be written

x ¯ k = 1 T D n = - ( N - 1 ) / 2 ( N - 1 ) / 2 X ( n ω 0 ) e i 2 π n k / ( N - 1 ) ,
(14.210)

or

x ¯ ( t ) = 1 T D n = - ( N - 1 ) / 2 ( N - 1 ) / 2 X ( n ω 0 ) e i n ω 0 t .
(14.211)

A virtue of the DFT is that the information in the DFT is exactly what is needed to create the original truncated function x(t) – no more and no less. The fact that the DFT spectrum actually creates the periodically repeated function x ¯ k and not the original x k is not a problem if we agree in advance to ignore x ¯ k for k outside the range of the original time-limited signal. However, it should be noted that certain operations, such as time translations, products, and convolution, that have familiar characteristics in the context of the Fourier transform, retain those characteristics only for the periodically extended signal x ¯ k and its Fourier transform X ¯ ( n ω 0 ) and not for the finite-duration signal.

Interpolation for the Spectrum

It is possible to estimate the Fourier transform at values of frequency between the harmonics of ω 0. The procedure begins with the definition of the Fourier transform of a finite function,

X ( ω ) = 0 T D d t x ( t ) e - i ω t .
(14.212)

Next, the function x(t) is replaced by the inverse DFT from (14.211), and the variable of integration t is replaced by t′, which has symmetrical upper and lower limits,

X ( ω ) = 1 T D - T D / 2 T D / 2 d t e - i ω t × n = - ( N - 1 ) / 2 ( N - 1 ) / 2 X ( n ω 0 ) e i n ω 0 t e - i ω T D / 2 e i n ω 0 T D / 2 ,
(14.213)

which reduces to

X ( ω ) = n = - ( N - 1 ) / 2 ( N - 1 ) / 2 X ( n ω 0 ) sin [ ( ω - n ω 0 ) T D / 2 ] ( ω - n ω 0 ) T D / 2 × e - i ω T D / 2 e i π n .
(14.214)

The z-Transform

Like the discrete Fourier transform, the z-transform is well suited to describing sampled signals. We consider x(t) to be a sampled signal so that it is defined at discrete time points t = t k  = kT s, where T s is the sampling period. Then the time dependence of x can be described by an index, x k  = x(t k ). The z-transform of x is

X ( z ) = k = - x k z - k .
(14.215)

The quantity z is complex, with amplitude A and phase φ,

z = A e i φ = A e i ω T s ,
(14.216)

where φ is the phase advance in radians per sample.

In the special case where A = 1, all values of z lie on a circle of radius 1 (the unit circle) in the complex z plane. In that case the z-transform is equivalent to the discrete Fourier transform. An often-overlooked alternative view is that the z-transform is an extension of the Fourier transform wherein the angular frequency ω becomes complex,

ω = ω R + i ω I ,
(14.217)

so that

z = e - ω I T s e i ω R T s .
(14.218)

The extended Fourier transform will not be pursued further in this chapter.

A well-defined z-transform naturally includes a function of variable z, but the function itself is not enough. In order for the inverse transform to be unique, the definition also requires that the region of the complex plane in which the transform converges must also be specified. To illustrate that point, one can consider two different functions x k that have the same z-transform function, but different regions of convergence (Table 14.2).

Table 14.2 z-Transform pairs

Consider first the function

x k = 2 k for k 0 , x k = 0 for k < 0 .
(14.219)

This two-line function can be written as a single line by using the discrete Heaviside function u k . The function u k is defined as zero when k is a negative integer and as + 1 when k is any other integer, including zero. Then x k above becomes

x k = 2 k u k .
(14.220)

The z-transform of x k is

X ( z ) = k = 0 ( 2 / z ) k .
(14.221)

The sum is a geometric series, which converges to

X ( z ) = 1 1 - 2 / z = z / ( z - 2 )
(14.222)

if |z| > 2. The region of convergence is therefore the entire complex plane except for the portion inside and on a circle of radius 2.

Next consider the function

x k = - 2 k u - k - 1 .
(14.223)

The z-transform of x k is

X ( z ) = - k = - - 1 ( 2 / z ) k or - k = 1 ( z / 2 ) k .
(14.224)

The sum converges to

X ( z ) = - ( z / 2 ) 1 - z / 2 = z z - 2
(14.225)

if |z| < 2. The function is identical to the function in (14.222), but the region of convergence is now the portion of the complex plane entirely inside the circle of radius 2.

The inverse z-transform is given by a counterclockwise contour integral circling the origin

x k = 1 2 π i C d z X ( z ) z k - 1 .
(14.226)

The contour C must lie entirely within the region of convergence of x and must enclose all the poles of X(z).

The regions of convergence when the functions x and y are combined in some way are at least the intersection of the regions of convergence for x and y separately. Scaling and time reversal lead to regions of convergence that are scaled and inverted, respectively. For instance, if X(z) converges in the region between radii r 1 and r 2, them X(1/z) converges in the region between 1/r 2 and 1/r 1.

Transfer Function

The output of a process at time point k, namely y k , may depend on the inputs x at earlier times and also on the outputs at earlier times. In equation form,

y k = q = 0 Nq β q x k - q - p = 1 Np α p y k - p .
(14.227)

This equation can be z-transformed using the time-shift property in Table 14.3,

q = 0 Nq β q z - q X ( z ) = p = 0 Np α p z - p Y ( z ) ,
(14.228)

where α 0 = 1. The transfer function is the ratio of the transformed output over the transformed input,

H ( z ) = Y ( z ) / X ( z ) ,
(14.229)

which is

H ( z ) = q = 0 Nq β q z - q p = 0 Np α p z - p .
(14.230)

From the fundamental theorem of algebra, the numerator of the fraction above has Nq roots and the denominator has Np roots, so that H(z) can be written as

H ( z ) = ( 1 - q 1 z - 1 ) ( 1 - q 2 z - 1 ) ( 1 - q Nq z - 1 ) ( 1 - p 1 z - 1 ) ( 1 - p 2 z - 1 ) ( 1 - p Np z - 1 ) .
(14.231)
Table 14.3 Properties of the z-transform

This equation and its development are of central importance to digital filters, also known as linear time-invariant systems. If the system is recursive, outputs from a previous point in time are sent back to the input. Therefore, some values of the coefficients α p are finite for p > 1 and so are the values of some poles, such as p 2. Such filters are called infinite impulse response (IIR) filters because it is possible that the response of the system to an impulse put in at time zero will never entirely die out. Some of the output is always fed back into the input. A similar conclusion is reached by recognizing that the expansion of 1/(1 − pz −1) in powers of z −1 goes on forever. Because the system has poles, there are concerns about stability.

If the system is nonrecursive, no values of the output are ever sent back to the input. Therefore, the denominator of H(z) is simply the number 1. Such filters are called finite impulse response (FIR) filters because their response to a delta function input will always die out eventually as long as Nq is finite. The system is said to be an all-zero system. The order of the filter is established by Nq or Np, the number of time points back to the earliest input or output that contribute to the current output value.

The formal z-transform,

H ( z ) = k = - h k z - k
(14.232)

leads to conclusions about causality and stability.

A filter is causal if the current value of the output does not depend on future inputs. For a causal filter h k is zero for k < 0. Then this sum has no terms with positive powers of z, and the region of convergence of H(z) includes |z| =∞.

A filter is stable if

S = k = - | h k |
(14.233)

is finite. It follows, that H(z) is finite for |z| = 1, i.e., for z on the unit circle. Thus, if the region of convergence includes the unit circle, the filter is stable.

Maximum Length Sequences

A maximum length sequence (MLS) is a train of ones and zeros that makes a useful signal for measuring the impulse response of a linear system. An MLS can be generated by a bit-shift register, which resembles a bucket brigade. To make an N-bit MLS, one needs an N-stage shift register. Each stage can hold either a one or a zero. The register is imagined to have a clock which synchronizes the transfer of bits from each stage to the next. On every clock tick the content of each stage of the register is transferred to the next stage down the line. The content of the last stage is regarded as the output of the register, and it is also fed back into the first stage. In addition, the output can be fed back into one or more of the other stages, and when that occurs the stage receiving the output, in addition to the content of the previous stage, performs an exclusive OR (XOR) operation on those two inputs. The XOR operation obeys the truth table shown in Table 14.4. In words, the XOR of inputs A and B is zero if A and B are the same and is 1 if A and B are different.

Table 14.4 Truth table for the exclusive or (XOR) operation

A shift register with three stages is shown in Fig. 14.23. With three stages and feedback taps to stages 1 and 2, it is defined as [3: 1,2].

Fig. 14.23
figure 23

A three-stage shift register [3: 1, 2] in which the output is fed back into the first and second stages

At the instant shown in the figure, the register holds the value 1,1,1. The subsequent development of the register values is given in Table 14.5. The sequence repeats after seven steps. The table shows that every possible pattern of ones and zeros occurs once, and only once, before the pattern repeats. There are 2N − 1 = 23 − 1 = 7 such patterns. There is one exception, namely the pattern 0,0,0. If this pattern should ever appear in the register then the process gets stuck forever. Therefore, this pattern is not allowed. The output sequence is the contents of the stage on the right, here, 1,1,0,0,1,0,1. Because all seven register patterns appear before repetition, this output is a maximum length sequence. There is nothing special about the starting register value, 1,1,1. Therefore, any cyclic permutation of the MLS is also an MLS. For instance, the sequence, 1,0,0,1,0,1,1 is the same sequence.

Table 14.5 Successive values in the shift register of Fig. 14.12

An example of a three-bit shift register that does not produce an MLS is [3: 1,2,3], shown in Fig. 14.24. The pattern for this shift register is shown in Table 14.6. The pattern of register values begins to repeat after only four steps. Therefore, the sequence of output values, namely 1,0,0,1,1,0,0,1,1,0,0,1, is not an MLS.

Fig. 14.24
figure 24

A three-stage shift with feedback into all the stages does not produce an MLS

Table 14.6 Successive values in the shift register of Fig. 14.13

The MLS as a Signal

To make a signal from an MLS requires only one step: every 0 in the sequence is replaced by − 1. Therefore, the MLS for the shift register in Fig. 14.23 becomes: 1, 1,− 1,− 1, 1,− 1, 1. For this three-stage register (N = 3) the MLS has a length of seven; there are four + 1 values and three − 1 values. These results can be generalized to an N-stage register which has 2N − 1 values; 2(N−1) are + 1 values and 2(N−1) − 1 are − 1 values. The average value is therefore 1/(2N − 1).

Application of the MLS

The key fact about an MLS is that its autocorrelation function is very nearly a delta function. To express that idea, one can write the autocorrelation function in the form appropriate for discrete samples,

c k = 1 2 N - 1 k 1 x k 1 x k 1 + k .
(14.234)

This sum, and all sums to follow, are over the 2N − 1 values of the MLS sequence x. Because the sequence is cyclical, it does not matter where one starts the sum.

An MLS has the property that

c k = ( 1 + 1 2 N - 1 ) δ k , 0 - 1 2 N - 1 .
(14.235)

Therefore, c k is approximately a Kronnecker delta function

c k δ k , 0 .
(14.236)

If we would like to know the impulse response h of a linear system, we can excite the system with the MLS x, and record the output y. As for filters, the linear response y is the convolution of x and h, i.e.,

y k = x * h = k 1 x k 1 + k h k 1 .
(14.237)

To find the impulse response, one can form the quantity d, by convolving the recording y with the original MLS x, i.e.,

d k = 1 2 N - 1 k 2 x k 2 + k y k 2
(14.238)

or from (14.237)

d k = 1 2 N - 1 k 1 , k 2 x k 2 + k x k 1 + k 2 h k 1 .
(14.239)

Only x*x involves the index k2, and doing the sum over k2 leads to

d k = k 1 δ k , k 1 h k 1
(14.240)

so that d k  = h k . In this way, we have found the desired impulse response.

As applied in architectural acoustics, the MLS is an alternative to recording the response to a popping balloon or gun shot. Because the MLS is continuous, it avoids the dynamic-range problem associated with an impulsive test signal, and by repeating the sequence one can achieve remarkable noise immunity.

Similarly, the MLS is an alternative to recording the response to white noise (the MLS is white). However, digital white noise, such as random telegraph noise, has an autocorrelation function that is zero only for a long-term or ensemble average. In practice, the white-noise response of a linear system is much noisier than the MLS response.

Long Sequences

Table 14.7 gives the taps for some MLSs generated by shift registers with 2–20 stages, i.e., orders 2–20. The longest sequence has a length of more than one million bits,

Table 14.7 Taps for maximum length sequences

For orders 2, 3, and 4, there is only one possible set of taps. These sets have two taps, including the feedback to stage 1. For orders 7, 15, and 17 there is more than one set with two taps, and all of them are shown in the table.

Beginning with order 5 there are four-tap sets as well as two-tap sets, except that for some orders, such as 8, there are no two-tap sets. For every order the table gives a set with the smallest possible number of taps.

Beginning with order 7 there are six-tap sets. As the order increases the number of sets also increases. For order 19, there are 79 four-tap sets.

Information Theory

Information theory provides a way to quantify information by computing information content. The information content of a message depends on the context, and the context determines the initial uncertainty about the message. Suppose, for example, that we receive one character, but we know in advance that the context is one in which the character must be a digit between 0 and 9. Our uncertainty before receiving that actual character is described by the number of possible outcomes, which is Ω = 10 in this case. Suppose instead, that the context is one in which the character must be a letter of the alphabet. Then our initial uncertainty is greater because the number of possible outcomes is now Ω = 26. The first step of information theory is to recognize that, when we actually receive and identify a character, the information content of that character is greater in the second context than in the first because in the second context the character has eliminated a greater number of a priori possibilities.

The second step in information theory is to consider a message with two characters. If the context of the message is decimal digits then the number of possibilities is the product of 10 for the first digit and 10 for the second, namely Ω = 100 possibilities. Compared to a message with one character, the number of possibilities has been multiplied by 10. However, it is only logical to expect that two characters will give twice as much information as one, not 10 times as much. The logical problem can be solved by quantifying information in terms of entropy, which is the logarithm of the number of possibilities

H = log Ω .
(14.241)

Because log 100 is just twice log 10, the logical problem is solved. The information measured in bits is obtained by using a base 2 logarithm.

A few simple features follow immediately. If the number of possible messages is Ω = 1 then the message provides no information, which agrees with log 1 = 0. If the context is binary, where a character can be only 1 or 0 (Ω = 2), then receiving a character provides 1 bit of information, which agrees with log 22 = 1.

If the context is an alphabet with M possible symbols, and all of the symbols are equally probable, then a message with N characters has Ω = M N possible outcomes and the information entropy is

H = log M N = N log M ,
(14.242)

illustrating the additivity of information over the characters of the message.

Shannon Entropy

Information theory becomes interesting when the probabilities of different symbols are different. Shannon [14.23,24] showed that the information content per character is given by

H c = - i = 1 M p i log p i ,
(14.243)

where p i is the probability of symbol i in the given context.

The rest of this section proves Shannonʼs formula. The proof begins with the plausible assumption that, if the probability of symbol i is p i , then in a very long message of N characters, the number of occurrences of character i, m i will be exactly m i  = Np i .

The number of possibilities for a message of N characters in which the set of {m i } is fixed by the corresponding {p i } is

Ω = N ! m 1 ! m 2 ! m M ! .
(14.244)

Therefore,

H = log N ! - log m 1 ! - log m 2 ! - log m M ! .
(14.245)

One can write log N! as a sum

log N ! = k = 1 N log k
(14.246)

and similarly for log m i !.

For a long message one can replace the sum by an integral,

log N ! = 1 N d x log x = N log N - N + 1
(14.247)

and similarly for log m i ! .

Therefore,

H = N log N - N + 1 - i = 1 M m i log m i + i = 1 M m i - i = 1 M 1 .
(14.248)

Because ∑ i=1 M m i  = N, this reduces to

H = N log N + 1 - i = 1 M m i log m i - M .
(14.249)

The information per character is obtained by dividing the message entropy by the number of characters in the message,

H c = log N - i = 1 M p i log m i + ( 1 - M ) / N ,
(14.250)

where we have used the fact that m i /N = p i .

In a long message, the last term can be ignored as small. Then because the sum of probabilities p i is equal to 1,

H c = - i = 1 M p i ( log m i - log N ) ,
(14.251)

or

H c = - i = 1 M p i log p i ,
(14.252)

which is (14.243) as advertised.

If the context of written English consists of 27 symbols (26 letters and a space), and if all symbols are equally probable, then the information content of a single character is

H c = - 1 . 443 i = 1 27 1 27 ln 1 27 = 4 . 75  (bits) ,
(14.253)

where the factor 1/ ln (2) = 1.443 converts the natural log to a base 2 log. However, in written English all symbols are not equally probable. For example, the most common letter, E, is more than 100 times more likely to occur than the letter J. Because equal probability of symbols always leads to the highest entropy, the unequal probability in written English is bound to reduce the information content – to about 4 bits per character. An even greater reduction comes from the redundancy in larger units, such as words, so that the information content of written English is no more than 1.6 bits per character.

The concept of information entropy can be extended to continuous distributions defined by a probability density function

h = - - d x PDF ( x ) log [ PDF ( x ) ] .
(14.254)

Mutual Information

The mutual information between sets of variables {i} and {j} is a measure of the amount of uncertainty about one of these variables that is eliminated by knowing the other variable. Mutual information H m is given in terms of the joint probability mass function p(i, j)

H m = i = 1 M j = 1 M p ( i , j ) log p ( i , j ) p ( i ) p ( j ) .
(14.255)

Using written English as an example again, p(i) might describe the probability for the first letter of a word and p(j) might describe the probability for the second. It is convenient to let the indices i and j be integers, e.g., p(i = 1) is the probability that the first letter is an ‘A’, and p(j = 2) is the probability that the second letter is a ‘B’. Then p(1, 2) is the probability that the word starts with the two letters ‘AB’. It is evident that in a context where the first two letters are completely independent of one another so that p(i, j) = p(i)p(j) then the amount of mutual information is zero because log (1) = 0. In the opposite limit the context is one in which the second letter is completely determined by the first. For instance, if the second letter is always the letter of the alphabet that immediately follows the first letter then p(i, j) = p(j) = p(i)δ(j, i + 1), and

H m = i = 1 M p ( i ) log p ( i ) p ( i ) p ( i )
(14.256)

which simply reduces to (14.243) for H c, the information content of the first letter of the word.

In the general case, the mutual information is a difference in information content. It is equal to the information provided by the second letter of the word given no prior knowledge at all, minus the information provided by the second letter of the word given knowledge of the first letter. Mathematically, p(i, j) = p(i)p(j|i), where p(j|i) is the probability that the second letter is j given that the first letter is i. Then

H m = j = 1 M p ( j ) log 1 p ( j ) - i = 1 M j = 1 M p ( i , j ) log 1 p ( j | i ) .
(14.257)

The information transfer ratio T is the degree to which the information in the first letter predicts the information in the second. Equivalently, it describes the transfer of information from an input to an output

T = - H m i = 1 M p ( i ) log p ( i ) .
(14.258)

This ratio ranges between 0 and 1, where 1 indicates that the second letter, or output, can be predicted from the first letter, or input, with perfect reliability. The mutual information is the basis for the calculation of the information capacity of a noisy communications channel.