Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The process of combining signals acquired by a microphone array in order to ‘focus’ on a signal in a specific direction is known as beamforming or spatial filtering. We present in this chapter a number of such beamforming methods that are specifically controlled by weights dependent only on the direction of arrival (DOA) of the desired source. They are otherwise signal-independent such that they do not depend on the statistics of the desired or noise signals. We derive maximum directivity and maximum white noise gain beamformers that establish performance bounds for spherical harmonic domain (SHD) beamformers. Because the weights of these beamformers are given by simple expressions, they present the advantages of being straightforward to implement and of having low computational complexity.

6.1 Signal Model

The sound pressure P captured at a position \(\mathbf {r} = (r,\varOmega ) = (r,\theta ,\phi )\) (in spherical coordinates, where \(\theta \) denotes the inclination and \(\phi \) denotes the azimuth) on a spherical microphone array of radius r is commonly expressed as the sum of a desired signal X and a noise signal V [12, 15]. In the spatial domain, the signal model is expressed as

$$\begin{aligned} P(k,\mathbf {r}) = X(k,\mathbf {r}) + V(k,\mathbf {r}), \end{aligned}$$
(6.1)

where k denotes the wavenumber.Footnote 1 The desired signal X is assumed to be spatially coherent, while the noise signal V models background noise or sensor noise, for example, and may be spatially incoherent, coherent or partially coherent .

When using spherical microphone arrays, it is convenient to work in the SHD [1, 17]. In this chapter, we assume error-free spatial sampling by Q microphones at positions \(\mathbf {r}_q = (r,\varOmega _q), q \in \left\{ 1, \ldots , Q \right\} \), and refer the reader to Chap. 3 for information on spatial sampling and aliasing. By applying the complex spherical harmonic transform (SHT) to the signal model in (6.1), we obtain the SHD signal model

$$\begin{aligned} P_{lm}(k) = X_{lm}(k) + V_{lm}(k), \end{aligned}$$
(6.2)

where \(P_{lm}(k)\), \(X_{lm}(k)\) and \(V_{lm}(k)\) are respectively the spherical harmonic transforms of the spatial domain signals \(P(k,\mathbf {r}_q)\), \(X(k,\mathbf {r}_q)\) and \(V(k,\mathbf {r}_q)\), as defined in (3.6), and are referred to as eigenbeams to reflect the fact that the spherical harmonics are eigensolutions of the wave equation in spherical coordinates [26]. The order and degree of the spherical harmonics are respectively denoted as l and m.

By combining the eigenbeams \(P_{lm}(k)\) in a particular way, the noise V can be suppressed and the desired signal X can be extracted from the noisy mixture P. This is accomplished using a spatio-temporal filter or beamformer. In the spatial domain, the output of a beamformer is obtained as the weighted sum of the pressure signals at each of the microphones [3, 4]; in the SHD, the beamformer output is given by a weighted sum of the eigenbeams \(P_{lm}(k)\) [14, 21]. The output of an Lth-order SHD beamformer can thus be expressed as [21, Eq. 12]Footnote 2

$$\begin{aligned} Z(k) = \sum _{l=0}^{L} \sum _{m=-l}^{l} W^*_{lm}(k) P_{lm}(k), \end{aligned}$$
(6.3)

where \(W_{lm}(k)\) denotes the beamformer weights and \((\cdot )^{*}\) denotes the complex conjugate.

Beamformers can either be signal-independent (fixed) or signal-dependent; their weights are chosen in order to achieve specific performance objectives. Signal-independent beamformers apply a constraint to a specific steering direction and optimize the beamformer weights with respect to array performance measures such as the white noise gain (WNG) and directivity. They can also, more generally, attempt to achieve a specific spatial response in all directions by minimizing the difference between the beamformer’s spatial response and the desired spatial response, according to some distance measure (see [6, Sects. 8.3 and 8.4] for examples). Signal-dependent beamformers optimize the weights taking into account characteristics of the desired signal and noise. In this chapter, we will discuss signal-independent beamformers and later address signal-dependent beamformers in Chap. 7 .

A block diagram of a signal-independent beamformer is shown in Fig. 6.1. We begin by capturing the sound pressure signals \(P(k,\mathbf {r}_q)\) at microphones \(q \in \{ 1, \ldots , Q \}\), and applying the SHT to obtain the SHD sound pressure signals, the eigenbeams \(P_{lm}(k)\), gathered together to form a vector \(\mathbf {p}(k)\). The output Z(k) of the beamformer is obtained by taking the weighted sum of these eigenbeams, where the weights \(W_{lm}(k,\varOmega _{\text {u}})\) depend only on the steering direction \(\varOmega _{\text {u}}\) and do not otherwise depend on the sound pressure signals P.

Fig. 6.1
figure 1

Block diagram of a signal-independent beamformer

The signal-independent beamformers presented in this chapter are designed assuming anechoic conditions with a single active sound source, though these assumptions are unlikely to be valid in practical use scenarios. Depending on the distance between this source and the array, the desired signal is either assumed to consist of a plane wave or a spherical wave. Under farfield conditions, the eigenbeams of a unit amplitude plane wave incident from a direction \(\varOmega _{\text {s}}\) are given by (3.22a). The SHD sound pressure \(X_{lm}(k,\varOmega _{\text {s}})\) related to a plane wave with power \(P_{\text {pw}}(k)\) can then be written as [18, 20, 26 ]

$$\begin{aligned} X_{lm}(k,\varOmega _{\text {s}}) = {\sqrt{P_{\text {pw}}(k)}} b_l(k) Y_{lm}^*(\varOmega _{\text {s}}), \end{aligned}$$
(6.4)

where \(Y_{lm}(\varOmega _{\text {s}})\) denotes the complex spherical harmonicFootnote 3 of order l and degree m evaluated at an angle \(\varOmega _{\text {s}}\), as defined in (2.14), and the mode strength \(b_l(k)\) captures the eigenbeams’ dependence on the array properties, such as microphone type or array configuration, and is discussed in more detail in Sect. 3.4.2.

All the beamformers designed in this chapter seek to suppress the noise while maintaining a distortionless constraint on the signal originating from the steering direction \(\varOmega _{\text {u}}\). This constraint is expressed as

$$\begin{aligned} \sum _{l=0}^{L} \sum _{m=-l}^{l} W^*_{lm}(k) b_l(k) Y_{lm}^*(\varOmega _{\text {u}}) = 1. \end{aligned}$$
(6.5)

It is important to note that this distortionless constraint depends only on the steering direction \(\varOmega _{\text {u}}\). It is different from the distortionless constraint imposed in Chap. 7, which takes into account the complex multipath propagation effects of a reverberant environment. Using the constraint in (6.5) can be appealing, as it does not require the estimation of the acoustic transfer functions (ATFs) or relative transfer functions, however this comes at the expense of sensitivity to errors in the steering direction and reduced robustness to reverberation .

For convenience, the SHD signal model in (6.2) can also be expressed in vector form as

$$\begin{aligned} \mathbf {p}(k) = \mathbf {x}(k) + \mathbf {v}(k) \end{aligned}$$
(6.6)

where the SHD signal vector \(\mathbf {p}(k)\) of length \((L+1)^2\) is defined as

$$\begin{aligned} \mathbf {p}(k)&= \left[ P_{00}(k)\,\, P_{1(-1)}(k)\,\, P_{10}(k)\,\, P_{11}(k)\,\, P_{2(-2)}(k) \,\cdots \, P_{LL}(k)\right] ^{\text {T}}, \end{aligned}$$

and \(\mathbf {x}(k)\) and \(\mathbf {v}(k)\) are defined similarly to \(\mathbf {p}(k)\). The beamformer output signal Z(k) can be expressed as

$$\begin{aligned} Z(k) = \mathbf {w}^{\text {H}}(k) \mathbf {p}(k), \end{aligned}$$
(6.7)

where the filter weights vector is defined as

$$\begin{aligned} \mathbf {w}(k)&= \left[ W_{00}(k)\,\, W_{1(-1)}(k)\,\, W_{10}(k)\,\, W_{11}(k)\,\, W_{2(-2)}(k) \,\cdots \, W_{LL}(k)\right] ^{\text {T}}. \end{aligned}$$

In matrix form the desired signal is written as

$$\begin{aligned} \mathbf {x}(k,\varOmega _{\text {s}}) = {\sqrt{P_{\text {pw}}(k)}} \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {s}}), \end{aligned}$$
(6.8)

where the vector of spherical harmonics \(\mathbf {y}(\varOmega _{\text {s}})\) of length \((L+1)^2\) is defined as

$$\begin{aligned} \mathbf {y}(\varOmega _{\text {s}}) = \left[ Y_{00}(\varOmega _{\text {s}})\,\, Y_{1(-1)}(\varOmega _{\text {s}})\,\, Y_{10}(\varOmega _{\text {s}})\,\, Y_{11}(\varOmega _{\text {s}})\,\, \cdots \, Y_{LL}(\varOmega _{\text {s}})\right] ^{\text {T}}, \end{aligned}$$
(6.9)

and the \((L+1)^2 \times (L+1)^2\) matrix of mode strengths \(\mathbf {B}(k)\) is defined as

$$\begin{aligned} \mathbf {B}(k)&= \text {diag}\left\{ b_{0}(k), b_{1}(k), b_{1}(k), b_{1}(k), b_{2}(k), \ldots , b_{L}(k)\right\} , \end{aligned}$$
(6.10)

therefore \(\mathbf {B}(k)\) consists of \(2l+1\) repetitions of \(b_l(k)\) for \(l \in \left\{ 0, \ldots , L \right\} \) along its diagonal. Finally, the distortionless constraint is given by

$$\begin{aligned} \mathbf {w}^{\text {H}}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}}) = 1. \end{aligned}$$
(6.11)

6.2 Design Criteria

In this section, we introduce a number of measures that can be used to design optimal beamformers as in Sect. 6.3. It should be noted that these measures are defined with respect to the signals with physical significance, namely the spatial domain signals, and not with respect to the eigenbeams. Nevertheless, these measures will still depend on the eigenbeams as they form a part of the spherical harmonic expansion (SHE) of the spatial domain signals.

6.2.1 Directivity

Directivity is a measure of a beamformer’s spatial selectivity and quantifies its ability to suppress sound waves that do not originate from a specifically chosen steering direction. It is defined as the ratio of the power of the beamformer output due to a plane wave arriving from the steering direction \(\varOmega _{\text {u}}\) to the power of the beamformer output averaged over all directions [28]. The directivity \(\mathcal {D}(k)\) is therefore written as

$$\begin{aligned} \mathcal {D}(k)&= \frac{\left| Z(k,\varOmega _{\text {u}})\right| ^2}{\frac{1}{4 \pi } \int _{\varOmega \in \mathcal {S}^2} \left| Z(k,\varOmega ) \right| ^2 \text {d}\varOmega } \end{aligned}$$
(6.12)
$$\begin{aligned}&= \frac{\left| \sum _{l=0}^{L} \sum _{m=-l}^{l} W^*_{lm}(k) X_{lm}(k,\varOmega _{\text {u}})\right| ^2}{\frac{1}{4 \pi } \int _{\varOmega \in \mathcal {S}^2} \left| \sum _{l=0}^{L} \sum _{m=-l}^{l} W^*_{lm}(k) X_{lm}(k,\varOmega ) \right| ^2 \text {d}\varOmega }, \end{aligned}$$
(6.13)

where the notation \(\int _{\varOmega \in \mathcal {S}^2} \text {d}\varOmega \) is used to denote compactly the solid angle \(\int _{\phi = 0}^{2\pi } \int _{\theta = 0}^{\pi } \sin \theta \text {d}\theta \text {d}\phi \). Applying the distortionless constraint (6.5), and by substituting the expression for a plane wave (6.4) into (6.12), we find

$$\begin{aligned} \mathcal {D}(k)&= \frac{4 \pi {P_{\text {pw}}(k)}}{\int _{\varOmega \in \mathcal {S}^2} \left| \sum _{l=0}^{L} \sum _{m=-l}^{l} W^*_{lm}(k) {\sqrt{P_{\text {pw}}(k)}} b_l(k) Y_{lm}^*(\varOmega ) \right| ^2 \text {d}\varOmega } \nonumber \\&= {\frac{4 \pi }{\int _{\varOmega \in \mathcal {S}^2} \left| \sum _{l=0}^{L} \sum _{m=-l}^{l} W^*_{lm}(k) b_l(k) Y_{lm}^*(\varOmega ) \right| ^2 \text {d}\varOmega }}. \end{aligned}$$
(6.14)

Using the orthonormality of the spherical harmonics (2.18), this can be simplified toFootnote 4

$$\begin{aligned} \mathcal {D}(k)&= 4 \pi \left( \sum _{l=0}^{L} \sum _{m=-l}^{l} \left| W^*_{lm}(k) b_l(k)\right| ^2\right) ^{-1}, \end{aligned}$$
(6.15)

or in vector form

$$\begin{aligned} \mathcal {D}(k)&= 4 \pi \left| \left| \mathbf {B}(k) \mathbf {w}^{*}(k) \right| \right| ^{-2}, \end{aligned}$$
(6.16)

where \(\left| \left| \cdot \right| \right| \) denotes the 2-norm. The directivity is therefore a function of the array properties, such as radius or microphone type, and the beamformer weights \(W_{lm}(k)\).

The directivity is frequently expressed in dB and is then referred to as the directivity index (DI),

$$\begin{aligned} \text {DI}(k) = 10 \log _{10} \mathcal {D}(k). \end{aligned}$$
(6.17)

6.2.2 Front-to-Back Ratio

The front-to-back ratio is another alternative measure of a beamformer’s spatial selectivity and quantifies its ability to differentiate between sound waves that originate from the front and the back. It is defined as the ratio of the average power of the beamformer output due to a plane waves arriving from the front to the average power of the beamformer output due to plane waves arriving from the back. The front-to-back ratio \(\mathcal {F}(k)\) is therefore written as [7 ]

$$\begin{aligned} \mathcal {F}(k)&= \frac{\frac{1}{4 \pi } \int _{\varOmega \in \mathcal {S}_{\text {F}}^2} \left| \sum _{l=0}^{L} \sum _{m=-l}^{l} W^*_{lm}(k) X_{lm}(k,\varOmega ) \right| ^2 \text {d}\varOmega }{\frac{1}{4 \pi } \int _{\varOmega \in \mathcal {S}_{\text {B}}^2} \left| \sum _{l=0}^{L} \sum _{m=-l}^{l} W^*_{lm}(k) X_{lm}(k,\varOmega ) \right| ^2 \text {d}\varOmega }, \end{aligned}$$
(6.18)

where for a beamformer steered to \((\pi /2,\pi /2)\) we have

$$\begin{aligned} \int _{\varOmega \in \mathcal {S}_{\text {F}}^2} \text {d}\varOmega = \int _{\phi = 0}^{\pi } \int _{\theta = 0}^{\pi } \sin \theta \text {d}\theta \text {d}\phi \end{aligned}$$
(6.19)

and

$$\begin{aligned} \int _{\varOmega \in \mathcal {S}_{\text {B}}^2} \text {d}\varOmega = \int _{\phi = \pi }^{2\pi } \int _{\theta = 0}^{\pi } \sin \theta \text {d}\theta \text {d}\phi . \end{aligned}$$
(6.20)

6.2.3 White Noise Gain

White noise gain (WNG) is a measure of a beamformer’s robustness against sensor noise and errors in microphone placement and steering direction [10], and is defined as the array gain in the presence of spatially incoherent noise [28], i.e., the ratio of the signal-to-noise ratio (SNR) at the beamformer output (\(\text {oSNR}\)) to the SNR at the beamformer input (\(\text {iSNR}\) ) .

We now derive the WNG for a spherical microphone array employing a set of microphones uniformly distributed on the sphere. The desired signal power is different at each microphone, particularly for a rigid sphere where the scattering effects depend on the angle of incidence [16]. When calculating the \(\text {iSNR}\), the desired signal power is therefore averaged over the sphere .

Let us assume that the noise at each microphone has equal power \(\sigma _{v}^2(k)\). The input SNR is then given by

$$\begin{aligned} \text {iSNR}_{\text {w}}(k)&= \frac{\frac{1}{4 \pi } \int _{\varOmega \in \mathcal {S}^2} \left| X(k,\mathbf {r}) \right| ^2 \text {d}\varOmega }{\sigma _{v}^2(k)} \end{aligned}$$
(6.21a)
$$\begin{aligned}&= \frac{\frac{1}{4 \pi } \int _{\varOmega \in \mathcal {S}^2} \left| \sum _{l=0}^{\infty } \sum _{m=-l}^l X_{lm}(k) Y_{lm}(\varOmega ) \right| ^2 \text {d}\varOmega }{\sigma _{v}^2(k)}, \end{aligned}$$
(6.21b)

where (6.21b) is obtained using the spherical harmonic decomposition of \(X(k,\mathbf {r})\). Assuming plane-wave incidence from a direction \(\varOmega _{\text {s}}\), by substituting (6.4) into (6.21), we find

$$\begin{aligned} \text {iSNR}_{\text {w}}(k)&= \frac{\int _{\varOmega \in \mathcal {S}^2} \left| \sum _{l=0}^{\infty } \sum _{m=-l}^l {\sqrt{P_{\text {pw}}(k)}} b_l(k) Y_{lm}^*(\varOmega _{\text {s}}) Y_{lm}(\varOmega ) \right| ^2 \text {d}\varOmega }{4 \pi \sigma _{v}^2(k)}. \end{aligned}$$
(6.22)

Using Unsöld’s theorem [29], a special case of the spherical harmonic addition theorem (2.23), and the orthonormality of the spherical harmonics, we simplify (6.22) to

$$\begin{aligned} \text {iSNR}_{\text {w}}(k)&= \frac{ \sum _{l=0}^{\infty } \sum _{m=-l}^l \left| {\sqrt{P_{\text {pw}}(k)}} b_l(k) Y_{lm}^*(\varOmega _{\text {s}}) \right| ^2}{4 \pi \sigma _{v}^2(k)} \end{aligned}$$
(6.23a)
$$\begin{aligned}&= \frac{{P_{\text {pw}}(k)} \sum _{l=0}^{\infty } \left| b_l(k)\right| ^2 (2l+1)}{(4 \pi )^2 \sigma _{v}^2(k)}. \end{aligned}$$
(6.23b)

The input SNR is therefore a function of the plane wave power \(P_{\text {pw}}(k)\), the array properties, via the mode strength \(b_l(k)\), and the noise power \(\sigma _{v}^2(k)\).

The output SNR is given by

$$\begin{aligned} \text {oSNR}_{\text {w}}(k)&= \frac{\left| \sum _{l=0}^{L} \sum _{m=-l}^l W_{lm}^{*}(k) X_{lm}(k) \right| ^2}{\text {E} \left\{ \left| \sum _{l=0}^{L} \sum _{m=-l}^l W_{lm}^{*}(k) V_{lm}(k) \right| ^2 \right\} }. \end{aligned}$$
(6.24)

Applying the distortionless constraint (6.5), this reduces to

$$\begin{aligned} \text {oSNR}_{\text {w}}(k)&= \frac{{P_{\text {pw}}(k)}}{\text {E} \left\{ \left| \sum _{l=0}^{L} \sum _{m=-l}^l W_{lm}^{*}(k) V_{lm}(k) \right| ^2 \right\} }. \end{aligned}$$
(6.25)

With Q microphones uniformly distributed on the sphere, the cross power spectral density of the noise is given by [31, Eq. 7.31]

$$\begin{aligned} \text {E} \left\{ V_{lm}(k) V^{*}_{l'm'}(k) \right\} = \sigma _{v}^2(k) \frac{4 \pi }{Q} \delta _{l,l'} \delta _{m,m'}, \end{aligned}$$
(6.26)

where \(\delta \) denotes the Kronecker delta, and \(\text {oSNR}\) simplifies to

$$\begin{aligned} \text {oSNR}_{\text {w}}(k)&= {P_{\text {pw}}(k)} \left( \frac{4 \pi }{Q} \sigma _{v}^2(k) \sum _{l=0}^{L} \sum _{m=-l}^l \left| W_{lm}^{*}(k)\right| ^2\right) ^{-1}. \end{aligned}$$
(6.27)

The output SNR is a function of the beamformer weights \(W_{lm}(k)\), the plane wave power \(P_{\text {pw}}(k)\), the noise power \(\sigma _{v}^2(k)\), and the beamformer order L. The beamformer order can be increased by adding microphones, as discussed in Sect. 3.4.

Finally, the WNG can be expressed as

$$\begin{aligned} \text {WNG}(k)&= \frac{\text {oSNR}_{\text {w}}(k)}{\text {iSNR}_{\text {w}}(k)} \end{aligned}$$
(6.28a)
$$\begin{aligned}&= \frac{ 4 \pi Q}{\left| \left| \mathbf {w}(k) \right| \right| ^2 \sum _{l=0}^{\infty } \left| b_l(k)\right| ^2 (2l+1)}. \end{aligned}$$
(6.28b)

The WNG is a function of the beamformer weights \(W_{lm}(k)\), array order L and the array properties. As expected, it is also an increasing function of the number of microphones Q. In the case of an open sphere, \(b_l(k) = i^l j_l(kr)\), and since \(\sum _{l=0}^{\infty } \left| j_l(kr)\right| ^2 (2l+1) = 1\) [2, 13], the WNG is given by the simple expression

$$\begin{aligned} \text {WNG}(k)&= \frac{ 4 \pi Q}{\left| \left| \mathbf {w}(k) \right| \right| ^2}. \end{aligned}$$
(6.29)
Fig. 6.2
figure 2

Illustrative example of the magnitude of a spatial response \(\mathcal {B}(k,\varTheta )\) as a function of the angle \(\varTheta \) between the steering direction and DOA

6.2.4 Spatial Response

The output of the beamformer in the presence of a single unit amplitude plane wave originating from a DOA \(\varOmega \) is given by

$$\begin{aligned} \mathcal {B}(k,\varOmega ) = \mathbf {w}^{\text {H}}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega ), \end{aligned}$$
(6.30)

and is known as the spatial response of the beamformer. The square magnitude of the spatial response \(\mathcal {B}(k,\varOmega )\) is referred to as the beam pattern [4].Footnote 5 The beam pattern describes the beamformer’s ability to select signals originating from a direction of interest, while suppressing signals that do not. Beam patterns typically exhibit multiple peaks or lobes; the largest lobe, in the direction of interest, is referred to as the main lobe, while the other lobes are referred to as sidelobes. Due to the effects of spatial aliasing, some sidelobes may have an amplitude equal to that of the main lobe, and they are then referred to as grating lobes [27].

Due to the spherical symmetry of the SHD, the beam pattern can also be expressed as a function of the angle between the DOA \(\varOmega \) and the beamformer’s steering direction \(\varOmega _{\text {u}}\), denoted as \(\varTheta \). Ideally, the response in the steering direction, \(\mathcal {B}(k,\varTheta = 0)\), should be as large as possible compared to the response in other directions, i.e., the sidelobe levels should be minimized. We refer to the width of the region that has a higher response than the maximum sidelobe level as the main lobe width,Footnote 6 as illustrated in Fig. 6.2 .

6.3 Signal-Independent Beamformers

Having established our signal model in Sect. 6.1, we now develop a number of signal-independent beamformers based on the design criteria introduced in Sect. 6.2. The beam patterns of all the beamformers presented in this section are rotationally symmetric about the steering direction.

6.3.1 Farfield Beamformers

In this section, we derive three beamformers suitable for use in farfield conditions: a maximum directivity beamformer, a maximum WNG beamformer, and a multiply constrained beamformer.

6.3.1.1 Maximum Directivity Beamformer

The beamformer that maximizes the directivity while imposing a distortionless constraint in the steering direction satisfies

$$\begin{aligned} \max _{ \mathbf {w}(k) } \,\mathcal {D}(k) \quad&\text {subject to} \quad \mathbf {w}^{\text {H}}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}}) = 1, \end{aligned}$$

or equivalently,

$$\begin{aligned} \min _{ \mathbf {w}(k) } \,\left| \left| \mathbf {B}(k) \mathbf {w}^{*}(k) \right| \right| ^2 \quad&\text {subject to} \quad \mathbf {w}^{\text {H}}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}}) = 1, \end{aligned}$$

where \(\mathbf {y}(\varOmega _{\text {u}})\) is the vector of spherical harmonics defined in (6.9).

Following the approach proposed by Brandwood [5], if we use a Lagrange multiplier to adjoin the constraint to the cost function, the weights of the maximum directivity beamformer are then given by

$$\begin{aligned} \mathbf {w}_{\mathrm {maxDI}}(k)&= \underset{\mathbf {w}(k)}{\arg \min } \, \mathcal {L}(\mathbf {w}(k), \lambda ), \end{aligned}$$
(6.31)

where \(\mathcal {L}\) is the complex Lagrangian given by

$$\begin{aligned} \mathcal {L}(\mathbf {w}(k), \lambda )&= \left[ \mathbf {B}(k) \mathbf {w}^{*}(k) \right] ^{\text {H}} \left[ \mathbf {B}(k) \mathbf {w}^{*}(k) \right] \nonumber \\&\quad + \lambda \left( \mathbf {w}^{\text {H}}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}}) - 1 \right) + \lambda ^{*} \left( \mathbf {y}^{\text {T}}(\varOmega _{\text {u}}) \mathbf {B}^{\text {*}}(k) \mathbf {w}(k) - 1 \right) \quad \end{aligned}$$
(6.32)

and \(\lambda \) is the Lagrange multiplier. Setting the gradient of \(\mathcal {L}(\mathbf {w}_{\mathrm {maxDI}}(k), \lambda )\) with respect to \(\mathbf {w}^{*}_{\mathrm {maxDI}}\) to zero yields

$$\begin{aligned} \nabla _{\mathbf {w}^{*}_{\mathrm {maxDI}}} \mathcal {L}(\mathbf {w}_{\mathrm {maxDI}}(k), \lambda )&= \mathbf {0}_N\nonumber \\ \mathbf {B}(k) \mathbf {B}^{*}(k) \mathbf {w}_{\mathrm {maxDI}}(k) + \lambda \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}})&= \mathbf {0}_N, \end{aligned}$$
(6.33)

where \(\mathbf {0}_N\) is a column vector of N zeros. Using the constraint in (6.31), we then find

$$\begin{aligned} \mathbf {w}_{\mathrm {maxDI}}(k) = \frac{\left[ \mathbf {B}^{*}(k)\right] ^{-1} \mathbf {y}^{*}(\varOmega _{\text {u}})}{\left| \left| \mathbf {y}(\varOmega _{\text {u}}) \right| \right| ^2}. \end{aligned}$$
(6.34)

Using Unsöld’s theorem [29], this simplifies to

$$\begin{aligned} \mathbf {w}_{\mathrm {maxDI}}(k) = \frac{4 \pi }{(L+1)^2} \left[ \mathbf {B}^{*}(k)\right] ^{-1} \mathbf {y}^{*}(\varOmega _{\text {u}}), \end{aligned}$$
(6.35)

or in scalar form

$$\begin{aligned} W_{lm}^{\mathrm {maxDI}}(k) = \frac{4 \pi }{(L+1)^2} \frac{Y_{lm}^{*}(\varOmega _{\text {u}})}{b_l^{*}(k)}. \end{aligned}$$
(6.36)

A well-known farfield SHD beamformer is the plane-wave decomposition (PWD) beamformer, also sometimes known as a regular beamformer [24], whose weights are given by [22]

$$\begin{aligned} \mathbf {w}_{\mathrm {PWD}}(k) = \left[ \mathbf {B}^{*}(k)\right] ^{-1} \mathbf {y}^{*}(\varOmega _{\text {u}}). \end{aligned}$$
(6.37)

As the (frequency-independent) scaling factor does not affect the directivity, the PWD beamformer is also a maximum directivity beamformer. The reason for the name PWD will become clear in the next paragraph.

Assuming a single unit amplitude plane wave is incident upon the array from a direction \(\varOmega _{\text {s}}\), the output Z(k) of the PWD beamformer is given by

$$\begin{aligned} Z(k)&= \mathbf {w}^{\text {H}}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {s}}) \end{aligned}$$
(6.38a)
$$\begin{aligned}&= \mathbf {y}^{\text {T}}(\varOmega _{\text {u}}) \mathbf {B}^{-1}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {s}}) \end{aligned}$$
(6.38b)
(6.38c)
(6.38d)

where \(\varTheta \) is the angle between \(\varOmega _{\text {s}}\) and \(\varOmega _{\text {u}}\) and is the Legendre polynomial of order L. The Christoffel summation formula [11, Sect. 8.915] is used to obtain (6.38d) [20]. The beamformer output Z(k) reaches its maximum when \(\varTheta = 0\), such that the steering direction \(\varOmega _{\text {u}}\) is equal to the arrival direction \(\varOmega _{\text {s}}\), as desired. We normalize the beamformer output with respect to its value for \(\varTheta = 0\), and plot it as a function of \(\varTheta \) in Fig. 6.3. We see that as L increases, the distribution of Z(k) narrows around \(\varTheta = 0\), tending towards a delta function for \(L \rightarrow \infty \) [31, Eq. 6.47].

Fig. 6.3
figure 3

Normalized beamformer output as a function of the beamformer order L and \(\varTheta \), the angle between the beamformer steering direction and the DOA

The directivity of the maximum directivity beamformer is given by substituting (6.35) into (6.16)Footnote 7

$$\begin{aligned} \mathcal {D}(k)&= 4 \pi \left| \left| \frac{4 \pi }{(L+1)^2} \mathbf {B}(k) \mathbf {B}^{-1}(k) \mathbf {y}(\varOmega _{\text {u}}) \right| \right| ^{-2} \end{aligned}$$
(6.39a)
$$\begin{aligned}&= \frac{(L+1)^4}{4 \pi } \left| \left| \mathbf {y}(\varOmega _{\text {u}}) \right| \right| ^{-2} \end{aligned}$$
(6.39b)
$$\begin{aligned}&= (L+1)^2. \end{aligned}$$
(6.39c)

The directivity of the maximum directivity beamformer is therefore frequency-independent and only depends on the beamformer order L.

Since at least \((L+1)^2\) microphones are required to sample a sound field up to order L without spatial aliasing, the directivity is upper bounded by the number of microphones Q. This is also the maximum directivity of a spatial domain beamformer based on a standard linear array [28, Eq. 2.160].

Fig. 6.4
figure 4

WNG of the maximum directivity and maximum WNG beamformers of order \(L = 4\) as a function of kr, for open and rigid arrays

The WNG of the maximum directivity beamformer is given by substituting (6.35) into (6.28)

$$\begin{aligned} \text {WNG}(k)&= \frac{Q (L+1)^4}{4 \pi \sum _{l=0}^{L} \sum _{m=-l}^l \left| \frac{Y_{lm}(\varOmega _{\text {u}})}{b_l(k)}\right| ^2 \sum _{l=0}^{\infty } \left| b_l(k)\right| ^2 (2l+1)} \end{aligned}$$
(6.40a)
$$\begin{aligned}&= \frac{Q (L+1)^4}{\sum _{l=0}^{L} \left| b_l(k)\right| ^{-2} (2l+1) \sum _{l=0}^{\infty } \left| b_l(k)\right| ^2 (2l+1)}. \end{aligned}$$
(6.40b)

In the open sphere case, this simplifies toFootnote 8

$$\begin{aligned} \text {WNG}(k)&= \frac{Q (L+1)^4}{\sum _{l=0}^{L} \left| b_l(k)\right| ^{-2} (2l+1)}, \end{aligned}$$
(6.41)

or in matrix form

$$\begin{aligned} \text {WNG}(k)&= Q (L+1)^4 \left| \left| \mathbf {B}^{-1}(k) \right| \right| ^{-2}. \end{aligned}$$
(6.42)

In Fig. 6.4, we plot the WNG of the maximum directivity beamformer of order \(L = 4\) as a function of the product of the wavenumber k and array radius r, kr, for an array of \(Q = 32\) microphones. Assuming a speed of sound of 343 \(\text {m}\cdot \text {s}^{-1}\), a kr value of 1 corresponds to a frequency of 1.1 kHz for an array radius of \(r = 10\) cm, for example. It can be seen that the beamformer’s WNG is low except at high frequencies or large array radii. When an open sphere is used, the maximum directivity beamformer has particularly poor robustness at certain values of kr; this is due to the presence of zeros in the open sphere mode strength (see Sect. 3.4.2). The rigid sphere does not present this issue, and in addition provides an increase in WNG of approximately 3.7 dB over the open sphere at low values of kr .

6.3.1.2 Maximum White Noise Gain Beamformer

The beamformer that maximizes the WNG while imposing a distortionless constraint in the steering direction satisfies

$$\begin{aligned} \max _{ \mathbf {w}(k) } \,\text {WNG}(k) \quad&\text {subject to} \quad \mathbf {w}^{\text {H}}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}}) = 1, \end{aligned}$$

or equivalently,

$$\begin{aligned} \min _{ \mathbf {w}(k) } \,\left| \left| \mathbf {w}(k) \right| \right| ^2 \quad&\text {subject to} \quad \mathbf {w}^{\text {H}}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}}) = 1. \end{aligned}$$

Proceeding in a similar way as for the analysis of the maximum directivity beamformer, if we use a Lagrange multiplier to adjoin the constraint to the cost function, the weights of the maximum directivity beamformer are then given by

$$\begin{aligned} \mathbf {w}_{\mathrm {maxWNG}}(k)&= \underset{\mathbf {w}(k)}{\arg \min } \, \mathcal {L}(\mathbf {w}(k), \lambda ), \end{aligned}$$
(6.43)

where \(\mathcal {L}\) is the complex Lagrangian given by

$$\begin{aligned} \mathcal {L}(\mathbf {w}(k), \lambda )&= \left[ \mathbf {w}(k) \right] ^{\text {H}} \left[ \mathbf {w}(k) \right] \nonumber \\&\quad + \lambda \left( \mathbf {w}^{\text {H}}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}}) - 1 \right) + \lambda ^{*} \left( \mathbf {y}^{\text {T}}(\varOmega _{\text {u}}) \mathbf {B}^{\text {*}}(k) \mathbf {w}(k) - 1 \right) \quad \end{aligned}$$
(6.44)

and \(\lambda \) is the Lagrange multiplier. Setting the gradient of \(\mathcal {L}(\mathbf {w}_{\mathrm {maxWNG}}(k), \lambda )\) with respect to \(\mathbf {w}^{*}_{\mathrm {maxWNG}}\) to zero yields

$$\begin{aligned} \nabla _{\mathbf {w}^{*}_{\mathrm {maxWNG}}} \mathcal {L}(\mathbf {w}_{\mathrm {maxWNG}}(k), \lambda )&= \mathbf {0}_N\nonumber \\ \mathbf {w}_{\mathrm {maxWNG}}(k) + \lambda \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}})&= \mathbf {0}_N, \end{aligned}$$
(6.45)

where \(\mathbf {0}_N\) is a column vector of N zeros. Using the constraint in (6.43), we then find

$$\begin{aligned} \mathbf {w}_{\mathrm {maxWNG}}(k) = \frac{\mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}})}{\mathbf {y}^{\text {T}}(\varOmega _{\text {u}}) \mathbf {B}^{*}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}})}. \end{aligned}$$
(6.46)

Using Unsöld’s theorem [29], this simplifies to

$$\begin{aligned} \mathbf {w}_{\mathrm {maxWNG}}(k) = 4 \pi \frac{\mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}})}{\left| \left| \mathbf {B}(k) \right| \right| ^2}, \end{aligned}$$
(6.47)

or in scalar form

$$\begin{aligned} W_{lm}^{\mathrm {maxWNG}}(k) = 4 \pi \frac{Y_{lm}^{*}(\varOmega _{\text {u}}) b_l(k)}{\sum _{l=0}^{L} |b_l(k)|^2 (2l+1)}. \end{aligned}$$
(6.48)

A well-known farfield SHD beamformer is the delay-and-sum beamformer, whose weights are given by [22]

$$\begin{aligned} \mathbf {w}_{\mathrm {DSB}}(k) = \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}}). \end{aligned}$$
(6.49)

In the case of an open sphere, \(b_l(k) = i^l j_l(kr)\), and since \(\sum _{l=0}^{\infty } \left| j_l(kr)\right| ^2 (2l+1) = 1\) [2], the following relationship between the maximum WNG and delay-and-sum beamformers is obtained:

$$\begin{aligned} \lim _{L \rightarrow \infty } \mathbf {w}_{\mathrm {maxWNG}}(k) = 4 \pi \, \mathbf {w}_{\mathrm {DSB}}(k). \end{aligned}$$
(6.50)

When an open sphere is used, the delay-and-sum beamformer therefore approaches a maximum WNG beamformer as \(L \rightarrow \infty \) (ignoring the \(4 \pi \) scaling factor, which does not affect the WNG). For a finite L and/or if another microphone type or array configuration is used (such as a rigid sphere), the delay-and-sum beamformer is slightly suboptimal.

The delay-and-sum beamformer owes its name to the fact that for an open sphere as \(L \rightarrow \infty \), its output converges to the output of the widely known spatial domain delay-and-sum beamformer [22].

The directivity of the maximum WNG beamformer is given by substituting (6.47) into (6.16)

$$\begin{aligned} \mathcal {D}(k)&= 4 \pi \left| \left| \frac{4 \pi }{\left| \left| \mathbf {B}(k) \right| \right| ^2} \mathbf {B}(k) \mathbf {B}^{*}(k) \mathbf {y}(\varOmega _{\text {u}}) \right| \right| ^{-2} \end{aligned}$$
(6.51a)
$$\begin{aligned}&= \frac{4 \pi }{(4 \pi )^2} \left| \left| \mathbf {B}(k) \right| \right| ^4 \left| \left| \mathbf {B}(k) \mathbf {B}^{*}(k) \mathbf {y}(\varOmega _{\text {u}}) \right| \right| ^{-2} \end{aligned}$$
(6.51b)
$$\begin{aligned}&= \left| \left| \mathbf {B}(k) \right| \right| ^4 \left| \left| \mathbf {B}(k) \mathbf {B}^{*}(k) \right| \right| ^{-2}. \end{aligned}$$
(6.51c)

The WNG of the maximum WNG beamformer is given by substituting (6.47) into (6.28)

$$\begin{aligned} \text {WNG}(k)&= \frac{4 \pi Q \left| \left| \mathbf {B}(k) \right| \right| ^4}{(4 \pi )^2 \left| \left| \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}}) \right| \right| ^2 \sum _{l=0}^{\infty } \left| b_l(k)\right| ^2 (2l+1)} \end{aligned}$$
(6.52a)

Using Unsöld’s theorem [29], this simplifies to

$$\begin{aligned} \text {WNG}(k)&= \frac{Q \left| \left| \mathbf {B}(k) \right| \right| ^2}{\sum _{l=0}^{\infty } \left| b_l(k)\right| ^2 (2l+1)} \end{aligned}$$
(6.53)

In the open sphere case, the WNG approaches Q as \(L \rightarrow \infty \) (as in [22]), so it can be seen that the maximum WNG beamformer achieves a constant WNG of Q that is independent of frequency. This is also the highest achievable WNG for a distortionless beamformer in the spatial domain [28].

Fig. 6.5
figure 5

Directivity of the maximum directivity and maximum WNG beamformers of order \(L = 4\) as a function of kr

In Fig. 6.5, we plot the DI of the maximum directivity and maximum WNG beamformers of order \(L = 4\) as a function of kr for an array of \(Q = 32\) microphones. As expected, the maximum directivity beamformer provides the highest directivity; while the maximum WNG beamformer has poor directivity at low values of kr (i.e., low frequencies or small array radii). Due to the effects of scattering introduced by the rigid sphere (see Sect. 3.4.1), the maximum WNG beamformer has better directivity with a rigid array than with an open array. The directivity of the maximum directivity beamformer is independent of kr, while for the maximum WNG beamformer the directivity decays as kr decreases, tending towards 0 dB (i.e., no directivity) .

The WNG of the maximum WNG beamformer of order \(L = 4\) is shown in Fig. 6.4; as expected, it provides the highest WNG. Using Figs. 6.4 and 6.5, it can be observed that there is a tradeoff between WNG and directivity. The maximum directivity and WNG beamformers provide performance bounds for SHD beamformers in terms of directivity and WNG, and are attractive due to their low computational complexity. However, in practice a compromise solution is desirable, such as the multiply constrained beamformer presented in Sect. 6.3.1.3, or the signal-dependent beamformers in Chap. 7, which adaptively control the tradeoff between these two objectives depending on the nature of the noise to be suppressed.

6.3.1.3 Multiply Constrained Beamformer

Another approach to the design of a signal-independent beamformer is to minimize its sidelobe levels for a given main lobe width, to ensure that interfering signals that do not originate from the steering direction are effectively suppressed. However, in order to obtain a beamformer that is robust to errors in sensor position and steering direction, and to sensor noise, it is desirable to introduce a constraint on the beamformer’s WNG.

In [25], the authors propose a robust minimum sidelobe beamformer, which minimizes the maximum sidelobe level, subject to a distortionless constraint in the steering direction and a minimum WNG constraint. The objective can therefore be expressed in the form of a minimax criterion as

$$\begin{aligned} \min _{ \mathbf {w}(k) }&\quad \max _{\varTheta > \varDelta /2} { \left| \mathcal {B}(k,\varTheta )\right| } \quad \text {subject to}\nonumber \\&\mathbf {w}^{\text {H}}(k) \mathbf {B}(k) \mathbf {y}^{*}(\varOmega _{\text {u}}) = 1, \quad \text {WNG}(k) \ge \zeta (k), \end{aligned}$$
(6.54)

where \(\mathcal {B}(k,\varTheta )\) is the spatial response of the beamformer, \(\varTheta \) denotes the angle between the steering direction and the DOA, \(\varDelta \) denotes the main lobe width (as defined in Sect. 6.2.4), and \(\zeta \) is the minimum WNG. The sidelobe region is defined as \(\varTheta _{\text {SL}} = \left\{ \varTheta | \varTheta > \varDelta /2 \right\} \).

As shown in [25], the problem in (6.54) can be reformulated as a convex optimization problem, solvable using second-order cone programming. The sidelobe region is approximated using a finite grid \(\varTheta _{n_\text {g}} \in \varTheta _{\text {SL}}, n_\text {g} \in \{ 1, \ldots , N_\text {g}\}\); the approximation then improves as \(N_\text {g}\) increases.

Finding a solution to (6.54) can be computationally intensive. However, a significant advantage of SHD beamforming is that if the desired beam pattern is rotationally symmetric about the steering direction \(\varOmega _{\text {u}}\), the process of computing the beamformer weights and steering of the beamformer can be decoupled. In this case, the beamformer weights are expressed as \(W_{lm}(k) = C_{l}(k) Y_{lm}^{*}(\varOmega _{\text {u}})\), and the weights \(C_l(k)\) then become the quantities to be optimized. If the desired beam pattern is not rotationally symmetric about the steering direction, the beam pattern can be rotated by multiplying the SHD beamformer weights by Wigner-D functions that depend on the rotation angles, as proposed in [23 ] .

6.3.2 Nearfield Beamformers

In this chapter, we have until now assumed that the desired signal was due to a single plane wave, i.e., farfield conditions. However, under nearfield conditions, the plane wave assumptions cannot be considered valid. The SHD sound pressure due to a spherical wave originating from a source at a position \(\mathbf {r}_{\text {s}} = (r_{\text {s}},\varOmega _{\text {s}})\) is given by

$$\begin{aligned} X_{lm}(k,\mathbf {r}_{\text {s}})&= X_{\text {sw}}(k) b_l^{\text {nf}}(k,r_{\text {s}}) Y_{lm}^*(\varOmega _{\text {s}}), \end{aligned}$$
(6.55)

where \(X_{\text {sw}}(k)\) denotes the spherical wave amplitude and the nearfield mode strength \(b_l^{\text {nf}}(k,r_{\text {s}})\) is given by

$$\begin{aligned} b_l^{\text {nf}}(k,r_{\text {s}}) = -i k i^{-l} h_l^{(2)}(kr_{\text {s}}) b_l(k), \end{aligned}$$
(6.56)

and \(h_l^{(2)}\) is the spherical Hankel function of the second kind and of order l.

Beamformers suitable for nearfield conditions [8, 9, 19] can be designed by replacing the farfield mode strength expression \(b_l(k)\) with the nearfield mode strength \(b_l^{\text {nf}}(k,r_{\text {s}})\) in the beamformer weights. For example, the weights of a nearfield plane-wave decomposition beamformer are given by

$$\begin{aligned} W_{lm}^{\text {PWD,nf}}(k) = \frac{Y_{lm}^{*}(\varOmega _{\text {u}})}{\left[ b_l^{\text {nf}}(k,r_{\text {s}})\right] ^{*}}, \end{aligned}$$
(6.57)

instead of (6.37). While this process is straightforward, it does require knowledge of the source-array distance \(r_{\text {s}}\). If the source-array knowledge is not known, the source-array distance \(r_{\text {s}}\) becomes a controllable parameter, which is effectively a look distance and enables radial discrimination [9].

An appropriate boundary between the farfield and nearfield regions can be determined by comparing the magnitudes of the farfield mode strength \(b_l(k)\) and the nearfield mode strength \(b_l^{\text {nf}}(k,r_{\text {s}})\), as proposed in [8]. Using this criterion, the cut-off distance \(r_{\text {nf}}\) is determined as

$$\begin{aligned} r_{\text {nf}}(k) = \frac{L}{k}. \end{aligned}$$
(6.58)

The extent of the nearfield region therefore decreases with frequency. An array with good radial discrimination, i.e., a large nearfield region, can be realized either at low frequencies (small k), or by oversampling the array (large N) [9].

Example: At a frequency of 100 Hz, assuming a speed of sound of 343 \(\text {m}\cdot \text {s}^{-1}\) and an array order \(L = 4\), the cut-off distance is \(r_{\text {nf}}(k) = 2.2\) m, while at a frequency of 4 kHz it is 5.5 cm.

6.4 Chapter Summary

An overview of beamforming in the SHD using signal-independent beamformers has been presented. We introduced a number of performance measures, which were then used to derive beamformers weights that are optimal with respect to these measures. We also showed the relationship between these optimal beamformers and two well-known SHD beamformers: the PWD and delay-and-sum beamformers. Finally, where similarities existed, the performance bounds for SHD beamformers were related to previously derived bounds for spatial domain beamformers.