Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The general principle of parametric array processing is to employ an efficient parametric representation of the sound field including typically one or a few reference signals, and a small number of associated parameters. The advantage of such an approach is that the number of parameters is significantly lower than in classical array processing (see Chap. 7). A block diagram of a parametric array processing approach is shown in Fig. 8.1.

Examples of parametric representations of the sound field include Directional Audio Coding (DirAC) [11], High Angular Resolution Planewave Expansion (HARPEX) [1] and computational auditory scene analysis (CASA) [4]. These representations can be used for spatial audio recording, coding and reproduction; source separation, noise reduction and dereverberation; and acoustic scene analysis and source localization. In this chapter, we will focus on parametric approaches to signal enhancement using the DirAC representation .

The DirAC representation is based on two features that are relevant to the perception of spatial sound: the direction of arrival (DOA) and the diffuseness. Providing these features are accurately reproduced, this representation ensures that the interaural time differences (ITDs), interaural level differences (ILDs), and the interaural coherence are correctly perceived [16]. The advantage of integrating DirAC with a signal enhancement process is that any interference sources can be reproduced at their original position [9] relative to the desired source, in addition to being attenuated, thereby maintaining the naturalness of the listening experience but with increased speech quality and intelligibility .

In this chapter, we first introduce a parametric model of the sound field. We then review the parameters that describe this sound field and how they can be estimated, and present filters that can be used to separate the two components of the sound field. Finally, we explore two applications of parametric array processing, namely, directional filtering and dereverberation.

Fig. 8.1
figure 1

Block diagram of a parametric array processing approach. In the analysis stage, a reference signal is computed and a number of parameters are estimated. The reference signal and estimated parameters are transmitted or stored. In the enhancement stage, a single-channel filter or time-frequency mask is applied to the reference signal, optionally based on the estimated parameters, to yield a processed output signal

8.1 Signal Model

In the short-time Fourier transform (STFT) domain , the sound pressure S at a position \(\mathbf {r}\) can be decomposed into a direct sound component \(S_{\text {dir}}\) and a diffuse sound component \(S_{\text {diff}}\), such that

$$\begin{aligned} S(\ell ,\nu ,\mathbf {r})&= S_{\text {dir}}(\ell ,\nu ,\mathbf {r}) + S_{\text {diff}}(\ell ,\nu ,\mathbf {r}), \end{aligned}$$
(8.1)

where \(\ell \) denotes the discrete time index and \(\nu \) denotes the discrete frequency index. The sound pressure signal X measured by Q microphones at positions \(\mathbf {r}_q, q\in \left\{ 1, \dots , Q\right\} \) is then given by

$$\begin{aligned} X(\ell ,\nu ,\mathbf {r}_q)&= S(\ell ,\nu ,\mathbf {r}_q) + V(\ell ,\nu ,\mathbf {r}_q)\end{aligned}$$
(8.2a)
$$\begin{aligned}&= S_{\text {dir}}(\ell ,\nu ,\mathbf {r}_q) + S_{\text {diff}}(\ell ,\nu ,\mathbf {r}_q) + V(\ell ,\nu ,\mathbf {r}_q), \end{aligned}$$
(8.2b)

where V denotes a sensor noise signal.

We assume that the directional signal \(S_{\text {dir}}\) is sparse in the time-frequency domain [12], such that in each time-frequency bin the directional signal is due to a single plane wave. The diffuse signal is due to a theoretically infinite number of independent plane waves with random phases, equal amplitudes and uniformly distributed DOAs [10]. We also assume that all three signals are mutually uncorrelated, that is,

$$\begin{aligned} \text {E} \left\{ S_{\text {dir}}(\ell ,\nu ,\mathbf {r}_q) S_{\text {diff}}^{*}(\ell ,\nu ,\mathbf {r}_q) \right\}&= 0\end{aligned}$$
(8.3)
$$\begin{aligned} \text {E} \left\{ S_{\text {dir}}(\ell ,\nu ,\mathbf {r}_q) V^{*}(\ell ,\nu ,\mathbf {r}_q) \right\}&= 0, \end{aligned}$$
(8.4)

where \(\mathrm {E} \left\{ \cdot \right\} \) denotes mathematical expectation, which can be computed using temporal averaging.

In order to obtain the reference signal as indicated in Fig. 8.1, we must transform the spatial domain signals to the spherical harmonic domain (SHD). In this chapter, we assume error-free spatial sampling, and refer the reader to Chap. 3 for information on spatial sampling and aliasing. By applying the complex spherical harmonic transform (SHT) to the signal model in (8.2), we obtain the SHD signal model

$$\begin{aligned} X_{lm}(\ell ,\nu )&= S_{lm}(\ell ,\nu ) + V_{lm}(\ell ,\nu )\end{aligned}$$
(8.5a)
$$\begin{aligned}&= S^{\text {dir}}_{lm}(\ell ,\nu ) + S^{\text {diff}}_{lm}(\ell ,\nu ) + V_{lm}(\ell ,\nu ), \end{aligned}$$
(8.5b)

where \(X_{lm}(\ell ,\nu )\), \(S_{lm}(\ell ,\nu )\), \(S^{\text {dir}}_{lm}(\ell ,\nu )\), \(S^{\text {diff}}_{lm}(\ell ,\nu )\) and \(V_{lm}(\ell ,\nu )\) are respectively the spherical harmonic transforms of the signals \(X(\ell ,\nu ,\mathbf {r}_q)\), \(S(\ell ,\nu ,\mathbf {r}_q)\), \(S_{\text {dir}}(\ell ,\nu ,\mathbf {r}_q)\), \(S_{\text {diff}}(\ell ,\nu ,\mathbf {r}_q)\) and \(V(\ell ,\nu ,\mathbf {r}_q)\), as defined in (3.6), and are referred to as eigenbeams to reflect the fact that the spherical harmonics are eigensolutions of the wave equation in spherical coordinates [14]. The order and degree of the spherical harmonics are respectively denoted as l and m.

We choose as a reference the signal that would be measured by an omnidirectional microphone \(\mathcal {M}_{\text {ref}}\) placed at the centre of the spherical array, if the array were not present. As shown in the Appendix of Chap. 5, the sound pressure \(\widetilde{X}(\ell ,\nu )\) at this microphone can be obtained from the zero-order eigenbeam \(X_{00}(\ell ,\nu )\) as

$$\begin{aligned} \widetilde{X}(\ell ,\nu )&= \frac{X_{00}(\ell ,\nu )}{\sqrt{4 \pi } B_0(\nu )}\end{aligned}$$
(8.6a)
$$\begin{aligned}&= \widetilde{S}(\ell ,\nu ) + \widetilde{V}(\ell ,\nu )\end{aligned}$$
(8.6b)
$$\begin{aligned}&= \widetilde{S}_{\text {dir}}(\ell ,\nu ) + \widetilde{S}_{\text {diff}}(\ell ,\nu ) + \widetilde{V}(\ell ,\nu ), \end{aligned}$$
(8.6c)

where the frequency-dependent mode strength \(B_l(\nu )\) for spherical harmonic order l, given by evaluating the wavenumber-dependent mode strength \(b_l(k)\) at discrete values of the wavenumber k, captures the dependence of the \(l^\text {th}\) order eigenbeams on the array properties, and is discussed in Sect. 3.4.2. By dividing the eigenbeam \(X_{00}(\ell ,\nu )\) by the mode strength, we remove this dependence, such that the reference signal is independent of the array properties. As noted in Sect. 7.2.2, assuming the array’s Q microphones are uniformly distributed on the sphere, the power of the sensor noise V is \(Q \left| B_0(\nu ) \right| ^2\) times smaller at \(\mathcal {M}_{\text {ref}}\) than at the individual microphones on the surface of the sphere .

The directional signal \(S_{lm}^{\text {dir}}\) due to a plane wave incident from a direction \(\varOmega _{\text {dir}}\) is given by

$$\begin{aligned} S_{lm}^{\text {dir}}(\ell ,\nu )&= \sqrt{P_{\text {dir}}(\ell ,\nu )} \varphi _{\text {dir}}(\ell ,\nu ) 4 \pi B_l(\nu ) Y_{lm}^{*}\left[ \varOmega _{\text {dir}}(\ell ,\nu )\right] , \end{aligned}$$
(8.7)

where \(\varphi _{\text {dir}}(\ell ,\nu )\) is the phase factor of the plane wave, \(P_{\text {dir}}(\ell ,\nu )\) is the power of the plane wave, and \(Y_{lm}\) is the complex spherical harmonic,Footnote 1 as defined in (2.14). The diffuse signal \(S_{lm}^{\text {dir}}\) can be expressed as

$$\begin{aligned} S_{lm}^{\text {diff}}(\ell ,\nu ) = \sqrt{\frac{P_{\text {diff}}(\ell ,\nu )}{4 \pi }} \int _{\varOmega \in \mathcal {S}^2} \varphi _{\text {diff}}(\ell ,\nu ,\varOmega ) 4 \pi B_l(\nu ) Y_{lm}^{*}(\varOmega ) \text {d}\varOmega , \end{aligned}$$
(8.8)

where \(\varphi _{\text {diff}}(\ell ,\nu ,\varOmega )\) denotes the phase factor of the plane wave incident from direction \(\varOmega \) and the notation \(\int _{\varOmega \in \mathcal {S}^2} \text {d}\varOmega \) is used to denote compactly the solid angle \(\int _{\phi = 0}^{2\pi } \int _{\theta = 0}^{\pi } \sin \theta \text {d}\theta \text {d}\phi \).

As in Sect. 5.2.1, using the relationship (5.74) between the zero-order eigenbeam \(X_{00}(\ell ,\nu )\) and the reference signal \(\widetilde{X}(\ell ,\nu )\), as well as the expressions for the directional and diffuse signals in (8.7) and (8.8), it can be verified that the powers of these signals at \(\mathcal {M}_{\text {ref}}\) are respectively given by \(P_{\text {dir}}\) and \(P_{\text {diff}}\).

8.2 Parameter Estimation

In the parametric model, the sound field is described by two parameters for each time-frequency bin: the DOA \(\varOmega _{\text {dir}}(\ell ,\nu )\) of the plane wave that generates the directional signal, and the diffuseness \(\varPsi (\ell ,\nu )\), which determines the strength of the directional signal with respect to the diffuse signal.

The diffuseness is defined as [5]

$$\begin{aligned} \varPsi (\ell ,\nu ) = \frac{1}{1 + \varGamma (\ell ,\nu )}, \end{aligned}$$
(8.9)

where \(\varGamma (\ell ,\nu )\) denotes the signal-to-diffuse ratio (SDR) at \(\mathcal {M}_{\text {ref}}\), given by

$$\begin{aligned} \varGamma (\ell ,\nu )&= \frac{|\widetilde{S}_{\text {dir}}(\ell ,\nu )|^2}{\text {E}\left\{ |\widetilde{S}_{\text {diff}}(\ell ,\nu )|^2\right\} }\end{aligned}$$
(8.10a)
$$\begin{aligned}&= \frac{|S_{00}^{\text {dir}}(\ell ,\nu )|^2}{\text {E}\left\{ |S_{00}^{\text {diff}}(\ell ,\nu )|^2\right\} }\end{aligned}$$
(8.10b)
$$\begin{aligned}&= \frac{P_{\text {dir}}(\ell ,\nu )}{P_{\text {diff}}(\ell ,\nu )}. \end{aligned}$$
(8.10c)

The diffuseness takes values between 0 and 1. In a purely directional field, a diffuseness of 0 is obtained; in a purely directional field, a diffuseness of 1 is obtained; and when the directional and diffuse signals have equal power, a diffuseness of 0.5 is obtained .

Time- and frequency-dependent DOA and SDR/diffuseness estimates can be obtained using the methods presented in Chap. 5. In order for the reproduction of the sound field to be accurate, and to avoid distortion of the signals when enhancement is performed, it is crucial that the parameter estimates have sufficiently high temporal and spectral resolution, as well as sufficiently low variance.

8.3 Sound Pressure Estimation

In order to perform signal enhancement, we would like to estimate the directional and diffuse components \(\widetilde{S}_{\text {dir}}(\ell ,\nu )\) and \(\widetilde{S}_{\text {diff}}(\ell ,\nu )\) of the reference signal \(\widetilde{X}(\ell ,\nu )\). This can be done by applying a square-root Wiener filter to \(\widetilde{X}(\ell ,\nu )\), such that

$$\begin{aligned} \hat{S}_{\text {dir}}(\ell ,\nu )&= W_{\text {dir}}(\ell ,\nu ) \widetilde{X}(\ell ,\nu )\end{aligned}$$
(8.11)
$$\begin{aligned} \hat{S}_{\text {diff}}(\ell ,\nu )&= W_{\text {diff}}(\ell ,\nu ) \widetilde{X}(\ell ,\nu ), \end{aligned}$$
(8.12)

where the directional filter weights are given by

$$\begin{aligned} W_{\text {dir}}(\ell ,\nu )&= \sqrt{\frac{P_{\text {dir}}(\ell ,\nu )}{P_{\text {dir}}(\ell ,\nu ) + P_{\text {diff}}(\ell ,\nu ) + \mathrm {E}\left\{ |\widetilde{V}(\ell ,\nu )|^2 \right\} }}\end{aligned}$$
(8.13a)
$$\begin{aligned}&= \sqrt{\frac{\varGamma (\ell ,\nu )}{\varGamma (\ell ,\nu ) + 1 + P_{\text {diff}}^{-1}(\ell ,\nu ) \, \mathrm {E}\left\{ |\widetilde{V}(\ell ,\nu )|^2 \right\} }} \end{aligned}$$
(8.13b)

and the diffuse filter weights are given by

$$\begin{aligned} W_{\text {diff}}(\ell ,\nu )&= \sqrt{\frac{P_{\text {diff}}(\ell ,\nu )}{P_{\text {dir}}(\ell ,\nu ) + P_{\text {diff}}(\ell ,\nu ) + \mathrm {E}\left\{ |\widetilde{V}(\ell ,\nu )|^2 \right\} }}\end{aligned}$$
(8.14a)
$$\begin{aligned}&= \sqrt{\frac{1}{\varGamma (\ell ,\nu ) + 1 + P_{\text {diff}}^{-1}(\ell ,\nu ) \, \mathrm {E}\left\{ |\widetilde{V}(\ell ,\nu )|^2 \right\} }}. \end{aligned}$$
(8.14b)

Because the power of the spatially incoherent sensor noise is reduced when combining the Q microphone signals, we can assume that the power of the sensor noise \(\widetilde{V}(\ell ,\nu )\) is negligible, and therefore \(\mathrm {E}\left\{ |\widetilde{V}(\ell ,\nu )|^2 \right\} = 0\). In this case, the filter weights can be simplified to

$$\begin{aligned} W_{\text {dir}}(\ell ,\nu )&= \sqrt{\frac{\varGamma (\ell ,\nu )}{\varGamma (\ell ,\nu ) + 1}}\end{aligned}$$
(8.15a)
$$\begin{aligned}&= \sqrt{1-\varPsi (\ell ,\nu )} \end{aligned}$$
(8.15b)

and

$$\begin{aligned} W_{\text {diff}}(\ell ,\nu )&= \sqrt{\frac{1}{\varGamma (\ell ,\nu ) + 1}}\end{aligned}$$
(8.16a)
$$\begin{aligned}&= \sqrt{\varPsi (\ell ,\nu )}\end{aligned}$$
(8.16b)
$$\begin{aligned}&= \sqrt{1-W_{\text {dir}}^2(\ell ,\nu )}. \end{aligned}$$
(8.16c)

If the sensor noise power is not sufficiently low to be disregarded, the filter weights can be computed using an estimate of the diffuse-to-noise ratio, obtained using the method in [15], for example.

The advantage of using a square-root Wiener filter in this context is that the power of the directional and diffuse signals is preserved, that is, \(\text {E}\left\{ |\hat{S}_{\text {dir}}(\ell ,\nu )|^2 \right\} = P_{\text {dir}}(\ell ,\nu )\) and \(\text {E}\left\{ |\hat{S}_{\text {diff}}(\ell ,\nu )|^2 \right\} = P_{\text {diff}}(\ell ,\nu )\). In practice, a lower bound is sometimes applied to \(W_{\text {dir}}\) in order to avoid introducing audible artefacts such as musical noise [2, 18]. In addition, if the diffuse filter weights are computed using (8.16c), \(\text {E}\left\{ |\hat{S}_{\text {dir}}(\ell ,\nu )|^2 \right\} + \text {E}\left\{ |\hat{S}_{\text {diff}}(\ell ,\nu )|^2 \right\} = \text {E}\left\{ |\widetilde{X}(\ell ,\nu )|^2 \right\} \), even if a lower bound is applied to \(W_{\text {dir}}\).

8.4 Applications

In this section, we consider two applications of parametric array processing to signal enhancement: directional filtering (Sect. 8.4.1) and dereverberation (Sect. 8.4.2). The general principle in both of these applications is to apply a single-channel filter or time-frequency mask to the reference signal \(\widetilde{X}(\ell ,\nu )\) or the estimated pressure signals \(\hat{S}_{\text {dir}}(\ell ,\nu )\) and \(\hat{S}_{\text {diff}}(\ell ,\nu )\). As well as enhancing the signal, this can unfortunately also introduce speech distortion or musical noise, especially with filters that vary quickly across time and frequency. However, this problem can be mitigated by establishing a lower bound on the filter weights (as in Sect. 8.3), or by smoothing the weights across time and frequency [3, 7].

8.4.1 Directional Filtering

As proposed by Kallinger et al. in [8], a directional filter can be implemented by modifying the reference signal \(\widetilde{X}(\ell ,\nu )\), the diffuseness \(\varPsi (\ell ,\nu )\) and the DOA \(\varOmega _{\text {dir}}(\ell ,\nu )\). In this section, we apply two filters \(W_{\text {dir}}^{\text {filt}}\) and \(W_{\text {diff}}^{\text {filt}}\) directly to the estimated direct and diffuse sound pressures, such that

$$\begin{aligned} Z_{\text {dir}}(\ell ,\nu )&= W_{\text {dir}}^{\text {filt}} \left[ \varOmega (\ell ,\nu ) \right] \hat{S}_{\text {dir}}(\ell ,\nu )\end{aligned}$$
(8.17)
$$\begin{aligned} Z_{\text {diff}}(\ell ,\nu )&= W_{\text {diff}}^{\text {filt}} \, \hat{S}_{\text {diff}}(\ell ,\nu ). \end{aligned}$$
(8.18)

The filtered reference signal is then given by summing the filtered directional and diffuse signals:

$$\begin{aligned} Z(\ell ,\nu )&= Z_{\text {dir}}(\ell ,\nu ) + Z_{\text {diff}}(\ell ,\nu ). \end{aligned}$$
(8.19)

We would like the filtered reference signal to correspond to the signal captured by a directional microphone with a directional response \(D\left[ \varOmega \right] \). We additionally want a directional response of unity in the microphone’s steering direction \(\varOmega _{\text {u}}\). Ideally, we would be able to use a Dirac delta function in the steering direction. However, in practice this is not possible because the DOA estimates are not error-free and the directional sources are not point sources [8]. In practice, a beam width in the region of \(60^{\circ }\) can be achieved without introducing significant audible artefacts [8 ] .

We can choose, for example, a first-order microphone steered in a direction \(\varOmega _{\text {u}} = (\theta _{\text {u}},\phi _{\text {u}})\), whose directional response is given by [6]

$$\begin{aligned} D\left[ \varOmega (\ell ,\nu )\right]&= \alpha + (1-\alpha ) \left\{ \sin \left[ \theta (\ell ,\nu )\right] \sin \theta _{\text {u}} \cos \left[ \phi (\ell ,\nu ) - \phi _{\text {u}}\right] \right. \nonumber \\&\qquad \qquad \qquad \left. + \cos \left[ \theta (\ell ,\nu )\right] \cos \theta _{\text {u}} \right\} , \end{aligned}$$
(8.20)

where the term in curly brackets is the cosine of the angle between the DOA \(\varOmega = \left( \theta ,\phi \right) \) and steering direction \(\varOmega _{\text {u}}\), and \(\alpha \) is a shape parameter for the first-order microphone. In Table 8.1, we list a number of commonly used directivity patterns and the corresponding shape parameters.

Table 8.1 Commonly used first-order directivity patterns and corresponding shape parameter values

The power of an ideal diffuse signal (with unit power at \(\mathcal {M}_{\text {ref}}\)) at the output of such a microphone is given by [6, 17]

$$\begin{aligned} P_{D_{\text {diff}}}&= \frac{1}{4 \pi } \int _{\varOmega \in \mathcal {S}^2} D^2\left[ \varOmega (\ell ,\nu )\right] \text {d}\varOmega \end{aligned}$$
(8.21a)
$$\begin{aligned}&= \frac{4}{3} \alpha ^2 - \frac{2}{3} \alpha + \frac{1}{3}. \end{aligned}$$
(8.21b)

The directional and diffuse filter weights are then given by

$$\begin{aligned} W_{\text {dir}}^{\text {filt}} \left[ \varOmega (\ell ,\nu ) \right]&= D\left[ \varOmega (\ell ,\nu )\right] \end{aligned}$$
(8.22)
$$\begin{aligned} W_{\text {diff}}^{\text {filt}}&= \sqrt{P_{D_{\text {diff}}}}. \end{aligned}$$
(8.23)

This directional filtering technique can be likened to beamforming, and indeed the objective is the same. However, this technique involves a single-channel filter, while in beamforming we apply a filter to the pressure signals recorded at multiple microphones, or to multiple eigenbeams.

8.4.2 Dereverberation

In [9], Kallinger et al. also proposed a method for dereverberation using a parametric approach. The desired signal that contains less reverberation than the reference signal \(\widetilde{X}(\ell ,\nu )\) is given by

$$\begin{aligned} \widetilde{X}_{\text {dereverb}}(\ell ,\nu ) = S_{\text {dir}}(\ell ,\nu ) + \beta (\ell ,\nu ) S_{\text {diff}}(\ell ,\nu ), \end{aligned}$$
(8.24)

where \(0 \le \beta (\ell ,\nu ) < 1\) is a reverberation reduction factor .

A single-channel filter \(W(\ell ,\nu )\) can be applied to the reference signal \(\widetilde{X}(\ell ,\nu )\) to estimate the desired signal \(\widetilde{X}_{\text {dereverb}}(\ell ,\nu )\):

$$\begin{aligned} Z(\ell ,\nu ) = W(\ell ,\nu ) \widetilde{X}(\ell ,\nu ). \end{aligned}$$
(8.25)

The filter weights \(W_{\mathrm {MMSE}}(\ell ,\nu )\) that minimize the mean square error between the filter output signal \(Z(\ell ,\nu )\) and the desired signal \(\widetilde{X}_{\text {dereverb}}(\ell ,\nu )\) are given by

$$\begin{aligned} W_{\mathrm {MMSE}}(\ell ,\nu )&= \underset{W(\ell ,\nu )}{\arg \min } \, \mathrm {E} \left\{ \left| \widetilde{X}_{\text {dereverb}}(\ell ,\nu ) - W(\ell ,\nu ) \widetilde{X}(\ell ,\nu ) \right| ^2 \right\} \end{aligned}$$
(8.26a)
$$\begin{aligned}&= 1 - (1-\beta ) \varPsi (\ell ,\nu )\end{aligned}$$
(8.26b)
$$\begin{aligned}&= \frac{\varGamma (\ell ,\nu ) + \beta (\ell ,\nu )}{\varGamma (\ell ,\nu ) + 1}. \end{aligned}$$
(8.26c)

This filter is attractive due to its simplicity, since the filter weights only depend on the diffuseness and the desired reverberation reduction factor and do not depend on the DOA. As previously mentioned, the filter weights must normally be smoothed over time and frequency to avoid audible artefacts; the amount of smoothing that is necessary will depend on how much smoothing has been applied to the diffuseness estimates.

It should be noted that the filter described in this section can be used to suppress any diffuse sound, whether it be reverberation, or isotropic noise such as car noise or babble noise.

8.5 Chapter Summary

Parametric array processing relies on a simple yet powerful parametric model of the sound field, which in this chapter was described using a single reference pressure signal along with two parameters, the DOA and the diffuseness. These parameters must be estimated accurately, and with high time and frequency resolution. We presented two illustrative applications of this array processing approach: directional filtering and dereverberation. These applications highlight a significant advantage of parametric array processing techniques: they typically have low computational complexity, especially if low-complexity parameter estimation methods are chosen (see Chap. 5).

Ongoing research challenges include formulating more sophisticated parametric models to improve performance, and finding new ways to avoid audible artefacts despite using filters whose weights vary quickly with time and frequency. Other potential applications of parametric array processing include acoustic zoom [13, 19] and source extraction using multiple microphone arrays.