Abstract
This chapter takes a different approach to signal enhancement using spherical microphone arrays: a physically-motivated parametric representation of the sound field is introduced. It is shown that the sound field can be manipulated to achieve noise reduction or dereverberation by applying a time- and frequency-dependent gain to a reference signal. The gain is a simple function of the sound field parameters, which can be estimated using the methods presented in Chap. 5.
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The general principle of parametric array processing is to employ an efficient parametric representation of the sound field including typically one or a few reference signals, and a small number of associated parameters. The advantage of such an approach is that the number of parameters is significantly lower than in classical array processing (see Chap. 7). A block diagram of a parametric array processing approach is shown in Fig. 8.1.
Examples of parametric representations of the sound field include Directional Audio Coding (DirAC) [11], High Angular Resolution Planewave Expansion (HARPEX) [1] and computational auditory scene analysis (CASA) [4]. These representations can be used for spatial audio recording, coding and reproduction; source separation, noise reduction and dereverberation; and acoustic scene analysis and source localization. In this chapter, we will focus on parametric approaches to signal enhancement using the DirAC representation .
The DirAC representation is based on two features that are relevant to the perception of spatial sound: the direction of arrival (DOA) and the diffuseness. Providing these features are accurately reproduced, this representation ensures that the interaural time differences (ITDs), interaural level differences (ILDs), and the interaural coherence are correctly perceived [16]. The advantage of integrating DirAC with a signal enhancement process is that any interference sources can be reproduced at their original position [9] relative to the desired source, in addition to being attenuated, thereby maintaining the naturalness of the listening experience but with increased speech quality and intelligibility .
In this chapter, we first introduce a parametric model of the sound field. We then review the parameters that describe this sound field and how they can be estimated, and present filters that can be used to separate the two components of the sound field. Finally, we explore two applications of parametric array processing, namely, directional filtering and dereverberation.
8.1 Signal Model
In the short-time Fourier transform (STFT) domain , the sound pressure S at a position \(\mathbf {r}\) can be decomposed into a direct sound component \(S_{\text {dir}}\) and a diffuse sound component \(S_{\text {diff}}\), such that
where \(\ell \) denotes the discrete time index and \(\nu \) denotes the discrete frequency index. The sound pressure signal X measured by Q microphones at positions \(\mathbf {r}_q, q\in \left\{ 1, \dots , Q\right\} \) is then given by
where V denotes a sensor noise signal.
We assume that the directional signal \(S_{\text {dir}}\) is sparse in the time-frequency domain [12], such that in each time-frequency bin the directional signal is due to a single plane wave. The diffuse signal is due to a theoretically infinite number of independent plane waves with random phases, equal amplitudes and uniformly distributed DOAs [10]. We also assume that all three signals are mutually uncorrelated, that is,
where \(\mathrm {E} \left\{ \cdot \right\} \) denotes mathematical expectation, which can be computed using temporal averaging.
In order to obtain the reference signal as indicated in Fig. 8.1, we must transform the spatial domain signals to the spherical harmonic domain (SHD). In this chapter, we assume error-free spatial sampling, and refer the reader to Chap. 3 for information on spatial sampling and aliasing. By applying the complex spherical harmonic transform (SHT) to the signal model in (8.2), we obtain the SHD signal model
where \(X_{lm}(\ell ,\nu )\), \(S_{lm}(\ell ,\nu )\), \(S^{\text {dir}}_{lm}(\ell ,\nu )\), \(S^{\text {diff}}_{lm}(\ell ,\nu )\) and \(V_{lm}(\ell ,\nu )\) are respectively the spherical harmonic transforms of the signals \(X(\ell ,\nu ,\mathbf {r}_q)\), \(S(\ell ,\nu ,\mathbf {r}_q)\), \(S_{\text {dir}}(\ell ,\nu ,\mathbf {r}_q)\), \(S_{\text {diff}}(\ell ,\nu ,\mathbf {r}_q)\) and \(V(\ell ,\nu ,\mathbf {r}_q)\), as defined in (3.6), and are referred to as eigenbeams to reflect the fact that the spherical harmonics are eigensolutions of the wave equation in spherical coordinates [14]. The order and degree of the spherical harmonics are respectively denoted as l and m.
We choose as a reference the signal that would be measured by an omnidirectional microphone \(\mathcal {M}_{\text {ref}}\) placed at the centre of the spherical array, if the array were not present. As shown in the Appendix of Chap. 5, the sound pressure \(\widetilde{X}(\ell ,\nu )\) at this microphone can be obtained from the zero-order eigenbeam \(X_{00}(\ell ,\nu )\) as
where the frequency-dependent mode strength \(B_l(\nu )\) for spherical harmonic order l, given by evaluating the wavenumber-dependent mode strength \(b_l(k)\) at discrete values of the wavenumber k, captures the dependence of the \(l^\text {th}\) order eigenbeams on the array properties, and is discussed in Sect. 3.4.2. By dividing the eigenbeam \(X_{00}(\ell ,\nu )\) by the mode strength, we remove this dependence, such that the reference signal is independent of the array properties. As noted in Sect. 7.2.2, assuming the array’s Q microphones are uniformly distributed on the sphere, the power of the sensor noise V is \(Q \left| B_0(\nu ) \right| ^2\) times smaller at \(\mathcal {M}_{\text {ref}}\) than at the individual microphones on the surface of the sphere .
The directional signal \(S_{lm}^{\text {dir}}\) due to a plane wave incident from a direction \(\varOmega _{\text {dir}}\) is given by
where \(\varphi _{\text {dir}}(\ell ,\nu )\) is the phase factor of the plane wave, \(P_{\text {dir}}(\ell ,\nu )\) is the power of the plane wave, and \(Y_{lm}\) is the complex spherical harmonic,Footnote 1 as defined in (2.14). The diffuse signal \(S_{lm}^{\text {dir}}\) can be expressed as
where \(\varphi _{\text {diff}}(\ell ,\nu ,\varOmega )\) denotes the phase factor of the plane wave incident from direction \(\varOmega \) and the notation \(\int _{\varOmega \in \mathcal {S}^2} \text {d}\varOmega \) is used to denote compactly the solid angle \(\int _{\phi = 0}^{2\pi } \int _{\theta = 0}^{\pi } \sin \theta \text {d}\theta \text {d}\phi \).
As in Sect. 5.2.1, using the relationship (5.74) between the zero-order eigenbeam \(X_{00}(\ell ,\nu )\) and the reference signal \(\widetilde{X}(\ell ,\nu )\), as well as the expressions for the directional and diffuse signals in (8.7) and (8.8), it can be verified that the powers of these signals at \(\mathcal {M}_{\text {ref}}\) are respectively given by \(P_{\text {dir}}\) and \(P_{\text {diff}}\).
8.2 Parameter Estimation
In the parametric model, the sound field is described by two parameters for each time-frequency bin: the DOA \(\varOmega _{\text {dir}}(\ell ,\nu )\) of the plane wave that generates the directional signal, and the diffuseness \(\varPsi (\ell ,\nu )\), which determines the strength of the directional signal with respect to the diffuse signal.
The diffuseness is defined as [5]
where \(\varGamma (\ell ,\nu )\) denotes the signal-to-diffuse ratio (SDR) at \(\mathcal {M}_{\text {ref}}\), given by
The diffuseness takes values between 0 and 1. In a purely directional field, a diffuseness of 0 is obtained; in a purely directional field, a diffuseness of 1 is obtained; and when the directional and diffuse signals have equal power, a diffuseness of 0.5 is obtained .
Time- and frequency-dependent DOA and SDR/diffuseness estimates can be obtained using the methods presented in Chap. 5. In order for the reproduction of the sound field to be accurate, and to avoid distortion of the signals when enhancement is performed, it is crucial that the parameter estimates have sufficiently high temporal and spectral resolution, as well as sufficiently low variance.
8.3 Sound Pressure Estimation
In order to perform signal enhancement, we would like to estimate the directional and diffuse components \(\widetilde{S}_{\text {dir}}(\ell ,\nu )\) and \(\widetilde{S}_{\text {diff}}(\ell ,\nu )\) of the reference signal \(\widetilde{X}(\ell ,\nu )\). This can be done by applying a square-root Wiener filter to \(\widetilde{X}(\ell ,\nu )\), such that
where the directional filter weights are given by
and the diffuse filter weights are given by
Because the power of the spatially incoherent sensor noise is reduced when combining the Q microphone signals, we can assume that the power of the sensor noise \(\widetilde{V}(\ell ,\nu )\) is negligible, and therefore \(\mathrm {E}\left\{ |\widetilde{V}(\ell ,\nu )|^2 \right\} = 0\). In this case, the filter weights can be simplified to
and
If the sensor noise power is not sufficiently low to be disregarded, the filter weights can be computed using an estimate of the diffuse-to-noise ratio, obtained using the method in [15], for example.
The advantage of using a square-root Wiener filter in this context is that the power of the directional and diffuse signals is preserved, that is, \(\text {E}\left\{ |\hat{S}_{\text {dir}}(\ell ,\nu )|^2 \right\} = P_{\text {dir}}(\ell ,\nu )\) and \(\text {E}\left\{ |\hat{S}_{\text {diff}}(\ell ,\nu )|^2 \right\} = P_{\text {diff}}(\ell ,\nu )\). In practice, a lower bound is sometimes applied to \(W_{\text {dir}}\) in order to avoid introducing audible artefacts such as musical noise [2, 18]. In addition, if the diffuse filter weights are computed using (8.16c), \(\text {E}\left\{ |\hat{S}_{\text {dir}}(\ell ,\nu )|^2 \right\} + \text {E}\left\{ |\hat{S}_{\text {diff}}(\ell ,\nu )|^2 \right\} = \text {E}\left\{ |\widetilde{X}(\ell ,\nu )|^2 \right\} \), even if a lower bound is applied to \(W_{\text {dir}}\).
8.4 Applications
In this section, we consider two applications of parametric array processing to signal enhancement: directional filtering (Sect. 8.4.1) and dereverberation (Sect. 8.4.2). The general principle in both of these applications is to apply a single-channel filter or time-frequency mask to the reference signal \(\widetilde{X}(\ell ,\nu )\) or the estimated pressure signals \(\hat{S}_{\text {dir}}(\ell ,\nu )\) and \(\hat{S}_{\text {diff}}(\ell ,\nu )\). As well as enhancing the signal, this can unfortunately also introduce speech distortion or musical noise, especially with filters that vary quickly across time and frequency. However, this problem can be mitigated by establishing a lower bound on the filter weights (as in Sect. 8.3), or by smoothing the weights across time and frequency [3, 7].
8.4.1 Directional Filtering
As proposed by Kallinger et al. in [8], a directional filter can be implemented by modifying the reference signal \(\widetilde{X}(\ell ,\nu )\), the diffuseness \(\varPsi (\ell ,\nu )\) and the DOA \(\varOmega _{\text {dir}}(\ell ,\nu )\). In this section, we apply two filters \(W_{\text {dir}}^{\text {filt}}\) and \(W_{\text {diff}}^{\text {filt}}\) directly to the estimated direct and diffuse sound pressures, such that
The filtered reference signal is then given by summing the filtered directional and diffuse signals:
We would like the filtered reference signal to correspond to the signal captured by a directional microphone with a directional response \(D\left[ \varOmega \right] \). We additionally want a directional response of unity in the microphone’s steering direction \(\varOmega _{\text {u}}\). Ideally, we would be able to use a Dirac delta function in the steering direction. However, in practice this is not possible because the DOA estimates are not error-free and the directional sources are not point sources [8]. In practice, a beam width in the region of \(60^{\circ }\) can be achieved without introducing significant audible artefacts [8 ] .
We can choose, for example, a first-order microphone steered in a direction \(\varOmega _{\text {u}} = (\theta _{\text {u}},\phi _{\text {u}})\), whose directional response is given by [6]
where the term in curly brackets is the cosine of the angle between the DOA \(\varOmega = \left( \theta ,\phi \right) \) and steering direction \(\varOmega _{\text {u}}\), and \(\alpha \) is a shape parameter for the first-order microphone. In Table 8.1, we list a number of commonly used directivity patterns and the corresponding shape parameters.
The power of an ideal diffuse signal (with unit power at \(\mathcal {M}_{\text {ref}}\)) at the output of such a microphone is given by [6, 17]
The directional and diffuse filter weights are then given by
This directional filtering technique can be likened to beamforming, and indeed the objective is the same. However, this technique involves a single-channel filter, while in beamforming we apply a filter to the pressure signals recorded at multiple microphones, or to multiple eigenbeams.
8.4.2 Dereverberation
In [9], Kallinger et al. also proposed a method for dereverberation using a parametric approach. The desired signal that contains less reverberation than the reference signal \(\widetilde{X}(\ell ,\nu )\) is given by
where \(0 \le \beta (\ell ,\nu ) < 1\) is a reverberation reduction factor .
A single-channel filter \(W(\ell ,\nu )\) can be applied to the reference signal \(\widetilde{X}(\ell ,\nu )\) to estimate the desired signal \(\widetilde{X}_{\text {dereverb}}(\ell ,\nu )\):
The filter weights \(W_{\mathrm {MMSE}}(\ell ,\nu )\) that minimize the mean square error between the filter output signal \(Z(\ell ,\nu )\) and the desired signal \(\widetilde{X}_{\text {dereverb}}(\ell ,\nu )\) are given by
This filter is attractive due to its simplicity, since the filter weights only depend on the diffuseness and the desired reverberation reduction factor and do not depend on the DOA. As previously mentioned, the filter weights must normally be smoothed over time and frequency to avoid audible artefacts; the amount of smoothing that is necessary will depend on how much smoothing has been applied to the diffuseness estimates.
It should be noted that the filter described in this section can be used to suppress any diffuse sound, whether it be reverberation, or isotropic noise such as car noise or babble noise.
8.5 Chapter Summary
Parametric array processing relies on a simple yet powerful parametric model of the sound field, which in this chapter was described using a single reference pressure signal along with two parameters, the DOA and the diffuseness. These parameters must be estimated accurately, and with high time and frequency resolution. We presented two illustrative applications of this array processing approach: directional filtering and dereverberation. These applications highlight a significant advantage of parametric array processing techniques: they typically have low computational complexity, especially if low-complexity parameter estimation methods are chosen (see Chap. 5).
Ongoing research challenges include formulating more sophisticated parametric models to improve performance, and finding new ways to avoid audible artefacts despite using filters whose weights vary quickly with time and frequency. Other potential applications of parametric array processing include acoustic zoom [13, 19] and source extraction using multiple microphone arrays.
Notes
- 1.
If the real SHT is applied instead of the complex SHT, the complex spherical harmonics \(Y_{lm}\) used throughout this chapter should be replaced with the real spherical harmonics \(R_{lm}\), as defined in Sect. 3.3.
References
Berge, S., Barrett, N.: High angular resolution planewave expansion. In: Proceedings of the 2nd International Symposium on Ambisonics and Spherical Acoustics (2010)
Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of speech corrupted by acoustic noise. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 208–211 (1979)
Breithaupt, C., Gerkmann, T., Martin, R.: Cepstral smoothing of spectral filter gains for speech enhancement without musical noise. IEEE Signal Process. Lett. 14(12), 1036–1039 (2007)
Brown, G.J., Cooke, M.: Computational auditory scene analysis. Comput. Speech Lang. 8, 297–336 (1994)
Del Galdo, G., Taseska, M., Thiergart, O., Ahonen, J., Pulkki, V.: The diffuse sound field in energetic analysis. J. Acoust. Soc. Am. 131(3), 2141–2151 (2012)
Elko, G.W.: Spatial coherence functions for differential microphones in isotropic noise fields. In: Brandstein, M., Ward, D. (eds.) Microphone Arrays: Signal Processing Techniques and Applications, chap. 4, pp. 61–85. Springer, Heidelberg (2001)
Gustafsson, S., Nordholm, S., Claesson, I.: Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE Trans. Speech Audio Process. 9(8), 799–807 (2001)
Kallinger, M., Ochsenfeld, H., Del Galdo, G., Kuech, F., Mahne, D., Schultz-Amling, R., Thiergart, O.: A spatial filtering approach for directional audio coding. In: Proceedings of the Audio Engineering Society Convention. Munich, Germany (2009)
Kallinger, M., Del Galdo, G., Kuech, F., Thiergart, O.: Dereverberation in the spatial audio coding domain. In: Proceedings of the Audio Engineering Society Convention. London, UK (2011)
Kuttruff, H.: Room Acoustics, 4th edn. Taylor & Francis, London (2000)
Pulkki, V.: Spatial sound reproduction with directional audio coding. J. Audio Eng. Soc. 55(6), 503–516 (2007)
Rickard, S., Yilmaz, Z.: On the approximate W-disjoint orthogonality of speech. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 529–532 (2002)
Schultz-Amling, R., Kuech, F., Thiergart, O., Kallinger, M.: Acoustical zooming based on a parametric sound field representation. In: Proceedings of the Audio Engineering Society Convention (2010)
Teutsch, H.: Wavefield decomposition using microphone arrays and its application to acoustic scene analysis. Ph.D. thesis, Friedrich-Alexander Universität Erlangen-Nürnberg (2005)
Thiergart, O., Habets, E.A.P.: An informed LCMV filter based on multiple instantaneous direction-of-arrival estimates. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 659–663 (2013)
Thiergart, O., Kallinger, M., Del Galdo, G., Kuech, F.: Parametric spatial sound processing using linear microphone arrays. In: Heuberger, A., Elst, G., Hanke, R. (eds.) Microelectronic Systems, pp. 313–321. Springer, Heidelberg (2011)
Thiergart, O., Del Galdo, G., Habets, E.A.P.: On the spatial coherence in mixed sound fields and its application to signal-to-diffuse ratio estimation. J. Acoust. Soc. Am. 132(4), 2337–2346 (2012)
Thiergart, O., Del Galdo, G., Taseska, M., Habets, E.: Geometry-based spatial sound acquisition using distributed microphone arrays. IEEE Trans. Audio, Speech, Lang. Process. 21(12), 2583–2594 (2013)
Thiergart, O., Kowalczyk, K., Habets, E.: An acoustical zoom based on informed spatial filtering. In: Proceedings of the International Workshop Acoustic Signal Enhancement (IWAENC), pp. 109–113. IEEE, Juan-les-Pins, France (2014). doi:10.1109/IWAENC.2014.6953348
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Jarrett, D.P., Habets, E.A.P., Naylor, P.A. (2017). Parametric Array Processing. In: Theory and Applications of Spherical Microphone Array Processing. Springer Topics in Signal Processing, vol 9. Springer, Cham. https://doi.org/10.1007/978-3-319-42211-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-42211-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42209-1
Online ISBN: 978-3-319-42211-4
eBook Packages: EngineeringEngineering (R0)