Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Background and Context

The motivation behind this book lies in the rapidly growing interest in spherical microphone arrays over the last decade. Important applications for these arrays include human-human and human-machine speech communication systems and spatial sound recording. While human-human speech communication systems have a long history, speech also plays an ever-growing part in human-machine communication. Indeed, while speech-based interfaces were once confined to the realms of science fiction, they are now becoming an increasingly popular way of interacting with devices such as smartphones, desktop and tablet computers, robots or televisions. This trend has been fuelled by advances in speech recognition technology, as well as the explosion in available computing power, particularly on mobile devices. With the widespread availability of 3D sound cinema systems and virtual reality gear with 3D binaural sound reproduction, the need to capture spatial sound is rapidly growing. Spherical microphone arrays are particularly suitable for capturing all three dimensions of the sound field, including both ambient sounds and sounds from particular directions.

The field of acoustic signal processing seeks to solve a number of problems relating to these systems, which can broadly be divided into three categories: acoustic parameter estimation, acoustic signal enhancement, and spatial audio recording. Acoustic parameter estimation, addressed in Chap. 5, involves the estimation of parameters such as the location or direction of arrival (DOA) of one or more acoustic sources [20, 27, 30, 34, 5153], the signal-to-diffuse energy ratio or diffuseness of the sound field at a particular position [31, 32, 43, 54, 55], the number of sources present in a sound field [53, 56, 57], or the reverberation time of an acoustic environment [14, 39, 40, 46, 49, 58].

In the aforementioned applications, the signal to be acquired originates from a distant source, located at some significant distance from the microphone(s). While in some applications, such as teleconferencing systems, a microphone located close to the source may be available, this is not always a practical option. As a result, the acquired signal is corrupted by the surrounding environment. One major cause for this degradation is the presence of noise, where by noise we mean any acoustic signal which is undesired, such as interfering speech signals or background noise [6, 9]. The other is the presence of reflectors and obstacles to the propagation of sound waves, in particular room boundaries (walls, floors and ceiling), which cause reverberation [35, 42]. As the distance between the source and microphone(s) increases, the degradative effects of noise and reverberation become increasingly significant.

In the case of speech signals, these effects not only degrade the quality of the acquired signal, but in some cases also its intelligibility, making communication difficult or even impossible [3]. The cognitive effort required to understand highly noisy and reverberant speech can also contribute to listener fatigue. Acoustic signal enhancement or speech enhancement techniques (considered in Chaps. 69) seek to mitigate these effects, and extract the desired signal. The main problems of interest within this field are noise reduction [4, 6, 22, 26, 28], echo cancellation [7, 33, 47] and dereverberation [11, 18, 21, 23, 24, 29, 37, 38, 42]. Although the release of the first speakerphone dates back to 1954 [15], these remain open problems and areas of active research.

1.2 Microphone Array Signal Processing

Acoustic signal processing problems are commonly approached with microphone arrays [5, 10, 17], which is an arrangement of microphones in a specific configuration, thereby taking advantage of the spatial properties of the sound field (or spatial diversity) in order to improve performance. Owing to the similarity of the problems involved, many microphone array processing techniques are based on narrowband antenna array processing techniques [12]; however, microphone array processing faces its own unique challenges [5]. These include the broadband nature of speech (which covers several octaves), the non-stationarity of speech, and the fact that the desired and noise signals often have very similar spectral characteristics [5]. In addition, the placement and number of microphones is restricted, primarily by cost, aesthetics and available space. Considerations of space limit both the inter-microphone spacing and total microphone array size, and are of particular importance for portable devices, such as hearing aids [13].

A typical application scenario in microphone array signal processing is illustrated in Fig. 1.1. A microphone array captures a mixture of signals with different spatial characteristics, some of which may be desired, and others undesired. Acoustic parameter estimation algorithms seek to accurately estimate the parameters of interest even in the presence of undesired signals that may adversely affect the estimation process. Acoustic signal enhancement algorithms aim to extract only the desired signals from the received mixture.

The spatial characteristics of the various captured signals are typically modeled based on their spatial coherence. The microphone signals are corrupted by sensor noise, which is spatially incoherent (or spatially white), that is, the sensor noise signals at each microphone are mutually uncorrelated. The desired signals, originating from one or more desired sources, as well as any directional noise signals, originating from interfering speakers or air-conditioning units, for example, are spatially coherent. Finally, partially coherent signals can be observed in spherically or cylindrically isotropic (or diffuse) sound fields, which can be used to model babble noise or reverberation. The desired signal is normally chosen as either the anechoic signal arriving from the desired source via the direct path, or the reverberant signal arriving via the direct path and a number of reflected paths.

Fig. 1.1
figure 1

Schematic illustration of a typical application scenario in microphone array signal processing. A microphone array captures a mixture of signals with different spatial characteristics in a reverberant environment

In theory, any microphone array configuration is possible; in practice, most microphone arrays are linear or planar, and the microphones respectively lie on a straight line or a flat, two-dimensional surface. Real sound fields are three-dimensional, however, and can only be fully analyzed with a three-dimensional array. The spherical configuration is convenient due to its symmetry giving equal performance in all directions. In addition, the captured sound field can be efficiently described in the spherical harmonic domain [41, 44], based on a formulation of the wave equation in spherical coordinates (in Chap. 2). Spherical microphone arrays [1, 16, 19, 36, 45, 48, 50] are usually either open or rigid, that is, the microphones are either suspended in free space or mounted on a rigid baffle (as discussed in Chap. 3). They have recently started to become commercially available, in the form of products such as the acoustic camera by GFal, the Eigenmike by mh acoustics (Fig. 1.2), Brüel and Kjær’s spherical array (Fig. 1.3), or the RealSpace Panoramic Audio Camera by VisiSonics, yet to date there have been few signal processing algorithms designed for these arrays. This motivates the work presented in this book.

Fig. 1.2
figure 2

The em32 Eigenmike spherical microphone array. This rigid array of radius 4.2 cm is comprised of 32 omnidirectional microphones. Copyright © Emanuël Habets. Used with permission

Fig. 1.3
figure 3

The Brüel and Kjær spherical microphone array. This rigid array is comprised of 36 or 50 microphones and 12 video cameras. Copyright © Brüel & Kjær. Used with permission

1.3 Organization of the Book

The content of this book is structured as follows:

  • In Chap. 2, the fundamentals of acoustics are reviewed. We introduce the spherical harmonics, which form a complete set of orthonormal functions. Their importance rests in the fact that any arbitrary function on a sphere can be expanded in terms of a these functions and a set of expansion coefficients.

  • Chapter 3 examines issues relating to spatial signal acquisition and transformation. We present the short-time Fourier transform and spherical harmonic framework that allow us to efficiently process the signals captured by a spherical microphone array. Common spatial sampling schemes are presented, which determine the placement of microphones on the sphere such that spatial aliasing is minimized. In addition, we discuss the advantages and disadvantages of two common array types: the open and rigid arrays with omnidirectional microphones.

  • In order to comprehensively evaluate spherical array processing algorithms under many different acoustic conditions, it is indispensable to use simulated acoustic impulse responses (AIRs). The image method proposed by Allen and Berkley [2] is a well-established way of doing this for point-to-point AIRs with sensors in free space, however it does not account for the scattering introduced by a rigid sphere. In Chap. 4, we present a method for simulating the AIRs between a sound source and microphones positioned on a rigid spherical array. In addition, three examples are presented based on this method: an analysis of a diffuse reverberant sound field, a study of binaural cues in the presence of reverberation, and an illustration of the algorithm’s use as a mouth simulator.

  • Chapter 5 introduces methods for the estimation of two important acoustic parameters: the DOA of a sound source, and the signal-to-diffuse energy ratio at a particular position in a sound field. Later in the book, it will be seen that these quantities can be used for signal enhancement purposes.

  • The process of combining signals acquired by a microphone array in order to isolate a signal of interest is known as beamforming or spatial filtering. Chapter 6 considers the simplest type of beamformer: the signal-independent (fixed) beamformer, whose weights only depend on the DOA of the source to be extracted, and do not otherwise depend on the desired signal.

  • In Chap. 7, we derive signal-dependent beamformers, whose weights depend on the second-order statistics of the desired signal and/or of the noise to be suppressed. These beamformers adaptively seek to achieve optimal performance in terms of noise reduction and speech distortion.

  • Chapter 8 takes a different approach to signal enhancement: a physically-motivated parametric representation of the sound field is introduced. It is shown that the sound field can be manipulated to achieve noise reduction or dereverberation by applying a time- and frequency-dependent gain to a reference signal. The gain is a simple function of the sound field parameters, which can be estimated using the methods presented in Chap. 5.

  • The concept of informed array processing is introduced in Chap. 9. It involves incorporating relevant spatial information about the specific problem into the design of spatial filters, and into the estimation of the second-order statistics that is required to implement the beamformers in Chap. 7. Informed array processing techniques are developed for two signal enhancement problems: noise reduction and dereverberation.

Fig. 1.4
figure 4

Structure of the book. The chapter/section relating to each topic is indicated in parentheses

The structure of the book, and the relationship between each of the topics it addresses, is illustrated in Fig. 1.4.