Keywords

1 Introduction

It is well known that conventional microphone arrays (MA) for speech processing have poor directivity in the low-frequency band (lower than 1000 Hz) [14]. At the same time, it is known that this frequency band is important for speech processing systems, since a large part of both speech signal and noise is concentrated in this band. There are two basic ways to solve this problem: (1) applying different superdirectivity algorithms [24] and (2) increasing the size of the MA by using the harmonically nested microphone subarrays technique with sub-band processing [3, 5].

However, superdirectivity algorithms perform badly in non-stationary noise environments, are very sensitive to noise (or signal + noise) covariance matrix estimation errors [6], as well as to microphone gain mismatch [7], etc., which limits their use in speech signal processing.

On the other hand, non-adaptive nested microphone subarrays with sub-band processing show good results, as noted in [3, 5]. A typical scheme of microphone layout for a MA with 9 microphones and 3 subarrays is described in detail in [5]. Here we note that in such schemes all subarrays have the same numbers of microphones but different distances between microphones, and improvement of directivity in the low-frequency (LF) band is achieved by increasing the length of the MA while keeping the total number of microphones constant. In the present paper we propose a different approach.

2 The Proposed Method

2.1 The Basic Idea

Typically, the distance \( d \) between microphones in a discrete MA is chosen based on the absence of side lobes with a large amplitude [1]:

$$ d < \frac{c}{{2f_{\hbox{max} } }} , $$
(1)

where: \( c \) is speed of sound in the air (343.1 m/s for 20 oC) and \( f_{\hbox{max} } \) is the maximum frequency of the operating range of the MA in Hz. Let us denote the total number of microphones in the MA as \( N \), and the aperture length of the MA as \( L \): \( L = d(N - 1) \).

It is known that if (1) is satisfied, further increasing \( N \) with a fixed \( L \) (i.e. decreasing \( d \)) does not lead to directivity improvement. On the other hand, (1) shows that for the low frequency range it is not necessarily to use all MA microphones. We can take, for example, only the edge microphones (the 0-th and \( N - 1 \)). It turns out that for the low-frequency range, we obtain improved directivity even in comparison with the case where all MA microphones are used.

Let us consider a standard equidistant linear MA with the number of microphones \( N = 9 \) and the distance between microphones \( d = 0.05 \) m. Consequently, the aperture length \( L \) of the MA is equal to: \( L = d(N - 1) = 0.4 \) m. The normalized horizontal amplitude directivity pattern \( D\left( {f,\varphi_{d} ,\varphi } \right) \) of such an MA (for far-field assumption) can be obtained as [1]:

$$ D(f,\varphi_{d} ,\varphi ) = \left| {\frac{1}{N}\sum\nolimits_{n = 0}^{N - 1} {\exp \left( {j\frac{2\pi f}{c}nd\left( {\sin \left( \varphi \right) - \sin \left( {\varphi_{d} } \right)} \right)} \right)} } \right| , $$
(2)

where: \( f \) is the signal frequency in Hz; \( \varphi \) is the direction of arrival; \( \varphi_{d} \) is the desired look direction and ‘\( \left| { \, \cdot \, } \right| \)’ denotes the “magnitude of complex value” operator.

The directivity pattern (DP), calculated using (2), for MA with \( N = 9 \), \( d = 0.05 \), \( \varphi_{d} = 0 \) and \( f = 4 2 8. 8 7 \) Hz is shown in Fig. 1, solid line.

Fig. 1.
figure 1

Directivity patterns, calculated using all 9 (solid line) and 2 edge (dashed line) microphones of the MA for \( f = 4 2 8. 8 7 \) Hz.

It is clear that the MA has poor directivity for the given \( f \). At the same time, if we take only the edge microphones (0-th and 8-th, \( d_{0,8} = L = 0.4 \)) and calculate the DP for the same \( f \) for this 2-microphone sub-array, we get the dashed curve shown in Fig. 1, i.e. we get better DP.

It seems strange, but when signal frequency is lower than some threshold level, it is better to use the edge microphones rather than all microphones in the MA.

2.2 Detailed Study of the Method

In fact there is nothing strange about the above mentioned result. It is known that for a dual-microphone array with a 0.4 m distance between microphones, the first zeroes in the DP for \( \varphi = \pm 90 \) can be calculated as: \( \lambda /2 = 0.4 \) (where \( \lambda \) is the wavelength); which implies that \( f = {c \mathord{\left/ {\vphantom {c {0.8 = {{343.1} \mathord{\left/ {\vphantom {{343.1} {0.8\, = }}} \right. \kern-0pt} {0.8\, = }}}}} \right. \kern-0pt} {0.8 = {{343.1} \mathord{\left/ {\vphantom {{343.1} {0.8\, = }}} \right. \kern-0pt} {0.8\, = }}}}\,428.875 \) Hz. At the same time, for an MA with a continuous aperture (and the same length) we will have the condition of first zeroes at \( \varphi = \pm 90 \) as: \( \lambda = 0.4 \) [1] i.e. the frequency in this case is two times higher. Frequency for a discrete MA with \( N = 9 \) occupies an intermediate position.

Now consider the directivity index (DI), \( G(f,\varphi_{d} ) \) of a linear MA. \( G(f,\varphi_{d} ) \) characterizes the directivity of the MA: the greater \( G\left( {f,\varphi_{d} } \right) \), the better the directivity. Without loss of generality we can calculate it in Cartesian coordinates, using a single-dimensional horizontal DP (as in Fig. 1) and \( - 90 \le \varphi \le 90 \) in degrees. In this case \( G(f,\varphi_{d} ) \) can be obtained as:

$$ G(f,\varphi_{d} ) = \frac{180}{{\int_{ - 90}^{90} {\left| {D(f,\varphi_{d} ,\varphi )} \right|^{2} \cos (\varphi )d\varphi } }} $$
(3)

Figure 2 demonstrates the directivity index \( G(f,\varphi_{d} ) \) for \( \varphi_{d} = 0 \) and for the initial MA (\( N = 9 \), \( d = 0.05 \) m); for the MA created using two microphones on the edges (\( N = 2 \), \( d = 0.4 \) m) and for the MA created using three (the central and the edges) microphones, i.e. \( N = 3 \), \( d = 0.2 \) m.

Fig. 2.
figure 2

Directivity index as function of signal frequency for three MA’s with the same length and different number of microphones.

It can be seen that up to 600 Hz the MA with \( N = 2 \) has the best directivity. The MA with \( N = 3 \) has the best directivity in the interval 600–1300 Hz. The MA with \( N = 9 \) works best only when \( f > 1300 \) Hz. It should also be noted that the difference in \( G(f,\phi_{d} ) \) for \( N = 3 \) and \( N = 9 \) is not so large, so we will further focus only on the MA with \( N = 2 \).

Maximizing (3) for \( N = 2 \), \( d = 0.4 \) and \( \varphi_{d} = 0 \) we find that \( G(f,\varphi_{d} ) \) has a maximum when \( f = 523 \) Hz. The corresponding DP as well as the DP of the initial MA with \( N = 9 \) are shown in Fig. 3.

Fig. 3.
figure 3

Directivity patterns, calculated using all 9 (solid line) and 2 edge (dashed line) microphones of the MA for \( \phi_{d} = 0 \), \( f = 5 2 3 \) Hz.

Figure 3 demonstrates the advantages of using the two microphone scheme instead of the total number of microphones in the LF band. Further increase of signal frequency leads to the point of intersection of \( G(f,\varphi_{d} ) \) for \( N = 2 \) and \( N = 9 \): \( f = 6 5 4. 5 4 \) Hz. The corresponding DPs are shown in Fig. 4.

Fig. 4.
figure 4

Directivity patterns, calculated using all 9 (solid line) and 2 edge (dashed line) microphones of the MA for \( \phi_{d} = 0 \), \( f = 6 5 4. 5 4 \) Hz.

In our opinion, Fig. 4. shows performance deterioration for \( N = 2 \), as the side-lobe level of the DP increases rapidly. Figures 3 and 4 suggest that the boundary frequency between using 2 and 9 microphones should be chosen in the interval [523, 654] Hz.

2.3 Dependence on the Look Direction

The above suggestions were obtained when the look direction was perpendicular to the line of the microphones, i.e. for \( \varphi_{d} = 0 \). In real life we often have \( \varphi_{d} \ne 0 \) and it is clear that directivity patterns, directivity indexes and, consequently, boundary frequencies depend on the \( \varphi_{d} \). These dependences for the MA with \( L = 0.4 \) m are shown in Fig. 5.

Fig. 5.
figure 5

Three basic frequencies as functions of the look direction \( \phi_{d} \).

It can be seen that all frequencies decrease when \( \varphi_{d} \) increases. So the boundary frequency of the filters should be controlled depending on the \( \varphi_{d} \). We suggest choosing Max DI frequency (solid curve on Fig. 5) or the arithmetical mean between Max DI frequency and the frequency at which the 2 and 9 mics DI cross.

The structure of the MA with the proposed method is shown in Fig. 6.

Fig. 6.
figure 6

Flow-chart of the MA with the proposed method.

The first two blocks of the chart are well-known: they are the short time Fourier transform (Block 1) that transforms input signals \( x_{i} (t), \, i = 0,N - 1 \) into the frequency domain, and the frequency-domain delay (Block 2) where all signals are delayed by multiplication by the complex steering vector \( D_{m} (f,\varphi_{d} ) \) [9] to move the MA beam to the look direction. After the delay, all signals \( X_{i} (f,\varphi_{d} ) \), \( \, i = 0,N - 1 \) are summed and normalized to \( \, { 1\mathord{\left/ {\vphantom { 1N}} \right. \kern-0pt} N} \) in Block 3. At the same time the two signals \( X_{0} (f,\varphi_{d} ) \) and \( X_{N - 1} (f,\varphi_{d} ) \) are summed and normalized to \( \, { 1\mathord{\left/ {\vphantom { 12}} \right. \kern-0pt} 2} \) in Block 4. Then both sums are filtered in Block 5 where the cut-off frequency is calculated according to the look direction \( \varphi_{d} \) (Fig. 5). Finally, both filtered sums are added together, after which inverse Fourier transform is performed for the resulting signal.

As a result, we have the following: high-frequency components of the input signal are passed to the MA output through Block 3 and the high-pass filter (all MA microphones work); at the same time, the LF components of the input signal are passed to the MA input through Block 4 and the low-pass filter, i.e. only \( X_{0} (f,\varphi_{d} ) \) and \( X_{N - 1} (f,\varphi_{d} ) \) are used, which gives us better MA directivity in the LF band.

3 Conclusion

In this paper we describe a new method for improving MA directivity in the low-frequency band. This method can be useful for speech processing in MAs in different areas, for example in speaker verification [8], in multimodal systems [9], etc.

Note that it is easy to implement a 3-microphones scheme (two microphones on the edges and the central one) in the flowchart in Fig. 6. In order to do this, it is necessary to add the third sum block and the third band-pass filter to Block 5. However, the improvement will not be significant (see Fig. 2).

It is clear that the proposed method can be used for LF direction improvement not only in equidistant MAs but in all MAs that include a set of microphones, even for nested microphone subarrays described in [5].

The method can also be used in planar MAs. For circular planar MAs, for example, the microphone ring with the highest diameter is the analog of two microphones on the edges for linear MAs. The proposed scheme may be improved by using adaptive superdirectivity algorithms, for example, those presented in [10]. This will further strengthen the MA directivity in the LF band, but will cause a shift in boundary frequency. This modification of the method remains to be studied in future work.