In an enclosed environment, multi-source DOAE is an inspiring due to environmental noise, room reverberation, and the source spectra overlapping. The DOAE of acoustic sources is imperative for several applications including beamforming, automatic camera steering, robotics and surveillance. The incidence of background noise and reverberation should be considered in the real environment. Furthermore, DOAE of multiple and concurrent active sources is considered a challenging problem [1,2,3,4]. For these applications, conventional methods often exploit an omnidirectional microphone array with DOAE using phase-delay information between the microphones. Nevertheless, the conventional arrays necessitate a large aperture.

4.1 Simulation of LPA Beamformer

Assume a ULA of ten omnidirectional sensors with the half-wavelength inter-element spacing, and uncorrelated sources with SNR = 0 dB in a single sensor. For three signals \( s_{1} (t) \), \( s_{2} (t) \) and \( s_{3} (t) \) be zero-mean white Gaussian with \( \sigma = 1 \), have the angles \( \theta_{i} (t),\,\text{i} = 1,2,3 \). Assume rectangular window has N = 60 snapshots and sampling period of \( T = 1\,\text{s} \). Thus, the observation model is represented as follows [5,6,7]:

$$ \text{r(t + kT)} = \text{A(t + kT)s(t + kT) + e(t + kT)} $$
(4.1)
$$ \text{A}(t + kT) = [\text{a(}\theta_{1} (t + kT)\text{),}\,\text{a(}\theta_{2} (t + kT)), \ldots ,\text{a(}\theta_{q} \text{(}t + kT\text{))}] $$
(4.2)
$$ s(t) = [s_{1} (t),s_{2} (t), \ldots ,s_{q} (t)]^{T} $$
(4.3)

Afterwards, the time-varying \( \theta_{i} (t) \) should be estimated, where the time-argument is centered around the \( t \) of the LPA, where:

$$ \theta_{i} (t + kT) = \theta_{i} (t) + \theta_{i}^{(1)} (t)kT $$
(4.4)

Thus, \( \theta_{i} (t + kT) \) is linear on time \( kT \) with the values of the direction \( \theta_{i} (t) \) and the first derivative \( \theta_{i}^{(1)} (t) \) for the time-instant \( t \). The performance of the LPA beamformer versus the conventional beamformer is studied for different scenarios as follows for multi-speech signals from moving sources.

4.1.1 Case 1 (One Source Case)

Consider a source, moving with three different uniform velocities, namely \( \theta^{(1)} (t) = (0,1,2)\,\text{deg/sample} \), where \( \theta (t) = 0^{\circ } \). The LPA function is displayed in Figs. 4.1 through 4.4 as a 2D contour plot and 3D surface plot are illustrated with focusing on the location of interest. Figure 4.1 illustrates the LPA beamformer output of a single stationary source.

Fig. 4.1
figure 1

A single stationary source case beamformers output at \( \theta = 0 \)

Figure 4.1 shows the accurate source localization at \( \theta = 0 \) and \( \theta_{{}}^{(1)} (t) = 0. \) Figure 4.2 illustrates the LPA beamformer output compared to the conventional beamformer \( P_{conv} \) for a single source case at \( \theta = 0 \) and \( \theta_{{}}^{(1)} (t) = 1{\text{ deg/sample}}. \)

Fig. 4.2
figure 2

Beamformers output for a single source case at \( \theta = 0 \) and \( \theta_{{}}^{(1)} (t) = 1{\text{ deg/sample}} \)

Figure 4.2 establishes that the LPA provided an accurate indication of the source location at \( \theta = 0. \) Also, \( P_{LPA} \) shows \( \theta_{{}}^{(1)} (t) = 1 \), while, \( P_{conv} \) cannot indicate the source location, where the peak location is shifted. Figure 4.3 illustrates the comparative results of the LPA and the conventional beamformers for a single source case at \( \theta = 0 \) and \( \theta_{{}}^{(1)} (t) = 2{\text{ deg/sample}}. \)

Fig. 4.3
figure 3

Beamformers output for a single source case at \( \theta = 0 \) and \( \theta_{{}}^{(1)} (t) = 2{\text{ deg/sample}} \)

Figure 4.3 illustrates the accurate localization of the source using the LPA beamformer at \( \theta = 0 \) and \( \theta_{{}}^{(1)} (t) = 2 \), while, \( P_{conv} \) cannot indicate the source location, where the output is degraded. Figure 4.4 demonstrates the beamformers output for a single source case at \( \theta = 0 \) and \( \theta_{{}}^{(1)} (t) = - 1{\text{ deg/sample}}. \)

Fig. 4.4
figure 4

Beamformers output for a single source case at \( \theta = 0 \) and \( \theta^{(1)} (t) = - 1{\text{ deg/sample}} \)

Figure 4.4 establishes the achieved accurate localization of the single source that located at \( \theta = 0 \) and \( \theta_{{}}^{(1)} (t) = - 1 \) using the LPA beamformer, while, \( P_{conv} \) cannot indicate the source location. The peaks of all curves indicate the efficient localization of the source position using the LPA beamformer. The true, accurate values of \( \theta (t) \) and \( \theta^{(1)} (t) \), while the shape of the curves is slightly depending on the angular velocity. The LPA function \( P_{LPA} \) is expressed in term of \( c_{0} \) i.e., \( \theta \) as well as the conventional beamformer \( P_{conv} \). The peaks of all the LPA curves show the true value of \( \theta (t) \) for all velocities. The increasing of the source velocity shifts the \( P_{conv} \) peaks from the accurate value of the angle, while a degradation of the one-peak of \( P_{conv} \) for larger values of \( \theta^{(1)} (t) \) occurs in Fig. 4.3. Comparing the \( P_{conv} \) output in Figs. 4.1 and 4.4, it is clear that the peak is shifted to the right when the angular velocity is positive, while it is shifted to left when the angular velocity is negative. Moreover, the \( P_{LPA} \) has no side-lobes in a wide range around the peak, which indicates the source location, and its main lobe bandwidth is smaller than that of the conventional beamformer. This can be considered an advantage of the LPA beamformer compared to the conventional beamformer even for the case of unmoving sources. As the source moves with negative uniform angular velocity as in Fig. 4.4, the \( P_{LPA} \) beamformer indicates the source location correctly.

4.1.2 Case 2 (Well Separated Multi Sources Case)

Consider the case of three sources (microphones), where the resolution of the sources depends on the direction \( \theta \) and can be given by:

$$ \Delta \theta \cong {1 \mathord{\left/ {\vphantom {1 {\left| {\Gamma \cos \theta } \right|,\quad\Gamma = (n - 1){\text{d}}/\lambda }}} \right. \kern-0pt} {\left| {\Gamma \cos \theta } \right|,\quad\Gamma = (n - 1){\text{d}}/\lambda }} $$
(4.5)

Assume well separated sources having \( \theta_{1} (t) = - 16^{\circ } ,\,\theta_{2} (t) = 0,\theta_{3} (t) = 16^{\circ } \) and they have the same angular velocity \( \theta^{(1)} (t) = 1^{\circ } /sample \) as illustrated in Fig. 4.5.

Fig. 4.5
figure 5

LPA beamformers output for a well separated sources at \( \theta_{1} (t) = - 16^{ \circ } ,\theta_{2} (t) = 0,\theta_{3} (t) = 16^{ \circ } \) and they have the same angular velocity \( \theta^{(1)} (t) = 1^{ \circ } /sample \)

Figure 4.5 establishes that the peaks of the LPA beamformer preserve their position at the actual location of the sources, which provide the right estimate of the source velocities. While the conventional beamformer is degraded and cannot localize the sources correctly where the angular velocity is considered (i.e., the sources are non-stationary).

4.2 Simulation of Frost Beamformers of Microphone Array

Sensor arrays are commonly employed for signal separation from noises based on the DOA information. Frost’s beamformer has a significant role in speech processing for speaker localization. Each sensor in the Frost’s beamformer array is followed by a transversal filter, which has weight as illustrated in Fig. 4.6 [8]. The beamformer output is the filter outputs’ sum, where the weights updated by Frost’s constrained least mean square (CLMS) procedure to minimize the mean square error of the output signal.

Fig. 4.6
figure 6

Frost’s beamformer configuration

The whole Frost’s beamformer system can be supplanted by one transversal FIR filter in the speech signal. The Matlab function in [9] is used to implement a time domain beamformers to recover speech signals by a microphone array of noisy microphone array measurements and to simulate an interference-dominant signal received. The Frost’s beamformer is applied as it has superior interference suppression ability compared to the time-delay approach. The Frost beamformer robustness is achieved using diagonal loading. Several scenarios are applied to show the structure effect of the omnidirectional microphones configuration, the elements’ spacing and the number of speech sources.

4.2.1 Case 1 (ULA of Ten Omnidirectional Microphones)

Assume a uniform linear array (ULA) 10 omnidirectional microphones to receive the speech signal. The array elements have 5 cm spacing, where multichannel signals are received by the MA. Two recorded speeches with one laughter recordings have been included, where the laughter audio refers to the interference. The azimuth and elevation directions of the speech signals are \( ( - 30^{\circ } ,0^{\circ } ) \) and \( ( - 10^{\circ } ,10^{\circ } ) \) of the first and second speech signals; respectively, while, the interference (laughter) comes from the direction \( (20^{\circ } ,0^{\circ } ) \), which masks the speech signals. In addition, for each sensor, white noise signal of \( 1e^{ - 4} \) watts representing the thermal noise is considered. At the array origin, each input single-channel signal is received by a single microphone. Figure 4.7 represents the channel 3 received signal.

Fig. 4.7
figure 7

The received speech signal at channel 3

In order to compensate the arrival time differences (ATD) across the array, a time delay beamformer is applied with the coming signal from certain direction. A time delay (TD) beamformer is constructed to delineate a steering angle consistent to the first speech signal’s incident direction. Figure 4.8 illustrated the TD beamformer output.

Fig. 4.8
figure 8

The TD beamformer output

The speech enhancement can be reported by measuring the array gain representing the ratio of output to input signal-to-interference-plus-noise ratio (SINR). In addition, a Frost beamformer can be used to acquire superior beamformer performance, where the attached FIR filters to each sensor provided the Frost beamformer with more weights for suppressing the interference. Thus, nulls can be placed at the interference directions for superior interference suppression. The Frost beamformer achieved 14 dB array gain, which is 4.5 dB higher than that of the TD beamformer. Furthermore, the frost’s beamformer can be used to steer the array in the direction of the second speech signal. Figure 4.9 illustrates the frost’s beamformer output with diagonal loading to improve its performance.

Fig. 4.9
figure 9

The frost’s beamformer output with diagonal loading

In order to demonstrate the effect of the microphone array configuration and cumber of elements, the following scenarios are applied.

4.2.2 Case 2 (ULA of 5 Omnidirectional Microphones)

In this case, the same signals at the same directions are received using a ULA with five omnidirectional microphones to receive the speech signal is used instead of using ten elements. In addition, the spacing between the elements is increased to be 10 cm instead of 5 cm. Figures 4.10, 4.11 and 4.12 represent the channel 3 received signal, the TD beamformer output, and the frost’s beamformer output; respectively.

Fig. 4.10
figure 10

The received speech signal at channel 3

Fig. 4.11
figure 11

The TD beamformer output

Fig. 4.12
figure 12

The frost’s beamformer output with diagonal loading

In order to demonstrate the effect of the microphone array configuration, the following scenario is applied.

4.2.3 Case 2 (UCA of 5 Omnidirectional Microphones)

In this case, the same signals at the same directions are received using a uniform circular array (UCA) with five omnidirectional microphones to receive the speech signal. In addition, the radius of the circular array is 1.5 cm. Figures 4.13, 4.14 and 4.15 represent the channel 3 received signal, the TD beamformer output, and the frost’s beamformer output; respectively.

Fig. 4.13
figure 13

The received speech signal at channel 3

Fig. 4.14
figure 14

The TD beamformer output

Fig. 4.15
figure 15

The frost’s beamformer output with diagonal loading

A comparative study between the TD beamformer and the frost’s beamformer gain values from the preceding results of the different array configurations is illustrated in Fig. 4.16.

Fig. 4.16
figure 16

The array gain using different configurations

4.3 Linear Microphone Array for Live Direction of Arrival Estimation

The 4 built-in microphones of the Microsoft Kinect™ are recognized using Matlab 2017 to estimate the linear coordinates [10]. The applied procedure is used independently with pairs of microphones to estimate the DOA, which are then combined to determine a single live DOA output. As the inter-microphone distance increases, DOAE sensitivity increases correspondingly. A bespoke arrow-based polar visualization (Fig. 4.17) is used to illustrate the DOA estimation of the sound source using multiple microphone pairs within the linear array, where the four microphone positions at [−0.088, 0.042, 0.078, 0.11].

Fig. 4.17
figure 17

The polar plot of the DOAE

Figure 4.18 illustrates the polar plot of the DOAE when changing the microphone positions to be at [−0.05, 0.062, 0.098, 0.19] .

Fig. 4.18
figure 18

The polar plot of the DOAE