Keywords

1 Introduction

The significance of effective methods for time delay estimation (TDE) lies primarily in their wide practical application in local positioning systems [1, 2]. Schematically, the problem of local positioning is presented in Fig. 1 [1]. Depending on the specific circumstances within the scenario, positioning tasks can be considered as passive or active. In passive tasks, the position of the object that is the direct emitter of the signal is determined. In active tasks, the mobile object being positioned reflects a dedicated signal emitted by the locator [3]. It is also possible to reverse the composition of the system shown in Fig. 1, where the mobile object will be considered as the signal receiver. In turn, stationary nodes will become transmitters [4]. It should be noted that the reverse passive scenario is not significantly different from the regular passive one when considering solely the TDE problem. Time delay estimation methods are normally applyed to all those scenarios [1].

Fig. 1.
figure 1

Generalized TDE scheme (passive non-reverse scenario): S – mobile object (active emitter); ABC, …, Z – array of stationary sensors; s0(t) – signal emitted by the source S; LALBLC, …, LZ – exact positions of the corresponding sensors; sA(t), sB(t), sC(t), …, sZ(t) – signals received by corresponding sensors; L0 – position of the source S.

The high demand for object positioning systems, in turn, is associated with the development of wireless technologies and advances in industrial automation. Recent development of the Internet of Things [5, 6] and the spread of smart devices [7, 8] have opened a great new field of application of positioning systems. The gradual introduction of such devices and systems into the consumer goods market makes it relevant to further reduce their cost. The principal factor of cheapening in this case is the reduction in the cost of hardware for smart devices or sensor nodes [6]. Systems on a chip (SoC) and SoC based single board computers are often considered as core computing units for positioning systems. The advantage of the latter is the developed peripherals and wide availability of hardware expansion modules, sufficient to solve most user tasks [9].

The main reason for choosing single-board computers is their high portability and self-sufficiency, as well as low power consumption and reduced cost compared to personal computers. However, you have to trade this for worse performance characteristics and lower volume of available RAM. This has to considered when implementing digital signal processing algorithms for practical applications.

In particular, a specific feature that must be taken into account when implementing local positioning techniques is the limited performance of the hardware platform, especially when operating in real time. This makes it relevant to study the practical methods of TDE and variants of their practical implementation in relation to various scenarios of application. The expected result is the reasonable choice of the TDE methods and their variants applicable to such scenarios as real time positioning of a mobile object, positioning of a signal source with a priory known spectral composition, and some others.

This article presents a theoretical and empirical comparison of different implementations of time-domain and frequency-domain methods of TDE in terms of computational performance and the amount of memory involved. In the experimental part of the work, a Raspberry Pi 4B single board computer was used as a hardware for tests.

2 Overview of TDE Methods

Time delay estimation is the most common technique applied to the local positioning problem. The essence of TDE is to measure a time lag between signals received by an array of spatially distributed sensors. In the basic passive scenario, the source of those signal is a positioning object.

For the sake of certainty, let us assume that only two sensors are used. In this case, the signals received by the sensors are described by the following expression [10]

$$ \begin{gathered} s_{a} (t) = K_{a} \cdot s_{0} (t -\uptau _{a} ) + n_{a} (t), \hfill \\ s_{b} (t) = K_{b} \cdot s_{0} (t -\uptau _{b} ) + n_{b} (t), \hfill \\ \end{gathered} $$
(1)

where s0 is the signal emitted by the object; Ka, Kb are the attenuation coefficients of signals along the propagation path; τa, τb are the delays between the received and the emitted signals; na, nb are additive noises, sa, sb are received signals. So the required lag time is expressed as

$$\uptau _{ab} = (t -\uptau _{a} ) - (t -\uptau _{b} ) =\uptau _{b} -\uptau _{a} . $$
(2)

The value τab is often referred to as the time difference of arrival (TDOA). Wide class of positioning methods is based on the analysis of set of TDOA estimations [1]. Depending on the practical setting, they can be used to determine the coordinates of an object by the method of multilateration in linear coordinates [11,12,13], in plane coordinates [14, 15], or in three-dimensions [5, 8, 16]. Such problems are quite typical for such fields of science and technology as local positioning and indoor navigation, wide area positioning of mobile subscribers in communication networks [5, 8], locating of pipeline leaks with leak noise correlators [11,12,13].

The applied methods to the aforementioned problem can be divided into two groups: time and frequency [17]. This segregation of methods is based on differences in the form of representation of signals directly in their analysis.

2.1 Time-Domain Methods

Time-domain TDE methods usually assume analyzing the correlogram [3]. The lag time can be obtained as an argument of the correlation function at which it reaches its maximum value:

$$\uptau _{ab} = \arg \max \big(R_{ab} (\tau )\big), $$
(3)

where arg max is the operator for obtaining the argument value at which function is maximized; Rab (τ) is the correlation function of sa and sb.

The conventional way of calculating the correlation function of sampled signals s(ti) and s(ti) applies the convolution theorem [11].

$$ R_{ab} \left( {{\uptau }_{j} } \right) = {\text{F}}^{ - 1} \left( {{\text{F}}^*\left({s_{a} \left( {t_{i}}\right)}\right) \times {\text{F}}\left( {s_{b} \left( {t_{i} } \right)} \right)} \right), $$
(4)

where F is operator of discrete Fourier transform (DFT) operator; F–1 is operator of inverse DFT (IDFT); * denotes unary operation of element-wise complex conjugation; × denotes binary operation of element-wise product.

Despite the prevalence of the approach based on (3) and (4), the practical implementation of time-domain methods may vary in a sensible way. The variants differ mainly by the algorithm that is used to perform DFT and IDFT operations.

2.2 Frequency-Domain Methods

Frequency-domain methods extract the lag time directly from the cross-spectrum of the analyzed signals. The discrete spectra Sa (fk) and Sb (fk) of the sampled signals s(ti) and s(ti) are complex valued, therefore we can consider them as following [18]

$$ S_{a,b} \left( {f_{k} } \right) = {\text{F}}\left( {s_{a,b} \left( {t_{i} } \right)} \right),\,\;S_{a,b} \left( {f_{k} } \right) = X_{a,b} \left( {f_{k} } \right) \times \Phi_{a,b} \left( {f_{k} } \right), $$
(5)

where Xa (fk), Xb (fk) are amplitude spectra; Фa (fk), Фb (fk) are phase spectra. The amplitude spectrum carries information about the energy properties of the signal. The phase spectrum carries information about the temporal features of the signal, in particular, the time shift is reflected in it.

The cross-spectrum, respectively, has the form

$$ S_{ab} \left( {f_{k} } \right) = S_{a}{}^*\left( {f_{k} } \right) \times S_{b} \left( {f_{k} } \right), $$
$$ S_{ab} \left( {f_{k} } \right) = X_{ab} \left( {f_{k} } \right) \times \Phi_{ab} \left( {f_{k} } \right), $$
$$ X_{ab} \left( {f_{k} } \right) = X_{a} \left( {f_{k} } \right) \times X_{b} \left( {f_{k} } \right),\Phi_{ab} \left( {f_{k} } \right) = \Phi_{b} \left( {f_{k} } \right) - \Phi_{a} \left( {f_{k} } \right). $$
(6)

The phase component of the cross-spectrum Фab (fk) is used to extract information about TDOA. The following formula can be used directly to estimate the lag [18]

$$\uptau _{ab} = {{\sum\limits_{k}^{{}} {\Theta_{ab} (f_{k} ) \cdot f_{k} } } \mathord{\left/ {\vphantom {{\sum\limits_{k}^{{}} {\Theta_{ab} (f_{k} ) \cdot f_{k} } } {2\pi \sum\limits_{k}^{{}} {f_{k}^{2} } }}} \right. \kern-0pt} {2\pi \sum\limits_{k}^{{}} {f_{k}^{2} } }}, $$
(7)

where Θab (fk) = U[Фab (fk)] is the result of applying the unwrapping operator U to Фab (fk). [18]

Expression (7) is established based on the full spectral representation of the signals. Taking into account the equivalence of the information contained in the time and frequency representations of the signal, it is correct to consider (7) as an analogue of (3). The mathematical identity of these methods in terms of the potentially achievable accuracy is shown in [19]. However, it should be noted that in practical cases, they give different results. This is due to both the difference between real signals and model signals, and the inevitable differences in their computational simulation [20].

In [20], an alternative variant (7) is proposed, which allows one to use an arbitrary set of samples of the phase cross-spectrum. This makes it possible to apply the frequency method in situations where noise prevails at low frequencies. The use of an alternative formula also makes it possible to determine the time lag without computing the entire spectrum. This feature is discussed in detail in the following section.

2.3 Signal Processing in Practical TDE

In practical cases, the basic methods of TDE have low accuracy due to contamination of the source signal by additive noise on the side of the receiving sensors. Reduction in the negative impact of noise can be achieved by averaging estimates of the spectral characteristics of signals. Each spectral estimate is obtained by the short-time Fourier transform method [10]. In general, the time windows at the input of the transformation can overlap and have a shared subset of signal samples. The expressions for computing the correlation function for (3) and the phase cross-spectrum for (7) will respectively take the following form:

$$ R_{ab} \left( {{\uptau }_{j} } \right) = \text{F}^{ - 1} \left( {\frac{1}{Q} \cdot \sum\limits_{q} {\left[ {\text{F}^*\left( {s_{a}^{(q)} \big( {t_{i} } \big)} \right) \times \text{F}\left( {s_{b}^{(q)} \big( {t_{i} } \big)} \right)} \right]} } \right), $$
(8)
$$ \Theta_{ab} (f_{k} ) = \text{U}\left[ {\frac{1}{Q}\sum\limits_{q} {\left[ {\Phi \left( {s_{b}^{(q)} \big( {t_{i} } \big)} \right) - \Phi \left( {s_{a}^{(q)} \big( {t_{i} } \big)} \right)} \right]} } \right], $$
$$ \Theta_{ab} (f_{k} ) = \text{U}\left[ {\arg\left( {\frac{1}{Q}\sum\limits_{q}^{{}} {\left[ {\text{F}^*\left( {s_{a}^{(q)} \big( {t_{i} } \big)} \right) \times \text{F}\left( {s_{b}^{(q)} \big( {t_{i} } \big)} \right)} \right]} } \right)} \right], $$
(9)

where Q is the total number of time windows at the input of DFT; arg is the operator that returns argument of a complex number; sa(q), sb(q) are subsets of signal samples belonging to the time window with index q.

In addition to averaging spectral estimates, frequency-weight functions of the form Ψab (fk) are used to further reduce the influence of noise [21]. The values of the samples of the frequency-weight functions are positive and do not exceed unity. Values Ψab (fk) close to unity indicate that at this frequency bin fk, the signal overall prevails over noise throughout the entire observation period. Frequency-weight functions can be used with both time-domain [11, 21, 22] and frequency-domain methods [12, 23] of TDE. Despite the variety of such functions, averaged spectral estimates are always used to get them. From a computational standpoint, obtaining additional spectral estimates does not differ from the cross-spectrum estimate used in (8). For this reason, weighing in the frequency domain is not considered in the course of the further experimental study.

2.4 Variants of Fourier Transform Implementation

As shown above, both time and frequency methods of TDE require spectral transformations. In the practice of digital signal processing, DFT algorithms are used for this. Some of the most effective solutions are classified as fast Fourier transforms (FFT). Among the latter, the Coolie-Tukey algorithm is the most well-known and widespread [24].

Despite the significant computational advantages of FFT over the straight computation of the DFT, its use has a number of inconvenient features. Firstly, most FFT algorithms impose restrictions on the number of samples at the input of the transform. Secondly, those algorithms allow only the computation of the entire spectrum. This is redundant in cases where the informative signal is localized in several a priori known frequency bins. Thirdly, the obtaining of new data (a forward shift of the time window by several samples) requires a full-fledged application of the FFT to obtain a new spectral estimate. This complicates the use of FFT when operating in real time. The use of small windows is not always acceptable, since the size of the time window is associated with frequency resolution and noise tolerance. In contrast, the use of large windows, in combination with calling the transformation every time new data arrives, creates a large computational load.

Special DFT implementations were proposed for all the cases described above, where the FFT is limited in application. In particular, the chirp Z-transform (CZT) makes it possible to obtain an arbitrary number M of spectral bins using time windows composed of an arbitrary number N of ticks [25]. It should be noted that this limitation of the FFT is not essential for applying to TDE problem so we do not consider CZT in further study.

The Goertzel algorithm was initially proposed to compute individual frequency bins within the signal spectrum [26]. The use of this algorithm in conjunction with the frequency-domain methods of TDE allows one to obtain a time lag without computing the entire spectrum. This feature gives a computational advantage in some TDE scenarios, and therefore we will investigate it further.

A recursive sliding DFT algorithm can be used to obtain spectral estimates in real time [27]. The advantage of this algorithm is the ability to reevaluate the already available spectral characteristics based on newly received data. We will further investigate the performance of SDFT and the corresponding amount of used memory to determine its possibility of application to TDE problem.

It should be noted that by the moment numerous different recursive DFT algorithms have been developed and described [28, 29], which remained beyond the framework of this study. However, the potential of applying a few of them to TDE problem is discussed in conclusion.

3 Computational Study

To determine the operational capabilities of a sensor node based on a single-board computer, a series of computational experiments was carried out. During the experiments, an array of dummy data was processed with time-domain and frequency-domain TDE methods described in the previous section.

Further in this section we present estimates of the computational performance and memory usage benchmarks related to the most critical stages of the implementation of the considered methods. The discussion section showcases a comparison between the empirical outcomes derived from empirical investigations and theoretical estimations. Operation limits for the TDE device that are implied from the study are also could be found within the discussion.

3.1 Raspberry Pi 4B Hardware

Computational experiments feature a Raspberry Pi 4B single-board computer [30] with a HiFiBerry DAC + ADC Pro expansion board [31] shown in Fig. 2. The Broadcom BCM 2711 SoC is the core processing unit of the computer. This SoC incorporates a quad-core general-purpose processor with the Cortex-A72 microarchitecture and a VC6 graphics core, along with some peripheral components. The HiFiBerry sound card was used exclusively in some segregated tests to verify the operability in real time for specific input data rates and particular preset of computational parameters. Therefore, its characteristics are not significant in the context of this study.

Fig. 2.
figure 2

Raspberry Pi 4B with HiFiBerry DAC + ADC Pro sound card attached on top.

3.2 Testing Software

For the purpose of this study, we have developed software for automated experimentation and statistical preprocessing of acquired data. We elected C++ as the main programming language, which was used to unify the program interfaces and implement wrap distinct computational functions.

Performance critical software components were implemented in low level in C. Our algorithmic implementation of the TDE methods largely corresponds to the description given in Sect. 2. To implement the FFT, we have used the current version of the FFTW library, that is in fact commonly considered as the branch standard. We implemented software components for SDF and Goertzel transform in a low level based on the algorithms described in [26] and [27] respectively.

We implemented a special class dedicated to acquisition of statistical data on computation time. Raw time benchmarks were gathered on calls of execution method of wrapper class. Each benchmark was reiterated 150 times. Then, the raw data underwent statistical processing. For each benchmark, we recorded the minimum and maximum execution times, the average time, as well as the standard deviation of time. Sample code for gathering and processing benchmarks is shown in Fig. 3.

Fig. 3.
figure 3

Visual Studio screenshot that shows implementation of time measurements.

Only dynamic allocation was taken into account when evaluating memory usage. This is due to the fact that a fair share of memory usage is associated with storing in buffers time series, complex spectra and precomputed constants for transforms. Due to their size, these data arrays have to be stored in dynamic memory. Such an approach to the evaluation of memory usage is tolerant to distortion by memory, that is used on the stack and not directly related to the algorithms under study. The influence of the latter could not be excluded if the entire memory associated with the process was used as an estimate.

The Valgrind software was used to collect data on the allocated memory [32]. This tool is a specialized memory management service, debug utility system and profiler for software developers. Its functions include but not limited to the search for memory leaks, register attempts to accesses beyond the boundaries of allocated areas or use of uninitialized memory, and the investigation of other memory-related bugs.

3.3 Estimation of Computation Time

The variant of DFT implementation heavily influences the performance of a TDE method. This follows from (3) and (8), as well as (7) and (9), which is coherent with acquired experimental data. Any of these TDE methods requires at least \(2 \cdot Q\) DFTs. This computational operation significantly prevails in (9). Other operations are mostly computationally simple: element-wise products of complex values, a unitary element-wise taking argument of complex numbers and element-wise multiplication by a scalar value. On the other hand, (8), in addition to similar element-wise operations, requires a single execution of the IDFT, which is computationally equivalent to an additional forward DFT.

For this reason, we further provide runtime estimates related only to the implementation variants of DFT. The execution time of the rest of the operations is not of comparable interest, because it has an auxiliary effect on the performance of the TDE methods, and also usually depends on the size of the time window N linearly.

The estimations of FFT execution time for various sizes N of the time window are presented in Table 1. Here and further, the following designations are used: Tmin – minimum computation time; Tmax – maximum computation time; Tave – mean computation time; ∆T – standard deviation (half width) of computation time. Since many random factors can negatively affect the calculation time, we chose the minimum time as the most reliable estimate for the purpose of performance comparison. Key benchmarks for FFT are shown in Fig. 4.

When estimating the execution time of the Goertzel transform, we have varied both the number of samples in time windows and the number of calculated frequency bins. Since a theoretically predicted linear dependence of the execution time on the number of frequency bins presented in all experiments, in Table 2 we showed the computation time for a single frequency bin. Key benchmarks for Goertzel algorithm of DFT are shown in Fig. 5.

Similarly, when estimating the execution time of the SDFT, we varied the number of samples in time windows as well as the overlap rate between adjacent windows. Since we predictably found a linear dependence of the computation time on the number of newly introduced time samples in the previous time window, we elected to present the computation time for a single sample in Table 3. Key benchmarks for SDFT are shown in Fig. 6.

3.4 Estimation of Memory Usage

Estimates for the memory usage are given only for DFT variants, for similar reasons. However, the memory requirements depend on a TDE method to a greater extent than the performance. For instance, the use of frequency weighting functions requires storing in memory several additional spectral estimates (usually power spectra) as well as a set of frequency coefficients. So time-domain methods require the storage of whole spectra and the full set of frequency coefficients, while frequency-domain methods can rely on a limited set of frequency samples that require less memory to store.

Empirical estimates of the memory usage are presented in Fig. 7. The results of the study indicate the slight superiority of the Goertzel transform in this aspect. The actual advantage of the latter may be higher, given that the volume of required memory is dependent on the number of computed frequency bins (see Fig. 8). However, if we elect not to preserve inputs with FFT we can even make memory its usage lesser than Goertzel for full spectrum case.

Table 1. Time to compute full spectrum with FFT.
Fig. 4.
figure 4

Computation time vs number of samples within time window for FFT: for a full input window of N samples (on top); for a single input sample (on bottom).

Table 2. Time to compute one frequency bin with Goertzel transform.
Fig. 5.
figure 5

Computation time vs number of samples within time window for Goertzel transform (various rates of computed frequency bins are indicated by the color of a curve): for a full input window of N samples (on top); for a single input sample (on bottom).

Table 3. Time to compute SDFT with almost overlapping time windows and precomputed spectrum for previous window (all samples but one are in both time windows).
Fig. 6.
figure 6

Computation time vs number of samples within time window for SDFT (various rates of overlapping samples are indicated by the color of a curve): for a full input window of N samples (on top); for a single input sample (on bottom).

Fig. 7.
figure 7

Memory usage vs number of samples within time window for all considered DFT variants.

Fig. 8.
figure 8

Memory usage rate (compared to maximum value used for computation of full spectrum) vs rate of computed frequency bins for Goertzel transform (various windows size are indicated by the color of a lines).

Figure 8 clearly shows that the memory required for Goertzel transform is linearly dependent on the number of bins that have to be calculated. The constant term in the linear equation tends to become less significant with the size of the time window.

4 Discussion

Our empirical results correspond well to theoretical estimations of complexity in regard to memory usage and computational operations required. Calculating the DFT for a real input time series using the FFT requires (N/2)·log2(N) complex multiplications and N·log2(N) complex additions. Each complex multiplication is composed of 4 real multiplications and 2 real additions. A complex addition is composed of 2 real additions. Thus, computing a spectrum via applying FFT to a time window of N samples requires 2N·log2(N) real multiplications and 3N·log2(N) real additions. The asymptotic computational complexity of the transform is O(N) = N·log2(N) and it is consistent with Fig. 4.

Each recursive FFT call requires splitting and reordering the interim results obtained at the current step of recursion. In the proposed implementation, a separate buffer was allocated for spectral data. However, it is possible to save about one third of memory by overwriting the initial sequence during computations. However, it is necessary to store the pre-calculated rotation multipliers prior to the transform, or performance will be compromised. Asymptotically, the memory usage of the FFT is O(N) = N, which is consisted with Fig. 7.

The Goertzel algorithm for calculating K frequency bins for an input time series of N samples requires K·4N real multiplications and K·5N real additions. Thus, its asymptotic computational complexity is O(K,N) = K·N. Full scale real-valued DFT by Goertzel algorithm (K = N/2 + 1) is inefficient and requires 2(N2 + 2N) real multiplications and 5(N2/2 + N) real additions. These estimates correspond well to those curves presented in Fig. 5. The implementation of the Goertzel algorithm requires storing K precomputed complex rotation multipliers. At the same time, it is also necessary to store the input series of N real samples as well as K computed output spectral estimates. The asymptotic memory requirement of the Goertzel algorithm is O(N,K) = N + K, which is consistent with Fig. 8.

The recursive SDFT algorithm relies on the already available spectral estimates when recomputing spectrum with the arrival of new input data. Processing each new time sample requires N + 2 real multiplications and N + 2 real additions. The asymptotic complexity of the transformation is O(M, N) = M·N, where M is the number of newly arrived non-processed time samples. Processing a full time window of N samples requires N2 + 2Nadditions and N2 + 2Nmultiplications. These estimates are consistent with the curves shown in Fig. 6. Like other variants, the sliding transform utilizes an array of rotation multipliers as well as buffers to store input and output sequences. The difference of SDFT is that the input series may not comprise a complete time window of all N samples in a first place. However, an additional internal buffer is required to store the N time samples which were used to obtain current spectral estimates. Even though SDFT can be called with any number of samples as input, it must be at least N samples in total before the first spectral estimate is produced. The requirement for an additional buffer leads to the fact that SDFT slightly underperforms in the aspect of memory requirements. That can be seen in Fig. 8. The asymptotic memory requirement of SDFT is O(M,N) = M + N.

Comparison of DFT implementation variants has shown that FFT is suitable in a wide range of scenarios, with few rare exceptions. Figure 9 shows the range of parameters of a computational problem in which Goertzel algorithms outperforms FFT. As far as a frequency-domain TDE method requires at least three frequency bins to draw a regression line, computational advantages can be achieved only by large time window sizes. This results in high frequency resolution. So to be practical in conjunction with a frequency-domain TDE method, SDFT requires an accurate a priori knowledge of the frequency localization of the signal as well as the absence of scattering during its propagation.

Fig. 9.
figure 9

Area in the domain of computational parameters when Goertzel outperforms FFT.

Figure 10 shows the range of parameters of a computational problem when SDFT has an advantage over FFT in execution time. One can infer from the figure that the use of the recursive algorithm is advisable only if an exceptionally high rate of spectrum recalculation is required. The sampling frequency is usually 44100 Hz if we assume a problem of positioning a mobile object via an acoustic channel. So in this case, the use of SDF will be practical only when the position of the object (along with spectral estimates) need to be updated at a rate exceeding 5000 Hz. Such a scenario seems not to be very realistic.

Fig. 10.
figure 10

Area in the domain of computational parameters when SDFT outperforms FFT.

A comparison of DFT implementation variants in the aspect of memory requirements showed that the Goertzel algorithm has a slight advantage. However, the amount of memory used is generally comparable for all variants, and this advantage appears to be of not high practical importance. If a critical limitation on memory is a case, then it is possible to save about one third of memory used by FFT just by giving up preserving input.

In the course of further discourse, we will assume that the functions of the sensor node are reduced to receiving continuously incoming signals, buffering them, processing them and output the results of their processing. Let us also assume that the intensity of data rate remains unaltered through all operating session, and the processing of the obtained results and their output are carried out asynchronously. The similar situation is described and modeled in [33].

Hence, real-time operating can be attributed to two parallel processes:

  • processing of incoming data and refinement of spectral estimates (usually by coherent averaging of instantaneous spectra);

  • and utilizing those spectra to measure time lags with TDE methods, use time lags to estimate object position solving multilateration problem.

The second process is not hardly synced with the first one and can be performed on demand or on a residual basis. On the contrary, acquiring and processing of incoming data should be done as soon as it arrives in order to avoid buffer overflow and data loss.

Let us denote the intensity of the incoming data flow as B and define as

$$ B = f_{d} \cdot n, $$

where fd – sampling rate; n – number of channels. So, the total number of time windows Q of size N that need to be processed during time period T0 can be defined as

$$ Q = T_{0} \cdot \frac{B}{N} = T_{0} \cdot \frac{{f_{d} \cdot n}}{{N \cdot \left( {1 - s} \right)}}, $$

where s – overlap rate between adjacent time windows (0 ≤ s < 1). By supposing that the processing time of a window is predominantly determined by the DFT computation time, we estimate the total time TQ takes to process all Q windows:

$$ T_{Q} = Q \cdot \frac{T\left( N \right)}{{k\left( N \right)}} \cdot \frac{B}{N} = T_{0} \cdot \frac{{f_{d} \cdot n \cdot T\left( N \right)}}{{N \cdot \left( {1 - s} \right) \cdot k\left( N \right)}}, $$

where T(N) – average computation time of DFT; k – the share of DFT in the total computational load. The ratio TQ to T0 is the fraction \( {\uprho }\) of time that the machine spends on processing the input data stream:

$$ {\uprho } = \frac{{T_{Q} }}{{T_{0} }} = \frac{{f_{d} \cdot n \cdot T\left( N \right)}}{{N \cdot \left( {1 - s} \right) \cdot k\left( N \right)}}. $$

Let take that about two-thirds of computing resources need to be reserved in order to ensure timely and regular calculation and display of the object's position. Therefore, the number of channels that a computing device can serve can be roughly estimated as

$$ n \le \frac{{N \cdot \left( {1 - s} \right) \cdot k\left( N \right)}}{{3 \cdot f_{d} \cdot T\left( N \right)}}. $$

It is safe to assume that k(N) ≥ 0.66 for any N ≥ 256. We have previously determined the empirical values of T(N) for the FFT and presented them in Table 1. A rough estimate using the formula above shows that the Raspberry Pi 4B is capable of acquiring and processing sound data from at least 16 channels at a frequency of 44100 Hz or from 4 channels at a frequency of 192000 Hz with an overlap rate of 75% in both cases. This potentially makes it possible to reevaluate the position of an object dozens of times per second, even using reasonably large size windows (for example, N = 16384). These qualitative estimates are confirmed empirically during test runs of Raspberry Pi 4 with a HiFiBerry module. More detailed and accurate benchmarks are planned for the future.

5 Conclusion

In this work, a comparative study of various implementations of TDE methods was made in respect of applicability on a single-board computer Raspberry Pi 4B. In the course of theoretical study of the issue and empirical research, it was established that DFT is the most computationally demanding operation, and it in a large extend determines the performance of the methods.

A comparison of various widely known DFT algorithms, in particular FFT, the Goertzel algorithm and the recursive SDFT algorithm showed significant advantages of FFT in performance and their equivalence in memory usage. Despite the fact that some areas of computational parameters in which the Goertzel algorithm and SDFT outperform the FFT, it is of little practical value.

The Raspberry Pi 4B single-board computer has sufficient computational capabilities to be used for positioning objects via an acoustic channel. The computer is able to process data streamed via 4 (or more) acoustical channels and to compute spectral estimates in a soft real time at a rate of at least 10 times per second. This way, FFT algorithms can be effectively utilized with time-domain and frequency-domain TDE methods.

In the future, other special DFT algorithms should be checked as well. In particular, the sliding Goertzel algorithm [34,35,36], which combines the features of both SDFT and the classical Goertzel algorithm, can probably make a competition to FFT in the scenario of tracking a mobile object in real-time.