1 INTRODUCTION

In this paper, we describe the basic principles of operation of well-known sound coding algorithms for stimulating the cochlear zone that allows patients with sensorineural hearing loss to perceive sound. Existing methods of sound processing are reviewed and an explanation is provided for determining the frequency of electrode stimulation.

The aim of our study was to simulate the sound coding algorithm for a new device called a cochlear implantation system and to assess the quality of its operation using the constructed model. We developed this device in order to replace such imported analogues as Cochlear, Med-El, and Advanced Bionics, which are all popular in our country.

We have developed and presented a new stimulation strategy for the cochlear implantation system that optimizes the computational efforts for sound processing; its main advantages have been demonstrated. The main stages of the sound coding algorithm are described. The quality of the algorithm was assessed by ear and the algorithm was simulated in the MATLAB system. This paper provides a visual comparison of the original sound and the sound perceived by the patient, as well as an analysis of the results. Amplitude spectra of the received signals are constructed, a diagram of the developed algorithm is shown, and graphs of the initial and final signals, as well as of the signal at intermediate stages of its processing, are constructed.

The cochlear implantation system technology, which helps hearing-impaired persons, requires the use of dedicated sound coding algorithms called stimulation strategies. Various methods for stimulating the cochlear receptors are known (CIS, ACE, HiRes, SPEAK, etc.). Only general information is known about the principles of their operation, since developments in this area are protected by the copyright of the device manufacturers.

As was the case with the CIS strategy, filtering was used in our strategy at the initial stage of processing. The Goertzel recursive algorithm, nonlinear compression, and a lowpass filter were then applied. The advantage of replacing the fast Fourier transform with the Goertzel algorithm to save computational resources was shown while analyzing the number of steps required for use of a particular method.

It is clear that comfortable use of the algorithm in the device requires minimum computational resources with the least loss of the quality of the transmitted sound. In this paper, we analyze various stages of the algorithm, as well as their advantages and disadvantages, and provide opportunities for improvement.

2 HUMAN PERCEPTION OF SOUND: THE DESIGN OF A COCHLEAR IMPLANTATION SYSTEM

The eardrum transmits sound pulses to the middle ear, where vibrations are transmitted to the cochlea in the inner ear via the auditory bones (the malleus and incus). On the basilar membrane in the cochlea are hair cells, which are receptors of the auditory system. Vibrations are transmitted to the receptor cells through the fluid that fills the cochlea. These cells convert a mechanical pulse into an electrical pulse and it is perceived by nerves.

Damage to hair cells may cause hearing loss. In this case, a cochlear implantation system can be used. It consists of a microphone and a speech processor. Sound travels through a coil placed under the skin to an array of electrodes located on the cochlea. The signal is then transmitted to the brain by stimulating the auditory nerves.

The cochlear implantation system consists of a cochlear implant and a wearable speech processor. The cochlear implant is a device used in the case of damage to the inner ear if functioning of the spiral ganglion neurons is preserved. The speech processor is equipped with a microphone, a sound processing device, and an inductor for transmitting energy and data to the implant. The implant itself consists of a receiving coil, a stimulation unit, and an electrode matrix.

The speech processor transmits digitally encoded sound to the implant through the coil. The stimulator−receiver of the cochlear implant digitally converts sound into electrical signals and then transmits them to an electrode array using a certain stimulation strategy. The number of electrodes varies from manufacturer to manufacturer and can range from 12 to 22.

After the frequencies have been selected for the stimulation, they need to be correctly placed on the electrode so that they stimulate exactly at those points where the cochlear receptors for these frequencies are located. For most people, they are located in the same way, as can be seen in Fig. 1.

Fig. 1.
figure 1

The distribution of frequencies along the depth inside the cochlea.

The distribution of frequencies perceived by the ear on the cochlea is known for humans from experimental data and was approximated by the empirical formula [1]

$$\begin{gathered} F{{C}_{{GP}}}(x) \\ \, = 0.1654\;{\text{kHz}}\left( {\exp \left( {\frac{{0.13815(35\;{\text{mm }} - {\text{ }}x)}}{{1\;{\text{mm}}}}} \right) - 1} \right). \\ \end{gathered} $$

3 VARIETIES OF STIMULATION STRATEGIES

The advanced combinational encoder (ACE) and continuous interleaved sampling (CIS) strategies are most similar to the stimulation strategy that we developed. Therefore, we consider them in more detail.

3.1 Description of the CIS Stimulation Strategy

The main stages of the CIS stimulation strategy are as follows (see Fig. 2). First, the signal is passed through bandpass filters that divide it into several frequencies. Each filter corresponds to its own electrode. This is followed by rectification and application of lowpass filters (usually 200 or 400 Hz). As a result, we obtain a low-frequency signal that resembles the envelope shape of the original signal. Next, this signal is compressed using a nonlinear function, e.g., a logarithm, to make sure that the electric signal, which will be further transmitted to the electrodes, falls into the signal range that is comfortable for a person. This range is set by a doctor. The amplitude ratios of the obtained electric pulses correspond to the nonlinearly compressed amplitudes of the envelopes in each stimulation channel. A total of 16 electrodes are used. The important point is that they are all stimulated in a single cycle, but not at the same time [2].

Fig. 2.
figure 2

The block diagram of the CIS stimulation strategy: (BPF) bandpass filter and (LPF) lowpass filter.

3.2 Description of the ACE Stimulation Strategy

Bandpass filters are applied to the input signal (see Fig. 3) in the ACE strategy, as well as in the CIS strategy. Envelopes are obtained in each channel after filtering. The energy is calculated for each bandwidth. Further, 6–10 frequency ranges for which the energy is maximal are selected in each cycle. The energy of the remaining ranges is not used for the stimulation. The pulses are then converted into an electric signal in a range that is comfortable for the patient to perceive and the selected electrodes are sequentially stimulated. Typically 22 electrodes are used and iteration of the cycle (selection of channels with the maximum energy) is repeated [2].

Fig. 3.
figure 3

The block diagram of the ACE stimulation strategy.

4 THE DEVELOPED STIMULATION STRATEGY

The diagram shown in Fig. 4 was simulated in the MATLAB software environment.

Fig. 4.
figure 4

The block diagram of the developed stimulation strategy.

We consider the blocks of our algorithm in more detail. The initial part, i.e., filtering with bandpass filters, is similar to the CIS and ACE strategies. The bandpass filters split the signal into 16 channels. The use of these filters makes it possible to dispense with a window function that removes distortions caused by the edge effect in the Fourier transform calculations. Calculation of the discrete Fourier transform leads to discontinuities at the edges of the analyzed block, which manifest themselves by a distortion of the spectrum. This effect is less pronounced if a signal with a narrow spectrum is present in the analyzed block after applying a bandpass filter. The application of filters without the window function makes it possible to save significant computational resources and use recursive algorithms in the Fourier transform calculation. It is possible to use both serial and parallel stimulation of the electrodes at the output.

4.1 Input Bandpass Filters

The following frequencies were taken to split the signal into 16 channels [3]:

Channel no.

1

2

3

4

5

6

7

8

Frequency, Hz

333

450

540

642

762

906

1076

1278

Channel no.

9

10

11

12

13

14

15

16

Frequency, Hz

1518

1803

2142

2544

3022

3590

4264

6665

The nonlinear phase−frequency response of the filter may cause unwanted phase distortions of the signal during recovery; nevertheless, the phase does not play a role for correct human speech perception. Filters with finite-impulse response (FIR filters) that provide a linear phase−frequency response were initially selected for the most efficient processing of sound. Although the FIR filters are more computationally intensive, the sound produced by them is slightly better due to the lack of phase delays between signals, but this difference is not significant. Considering all of the above, as well as the fact that the human ear is not very susceptible to phase delays (they are more important at low frequencies), we decided to use filters with an infinite impulse response (IIR).

The reconstructed sound obtained using IIR filters was compared to the original sound (see graphs in Figs. 5a and 5b) in order to evaluate the efficiency of the developed algorithm. These graphs show that the original sound and the sound obtained after the use of our algorithm and the reconstruction differ only slightly.

Fig. 5.
figure 5

(a) The original signal received by the microphone and (b) the sound signal after restoration (using the IIR filters).

4.2 The Goertzel Algorithm and the Modified Goertzel Algorithm

The Goertzel algorithm is a method for calculating the Fourier transform for a selected frequency. The standard Goertzel algorithm calculates the Fourier coefficients for a set of frequencies:

$$\{ 2{{\pi }}k{\text{/}}N\} ,$$

where \(k = \overline {0,N - 1} ,\) and N is the number of signal counts [4].

The computational structure of the Goertzel algorithm is similar to that of the IIR filter; therefore, it is often called the Goertzel filter. We compare it to the fast Fourier transform algorithm, which allows us to calculate the transform at specific points (e.g., we can choose the most convenient number of points). If we return to the above frequencies, we can see that the distance between the required frequencies is approximately 100 Hz. For a rough estimate to be obtained, the algorithm must be applied to approximately 160 points (the sampling frequency of 16 kHz/100 Hz) and even this fails to guarantee that we will definitely get the selected frequencies. Thus, the calculation for 160 points, obviously, will take more resources than the calculation for 16 points in the case of the Goertzel algorithm, e.g., when we calculate specific frequencies. Therefore, the Goertzel algorithm is optimal, especially in view of the existence of its modification, i.e., the recursive algorithm. The diagram of the classical Goertzel filter is shown in Fig. 6. We initially consider its design and then its modifications [4]. The use of such a filter helps us to evaluate the kth bin (the fixed number of the spectral sample of the discrete Fourier transform) of the N-point discrete Fourier transform:

$$\begin{gathered} {{S}_{N}}(k) = \sum\limits_{n = 0}^{N - 1} {x(n)W_{N}^{{kn}}} , \\ W_{N}^{{kn}} = \exp \left( { - j\frac{{2{{\pi }}}}{N}kn} \right), \\ \end{gathered} $$

where x(n) is the signal.

Fig. 6.
figure 6

The diagram of the Goertzel filter.

We introduce the discrete exponential functions:

$$W_{N}^{{(p + {{\theta )}}l}} = \exp \left[ { - j\frac{{2{{\pi }}}}{{N(p + {{\theta )}}}}} \right]l.$$

The spectrum value \(S_{N}^{{(k + {{\theta }})}}(r)\) can be calculated from a sliding window with length N samples at the step r:

$$S_{N}^{{(k + {{\theta )}}}}(r) = \sum\limits_{n = 0}^{N - 1} {x(n + r)W_{N}^{{(k + {{\theta )}}n}}} .$$

We use X(z) to denote the z-transformation x(n) and Y(z) to denote \(S_{N}^{{(k + {{\theta )}}}}(n)\). The transfer characteristic of the IIR filter shown in Fig. 6 is:

$$N(z) = \frac{{Y(z)}}{{X(z)}} = \frac{{W_{N}^{{ - (k + {{\theta )}}}}}}{{1 - {{z}^{{ - 1}}}W_{N}^{{ - (k + {{\theta )}}}}}},$$

where the integer variable k is replaced with the parametrically introduced variable –(k + θ), in which k = \(\overline {0,N - 1} \), 0 < θ < 1, and the set of analyzed frequencies is varied using the parameter θ: \(\left\{ {\frac{{2{{\pi (}}k + {{\theta )}}}}{N}} \right\}\).

The difference equations of the Goertzel filter are given below (the straight and reverse filter chains are to the right and to the left of the dashed line in Fig. 6, respectively):

for the reverse filter chain,

$$\upsilon (n) = 2\cos (2{{\pi }}k{\text{/}}N)\upsilon {\text{(}}n - {\text{1)}} - \upsilon {\text{(}}n - {\text{2)}} + x{\text{(}}n{\text{)}}{\text{,}}$$

for the straight filter chain,

$$y(n) = W_{N}^{{ - k}}\upsilon {\text{(}}n{\text{)}} - \upsilon {\text{(}}n - {\text{1)}}{\text{.}}$$

At the output of the Goertzel filter, we obtain the coefficients by which we can calculate the signal amplitude in each channel.

The Goertzel algorithm is convenient for reducing the computational efforts spent on processing the sound received by the device, since it replaces several procedures of other algorithms at once, i.e., the procedures of rectifying and applying a lowpass filter. The algorithm is a second-order IIR filter. Data processing occurs in blocks with a length of N points (a window), and the resulting values ​​coincide with the corresponding coefficients of the discrete Fourier transform. A window with a given length N with an overlap calculates the coefficients at each sampling point.

In our work, the simulation was carried out using the Hanning window [5]. The resulting complex values ​​at each point of the signal determine its amplitude, i.e., allow us to obtain envelopes. Thus, it is possible to obtain 16 envelopes for each of the selected frequency components of the signal, after which the modulus of these complex values ​​is found.

Thus, the Fourier coefficients for each individual frequency (that we selected from the series corresponding to the stimulation channels) allow us to calculate the amplitudes of the envelopes for each frequency channel. This is where the similarities to the CIS end. The CIS strategy also provides a rectified signal after the bandpass filter, which is similar in our case to the envelope. The CIS filter primarily calculates the modulus of the signal and then applies a lowpass filter. These three links are replaced with one Goertzel algorithm.

The efficiency of the algorithm can be estimated by comparing the frequency spectra of the signals (see graphs in Figs. 7a and 7b). It can be seen that noise appears at high frequencies (4.5–8.0 kHz) when the Goertzel algorithm is applied without input filters. We have shown that noise at these frequencies is suppressed when the Goertzel algorithm is applied with the input filters, as well as when the Hanning window is used without the filters.

Fig. 7.
figure 7

The amplitude spectrum of the resulting signal after applying the Goertzel algorithm (a) with the bandpass IIR filter and (b) without it.

A method exists that allows the recursive use of the Goertzel filter; its diagram is shown in Fig. 8. It turns out that every time we do not need to calculate the values ​​at all points anew. This allows us to obtain the values ​​of spectral samples on a real-time basis.

Fig. 8.
figure 8

The diagram of the modified Goertzel filter (the sliding discrete Fourier transform).

The payoff obtained by applying the recursive formula to the modified Goertzel filter can be seen by looking at the number of steps required for the operation of the filter (see the comparison in Table 1).

Table 1. The number of operations required by the filter [4]

The recurrent equation for the sliding discrete Fourier transform is:

$$S_{N}^{k}(n) = W_{N}^{{ - k}}[S_{N}^{k}(n - 1) + x(n) - x(n - N)],$$

where \(S_{N}^{k}(n)\) is the value of the kth fixed number of the spectral sample in the N-point discrete Fourier transform at the time instant n.

It should also be noted here that the computation of the Fourier transform using the Goertzel algorithm does not imply the use of filters instead of a window function. Therefore, bandpass filters should be applied at the input so that we can use a recursive algorithm with a low consumption of computational resources. As a result, the application of filters results in less computational effort compared to the use of a window function, since the window is similar in computational efforts to FIR filters, and IIR filters are less computationally intensive. Thus, the least resource-intensive result is obtained when IIR filters are applied at the input and the modified recursive Goertzel algorithm is used.

Thus, after conducting a series of tests, we came to the conclusion that although the window function provides the same result as the filters, the filters without the window function are more convenient, since they allow the use of a modified (recursive) algorithm that requires significantly less computational resources. Therefore, the best option is to use IIR filters at the input and replace the window function with the recursive modified Goertzel algorithm.

4.3 The Reverse Transformation

It is clear that the quality of a signal perceived by a person is worse compared to the signal that initially entered the device. We carry out the inverse transformation in order to understand what kind of signal is perceived by a person, i.e., we restore the original signal with coding losses. For this purpose, the resulting amplitude is multiplied by the cosine of the frequency corresponding to one of the 16 channels. The initial phases are not taken into account, since this information cannot be restored as a result of sound coding, and this is not important for speech recognition. We obtain the reconstructed sound by adding the signals reconstructed using the cosines in all channels:

$${\text{signal}}(j,i) = {\text{amplitude}}(j,i){\text{cos}}\left( {\frac{{{\text{freq}}(i)2j{{\pi }}}}{{{{F}_{{\text{s}}}}}}} \right),$$

where i is the channel number, j is the line number in the array containing the signal or its amplitude, Fs is the sampling frequency, amplitude(  j, i) presents the coefficients that are equal to the signal amplitude, and freq(i) is the frequency corresponding to the channel. Therefore, the recovered signal in the channel is obtained in the signal(  j, i) array.

Having received the reconstructed signal, we thus check the operation of the algorithm.

4.4 Nonlinear Compression

Cochlear nerve endings perceive current stimulation nonlinearly. With electrical stimulation, loudness is perceived according to the law:

$$L = {{k}_{a}}{{p}^{{{{{{\beta }}}_{a}}}}} = {{k}_{e}}{{i}^{{{{{{\beta }}}_{e}}}}},$$

where i is the current, p is the sound pressure, and the coefficients are βa = 0.6 and βe = 2.7 [6]. The current, and, therefore, the microphone voltage are determined by the law:

$$i = {{k}_{m}}{{p}^{{{{{{\beta }}}_{m}}}}},$$

where βm = βae = 0.22.

It can be seen that the perceived loudness is associated with stimulating impulses according to the power law. Therefore, it is necessary to perform nonlinear compression after obtaining the envelopes in order to stimulate the electrodes of the device.

This results in obtaining the amplitudes of electric pulses y. They are set at a certain level of values (comfort M and threshold B). The purpose of the nonlinear compression is to transfer the entire signal into a comfortable range that can be adjusted by the doctor. The formula for the nonlinear signal compression is [6]:

$$y = \left\{ \begin{gathered} \frac{{{\text{log}}\left( {1 + \alpha \left( {\frac{{\upsilon - B}}{{M - B}}} \right)} \right)}}{{{\text{log}}(1 + \alpha )}},\quad ~B \leqslant \upsilon \leqslant M, \hfill \\ 0,\quad \upsilon \leqslant B, \hfill \\ 1,\quad \upsilon \geqslant M. \hfill \\ \end{gathered} \right.$$

where υ is the envelope, α is the compression ratio, M is the saturation level, and B is the threshold value.

4.5 The Sliding-Average Algorithm

In order to eliminate low-frequency oscillations in the signal, the sliding-average method can be applied to the envelope in each of the channels instead of the standard lowpass filter. In this case, to obtain the best result, we should take a third-order filter and select its length so that it is a multiple of the frequency period of the stimulation channel. At first, we need to determine the window (its width) over which the averaging will be performed.

The function obtained after the transformation is numerically equal at each point to the average value of the original function, which was calculated over the initially specified smoothing interval (the number of values ​​of the original function used in the calculation). As a result, the original data are smoothed, and the above algorithm implements envelope filtering.

The values ​​for the sliding-average method are calculated according to the formula:

$$\bar {X}(k) = \frac{1}{n}\sum\limits_{t = k}^{n + k} {X(t)} ,$$

where n is the window size (the smoothing period) and k is the number of the term of the series whose value is replaced by the average.

Figure 9 shows the envelope 2, which was filtered using the third-order sliding-average method, as well as the original envelope 1, which was filtered. It should be noted that the envelope modulus was preliminarily calculated.

Fig. 9.
figure 9

(1) The envelope and (2) the smoothed envelope obtained using the sliding-average method in the 10th channel.

A 400-Hz lowpass filter was also used for comparison [6].

As mentioned above, this stage can be omitted and replaced with a single procedure, i.e., the Goertzel algorithm.

5 DETERMINATION OF THE ELECTRODE-STIMULATION FREQUENCY

The stimulation frequency is the number of fully completed pulses in all stimulation channels on the electrode array per second. The range of stimulation frequencies ​​ is quite wide, depending on the manufacturer.

We consider the HiRes™ Ultra Cochlear implant from Advanced Bionics as an example. From the description of the device, we find out that the maximum pulse duration is 229 μs, the maximum stimulation frequency is 83 kHz, and the number of channels is 16. We suppose that the stimulation period (it includes both the positive and negative pulses) is 192 μs; then the frequency in an individual channel is ≈5.2 kHz, and the total frequency is 5.2 × 16 ≈ 83 kHz. The assumption about the determination of the stimulation frequency is thus confirmed.

This device can implement different strategies, including virtual channels (hiRes 120) or paired stimulation in each channel (the hiRes paired strategy). Let us consider the latter in more detail. The stimulation frequency for the strategies using 8 and 16 channels is presented in Table 2 [7].

Table 2. Stimulation frequency vs. the number of channels and the type of stimulation

Although the stimulation frequency is very high in some devices, it fails to be a decisive factor in the quality of sound transmission, since the refractory period does not allow effective stimulation with a frequency higher than nerve endings are able to perceive. This time for a person is approximately 500 μs [8]. In addition, simultaneous stimulation of several zones causes cross excitation of the nerve endings of neighboring zones when the paired stimulation strategy is used, since the electrode array is ​​in a conductive fluid, which does not allow stimulation of only one zone.

A study conducted on a group of 13 adults to compare paired and sequential stimulation [7] has shown that the sequential-stimulation strategy provides the best results. It also follows that the stimulation frequency of 40 kHz is quite enough for high-quality sound reproduction. The authors in [7] noted that more pronounced differences in perception between the two strategies are felt in the presence of background noise, which correlates well with real-life conditions.

Table 3 presents information on different manufacturers of cochlear implantation systems and their stimulation frequencies [9].

Table 3. The characteristics of cochlear implant systems of various manufacturers [9]

6 STIMULATION USING VIRTUAL CHANNELS

Along with the CIS strategy in which the stimulation is strictly sequential, there is another interesting method. This is a technique that uses virtual channels and several sources to increase the number of different frequencies in transmitted sound. Electrodes to be stimulated are selected depending on the loudness. Two adjacent electrodes are stimulated by two impulses. The proportion of the current that will be delivered to two adjacent electrodes is also determined by the loudness ratio. The two outer electrodes are used as ground, directing current through the inner electrodes (two electrodes generate one virtual channel). Virtual channels result from a shift of the current density in the region between the electrodes; this happens when the current is unevenly distributed between the electrodes. The region of ​​the cochlea between the electrodes is stimulated and the ratio of the supplied currents determines the place of stimulation.

In our case, the reference electrode is the body of an implant. If the current enters the reference electrode after leaving the stimulating zone, this is a monopolar stimulation mode. If the current leaves a certain electrode and enters an adjacent one, this is a bipolar mode.

7 CONCLUSIONS

In various countries that do not yet have a national cochlear implantation system, the scientific community is working towards developing their own stimulation strategies and devices to make hearing restoration operations more affordable. Thus, the UGR (University of Granada, Spain) team developed the Cochlear Implant Simulation version 2.0 program. It simulates the cochlear implantation system with the CIS strategy and presents the results of sound processing using known strategies [10]. This program allows the use of the Hilbert transform or the rectification and a lowpass filter for the calculation of the signal envelope. By adjusting various parameters, one can take the interaction of adjacent channels and some physiological characteristics of a patient into account, select the number of stimulation channels, etc. We provide open source code for the MATLAB program (see Appendix) of our strategy.

The first block of the developed algorithm contains filtering with bandpass filters that divide the signal into 16 channels. This stage is also used in other stimulation strategies. Filters remove distortions and preserve the window function, due to which we save computational resources when calculating the Fourier transform. The second block we introduced contains the Goertzel recursive algorithm (it also has the advantage of saving resources), which is followed by nonlinear compression, lowpass filters, and, finally, signal transmission to the electrodes. Both the sequential and parallel stimulation can be used at the output.

In analyzing the simulation results, we drew the conclusion that, although the windowed Fourier transform provided the same result as the bandpass filters, it is more convenient to use bandpass filters without a window function. This makes it possible to use the modified (recursive) Goertzel algorithm for the Fourier transform, due to which the number of calculations can be significantly reduced. Therefore, the best option is to use IIR filters at the input and replace the windowed Fourier transform with the recursive modified Goertzel algorithm. It can also be concluded that the algorithm reproduces a high-quality signal.

Thus, a proprietary stimulation strategy that optimizes the computational efforts for sound processing has been developed for the cochlear implantation system. The inverse transformation was carried out to compare the original and reconstructed signals and evaluate the quality of the algorithm by ear. The operation of the sound coding algorithm for the cochlear implantation system was simulated in the MATLAB system. We obtained a means to hear how the sound would change after it was processed by our stimulation strategy. We came to the conclusion that high-quality speech reproduction was achieved, but the perception of the full range of sound signals, e.g., music and sounds of nature, was difficult at that stage. Therefore, further research will focus on simulating the strategy using virtual channels to increase the frequency resolution of auditory perception for patients with a cochlear implantation system with the same number of physical stimulation channels.