Keywords

1 Introduction

Speech is the basic form of human communication. Speech communication is of immense importance as the speech signal is different from any other sounds. In the last few decades speech communication has been of great concern due to the fast growing technologies. The communication channel is probabilistic in nature. Therefore, measuring parameters are essential to ensure the quality of the speech signal through the channel. The parameters for good speech signal transmission are (i) low bit rate, (ii) more than 30 dB SNR and (iii) BER less than 10−5 [13]. Speech signals follows some probability density functions (PDF), i.e. Gaussian [4], Rayleigh [5], Log Normal [6], etc. The authors identified Log Normal distribution for characterization of speech signal and compared it with tested speech signal distributions, i.e. standard power spectral density of speech signal, Gaussian Distribution [7], Rayleigh Distribution [8]. By employing these PDFs (Probability Density Functions) communication channels as well as PSD (power spectral density) of speech signal can be used efficiently. Subband coding (SBC) is a kind of transform coding. A signal is divided into a number of different frequency bands and encodes each one independently. It enables data reduction by discarding information about frequencies which are masked. The result differs from the original signal, but if the discarded information is chosen carefully, the difference will not be noticeable, or more importantly, objectionable [912].

2 Literature Survey

The paper—“A low-complexity audio data compression technique using subband coding (SBC) and a recursively indexed quantizer (RIQ)” compared SBC and RIQ with conventional coding techniques. The system shows SNR 2–5 dB higher than that of SNRs of other coders of similar computational complexity of wideband audio signals [7]. The basic concept of “Frequency Domain Coding of Speech” methods is to divide the speech into frequency components by a filter bank (subband coding), or by a suitable transform (transform coding), and then encode them using adaptive PCM (Pulse Code Modulation). Three basic factors of the design of coders are: (1) the type of the filter bank or transform, (2) the choice of bit allocation and noise shaping properties and (3) the control of the step-size of the encoders. Short-time analysis/synthesis, practical realizations of subband and transform coding are interpreted within this framework. Spectral estimation, models of speech production, perception and the “side information” can be most efficiently represented and utilized in the design of the coder (particularly the adaptive transform coder) to control the dynamic bit allocation and quantizer step-sizes. Recent developments and examples of the “Vocoder-driven’’ adaptive transform coder for low bit-rate applications is also discussed [8]. In digital telecommunication systems different signals are processed with different sampling rates, leading to significant errors. In “Subband Coding of Speech Signals Using Decimation and Interpolation’’—a structure of a two-channel quadrature mirror filter with low pass filter, high pass filter, decimators and interpolators, is proposed to perform subband coding of speech signals in the digital domain. The performance of the proposed structure is compared with the performance of delta-modulation encoding systems. The results show that the proposed structure significantly reduces error and achieves considerable performance improvement compared to delta-modulation encoding systems [13]. Gaussian Distribution is well suited for describing the Power Spectral Density of Speech Signal. In statistical voice activity detection (VAD) Rayleigh Distribution has been used as the distribution has longer asymmetric tail than Gaussian distribution. MMSEEs (Minimum Mean Square Estimators) for speech enhancement have employed various PDFs, such as Gaussian Distribution, Log Normal Distributions, etc.

3 Basic Principles of the Proposed System Model [8]

3.1 Design Procedure for Subband Coding for Speech Signal

The Power Spectral Density (PSD) of a voice signal has been considered to be restricted to 3.5 kHz only, Power Spectral Density to be in watt/Hz or dB (Fig. 51.1).

Fig. 51.1
figure 1

PSD of speech signal

In this figure frequency axis is divided into a number of subbands (say 0−f1, f1–f2, f2–f3, f3–f4, etc.). The frequency band (0–f1) is baseband signal, whereas (f1–f2), (f2–f3), (f3–f4), etc. are bandpass signals. Each band will be translated into baseband by multiplying with the lowest frequency component of the said subband. Here seven subbands have been considered (Fig. 51.2).

Fig. 51.2
figure 2

Block diagram of subband coder

The transmitter consists of one LPF and six BPFs. All BPFs outputs are multiplied by the lowest frequency component of those bands at the multiplier block. Then outputs are PCM and then added by summer. Finally the summed output is put into channel (Fig. 51.3).

Fig. 51.3
figure 3

Block diagram of subband receiver

At the receiver signals are decoded by seven decoders. Then each signal is passed through LPF of cut-off frequency f1, f2–f1, f3–f2, etc. From the second to the seventh signal outputs are multiplied by their respective lowest frequency components and then passed through BPFs of f2–f1, f3–f2, etc. Then the outputs are summed up to get a replica of the original signal.

4 Proposed Method with Log Normal Distribution PSD

Speech Coding follows different probability distributions. Authors have already worked with Gaussian and Raleigh Distributions [7, 8]. Here, they have chosen Log Normal Distribution and followed the same procedure as earlier. The results are shown below.

4.1 Mathematical Validation Using MATLAB Simulation

(See Figs. 51.4, 51.5 and 51.6; Tables 51.1 and 51.2)

Fig. 51.4
figure 4

Cumulative data rate versus frequency

Fig. 51.5
figure 5

SNR versus frequency

Fig. 51.6
figure 6

Probability of bit error

Table 51.1 Data Rate, SNRmin, BER of Log Normal Distribution
Table 51.2 Comparative list of different distribution with their data rates

5 Conclusion

It is evident from the above discussion that both subband coding and existing 64 Kbps line have almost negligible probability of bit error but subband offers lowest data rate bandwidth ever possible. Authors have used different probability distribution for speech coding for validation. Log Normal Distribution shows the least. Therefore it can be deduced that subbanding generates all the possible significant footsteps towards data rate as well as bandwidth savings without losing any significant information and probability of bit error is also least or may be said negligible. PCM requires high bandwidth as well as data rate. But PCM and DM have almost the same SNR up to 30 dB. After 30 dB PCM shows performance-wise better results than DM. It has been shown by Matlab program. If more subbands are used, data rate can be reduced more and more accurate approximation of the original voice signal can be reconstructed. Therefore, authors can conclude that communication engineering will be immensely benefited by using this scheme. There are a lot more distribution support speech signals. These distributions can be simulated and results can be found out.