Keywords

1 Introduction

The throat polyps detection is a field which demands more investigation. Traditionally, the methods of diagnosis are indirect laryngoscope, video-laryngoscope, and stroboscope light [1]. However, most of these methods need special instrument, and mainly depend on the experience of the pathologists. It would be desirable if throat polyps could be detected based on the patient vowel voices only. Traditional pattern recognition techniques such as Bayesian classifier, known as the optimal classifier, could be used if the voice samples follow certaain distribution, and this belongs to model-based statistical processing. In human’s voices, the voice amplitude is highly bursty, and we believe that no statistical model can really demonstrate the uncertain nature of the voice. Fuzzy logic systems (FLS) are model free. Their membership functions are not based on statistical distributions. In this paper, we, therefore, apply fuzzy techniques to polyps patient diagnosis.

In Sect. 54.2, we model voice samples using interval type-2 Gaussian membership function. In Sect. 54.3, we apply STFT and SVD to voice samples. In Sect. 54.4, a Bayesian classifier is proposed. Performances of the three classifiers are evaluated in Sect. 54.5. Conclusions are presented in Sect. 54.6.

2 Modeling Voice Samples Using Hidden Markov Model and GaussianPrimary MF with Uncertain Mean

In [3], autoregressive Hidden Markov Model (HMM) was used to represent voice samples x i , which means we could have

$${x}_{k} = -\sum\limits_{i=1}^{p}{b}_{ i}{x}_{k-i} + {n}_{k}$$
(54.1)

where n k is Gaussian noise, and b i (i = 1, 2, ⋯ , p) are the autoregression coefficients where p is autoregressive order. So

$${x}_{k} -\sum\limits_{i=1}^{p}{c}_{ i}{x}_{k-i} = {n}_{k}$$
(54.2)

where \({c}_{i} = -{b}_{i}\). Which means the difference between samples (or their linear combinations) follows Gaussian distribution.

Based on the voice data we have collected, we observed that the vowel /a:/ samples (x k ) don’t follow Gaussian distribution, as illustrated in Fig. 54.1a, but when we choose p = 5, \({c}_{1} = {c}_{2} = {c}_{3} = {c}_{4} = 0\), c 5 = 1, i.e.,

$${x}_{k} - {x}_{k-5} = {n}_{k}^{a}$$
(54.3)

the new sequences follow Gaussian distribution, as illustrated in Fig. 54.1b. Similarly, we observed that the vowel /i:/ samples (x k ) don’t follow Gaussian distribution, as illustrated in Fig. 54.2a, but if we choose c 1 = 1 p = 1, i.e.,

$${x}_{k} - {x}_{k-1} = {n}_{k}^{i}$$
(54.4)

follows Gaussian distribution, as illustrated in Fig. 54.2.

Fig. 54.1
figure 00541

(a) The histogram of 100,000 voice /a:/ samples x k ; (b) the histogram of transformed voice samples x k x k -5 and its matching to a Gaussian distribution

Fig. 54.2
figure 00542

(a) The histogram of 100,000 voice /a:/ samples x k ; (b) the histogram of transformed voice samples x k x k -1 and its matching to a Gaussian distribution

We, therefore, tried to model the the new transformed voice sequences n k a and n k i, to see if a Gaussian MF can match its nature. For n k a and n k i from each subject (human) for 100,000 samples, we equally separate it into ten segments, and computed the mean m i and std σ i of the ith segment, i = 1, 2, ⋯ , 10. We also computed the mean m and std σ of the entire sequence (100,000 samples). To see which value – m i or σ i – varies more, we normalized the mean and std of each segment using m i  ∕ m, and σ i  ∕ σ, and we then computed the std of their normalized values, σ m and σ std . We observed that σ m  ≫ σ std . We conclude, therefore, that if the transformed voice samples of each segment (short range) of the voice samples are Gaussian distributed, then the transformed voice samples in an entire video trAff0054ic (long range) is more appropriately modeled as a Gaussian with uncertain mean. This justifies the use of the Gaussian MFs with uncertain means to model the transformed voice samples.

3 Identifying Polyps Patient Voice Using Short-Time Fourier Transform andSingular-Value Decomposition

STFT uses a slide window to determine the sinusoidal frequency and phase content of a signal as it changes over time. The STFT of the voices is a matrix, how to extract its information for throat polyps detection? We use singular-value decomposition (SVD). The SVD is an important factorization of a rectangular real or complex matrix, with many applications in signal processing and statistics. Applications which employ the SVD include computing the pseudoinverse, least squares fitting of data, matrix approximation, and determining the rank, range and null space of a matrix. Given P ∈ C N ×M (assuming N > M), and rank(P) = r ≤ M. Determine a numerical estimate r of the rank of the data sets matrix P by calculating the singular value decomposition

$$P = U\left [\begin{array}{lll} \Sigma&0\\0 &0\\\end{array} \right ]{V }^{T},$$
(54.5)

where, U is an N ×N matrix of orthonormalized eigenvectors of PP T, V is an M ×M matrix of orthonormalized eigenvectors of P T P, and Σ is the diagonal matrix Σ = diag1, σ2, , σ r ), where σ i denotes the i th singular value of P, and σ1 ≥ σ2 ≥ ⋯ ≥ σ r  > 0. Using SVD, the STFT of voices could be diagonalized, and the diagonal values in Σ could be used to represent the speaker voice power decay in the frequency domain. Generally the σ1 is much higher than σ2, and the decay from σ1 to σ2 somehow represent how a person could handle his voice freely. For illustration purpose, we plot the singular values (σ i ) (i = 1, 2, ⋯ , 10) in Fig. 54.4 for the two patients whose spectrogram were plotted in Fig. 54.3. Observe Fig. 54.4, the voice power decay rate, i.e., \(Pd = {\sigma }_{1} - {\sigma }_{2}\), is higher for a normal person than that of a patient with throat polyps, which means that a normal person could handle his/her voices more freely (with higher power changes from one frequency to another frequency). So voice power decay rate could be used as an identifier on throat polyps detection. In this paper, we will use the vowel /a:/ and /i:/ power decay rate in fuzzy classifiers for throat polyps detection.

Fig. 54.3
figure 00543

The ten largest singular values of STFT of two patients in Fig. 54.4

Fig. 54.4
figure 00544

Spectrogram using a Short-Time Fourier Transform (STFT). Window size of STFT is 2,048, and overlap between two neighbor window is 1,024. (a) Is from a throat polyps patient speaking vowel /a:/; (b) is from a normal person speaking vowel /a:/; (c) is from the throat polyps patient speaking vowel /i:/; (d) is from the normal person speaking vowel /i:/

4 Bayesian Classifier for Throat Polyps Detection

Bayesian decision theory [2] provides the optimal solution to the general decision-making problem. We assume that each patient has equal probability to have throat polyps, i.e., H 1: Polyps, and H 2: Normal, so \(p({H}_{1}) = p({H}_{2}) = 0.5\). If each transformed vowel voice samples (/a:/ and /i:/) of patient j follows Gaussian distribution, \({\rm X_{j} \mathop =\limits^{\Delta} [\mathop x \nolimits_{j}^{a} \mathop x\nolimits_{j}^{i}]^T}\)stands for the samples from patient j for vowel /a:/ and /i:/, then

$$p({\mathbf{x}}_{j}\vert {v}_{j}) = \frac{1} {(2\pi ){\vert {\Sigma }_{\mathbf{j}}\vert }^{1/2}}\exp [-\frac{1} {2}{({\mathbf{x}}_{j} -{\mathbf{m}}_{j})}^{T}{{\Sigma }_{\mathbf{j}}}^{-1}({\mathbf{x}}_{ j} -{\mathbf{m}}_{j})]$$
(54.6)

where \({\rm m_{j} \mathop =\limits^{\Delta} [\mathop m\nolimits_{j}^{a} \mathop m\nolimits_{j}^{i}]^T}\) and Σ j  = diag j a 2, σ j i 2} are the mean vector (2 ×1) and covariance matrix (2 ×2) of x j . In this case,

$$\begin{array}{lll} p(\mathbf{x}\vert {H}_{1})& = \sum\limits_{i=1}^{10}p(\mathbf{x}\vert {v}_{ i})p({v}_{i})\end{array}$$
(54.7)
$$\begin{array}{lll}p(\mathbf{x}\vert {H}_{2})& = \sum\limits_{i=11}^{20}p(\mathbf{x}\vert {v}_{ i})p({v}_{i})\end{array}$$
(54.8)

Based on Bayes decision theory, since \(p({H}_{1}) = p({H}_{2}) = 0.5\), we obtain the decision rule:

$$\begin{array}{lll} \text{ Claim throat polyps if} p(\mathbf{x}\vert {H}_{1}) > p(\mathbf{x}\vert {H}_{2})\end{array}$$
(54.9)
$$\begin{array}{lll} \text{ No throat polyps if} p(\mathbf{x}\vert {H}_{1}) < p(\mathbf{x}\vert {H}_{2})\end{array}$$
(54.10)
$$\begin{array}{lll} \mbox{ Not sure if} p(\mathbf{x}\vert {H}_{1}) = p(\mathbf{x}\vert {H}_{2})\end{array}$$
(54.11)

This Bayesian polyps detector will be used in Sect. 54.5.

5 Simulations

We extract the general features and behavior of /a:/ and /i:/ voices for 20 patients, of which 10 have throat polyps and 10 have no throat polyps, and determine one discriminant rule for each patient in the domain of interest. In choosing the antecedents of the fuzzy classifier, we make full use of the statistical knowledge (mean and std) obtained from the patient voices. We used 100,000 samples in vowel /a:/ and /i:/ respectively to establish a discriminant rule for each patient. All-in-all, we obtained 20 rules, one per patient.

To evaluate the performance of the two fuzzy detectors, we used another group of 20 patients (testing group), which has no overlap with the first group of 20 patients whose vowel samples were used for fuzzy rules. By this means, it would help to demonstrate that our classifiers are robust. We also collected 100,000 voice samples for /a:/ and /i:/ respectively for each patient in the testing group. To demonstrate that our classifiers are able to detect throat polyps using a small number of samples, we made our detection based on every 5,000 samples, with 20 independent detections (20 ×5, 000) for each patient. During testing, we obtain the mean m t = [m a t, m i t] for each 5,000 /a:/ and /i:/ samples.

5.1 Design of three Throat Polyps Detectors

5.1.1 Design of Type-1 Fuzzy Polyps Detector

For a type-1 fuzzy classifier, the lth rule, R l, is (l = 1, ⋯ , 10):

R l: IF the transformed /a:/ voice is F1 l and the transformed /i:/ voice is F2 l and /a:/ power decay rate is F3 l and /i:/ power decay rate is F4 l THEN this patient has throat polypus ( + 1) [or throat normal ( − 1)].

The antecedents F k l (k = 1, 2, 3, 4) are described by a type-1 Gaussian MF whose mean, m p l, and std, σ p l, are determined by known patient voice samples. More specifically, m 1 l and σ1 l are the mean and std of voice /a:/ samples in the 100,000 samples of patient l in the first group; m 2 l and σ2 l are the mean and std of /i:/ samples in the 100,000 samples of patient l in the first group. To determine m 3 l, σ3 l, m 4 l, and σ4 l, we partition the voice samples into ten segments for /a:/ and /i:/ respectively, and obtain the STFT of each segment. Then apply SVD to the STFT matrix to obtain the power decay rate for each segments. The mean and std of the ten power decay rates are m 3 l (m 4 l) and σ3 l4 l). The consequent corresponds to \({y}^{l} = +1\) (polypus) or \({y}^{l} = -1\) (normal) in the fuzzy detector.

For a type-1 fuzzy detector, its input, m t = [m a t, m i t, Pd a , Pd i ], is obtained from 5,000 vowel samples from patient in the testing group. Pd a and Pd i are the power decay rate for /a:/ and /i:/.

5.1.2 Design of Type-2 Fuzzy Polyps Detector

For type-2 fuzzy classifiers, the lth rule, R l, is (l = 1, ⋯ , 10):

R l: IF the transformed /a:/ voice is \({\tilde{\mbox{ F}}}_{1}^{l}\) and the transformed /i:/ voice is \({\tilde{\mbox{ F}}}_{2}^{l}\) and /a:/ power decay rate is F3 l and /i:/ power decay rate is F4 l THEN this patient has throat polypus ( + 1) [or throat normal ( − 1)].

The antecedents \({\tilde{\mbox{ F}}}_{k}^{l}\) (k = 1, 2, 3) are described by a type-2 MF, i.e., a Gaussian MF with uncertain mean, whose mean m k l ∈ [m k1 l, m k2 l] and std σ k l are determined by the voice samples of patients in the first group. F3 l and F4 l are same as those in type-1 fuzzy detector.

More specifically, σ k l (k = 1, 2) are determined using the same method as described in Sect. 54.5.1.1, and m k1 l and m k2 l are determined as follows. We divided the 100,000 frames of the lth known patient into 10 equal-length (10,000 samples) segments, and computed the mean m 1 lj of /a:/ samples in the jth segment (j = 1, ⋯ , 5). Let

$$\begin{array}{lll}{ m}_{11}^{l}= \min\limits_{j=1,\cdots,10}{m}_{1}^{lj}\end{array}$$
(54.12)
$$\begin{array}{lll}{m}_{12}^{l}= \max\limits_{j=1,\cdots,10}{m}_{1}^{lj}\end{array}$$
(54.13)

so [m 11 l, m 12 l] is the range of uncertain mean of /a:/ voice samples of the lth known patient. We obtained the ranges of uncertain mean of /i:/ samples ([m 21 l, m 22 l]) in a similar manner.

For a type-2 fuzzy detector, its input, m t = [m a t, m i t, Pd a , Pd i ]], is obtained from 5,000 vowel samples from patient in the testing group.

5.1.3 Design of Bayesian Classifier

Observe from (54.6), that the Bayesian classifier needs m i  = [m i a, m i i]T and Σ i  = diag i a 2, σ i i 2}. In our design, m i a and σ i a are the mean and std of vowel /a:/ in the 100,000 samples of patient i in the first group; similarly, m i i and σ i i are the mean and std of vowel /i:/ in the 100,000 samples of patient i in the first group; and, its input \({\rm x \mathop =\limits^{\Delta} m^{t}}\), where m t is obtained from the mean value of 5,000 voice samples from a patient in the testing group.

5.2 Performance Analysis

We computed the average probability of miss detection (p r (ε)) for each fuzzy detector as well as for the Bayesian detector in 20 ×20 = 400 independent classifications (20 patients each with 20 5,000-sample segments), and please be aware that the voices of the first group patients were used to design the fuzzy rules, and the testing group of patients have no overlap with the first group. Simulations show that p r (ε) = 25 % for Bayesian classifier, p r (ε) = 18 % for type-1 fuzzy classifier, and p r (ε) = 14 % for type-2 fuzzy classifier.

6 Conclusions

Based on human voice samples and Hidden Markov Model, we showed that transformed voice samples (linearly combined samples) follow Gussian distribution, further we demonstrated that a type-2 fuzzy MF, i.e., a Gaussian MF with uncertain mean, is most appropriate to model the transformed voices samples. We also applied STFT and SVD to the vowel voice samples, and observe that the voice power decay rate could be used as an identifier in throat polyps detection. Two fuzzy classifiers and a Bayesian classifier were designed for throat polyps detection based on human vowel voices /a:/ and /i:/ only, and the fuzzy classifiers are compared against the Bayesian classifier. Simulation results showed that an interval type-2 fuzzy classifier performs the best of the three classifiers.