Abstract
In this paper, we present a new approach for throat polyps detection based on patient’s vowel voices using fuzzy classifiers. Based on human voice samples and Hidden Markov Model, we show that transformed voice samples (linearly combined samples) follow Gussian distribution, further we demonstrate that a type-2 fuzzy membership function (MF), i.e., a Gaussian MF with uncertain mean, is most appropriate to model the transformed voices samples. We also apply Short-Time-Fourier-Transform (STFT) and Singular-Value-Decomposition (SVD) to the vowel voice samples, and observe that the power decay rate could be used as an identifier in throat polyps detection. Two fuzzy classifiers and a Bayesian classifier are designed for throat polyps detection based on human vowel voices /a:/ and /i:/ only, and the fuzzy classifiers are compared against the Bayesian classifier. Simulation results show that an interval type-2 fuzzy classifier performs the best of the three classifiers.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Polyps detection
- Fuzzy logic systems
- Bayesian classifier
- Interval type-2 fuzzy classifier
- Fuzzy membership functions
1 Introduction
The throat polyps detection is a field which demands more investigation. Traditionally, the methods of diagnosis are indirect laryngoscope, video-laryngoscope, and stroboscope light [1]. However, most of these methods need special instrument, and mainly depend on the experience of the pathologists. It would be desirable if throat polyps could be detected based on the patient vowel voices only. Traditional pattern recognition techniques such as Bayesian classifier, known as the optimal classifier, could be used if the voice samples follow certaain distribution, and this belongs to model-based statistical processing. In human’s voices, the voice amplitude is highly bursty, and we believe that no statistical model can really demonstrate the uncertain nature of the voice. Fuzzy logic systems (FLS) are model free. Their membership functions are not based on statistical distributions. In this paper, we, therefore, apply fuzzy techniques to polyps patient diagnosis.
In Sect. 54.2, we model voice samples using interval type-2 Gaussian membership function. In Sect. 54.3, we apply STFT and SVD to voice samples. In Sect. 54.4, a Bayesian classifier is proposed. Performances of the three classifiers are evaluated in Sect. 54.5. Conclusions are presented in Sect. 54.6.
2 Modeling Voice Samples Using Hidden Markov Model and GaussianPrimary MF with Uncertain Mean
In [3], autoregressive Hidden Markov Model (HMM) was used to represent voice samples x i , which means we could have
where n k is Gaussian noise, and b i (i = 1, 2, ⋯ , p) are the autoregression coefficients where p is autoregressive order. So
where \({c}_{i} = -{b}_{i}\). Which means the difference between samples (or their linear combinations) follows Gaussian distribution.
Based on the voice data we have collected, we observed that the vowel /a:/ samples (x k ) don’t follow Gaussian distribution, as illustrated in Fig. 54.1a, but when we choose p = 5, \({c}_{1} = {c}_{2} = {c}_{3} = {c}_{4} = 0\), c 5 = 1, i.e.,
the new sequences follow Gaussian distribution, as illustrated in Fig. 54.1b. Similarly, we observed that the vowel /i:/ samples (x k ) don’t follow Gaussian distribution, as illustrated in Fig. 54.2a, but if we choose c 1 = 1 p = 1, i.e.,
follows Gaussian distribution, as illustrated in Fig. 54.2.
We, therefore, tried to model the the new transformed voice sequences n k a and n k i, to see if a Gaussian MF can match its nature. For n k a and n k i from each subject (human) for 100,000 samples, we equally separate it into ten segments, and computed the mean m i and std σ i of the ith segment, i = 1, 2, ⋯ , 10. We also computed the mean m and std σ of the entire sequence (100,000 samples). To see which value – m i or σ i – varies more, we normalized the mean and std of each segment using m i ∕ m, and σ i ∕ σ, and we then computed the std of their normalized values, σ m and σ std . We observed that σ m ≫ σ std . We conclude, therefore, that if the transformed voice samples of each segment (short range) of the voice samples are Gaussian distributed, then the transformed voice samples in an entire video trAff0054ic (long range) is more appropriately modeled as a Gaussian with uncertain mean. This justifies the use of the Gaussian MFs with uncertain means to model the transformed voice samples.
3 Identifying Polyps Patient Voice Using Short-Time Fourier Transform andSingular-Value Decomposition
STFT uses a slide window to determine the sinusoidal frequency and phase content of a signal as it changes over time. The STFT of the voices is a matrix, how to extract its information for throat polyps detection? We use singular-value decomposition (SVD). The SVD is an important factorization of a rectangular real or complex matrix, with many applications in signal processing and statistics. Applications which employ the SVD include computing the pseudoinverse, least squares fitting of data, matrix approximation, and determining the rank, range and null space of a matrix. Given P ∈ C N ×M (assuming N > M), and rank(P) = r ≤ M. Determine a numerical estimate r ′ of the rank of the data sets matrix P by calculating the singular value decomposition
where, U is an N ×N matrix of orthonormalized eigenvectors of PP T, V is an M ×M matrix of orthonormalized eigenvectors of P T P, and Σ is the diagonal matrix Σ = diag(σ1, σ2, …, σ r ), where σ i denotes the i th singular value of P, and σ1 ≥ σ2 ≥ ⋯ ≥ σ r > 0. Using SVD, the STFT of voices could be diagonalized, and the diagonal values in Σ could be used to represent the speaker voice power decay in the frequency domain. Generally the σ1 is much higher than σ2, and the decay from σ1 to σ2 somehow represent how a person could handle his voice freely. For illustration purpose, we plot the singular values (σ i ) (i = 1, 2, ⋯ , 10) in Fig. 54.4 for the two patients whose spectrogram were plotted in Fig. 54.3. Observe Fig. 54.4, the voice power decay rate, i.e., \(Pd = {\sigma }_{1} - {\sigma }_{2}\), is higher for a normal person than that of a patient with throat polyps, which means that a normal person could handle his/her voices more freely (with higher power changes from one frequency to another frequency). So voice power decay rate could be used as an identifier on throat polyps detection. In this paper, we will use the vowel /a:/ and /i:/ power decay rate in fuzzy classifiers for throat polyps detection.
4 Bayesian Classifier for Throat Polyps Detection
Bayesian decision theory [2] provides the optimal solution to the general decision-making problem. We assume that each patient has equal probability to have throat polyps, i.e., H 1: Polyps, and H 2: Normal, so \(p({H}_{1}) = p({H}_{2}) = 0.5\). If each transformed vowel voice samples (/a:/ and /i:/) of patient j follows Gaussian distribution, \({\rm X_{j} \mathop =\limits^{\Delta} [\mathop x \nolimits_{j}^{a} \mathop x\nolimits_{j}^{i}]^T}\)stands for the samples from patient j for vowel /a:/ and /i:/, then
where \({\rm m_{j} \mathop =\limits^{\Delta} [\mathop m\nolimits_{j}^{a} \mathop m\nolimits_{j}^{i}]^T}\) and Σ j = diag{σ j a 2, σ j i 2} are the mean vector (2 ×1) and covariance matrix (2 ×2) of x j . In this case,
Based on Bayes decision theory, since \(p({H}_{1}) = p({H}_{2}) = 0.5\), we obtain the decision rule:
This Bayesian polyps detector will be used in Sect. 54.5.
5 Simulations
We extract the general features and behavior of /a:/ and /i:/ voices for 20 patients, of which 10 have throat polyps and 10 have no throat polyps, and determine one discriminant rule for each patient in the domain of interest. In choosing the antecedents of the fuzzy classifier, we make full use of the statistical knowledge (mean and std) obtained from the patient voices. We used 100,000 samples in vowel /a:/ and /i:/ respectively to establish a discriminant rule for each patient. All-in-all, we obtained 20 rules, one per patient.
To evaluate the performance of the two fuzzy detectors, we used another group of 20 patients (testing group), which has no overlap with the first group of 20 patients whose vowel samples were used for fuzzy rules. By this means, it would help to demonstrate that our classifiers are robust. We also collected 100,000 voice samples for /a:/ and /i:/ respectively for each patient in the testing group. To demonstrate that our classifiers are able to detect throat polyps using a small number of samples, we made our detection based on every 5,000 samples, with 20 independent detections (20 ×5, 000) for each patient. During testing, we obtain the mean m t = [m a t, m i t] for each 5,000 /a:/ and /i:/ samples.
5.1 Design of three Throat Polyps Detectors
5.1.1 Design of Type-1 Fuzzy Polyps Detector
For a type-1 fuzzy classifier, the lth rule, R l, is (l = 1, ⋯ , 10):
R l: IF the transformed /a:/ voice is F1 l and the transformed /i:/ voice is F2 l and /a:/ power decay rate is F3 l and /i:/ power decay rate is F4 l THEN this patient has throat polypus ( + 1) [or throat normal ( − 1)].
The antecedents F k l (k = 1, 2, 3, 4) are described by a type-1 Gaussian MF whose mean, m p l, and std, σ p l, are determined by known patient voice samples. More specifically, m 1 l and σ1 l are the mean and std of voice /a:/ samples in the 100,000 samples of patient l in the first group; m 2 l and σ2 l are the mean and std of /i:/ samples in the 100,000 samples of patient l in the first group. To determine m 3 l, σ3 l, m 4 l, and σ4 l, we partition the voice samples into ten segments for /a:/ and /i:/ respectively, and obtain the STFT of each segment. Then apply SVD to the STFT matrix to obtain the power decay rate for each segments. The mean and std of the ten power decay rates are m 3 l (m 4 l) and σ3 l (σ4 l). The consequent corresponds to \({y}^{l} = +1\) (polypus) or \({y}^{l} = -1\) (normal) in the fuzzy detector.
For a type-1 fuzzy detector, its input, m t = [m a t, m i t, Pd a , Pd i ], is obtained from 5,000 vowel samples from patient in the testing group. Pd a and Pd i are the power decay rate for /a:/ and /i:/.
5.1.2 Design of Type-2 Fuzzy Polyps Detector
For type-2 fuzzy classifiers, the lth rule, R l, is (l = 1, ⋯ , 10):
R l: IF the transformed /a:/ voice is \({\tilde{\mbox{ F}}}_{1}^{l}\) and the transformed /i:/ voice is \({\tilde{\mbox{ F}}}_{2}^{l}\) and /a:/ power decay rate is F3 l and /i:/ power decay rate is F4 l THEN this patient has throat polypus ( + 1) [or throat normal ( − 1)].
The antecedents \({\tilde{\mbox{ F}}}_{k}^{l}\) (k = 1, 2, 3) are described by a type-2 MF, i.e., a Gaussian MF with uncertain mean, whose mean m k l ∈ [m k1 l, m k2 l] and std σ k l are determined by the voice samples of patients in the first group. F3 l and F4 l are same as those in type-1 fuzzy detector.
More specifically, σ k l (k = 1, 2) are determined using the same method as described in Sect. 54.5.1.1, and m k1 l and m k2 l are determined as follows. We divided the 100,000 frames of the lth known patient into 10 equal-length (10,000 samples) segments, and computed the mean m 1 lj of /a:/ samples in the jth segment (j = 1, ⋯ , 5). Let
so [m 11 l, m 12 l] is the range of uncertain mean of /a:/ voice samples of the lth known patient. We obtained the ranges of uncertain mean of /i:/ samples ([m 21 l, m 22 l]) in a similar manner.
For a type-2 fuzzy detector, its input, m t = [m a t, m i t, Pd a , Pd i ]], is obtained from 5,000 vowel samples from patient in the testing group.
5.1.3 Design of Bayesian Classifier
Observe from (54.6), that the Bayesian classifier needs m i = [m i a, m i i]T and Σ i = diag{σ i a 2, σ i i 2}. In our design, m i a and σ i a are the mean and std of vowel /a:/ in the 100,000 samples of patient i in the first group; similarly, m i i and σ i i are the mean and std of vowel /i:/ in the 100,000 samples of patient i in the first group; and, its input \({\rm x \mathop =\limits^{\Delta} m^{t}}\), where m t is obtained from the mean value of 5,000 voice samples from a patient in the testing group.
5.2 Performance Analysis
We computed the average probability of miss detection (p r (ε)) for each fuzzy detector as well as for the Bayesian detector in 20 ×20 = 400 independent classifications (20 patients each with 20 5,000-sample segments), and please be aware that the voices of the first group patients were used to design the fuzzy rules, and the testing group of patients have no overlap with the first group. Simulations show that p r (ε) = 25 % for Bayesian classifier, p r (ε) = 18 % for type-1 fuzzy classifier, and p r (ε) = 14 % for type-2 fuzzy classifier.
6 Conclusions
Based on human voice samples and Hidden Markov Model, we showed that transformed voice samples (linearly combined samples) follow Gussian distribution, further we demonstrated that a type-2 fuzzy MF, i.e., a Gaussian MF with uncertain mean, is most appropriate to model the transformed voices samples. We also applied STFT and SVD to the vowel voice samples, and observe that the voice power decay rate could be used as an identifier in throat polyps detection. Two fuzzy classifiers and a Bayesian classifier were designed for throat polyps detection based on human vowel voices /a:/ and /i:/ only, and the fuzzy classifiers are compared against the Bayesian classifier. Simulation results showed that an interval type-2 fuzzy classifier performs the best of the three classifiers.
References
de Oliveira Rosa M, Pereira JC, Grellet M (2000) Adaptive estimation of residue signal for voice pathology diagnosis. IEEE Trans Biomed Eng 47(1):96–104
Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2):257286
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media New York
About this paper
Cite this paper
Zhong, Z., Chen, Z., Liang, Q., Xiao, S. (2012). Throat Polyps Detection Based on Patient Voices. In: Liang, Q., et al. Communications, Signal Processing, and Systems. Lecture Notes in Electrical Engineering, vol 202. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5803-6_54
Download citation
DOI: https://doi.org/10.1007/978-1-4614-5803-6_54
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5802-9
Online ISBN: 978-1-4614-5803-6
eBook Packages: EngineeringEngineering (R0)