Throat Polyps Detection Based on Patient Voices

Zhong, Zhen; Chen, Zhangliang; Liang, Qilian; Xiao, Shuifang

doi:10.1007/978-1-4614-5803-6_54

Zhen Zhong⁸,
Zhangliang Chen⁹,
Qilian Liang¹⁰ &
…
Shuifang Xiao⁸

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 202))

1802 Accesses
4 Citations

Abstract

In this paper, we present a new approach for throat polyps detection based on patient’s vowel voices using fuzzy classifiers. Based on human voice samples and Hidden Markov Model, we show that transformed voice samples (linearly combined samples) follow Gussian distribution, further we demonstrate that a type-2 fuzzy membership function (MF), i.e., a Gaussian MF with uncertain mean, is most appropriate to model the transformed voices samples. We also apply Short-Time-Fourier-Transform (STFT) and Singular-Value-Decomposition (SVD) to the vowel voice samples, and observe that the power decay rate could be used as an identifier in throat polyps detection. Two fuzzy classifiers and a Bayesian classifier are designed for throat polyps detection based on human vowel voices /a:/ and /i:/ only, and the fuzzy classifiers are compared against the Bayesian classifier. Simulation results show that an interval type-2 fuzzy classifier performs the best of the three classifiers.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Throat Polyp Detection Based on the KPCA and Neural Network Pattern Recognition

Throat Polyp Detection Based on the Neural Network Classification Algorithm

Separação de Sons Adventícios Descontínuos de Sons Respiratórios Utilizando Lógica Fuzzy

Keywords

1 Introduction

The throat polyps detection is a field which demands more investigation. Traditionally, the methods of diagnosis are indirect laryngoscope, video-laryngoscope, and stroboscope light [1]. However, most of these methods need special instrument, and mainly depend on the experience of the pathologists. It would be desirable if throat polyps could be detected based on the patient vowel voices only. Traditional pattern recognition techniques such as Bayesian classifier, known as the optimal classifier, could be used if the voice samples follow certaain distribution, and this belongs to model-based statistical processing. In human’s voices, the voice amplitude is highly bursty, and we believe that no statistical model can really demonstrate the uncertain nature of the voice. Fuzzy logic systems (FLS) are model free. Their membership functions are not based on statistical distributions. In this paper, we, therefore, apply fuzzy techniques to polyps patient diagnosis.

In Sect. 54.2, we model voice samples using interval type-2 Gaussian membership function. In Sect. 54.3, we apply STFT and SVD to voice samples. In Sect. 54.4, a Bayesian classifier is proposed. Performances of the three classifiers are evaluated in Sect. 54.5. Conclusions are presented in Sect. 54.6.

2 Modeling Voice Samples Using Hidden Markov Model and GaussianPrimary MF with Uncertain Mean

In [3], autoregressive Hidden Markov Model (HMM) was used to represent voice samples x _i, which means we could have

$${x}_{k} = -\sum\limits_{i=1}^{p}{b}_{ i}{x}_{k-i} + {n}_{k}$$

(54.1)

where n _k is Gaussian noise, and b _i (i = 1, 2, ⋯ , p) are the autoregression coefficients where p is autoregressive order. So

$${x}_{k} -\sum\limits_{i=1}^{p}{c}_{ i}{x}_{k-i} = {n}_{k}$$

(54.2)

where ${c}_{i} = -{b}_{i}$. Which means the difference between samples (or their linear combinations) follows Gaussian distribution.

Based on the voice data we have collected, we observed that the vowel /a:/ samples (x _k) don’t follow Gaussian distribution, as illustrated in Fig. 54.1a, but when we choose p = 5, ${c}_{1} = {c}_{2} = {c}_{3} = {c}_{4} = 0$, c ₅ = 1, i.e.,

$${x}_{k} - {x}_{k-5} = {n}_{k}^{a}$$

(54.3)

the new sequences follow Gaussian distribution, as illustrated in Fig. 54.1b. Similarly, we observed that the vowel /i:/ samples (x _k) don’t follow Gaussian distribution, as illustrated in Fig. 54.2a, but if we choose c ₁ = 1 p = 1, i.e.,

$${x}_{k} - {x}_{k-1} = {n}_{k}^{i}$$

(54.4)

follows Gaussian distribution, as illustrated in Fig. 54.2.

We, therefore, tried to model the the new transformed voice sequences n _k ^a and n _k ⁱ, to see if a Gaussian MF can match its nature. For n _k ^a and n _k ⁱ from each subject (human) for 100,000 samples, we equally separate it into ten segments, and computed the mean m _i and std σ_i of the ith segment, i = 1, 2, ⋯ , 10. We also computed the mean m and std σ of the entire sequence (100,000 samples). To see which value – m _i or σ_i – varies more, we normalized the mean and std of each segment using m _i ∕ m, and σ_i ∕ σ, and we then computed the std of their normalized values, σ_m and σ_std. We observed that σ_m ≫ σ_std. We conclude, therefore, that if the transformed voice samples of each segment (short range) of the voice samples are Gaussian distributed, then the transformed voice samples in an entire video trAff0054ic (long range) is more appropriately modeled as a Gaussian with uncertain mean. This justifies the use of the Gaussian MFs with uncertain means to model the transformed voice samples.

3 Identifying Polyps Patient Voice Using Short-Time Fourier Transform andSingular-Value Decomposition

STFT uses a slide window to determine the sinusoidal frequency and phase content of a signal as it changes over time. The STFT of the voices is a matrix, how to extract its information for throat polyps detection? We use singular-value decomposition (SVD). The SVD is an important factorization of a rectangular real or complex matrix, with many applications in signal processing and statistics. Applications which employ the SVD include computing the pseudoinverse, least squares fitting of data, matrix approximation, and determining the rank, range and null space of a matrix. Given P ∈ C ^{N ×M} (assuming N > M), and rank(P) = r ≤ M. Determine a numerical estimate r ^′ of the rank of the data sets matrix P by calculating the singular value decomposition

$$P = U\left [\begin{array}{lll} \Sigma&0\\0 &0\\\end{array} \right ]{V }^{T},$$

(54.5)

where, U is an N ×N matrix of orthonormalized eigenvectors of PP ^T, V is an M ×M matrix of orthonormalized eigenvectors of P ^T P, and Σ is the diagonal matrix Σ = diag(σ₁, σ₂, …, σ_r), where σ_i denotes the i ^th singular value of P, and σ₁ ≥ σ₂ ≥ ⋯ ≥ σ_r > 0. Using SVD, the STFT of voices could be diagonalized, and the diagonal values in Σ could be used to represent the speaker voice power decay in the frequency domain. Generally the σ₁ is much higher than σ₂, and the decay from σ₁ to σ₂ somehow represent how a person could handle his voice freely. For illustration purpose, we plot the singular values (σ_i) (i = 1, 2, ⋯ , 10) in Fig. 54.4 for the two patients whose spectrogram were plotted in Fig. 54.3. Observe Fig. 54.4, the voice power decay rate, i.e., $Pd = {\sigma }_{1} - {\sigma }_{2}$, is higher for a normal person than that of a patient with throat polyps, which means that a normal person could handle his/her voices more freely (with higher power changes from one frequency to another frequency). So voice power decay rate could be used as an identifier on throat polyps detection. In this paper, we will use the vowel /a:/ and /i:/ power decay rate in fuzzy classifiers for throat polyps detection.

4 Bayesian Classifier for Throat Polyps Detection

Bayesian decision theory [2] provides the optimal solution to the general decision-making problem. We assume that each patient has equal probability to have throat polyps, i.e., H ₁: Polyps, and H ₂: Normal, so $p({H}_{1}) = p({H}_{2}) = 0.5$. If each transformed vowel voice samples (/a:/ and /i:/) of patient j follows Gaussian distribution, ${\rm X_{j} \mathop =\limits^{\Delta} [\mathop x \nolimits_{j}^{a} \mathop x\nolimits_{j}^{i}]^T}$stands for the samples from patient j for vowel /a:/ and /i:/, then

$$p({\mathbf{x}}_{j}\vert {v}_{j}) = \frac{1} {(2\pi ){\vert {\Sigma }_{\mathbf{j}}\vert }^{1/2}}\exp [-\frac{1} {2}{({\mathbf{x}}_{j} -{\mathbf{m}}_{j})}^{T}{{\Sigma }_{\mathbf{j}}}^{-1}({\mathbf{x}}_{ j} -{\mathbf{m}}_{j})]$$

(54.6)

where ${\rm m_{j} \mathop =\limits^{\Delta} [\mathop m\nolimits_{j}^{a} \mathop m\nolimits_{j}^{i}]^T}$ and Σ_j = diag{σ_j ^a ², σ_j ⁱ ²} are the mean vector (2 ×1) and covariance matrix (2 ×2) of x _j. In this case,

$$\begin{array}{lll} p(\mathbf{x}\vert {H}_{1})& = \sum\limits_{i=1}^{10}p(\mathbf{x}\vert {v}_{ i})p({v}_{i})\end{array}$$

(54.7)

$$\begin{array}{lll}p(\mathbf{x}\vert {H}_{2})& = \sum\limits_{i=11}^{20}p(\mathbf{x}\vert {v}_{ i})p({v}_{i})\end{array}$$

(54.8)

Based on Bayes decision theory, since $p({H}_{1}) = p({H}_{2}) = 0.5$, we obtain the decision rule:

$$\begin{array}{lll} \text{ Claim throat polyps if} p(\mathbf{x}\vert {H}_{1}) > p(\mathbf{x}\vert {H}_{2})\end{array}$$

(54.9)

$$\begin{array}{lll} \text{ No throat polyps if} p(\mathbf{x}\vert {H}_{1}) < p(\mathbf{x}\vert {H}_{2})\end{array}$$

(54.10)

$$\begin{array}{lll} \mbox{ Not sure if} p(\mathbf{x}\vert {H}_{1}) = p(\mathbf{x}\vert {H}_{2})\end{array}$$

(54.11)

This Bayesian polyps detector will be used in Sect. 54.5.

5 Simulations

We extract the general features and behavior of /a:/ and /i:/ voices for 20 patients, of which 10 have throat polyps and 10 have no throat polyps, and determine one discriminant rule for each patient in the domain of interest. In choosing the antecedents of the fuzzy classifier, we make full use of the statistical knowledge (mean and std) obtained from the patient voices. We used 100,000 samples in vowel /a:/ and /i:/ respectively to establish a discriminant rule for each patient. All-in-all, we obtained 20 rules, one per patient.

To evaluate the performance of the two fuzzy detectors, we used another group of 20 patients (testing group), which has no overlap with the first group of 20 patients whose vowel samples were used for fuzzy rules. By this means, it would help to demonstrate that our classifiers are robust. We also collected 100,000 voice samples for /a:/ and /i:/ respectively for each patient in the testing group. To demonstrate that our classifiers are able to detect throat polyps using a small number of samples, we made our detection based on every 5,000 samples, with 20 independent detections (20 ×5, 000) for each patient. During testing, we obtain the mean m ^t = [m _a ^t, m _i ^t] for each 5,000 /a:/ and /i:/ samples.

5.1 Design of three Throat Polyps Detectors

5.1.1 Design of Type-1 Fuzzy Polyps Detector

For a type-1 fuzzy classifier, the lth rule, R ^l, is (l = 1, ⋯ , 10):

R ^l: IF the transformed /a:/ voice is F₁ ^l and the transformed /i:/ voice is F₂ ^l and /a:/ power decay rate is F₃ ^l and /i:/ power decay rate is F₄ ^l THEN this patient has throat polypus ( + 1) [or throat normal ( − 1)].

The antecedents F_k ^l (k = 1, 2, 3, 4) are described by a type-1 Gaussian MF whose mean, m _p ^l, and std, σ_p ^l, are determined by known patient voice samples. More specifically, m ₁ ^l and σ₁ ^l are the mean and std of voice /a:/ samples in the 100,000 samples of patient l in the first group; m ₂ ^l and σ₂ ^l are the mean and std of /i:/ samples in the 100,000 samples of patient l in the first group. To determine m ₃ ^l, σ₃ ^l, m ₄ ^l, and σ₄ ^l, we partition the voice samples into ten segments for /a:/ and /i:/ respectively, and obtain the STFT of each segment. Then apply SVD to the STFT matrix to obtain the power decay rate for each segments. The mean and std of the ten power decay rates are m ₃ ^l (m ₄ ^l) and σ₃ ^l (σ₄ ^l). The consequent corresponds to ${y}^{l} = +1$ (polypus) or ${y}^{l} = -1$ (normal) in the fuzzy detector.

For a type-1 fuzzy detector, its input, m ^t = [m _a ^t, m _i ^t, Pd _a, Pd _i], is obtained from 5,000 vowel samples from patient in the testing group. Pd _a and Pd _i are the power decay rate for /a:/ and /i:/.

5.1.2 Design of Type-2 Fuzzy Polyps Detector

For type-2 fuzzy classifiers, the lth rule, R ^l, is (l = 1, ⋯ , 10):

R ^l: IF the transformed /a:/ voice is ${\tilde{\mbox{ F}}}_{1}^{l}$ and the transformed /i:/ voice is ${\tilde{\mbox{ F}}}_{2}^{l}$ and /a:/ power decay rate is F₃ ^l and /i:/ power decay rate is F₄ ^l THEN this patient has throat polypus ( + 1) [or throat normal ( − 1)].

The antecedents ${\tilde{\mbox{ F}}}_{k}^{l}$ (k = 1, 2, 3) are described by a type-2 MF, i.e., a Gaussian MF with uncertain mean, whose mean m _k ^l ∈ [m _k1 ^l, m _k2 ^l] and std σ_k ^l are determined by the voice samples of patients in the first group. F₃ ^l and F₄ ^l are same as those in type-1 fuzzy detector.

More specifically, σ_k ^l (k = 1, 2) are determined using the same method as described in Sect. 54.5.1.1, and m _k1 ^l and m _k2 ^l are determined as follows. We divided the 100,000 frames of the lth known patient into 10 equal-length (10,000 samples) segments, and computed the mean m ₁ ^lj of /a:/ samples in the jth segment (j = 1, ⋯ , 5). Let

$$\begin{array}{lll}{ m}_{11}^{l}= \min\limits_{j=1,\cdots,10}{m}_{1}^{lj}\end{array}$$

(54.12)

$$\begin{array}{lll}{m}_{12}^{l}= \max\limits_{j=1,\cdots,10}{m}_{1}^{lj}\end{array}$$

(54.13)

so [m ₁₁ ^l, m ₁₂ ^l] is the range of uncertain mean of /a:/ voice samples of the lth known patient. We obtained the ranges of uncertain mean of /i:/ samples ([m ₂₁ ^l, m ₂₂ ^l]) in a similar manner.

For a type-2 fuzzy detector, its input, m ^t = [m _a ^t, m _i ^t, Pd _a, Pd _i]], is obtained from 5,000 vowel samples from patient in the testing group.

5.1.3 Design of Bayesian Classifier

Observe from (54.6), that the Bayesian classifier needs m _i = [m _i ^a, m _i ⁱ]^T and Σ_i = diag{σ_i ^a ², σ_i ⁱ ²}. In our design, m _i ^a and σ_i ^a are the mean and std of vowel /a:/ in the 100,000 samples of patient i in the first group; similarly, m _i ⁱ and σ_i ⁱ are the mean and std of vowel /i:/ in the 100,000 samples of patient i in the first group; and, its input ${\rm x \mathop =\limits^{\Delta} m^{t}}$, where m ^t is obtained from the mean value of 5,000 voice samples from a patient in the testing group.

5.2 Performance Analysis

We computed the average probability of miss detection (p _r(ε)) for each fuzzy detector as well as for the Bayesian detector in 20 ×20 = 400 independent classifications (20 patients each with 20 5,000-sample segments), and please be aware that the voices of the first group patients were used to design the fuzzy rules, and the testing group of patients have no overlap with the first group. Simulations show that p _r(ε) = 25 % for Bayesian classifier, p _r(ε) = 18 % for type-1 fuzzy classifier, and p _r(ε) = 14 % for type-2 fuzzy classifier.

6 Conclusions

Based on human voice samples and Hidden Markov Model, we showed that transformed voice samples (linearly combined samples) follow Gussian distribution, further we demonstrated that a type-2 fuzzy MF, i.e., a Gaussian MF with uncertain mean, is most appropriate to model the transformed voices samples. We also applied STFT and SVD to the vowel voice samples, and observe that the voice power decay rate could be used as an identifier in throat polyps detection. Two fuzzy classifiers and a Bayesian classifier were designed for throat polyps detection based on human vowel voices /a:/ and /i:/ only, and the fuzzy classifiers are compared against the Bayesian classifier. Simulation results showed that an interval type-2 fuzzy classifier performs the best of the three classifiers.

References

de Oliveira Rosa M, Pereira JC, Grellet M (2000) Adaptive estimation of residue signal for voice pathology diagnosis. IEEE Trans Biomed Eng 47(1):96–104
Article Google Scholar
Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York
MATH Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2):257286
Google Scholar

Download references

Author information

Authors and Affiliations

Dept of ENT and Neck Surgery, Peking University First Hospital, Beijing, 100034, China
Zhen Zhong & Shuifang Xiao
College of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin, 300072, China
Zhangliang Chen
Dept of Electrical Engineering, University of Texas at Arlington, Arlington, TX, 76019, USA
Qilian Liang

Authors

Zhen Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Zhangliang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qilian Liang
View author publications
You can also search for this author in PubMed Google Scholar
Shuifang Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Zhong .

Editor information

Editors and Affiliations

University of Texas at Arlington, 416 Yates St, Rm 518, Box 19016, Arlington, 76019-0016, Texas, USA
Qilian Liang
College of Physical and Electronic Infor, Tianjin Normal University, Bingshui West Road, XiQing District, Tianjin, 300387, China, People's Republic
Wei Wang
College of Physical and Electronic Infor, Tianjin Normal University, Bingshui West Road, XiQing District, Tianjin, 300387, China, People's Republic
Jiasong Mu
School of electronic engineering, University of Electronic Science and Tec, Xiyuan Road, Gaoxin District, Chengdu, 611731, China, People's Republic
Jing Liang
College of Physical and Electronic Infor, Tianjin Normal University, Bingshui West Road, XiQing District, Tianjin, 300387, China, People's Republic
Baoju Zhang
School of Electronic Engineering, University of Electronic Science and Tec, Xiyuan Road, Gaoxin district, Chengdu, 611731, China, People's Republic
Yiming Pi
School of Information and Communication, Beijing University of Posts and Telecomm, Xitucheng Road 10, Beijing, 100876, China, People's Republic
Chenglin Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhong, Z., Chen, Z., Liang, Q., Xiao, S. (2012). Throat Polyps Detection Based on Patient Voices. In: Liang, Q., et al. Communications, Signal Processing, and Systems. Lecture Notes in Electrical Engineering, vol 202. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5803-6_54

Download citation

DOI: https://doi.org/10.1007/978-1-4614-5803-6_54
Published: 30 October 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5802-9
Online ISBN: 978-1-4614-5803-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Throat Polyps Detection Based on Patient Voices

Abstract

Similar content being viewed by others

Throat Polyp Detection Based on the KPCA and Neural Network Pattern Recognition

Throat Polyp Detection Based on the Neural Network Classification Algorithm

Separação de Sons Adventícios Descontínuos de Sons Respiratórios Utilizando Lógica Fuzzy

Keywords

1 Introduction

2 Modeling Voice Samples Using Hidden Markov Model and GaussianPrimary MF with Uncertain Mean

3 Identifying Polyps Patient Voice Using Short-Time Fourier Transform andSingular-Value Decomposition

4 Bayesian Classifier for Throat Polyps Detection

5 Simulations

5.1 Design of three Throat Polyps Detectors

5.1.1 Design of Type-1 Fuzzy Polyps Detector

5.1.2 Design of Type-2 Fuzzy Polyps Detector

5.1.3 Design of Bayesian Classifier

5.2 Performance Analysis

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Throat Polyps Detection Based on Patient Voices

Abstract

Similar content being viewed by others

Throat Polyp Detection Based on the KPCA and Neural Network Pattern Recognition

Throat Polyp Detection Based on the Neural Network Classification Algorithm

Separação de Sons Adventícios Descontínuos de Sons Respiratórios Utilizando Lógica Fuzzy

Keywords

1 Introduction

2 Modeling Voice Samples Using Hidden Markov Model and GaussianPrimary MF with Uncertain Mean

3 Identifying Polyps Patient Voice Using Short-Time Fourier Transform andSingular-Value Decomposition

4 Bayesian Classifier for Throat Polyps Detection

5 Simulations

5.1 Design of three Throat Polyps Detectors

5.1.1 Design of Type-1 Fuzzy Polyps Detector

5.1.2 Design of Type-2 Fuzzy Polyps Detector

5.1.3 Design of Bayesian Classifier

5.2 Performance Analysis

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation