Keywords

1 Introduction

Throat polyp detection is a field that demands more investigation. It is common to have throat polyps and to be completely unaware of them, particularly if they are fairly small. Traditionally, the methods of diagnosis are indirect laryngoscope, video-laryngoscope, and stroboscope light [1]. These polyps then break off and disappear inside the body or clear up by themselves. However, throat polyps can increase in size to the extent that they affect a person’s ability to speak. Furthermore, most of these methods need special instrument, and mainly depend on the experience of the pathologists. Also, the patients will feel uncomfortable pain usually. It would be desirable if throat polyps could be detected based on the patient vowel voices only [2].

Traditional pattern recognition techniques such as Bayesian classifier, known as the optimal classifier, could be used if the voice samples follow certain distribution, and this belongs to model-based statistical processing. In [3], the statistical characteristic root-mean square-delay spread and standard deviation were employed to describe the speech frequency domain characteristic and used as two antecedents. The Fuzzy logic system was used to make polyp patients’ diagnosis. The results demonstrated that the proposed method could detect the throat polyps with low prob-48 ability of miss detection and 0 % false alarm rate. In [4], some methods of speech analysis for the diagnosis of the laryngeal function have been discussed. In humans’ voices, the voice amplitude is highly bursty, and we believe that no statistical model can really demonstrate the uncertain nature of the voice [5].

Because of the complexity and unpredictability of voice, the data and information in many aspects, such as analysis, making discissions are non linear connection and complex. As an important branch of artificial intelligence, Neural network implements a mapping function from the input to the output, which is proved by mathematical theory that three layer neural network can approximate any nonlinear continuous function with arbitrary precision [6]. This makes it particularly suitable for solving complex problems, namely that has strong nonlinear mapping ability.

This paper builds a BP neural network, which realizes the judgment that whether patients have throat polyp. Some popular industry technologies, including normalization processing, principal component analyzing and neural network classifying are appropriately combined.

2 Theory

2.1 Principal Component Analysis

There are random variables X 1, X 2, …, X p , whose standard deviations of the sample is recorded as S 1, S 2, …, S p . First, standardization transformation is Cj = aj1 × 1 + aj2 × 2 + … + ajp × p , j = 1, 2, …, p. We have the following definitions:

  • If C1 = a11 × 1 + a12 × 2 + … + a1p × p, and Var(C1) is the biggest, C1 is called the first principal component;

  • If C2 = a21 × 1 + a22 × 2 + … + a2p × p, and (a21, a22, …, a2p) perpendicular to the (a11, a12, …, a1p), and Var(C2) is the biggest, C2 is called the second principal component;

Similarly, there is a third, fourth, fifth … the main ingredient, and at most p points.

Principal component analysis is a statistical approach allows to reduce the dimensionality, which is implemented by an orthogonal transformation translating the related component of the original random vector into uncorrelated components of new random vector. On the respect of algebra, it means that the covariance matrix of original random vector is transformed into a diagonal matrix. In geometry, it means that the original coordinate system is transformed into a new orthogonal coordinate system, which point to the P orthogonal directions of the sample points that spread most open. Then, the multidimensional variable system will reduce the dimension [7]. The math algorithm is as follows.

P dimensional random vectors of standard collection of the original data is x = (x1, x2, …, xp)T, n samples is x i  = (x i1, x i2, …, x ip )T, i = 1, 2, …, n

When n > p, construct the sample matrix, and make the standard transformation to the sample array element as follows:

$$ Z=\frac{x_{ij}-{\overline{x}}_j}{s_j},i=1,2,\dots, n;j=1,2,\dots, p $$
(89.1)
$$ {\overline{x}}_j=\frac{{\displaystyle {\sum}_{i=1}^n{x}_{ij}}}{n},{s}^2j=\frac{{\displaystyle {\sum}_{i=1}^n{\left({x}_{ij}-{\overline{x}}_j\right)}^2}}{n-1} $$
(89.2)

and the standardization matrix is Z.

Second, demand the correlation coefficient matrix of Z.

$$ R={\left[{r}_{ij}\right]}_p xp=\frac{Z^TZ}{n-1} $$
(89.3)
$$ {r}_{ij}=\frac{{\displaystyle \sum {z}_{kj}\cdot {z}_{kj}}}{n-1},i,j=1,2,\dots, p $$
(89.4)

Third, demand the characteristic equation of sample correlation matrix R, which is |R − λI p | = 0, have p characteristic roots to determine the main ingredient. Then, according to \( \frac{{\displaystyle {\sum}_{j=1}^m{\lambda}_j}}{{\displaystyle {\sum}_{j=1}^p{\lambda}_j}}\ge 0.85 \) to determine the value of m, and ensure the utilization rate of information is more than 85 %, Solution of equations, Rb = λ j b, for each λ j , j = 1, 2, …, m, get the unit eigenvector b o j .

Last, conversion the normalized indicator variables to main component U ij  = z T i b o j , j = 1, 2, …, m, U 1 is the first main ingredient.

2.2 Neural Network Algorithm

Artificial neural network has the characteristics of self-adaption, self-organization and self-learning. It is already well known that an ANN consists of a number of artificial neurons and connections among them. An artificial neuron is generally regarded as a nonlinear device with multiple inputs and a single output. An ANN model is shown in Fig. 89.1.

Fig. 89.1
figure 1

An ANN mode

Where xn(t) is the output of the n ‐ th neuron at time t which is also the n ‐ th input to the i ‐ th neuron at the same time, win is the weight representing the connection strength between the n ‐ th and i ‐ th neuron, net(t) is the net total input to the i ‐ th neuron at time t, ai(t) is the activation of the i ‐ th neuron at time t which is a function of neti(t), θ i is the threshold of the i ‐ th neuron, and yi(t + 1) is the output which is a nonlinear function of ai(t) and θ i, as shown in Eqs. 89.5, 89.6, and 89.7 respectively [8].

$$ ne{t}_i(t)={\displaystyle \sum_{n=1}^N{W}_{in}{x}_n(t)} $$
(89.5)
$$ {a}_i(t)=g\left(ne{t}_i(t)\right) $$
(89.6)
$$ {y}_i\left(t+1\right)=f\left({a}_i(t),{\theta}_i\right) $$
(89.7)

BP neural network algorithm is the most widely used, which is based on the error back propagation to adjust the network weights and thresholds constantly, and establishes a network with the minimum of sum of squared errors. Actually, the data flow information is positive, but the error propagation is reversed [9].

3 Experiment and Simulation

MATLAB is a useful industrial, educational and research tool, which can enough to help users find what can be done and what not to do, which can help develop and broaden the field of neural networks work. In this lab, each person has two sound samples, /a:/and /i:/. In MATLAB, the function of premnmx is used to normalize the original data between −1 and 1, and the function of princomp is used to extract the characteristic values. We take five characteristic values from /a:/and /i:/ sample of each person in a column as one input of net. There are 18 training samples, and 15 testing samples [10].

3.1 The Result of Experiment

3.1.1 The Result of the Data Normalization

The main idea of data normalization is to define the data within a fixed range, by setting up a normalization factor, which is subtracted by the original data, and at same time, to eliminate systematic errors in the experimental data. In this experiment, the collected data samples of voice normalized between −1 and 1. The normalized data will be very convenient, and ensure a faster convergence during running programs in subsequent processing. Figure 89.2 is a sample in which the data distribution of data normalization.

Fig. 89.2
figure 2

The data distribution of data normalization

3.1.2 The Result of Principal Component Analysis

In the samples, the original data is a matrices of n rows and 1 column, after matrix deformation, they become a matrix of m rows and 100 columns. Principal component analysis was performed on the raw data, and we get the triangular matrix of 100 rows and 100 columns and achieve the effect of reducing the dimensions. In the MATLAB, we can obtain the characteristics of each vector value directly.

Dimension reduction will cause a loss of information. However, the loss of information is rarely. Because the main part of the information is extracted. Figure 89.3 shows the main ingredients of one sample.

Fig. 89.3
figure 3

The main ingredients of one sample

3.1.3 The Result of Neural Network Classifying

The experiment choose Back Propagation network algorithm, which is the most widely used. And learning rules is quasi-Newton method.

Compared with the standard steepest descent method, quasi-Newton method has a high convergence speed in the vicinity of the area closed to the optimal solution, and we can improve the learning speed of neural networks. In addition, iterative direction of Quasi-Newton method has conjugation, which has a limited secondary termination. In fact, this property is one of evaluation standards to judge a algorithm whether is good or bad. If a convergence algorithm does not have this property, then it would be difficult to have the super linear convergence speed.

The experiment uses eight-layer hidden layer. Increasing the number of layers can reduce the learning error and improve the training accuracy. But, network becomes complex, and training time of the network weights will be longer.

The expected outputs of training sample is [1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0], and the actual output is [0.9975, 0.9973, 0.9973, 0.9975, 0.9974, 0.9961, 0.9902, 0.9975, 0.9975, 0.9971, 0.0611, 0.0339,0.0200, 0.0560, 0.0339, 0.0339, 0.0129, −0.0103], in which illness output is 1, not illness output is 0. The result of training is shown in Figs. 89.4 and 89.5.

Fig. 89.4
figure 4

The output of training

Fig. 89.5
figure 5

The target error of training

3.2 The Result of Simulation

There are 15 testing samples. The expected output is [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0], and the real output is [0.9975 0.9975 0.9967 0.9797 0.0339 0.9975 0.0070 0.0273 0.9267 0.9975 0.9922 0.0069 0.9975 0.0345 0.3247]. The result is shown in Figs. 89.6 and 89.7.

Fig. 89.6
figure 6

The result of simulation of 10-dimensional

Fig. 89.7
figure 7

The result of simulation of 1-dimensional

3.3 Error Analysis

The results of simulation displays that the correct rate of prediction is believable under different number of samples and different random measurement matrices. By calculating, It can be drawn that the of the sick can reach 70 %, and the judgment accuracy of the health rate is up to 60 %. So, the judgment accuracy of wether is sick or health is about 67 %.

Conclusion

In this experiment, we analyze the data, extract its characteristic values, and then use the neural network learning to train the samples. We get the classification data. The simulation results of test samples are also ideal, indicating that the experiment can be replicated in reality to alleviate the pain of patients in the process of diagnosis.

The results shows that the correct rate of prediction is stable under different number of samples and different random measurement matrices. But However, more voice data should be sampled in order to reach a better diagnosis result, and the test accuracy still need to be improved by improved algorithm. We will continue to study in the future.