Keywords

1 Introduction

Dealing with expert (human) knowledge consideration, intelligent diagnosis systems or Computer-Aided Diagnosis (CAD) dilemma is one of the most interesting, but also one of the most difficult problems. Among difficulties contributing to challenging nature of this problem, one can mention the need of several knowledge representations, fine classification and decision-making with a certain degree of reliability.

In many applications of interest, it is desirable for the system to not only identify the possible causes of the problem, but also to suggest suitable remedies (systems capable of advising) or to give a reliability rate of the identification of possible causes.

Recently, several decision support systems and intelligent systems have been developed [9, 10] and the diagnosis approaches based on such intelligent systems have been developed for biomedicine applications [11,12,13,14,15]. Indeed, several approaches have been developed to analyze and classify biomedicine signals: electroencephalography signals [12], electrocardiogram signals [13], and particularly signals based on Auditory Brainstem Response (ABR) test, which is a test for hearing and brain (neurological) functioning [11, 16,17,18].

The analysis and recognition of ABR signals is a medical problem of great importance, since it is the best known technique of the auditory organs evaluation. The task of construction of fully automatic method of ABR recognition present considerable technical difficulties, because the signals are in general hardly readable, and in particular the evaluation of the data part obtained for low intensities of the audio stimulus is especially difficult. It can be assumed that the methods of analysis and recognition of ABR signals can be of some interest to other investigators, not necessarily directly interested in audiology, but trying to cope with the difficulties of interpretation and recognition of totally different signals.

The aim of this work is absolutely not to replace specialized human but to suggest a decision support system with a satisfactory reliability degree for CAD systems. We present in this paper, an original approach which is suggested for CAD systems and applied in biomedicine to auditory diagnosis, based on ABR test.

We propose to use the Bayesian models for classification of electrical signals, with come from a medical test, these are called Auditory Evoked Potentials (AEP). AEP are scalp-recorded electrical responses of the brain elicited by acoustical stimuli. Indeed, since about twenty years, the otoneurology functional exploration possesses a tool to analyze objectively the state of the nervous conduction of additive pathway. The AEP’s classification is a first step in the development of a diagnosis tool assisting the medical expert. The classification of these signals presents some problems, because of the difficulty to distinguish one class of signal from the others. The results can be different for different test session for the same patient. Today, taking into account the progress accomplished in the area of intelligent computation or artificial intelligence, it becomes conceivable to develop a diagnosis tool assisting the medical expert. One of the first steps in the development of such tool is the AEP signal classification.

Then we proposed to use the Hidden Markov Models (HMM) and we attempt to illustrate some applications of this theory to real problems to match complex patterns problems as those related to AEP biomedical diagnosis or those linked to social behavior modeling. We focus also on the K-Means clustering algorithm which it is one of the most used iterative partitioned clustering algorithms based in vector quantization. In particularly, we review the theory of discrete HMM and show how the concept of hidden states, where the observation sequences provided using the k-means algorithm, can be used effectively for AEP classification.

In the pattern recognition domain, HMM techniques hold an important place, there are two reasons why the HMM has occurred. First the models are very tick in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Second the models, when applied properly, work very well in practice for several important applications. Nowadays, HMM are considered as a specific form of dynamic Bayesian networks based on the theory of Bayes [22]. They are a dominant technique for sequence analysis and they owe their success to the existence of many efficient and reliable algorithms.

HMM are used in many areas in modern sciences or engineering applications, e.g. in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges. Other areas where the use of HMM and derivatives becomes more and more interesting are biosciences, bioinformatics and genetics [1925].

The organization of this paper is as follows. In the Sects. 2 and 3, we introduce the theory and the foundation of HMM and vector quantization. In the Sect. 4, we present the AEP signals and we describe our biomedical pattern classifier, implemented with HMM and vector quantization ideas. In the same section, we present the classification results obtained by using a database of 213 AEP like waveforms. A comparison to alternative implementations using neural networks methods is presented. Finally in Sect. 5 we summarize the ideas, discus the presented technique’s potential to deal with social behavior modeling and give the prospects that follow from our work.

2 Foundation of HMM

An HMM system is typically characterized by the following quantities [6, 7].

2.1 Elements of an HMM

We define the following notation for an HMM:

  1. 1.

    A set of states S i , i ∊ [1, N] that are unobservable though there is often a physical meaning attached to them.

  2. 2.

    A set of M observations. In a discrete HMM [5]. M is the number of codebook vectors, or the number of all possible observations. This implies that any observation, v t , is quantized into the set {x 1 , x 2 , … x m } where x m is the m th codebook vector (to see Sect. 3).

  3. 3.

    A set of state “transition” probabilities represented by matrix A = [a ij ] where a ij  = P(q t  = S j | q t1  = S i , λ) with qt being the state visited at time t, S i is state i and λ is the model defined by the object class and the corresponding training data.

  4. 4.

    A set of observation probabilities represented by matrix B = [b i (x k )] where b i (x k ) = P(x k \ q t  = S i , λ) is the “emission” probability of the k th quantized observation, x k , at time t from state S i if the emission processes are assemply reduces to P(x k | S i , λ).

  5. 5.

    An initial state distribution or the probability of starting in a given state, i.e., π j  = P(q 1  = S j | λ).

Given the number of states N, and the number of observations M, the parameters A, B and π represent the model λ. There are three main issues [5] in order to maximize the performance of the HMM and identify the model in practical applications. These are briefly mentioned in the following. An in depth discussion on these topics can be found in [5,6,7].

2.2 The Three Basic Problems of HMM

Given the form of HMM of the previous section, there are three basic problems of interest that must be solved for the model to be useful in real-word applications. These problems are the following:

Problem 1.

Given the observation sequence O = o 1 o 2 … o T , and a model λ = (A, B, π), how do we efficiently compute P(O|λ), the probability of the observation sequence, given the model?

Problem 2.

Given the observation sequence O = o 1 o 2 … o T , and the model λ, how do we choose a corresponding state sequence Q = q 1 q 2 … q T which is optimal in some meaningful sense (i.e., best “explains” the observations)?

Problem 3.

How do we adjust the model parameters λ = (A, B, π) to maximize P(O|λ)?

2.3 Solutions to the Three Problems

Solution to Problem 1: Computing Model Probability.

The answer to the first problem is the forward-backward procedure [5]. From this procedure we can find the forward variable α t (i) and it is defined as α t (i) = P(O 1 O 2 … O t , q t  = S i |λ). This is the probability of the partial observation sequence, O 1 O 2 … O t , (until time t) and state S i at time t given the model λ. The forward-backward procedure also provides us with a backward variable β t (j) = P(O 1 O 2 … O t |q t  = S i , λ) which gives the probability of the partial observation sequence from t + 1 to the end, given state Si at time t and the model λ. Even though the backward variable is not needed for the first problem it becomes useful when solving problem 3 [5].

Solution to Problem 2: Optimal State Sequence.

Problem number 2 is a matter of finding the best state sequence that best fits with the observation. The Viterbi algorithm manages this and finds the single best state sequence, Q = (q 1 q 2 … q t ), for the given observation O = (O 1 O 2 … O t ) [5, 6].

Solution to Problem 3: Maximization of P(O/λ).

The third problem is to adjust the HMM parameters to maximize the probability of the observation sequence. If given an finite observation sequence there is no optimal way of estimating the models parameters. However, it is possible to chose λ = (A, B, π) such that it is locally maximized for P(O|λ) by using the Baum-Welch algorithm [5]. The Baum-Welch algorithm works by assigning initial probabilities to all the parameters. Then, until the training converges, it adjusts the probabilities of the parameters so as to increase the probability the model assigns to the training set [7].

The Baum-Welch algorithm (or Baum-Welch expectation maximization algorithm) makes use of both the forward variable α t (i) and the backward variable β t (j) when it determines updated parameters for the HMM. Because of this the Baum-Welch algorithm is also known as the Forward-Backward algorithm.

To properly estimate the local maximum for P(O|λ) the Baum-Welch algorithm needs several iterations. The algorithm will either be repeated a predetermined number of times, or until the local maximum is found. The local maximum is found when the difference between P(O|λnew) and P(O|λold) reaches a certain value [5].

3 Vector Quantization

Several approaches to find groups in a given database have been developed in literature, but we focus on the K-Means algorithm (vector quantization) [2] as it is one of the most used iterative partitional clustering algorithms and because it may also be used to initialize more expensive clustering algorithms (e.g., the EM algorithm).

k-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori.

As can be seen in Fig. 1 where the pseudo-code is presented, the k-means algorithm is provided somehow with an initial partition of the database and the centroids of these initial clusters are calculated. Then, the instances of the database are relocated to the cluster represented by the nearest centroid in an attempt to reduce the square-error. This relocation step (step 3) changes its cluster membership, and then the centroids of the clusters Cs and Ct and the square-error should be recomputed. This process is repeated until convergence, that is, until the square-error cannot be further reduced which means no instance changes its cluster membership [1].

Fig. 1.
figure 1

The pseudo-code of the k-means algorithm.

For the case in which we wish to use an HMM with a discrete observation symbol density, rather than the continuous vectors above, a vector quantized VQ is required to map each continuous observation vector into a discrete codebook index. Once the codebook of vectors has been obtained, the mapping between continuous vectors and codebook indices becomes a simple nearest neighbor computation, i.e., the continuous vector is assigned the index of the nearest codebook vector. Thus the major issue in VQ is the design of an appropriate codebook for quantization.

Fortunately a great deal of work has gone into devising an excellent iterative procedure for designing codebooks based on having a representative training sequence of vectors [2]. The procedure basically partitions the training vectors into M disjoint sets (where M is the size of the codebook), represents each such set by a single vector (v m , 1 ≤ m ≤ M), which is generally the centroid of the vectors in the training set assigned to the m th region, and then iteratively optimizes the partition and the codebook (i.e., the centroids of each partition). Associated with VQ is a distortion penalty since we are representing an entire region of the vector space by a single vector. Clearly it is advantageous to keep the distortion penalty as small as possible.

4 Validation on Biomedical Classification Paradigm

For our experience, consider using HMM to build a biomedical classifier or a CAMD tool. Assume we have a vocabulary of 3 classes to be recognized: Normal class – Endocochlear class – Retrocochlear class, and that each category is modeled by a discrete HMM. For this type of HMM, there exists a limited number of observations (in our case, number of clusters) which can be made.

Further assume that for each category in the vocabulary, we have a training set of k occurrences (instances) of each category where each instance of the class constitutes an observation sequence¸ where the observation are some appropriate representation of the characteristics of the class. In order to build a CAMD tool, we perform the following operations:

  1. (1)

    For each class v in the vocabulary (3 classes in our work), we must build an HMM λv, i.e., we must estimate the model parameters (A, B, π) that optimize the likelihood of the training set observation vectors for the v th class.

  2. (2)

    For each unknown class which is to be recognized, the processing of Fig. 2 must be carried out namely measurement of the observation sequence O = {o 1 , o 2 , … ,o T }, via a feature analysis of the signal corresponding to the class, followed by calculation of model likelihoods for all possible models, P(O/λ v ), 1 ≤ v ≤ V, followed by selection of the class whose model likelihood is highest, i.e., \( v^{*} = \mathop {\arg \hbox{max} }\limits_{1\, \le \,v\, \le \,V} \left[ {P\left( {O/\lambda^{v} } \right)} \right] \)

    Fig. 2.
    figure 2

    Block diagram of a biomedical database HMM recognizer.

The probability computation step is generally performed using the Viterbi algorithm (i.e., the maximum likelihood path is used) and requires on the order V.N 2 .T computations.

4.1 Background of Brainstem AEP Clinical Test

When a sense organ is stimulated, it generates a string of complex neurophysiology processes. Brainstem auditory evoked potentials (BAEP) are electrical response caused by the brief stimulation of a sense system. The stimulus gives rise to the start of a string of action’s potentials that can be recorded on the nerve’s course, or from a distance of the activated structures. ABR comprise the early portion (0–12 m-s) of AEP are composed of several waves or peaks. BAEP are generated as follows (see Fig. 3): the patient hears clicking noise or tone bursts through earphones. The use of auditory stimuli evokes an electrical response.

Fig. 3.
figure 3

Brainstem auditory evoked potentials clinical test. (Source: Ref. [3], p. 120)

In fact, the stimulus triggers a number of neurophysiology responses along the auditory pathway. An action potential is conducted along the eight nerve, the brainstem, and finally to the brain. A few times after the initial stimulation, the signal evokes a response in the area of brain where sounds are interpreted. AEP are considered the most objective measure currently available with which to determine the functional integrity of the peripheral auditory nervous system.

These response signals have small amplitude, and so they are frequently masked by the background noise of electrical activity. Indeed, the response is obtained by extraction from the noise by the principle of averaging. The firing of neurons results in small but measurable electrical potentials. The specific neural activity arising from acoustic stimulation, a pattern of voltage fluctuations lasting about one half second, is an AEP. With enough repetitions of an acoustic stimulus, signal averaging permits AEPs to emerge from the background spontaneous neural firing (and other non-neural interferences such as muscle activity and external electromagnetic generators), and they may be visualized in a time-voltage waveform.

Depending upon the type and placement of the recording electrodes, the amount of amplification, the selected filters, and the post-stimulus timeframe, it is possible to detect neural activity arising from structures spanning the auditory nerve to the cortex. Estimating hearing threshold from BAEP signals is a time consuming and labor intensive procedure, and therefore one which recommends itself to computerized automation. The important step is the classification of the signals into Response (R) and No Response (NR) classes, the main difficulties being a poor signal-to-noise ratio and the differentiation of response peaks from artifacts.

The ABR waves or peaks, labelled using Roman numerals I–VII as shown in the Fig. 4, are typically 1 ms apart and have amplitudes of about 100–500 nanovolts. Waves I, III and V are generally considered major peaks, generated by the synchronous electrical activity of the auditory nerve, caudal and rostral auditory brainstem structures, respectively, in response to onset of auditory stimuli. This test provides an effective measure of the integrity of the auditory pathway up to the upper potential level.

Fig. 4.
figure 4

Perfect AEP (Source: Ref. [3], p. 85)

A technique of extraction, presented in [5] allows us, following 800 acquisitions such as describe before, the visualization of the AEP estimation on averages of 16 acquisitions. Thus, a surface of 50 estimations called Temporal Dynamic of the Cerebral trunk (TDC) can be visualized. The software developed for the acquisition and the processing of the signals is called ELAUDY. It allows us to obtain the average signal, which corresponds to the average of the 800 acquisitions, and the TDC surface. Figure 5 (extracted from [5]) shows two typical surfaces, one for a patient with a normal audition (2-A) and the other one for patient who suffers from an auditory disorder (2-B). This figure shows the large variety of AEP signals even for a same patient. Moreover, this software automatically determinates, from the average signal, the five significant peaks and gives the latency of these waves. It also allows us to record a file for each patient, which contains administrative information (address, age,…), the results of the tests and the doctor’s conclusions (pathology, cause, confidence’s index of the pathology…).

Fig. 5.
figure 5

(Source: Ref. [3], p. 85)

TDC surfaces (A- Normal patient, B- Patient with auditory disorder)

AEP signal and TDC technique are important to diagnosis auditory pathologies. However, medical experts have still to visualize all auditory tests’ results before making a diagnosis.

4.2 Classification and Decision-Making

At first, through some examples, an important problem is emphasized to illustrate the problem difficulty of the classification in diagnosis systems. In the biomedicine application described in this section, three patient classes are studied: Retro-cochlear auditory disorder’s patients (Retro-cochlear Class: RC), Endo-cochlear auditory disorder’s patients (Endocochlear Class: EC), and healthy patients (Normal Class: NC). The AEP signals descended of the exam and their associated pathology are defined in a data base containing the files of 11 185 patients. We chose 3 categories of patients (3 classes) according to the type of their trouble. The categories of patients are the next one:

  1. (1)

    Normal: the patients of this category have a normal audition (normal class).

  2. (2)

    Endocochlear: these patients are reached of unrest that touches the part of the ear situated before the cochlea (class Endocochlear).

  3. (3)

    Retrocochlear: these patients are reached of unrest that touches the part of the ear situated to the level of the cochlea or after the cochlea. (class retrocochlear).

We selected 213 signals (correspondents to patients). So that every process (signal) contains 128 parameters, we were force to respect the values of parameters used in the work describes in the following articles [3, 4] using the LVQ and RBF neural structure respectively. 92 among the 213 signals belong to the normal class, 83, to the class Endocochlear: and 38, to the class retrocochlear. Figure 6 shows two examples of signal knowledge representations for six patients: RC, EC, and NC. Also, Fig. 7 shows image knowledge representations for the same six patients. These figures illustrate the fact that, signal or image representations could be very similar for patients belonging to different classes, and they could be very different for patients belonging to a same class, demonstrating the difficulty of their classification.

Fig. 6.
figure 6

Two examples of signal representations for RC patients, EC patients, and NC patients.

Fig. 7.
figure 7

Two examples of image representations for RC patients, EC patients, and NC patients.

4.3 Analysis of the Signal

Raw BAEPs were amplified and bandpass filtered (100–3000 Hz) to remove the EEG component and high frequency noise. A post stimulus signal of 12.8 ms was sampled at 40 kHz to give 512 data points. Since these raw signals are extremely noisy, standard procedure was to coherently average 1024 of such signals to give a single BAEP signal. This signal can be used for classification but in this study, the signals were further reduced by sampling every eighth value between 1 ms and 11 ms The resulting signal of 50 data points was normalized between 0 and 1. A data set of 321 such input signals was obtained, which included various combinations of hearing impaired and normal subjects and varying stimulus intensities.

4.4 Case Study and Experimental Results

The aim of our work is to classify the AEP signals using HMM models. In our case, the components of input vectors are the samples of the BERA average signals and the output vectors correspond to 3 different possible classes.

To construct our basis of training, we chose the signals corresponding to pathologies indicated like being certain by the physician. All AEP signals come from the same experimental system. In order to value the realized work and for ends of performance comparison with the work of the group describes in the following article [3, 4] and that uses a neural network structure basis of RBF and LVQ networks The basis of training contains 141 signals, of which 25 correspondent to the class retrocochlear, 55 to the class endocochlear and 61 to the normal class. The ratio of class sizes (the R:NR ratio) in the training set was chosen as 3:1, reflecting the approximate ratio in a clinical setting.

The test set consisted of 213 signals with the same ratio of three R signals to one NR signal. No signals from any of the same subjects used in the training set were included, which added considerably to the difficulty of the learning task.

After the phase of training, when the non-learned signals are presented to the HMM, the corresponding class must be designated.

The convergence of the k-means process has been obtained during the 11th iteration. Figures 8 and 9 illustrate the tradeoff of quantization distortion versus M (on a long scale). Although the distortion steadily decreases as M increases, it can be seen from Fig. 9 that only small decreases in distortion accrue beyond a value of M = 26. Hence HMM with codebook sizes of from M = 26 to 64 vectors have been used in biomedical database recognition experiments using HMM.

Fig. 8.
figure 8

Process of convergence of AEP basic using the k-means algorithm.

Fig. 9.
figure 9

Curve showing tradeoff of VQ average distortion as a function of the size of the VQ, M (shown of a log scale).

Table 1 presents a sample of clustering for two instances taken randomly, after the iteration of convergence. Noting that each instance has been divided in 16 windows that each has 8 parameters. When creating HMMs during this project, we used a HMM implementation for Matlab called “Hidden Markov Model (HMM) Toolbox” for Matlab.

Table 1. The result of clustering of two instances after the iteration of convergence.

We have three HMM representing respectively: HMM1: normal class; HMM2: endo class; HMM3: retro class. The final parameters of the model that represents the class of normal patients are as follows:

The most likely sequence of states and the observing probability of “Instance1” to the HMM1, HMM2, HMM3 respectively are as follows:

For the phase of generalization, with application of HMM, the basis of training has been learned correctly at the time of the phase of generalization with 100% rate. The results gotten for the recognition system of the test biomedical basis with using the HMM, are presented like follows.

  1. (1)

    98.38% for the normal class.

  2. (2)

    58.93% for the class Endocochlear:

  3. (3)

    96.15% for the class retrocochlear:

Thus, the average rate of success is of 84.48% for the totality of the data base in relation to the rate of classification of 63.7% and 62.5% of the systems using the LVQ and RBF neural structure respectively describes in the following articles [3, 4]. So the results gotten for every class with the system proposed in this paper in comparison with those of the neural networks structure, is presented in the following table (to see Table 2).

Table 2. Results of classification of the AEP

The results with a number of 26 clusters which is the number that we had to get quite satisfactory classification results are presented in the Table 2. The first observation for the results is that the application of HMM for this biomedical BDD is has proven to be very effective for both normal and retro classes with a classification rate of 98.38 and 96.15 respectively and a performance degradation for the endo class with a rate of 58.93. In fact, the signals on which we work can be very different within the endo class compared to other classes. They could be very different for patients belonging to a same class, demonstrating the difficulty of their classification. One can notice, by comparing these results with those obtained with the LVQ and RBF connectionist approaches that include an improvement of the overall performance.

5 Conclusion and Perspective

In this paper, Bayesian approaches is suggested for CAD systems in a biomedicine application: auditory diagnosis based on ABR test. In fact, the aim is then to achieve an efficient and reliable CAD system for three classes: two auditory pathologies RC and EC and normal auditory NC. Implementation and experimental results are presented and discussed.

The setting of this experimentation is off line and it remains again many to make. The results that we got only concern a qualitative approach in a static context. Nevertheless, this preliminary survey will allow us to propose other models that will allow palliating the insufficiencies of HMM.

The main idea is to define a fusion scheme: cooperation of HMM with the multi-network structure in order to succeed to a hybrid model by those providing more effective results than those proposed in this paper. On the other hand, we have studied the potentiality of HMM modeling’s application in estimating artificial bots’ behavior state in a social negotiation context in the framework of a European Sistine project [8].

The both state in that case may be neutral, aggressive or conciliate. These both is used in the development of innovative training practices for the teaching of negotiation, leading to the development of new teaching and evaluation methodologies. If such technique show a number of strong theoretical advantages, unfortunately, its implementation in a real time interactive multi-user tool still remains inappropriate.