1 Introduction

The loss or the disruptions of sleep result in sleepiness during periods when the person should usually be fully awake. The loss of even one night’s sleep can lead to extreme short-term sleepiness. The effects of sleep loss are cumulative and regularly losing one or two hours of sleep a night can result to chronic sleepiness over time

Sleep deprivation and related phenomena of excessive fatigue, prolonged inattention, hypovigilance and stress are among the key causes of serious industrial accidents such as nuclear, chemical and environmental disasters, as well as fatal driving accidents [1].

An automated sleepiness monitoring system could watch over people to make sure that the alertness and attention levels are high and warn or even take predefined measures when extreme hypovigilance is detected, in order to prevent an accident. This kind of system could increase the level of safety for everyone since it can be applied to a wide range or users, from regular drivers to sensitive equipment operators.

Several monitoring systems for the automatic hypovigilance detection have been developed over the past years. The majority of those systems focus on the diagnosis of the physiological demonstration of sleepiness, by recording and analyzing features that in most cases are related to the person’s blinking behavior.

Even though blink-related features intuitively and experimentally [2] seem to be the most suitable candidates for hypovigilance detection, studies show that these features are not accurate and reliable enough since they exhibit strong interpersonal (between persons) and intrapersonal (same person different times) variabilities. Aiming to address the limitations of the current hypovigilance detection and accident warning systems, we develop a new multimodal sleep prediction algorithm, which will be integrated into an automatic accident warning and sleep prediction prototype for drivers within the Integrated Project SENSATION (http://www.sensation-eu.org).

The major objective of SENSATION is the development of new, unobtrusive sensors, capable of providing measurements that allow the online extraction of advanced physiological features that are not currently available to the existing warning systems. The exploitation of these features will potentially allow more accurate hypovigilance detection and the development of more reliable sleep prediction systems (less false warnings).

In this paper we describe the framework for such a multimodal physiological sleep prediction system, which is based on fuzzy logic expertise and trained with the use of a real-coded Genetic algorithm. Also some preliminary results from the analysis of the training data, concerning the accident prediction effectiveness of blink-related features are reported.

2 The fuzzy expert system (FES) in general

Fuzzy logic is a research area based on the principles of approximate reasoning and computational intelligence. It departs from classical sets, logic and strict Boolean (True or False) decisions and assignments. Instead, it uses soft linguistic variables (e.g., small, medium, large), and a continuous range of truth-values in the interval [0, 1]. Fuzzy models are employed in cases where a system is difficult to model exactly (but an inexact model is available), or ambiguity and vagueness is encountered in the problem formulation.

A typical fuzzy system comprises the following key parts:

  • A rule base containing a number of IF-THEN rules.

  • A fuzzy inference unit, which performs the inference operations of the rules.

  • The fuzzification interface which transforms crisp inputs into fuzzy variables that are processed by the fuzzy inference unit.

  • The defuzzification interface that transforms the fuzzy output into a crisp number.

Expert knowledge can be “stored” in a fuzzy system’s IF-THEN rules. This transfusion of knowledge in the system can take place either by the manual definition of the fuzzy rules, or by the training of the system using training cases or patterns. After the fuzzy rules are defined, the system is capable of making inferences and its output or decision simulates the one of an expert‘s. In that way the system is called FES.

The fuzzy inference system suggested by Takagi, Sugeno and Kang (TSK fuzzy model) [12] has gained a great importance in several applications in fuzzy modelling and control. The TSK fuzzy models consist of linguistic fuzzy rules represented in the following form:

$${\begin{aligned} \,& \hbox{R}^{(j)}: \mathbf{IF} (x_{\rm p,1} \,\hbox{is}\, A^{j}_{1} )\,\hbox{AND} \ldots \hbox{AND}\, (x_{p,{\rm NPI}}\,\hbox{is}\, A^{j}_{{\rm NPI}})\\ \,&\quad \mathbf{THEN}\,y_{j} = \hbox{F}_{j} (x_{\rm c,1}, \,x_{\rm c,2}, \ldots, x_{\rm c,NCI}) j=1,...,\hbox{NR} \end{aligned}}$$
(1)

where NR is the number of fuzzy rules.

The “IF” precondition statements define the premise part while the “THEN” rule functions constitute the consequent part of the fuzzy model.

  • \({\bar{X}_{p} = [x_{p,1}, \ldots, x_{p,{\rm NPI}}]^{\rm T}}\) is the input vector to the premise part comprising NPI input variables.

  • A j i are labels of fuzzy sets describing linguistically the input component x p,i  i = 1, ..., NPI. (e.g., “low”, “medium”, “high”).

  • \({\bar{X}_{c} = [x_{c,1}, \ldots, x_{c,\hbox{NCI}} ]^{\rm T}}\) denotes the input vector to the consequent part of R (j) containing NCI input variables.

Finally, \({y_j = F(\bar{X}_{c})}\) represents the output of the j-th rule which is a function of the consequence part input components x c,i , i = 1,..., NCI. A special case of particular importance is encountered when the rule functions are linear polynomials of the consequent inputs:

$${y_{j} = F(\bar{X}_{c}) =\lambda^{j}_{0} + {\sum\limits_{i = 1}^{\rm NCI} {\lambda^{j}_{i}\,x_{{c,i}}}}}$$
(2)

where λj i are weight coefficients and λj 0 is a bias term.

Each linguistic label A j i is associated with a membership function μj i (x p,i ). These are usually unimodal functions (triangular, Gaussian, bell shaped, etc.), taking values in the interval [0, 1]. In this paper we employ Gaussian type memberships described by

$${\mu^{j}_{i} (x_{{p,i}}) = \exp {\left[ {- \frac{1}{2}\frac{{{\left({x_{{p,i}} - m^{j}_{i}} \right)}^{2}}}{{{\left({\sigma^{j}_{i}} \right)}^{2}}}} \right]}}$$
(3)

where m j i and σj i are the mean value and the standard deviation of the membership function, respectively (Fig. 1a).

Fig. 1
figure 1

a Assuming the “very long blinks duration per minute” feature is a premise input, x p,1, three fuzzy sets A 1,1,   A 1,2 and  A 1,3 can express the linguistic propositions that the measured “very long blinks duration per minute” is “Low”, “Medium” or “High”, respectively. Thus, for a specific sample x p,1 = 4 s the memberships for each of the fuzzy sets are 0.2, 0.62, and 0.0 respectively and the measured “very long blinks duration” is linguistically described as “medium to low”. b Three membership functions Ai,1(xp,i),  Ai,2(xp,i),  and Ai,3(xp,i) are used for each premise input i = 1,2, to express linguistic properties of the inputs, forming nine fuzzy regions that define the boundaries of the system’s fuzzy rules.

The firing strength of the rule R (j), representing the degree to which R (j) is excited by a particular premise input vector \({\bar{X}_{p} },\) is determined by

$${\mu_{j} (\bar{X}_{c}) = {\prod\limits_{i = 1}^{\rm NPI} {\mu^{j}_{i} (x_{{p,i}})}} .}$$
(4)

The antecedent fuzzy sets pertaining to a rule R (j) define a fuzzy region within the premise space (Fig. 1b)

$${{\mathbf{A}}^{(j)} = A^{j}_{1} \times A^{j}_{2} \times \cdots \times A^{j}_{{\rm NPI}}. }$$
(5)

Essentially, A (j) represents a multidimensional fuzzy set with a membership distribution defined by (4).

Using the notation above, the TSK rule can be brought in the following compact form:

$${R^{(j)} : \quad \hbox{IF} \quad \bar{X}_{p} \,is\, A^{(j)} \quad \hbox{THEN} \quad y_{j} = F_{j} {\left({\bar{X}_{c}} \right)}.}$$
(6)

Given the input vectors \({\bar{X}_{p} }\) and \({\bar{X}_{c} },\) the final output of the fuzzy model is inferred using the weighted average defuzzification method [12] as follows:

$${y = \frac{{{\sum\nolimits_{j = 1}^{\rm NR} {\mu_{j} (\bar{X}_{p}) \cdot F_{j} (\bar{X}_{c})}}}}{{{\sum\nolimits_{j = 1}^{\rm NR} {\mu_{j} (\bar{X}_{p})}}}}}$$
(7)

From the above description, it can be seen that the basic philosophy of the TSK model is to decompose the premise space into fuzzy regions \({{\bf A}^{(j)} }\) and approximate the system’s behaviour in every region by a simple submodel \({F(\bar{X}_{c})}.\) Thus, the overall model can be regarded as a fuzzy blending of linear submodels with simpler structure.

3 Accident prediction fuzzy expert system

Our objective is to develop a TSK fuzzy model that provides early warnings for accidents that are due to driver’s hypovigilance or sleep onset, based on physiological features. The fuzzy decomposition of the premise space should allow the discrimination between different physiological demonstrations of extreme sleepiness and address the inter-personal variability. To detect all the different signs by which people exhibit extreme hypovigilance just before the sleep onset, we have to select the appropriate physiological features that describe adequately these ways.

3.1 Selection of the physiological inputs

To construct the fuzzy model structure, a number of premise inputs x p,1, ..., x p, NPI should be properly selected. These are the decision variables that constitute the premise space and will allow the formulation of rules (discrete cases). Each premise variable will then be partitioned by a certain number of fuzzy sets that cover adequately its universe of discourse as shown in Fig. 1a.

The number of premise inputs should be as small as possible. A reasonable choice is to select one or two inputs. This is dictated by our requirement to keep the number of rules to an acceptably low level. However the great inter-personal variability of the physiological signs that characterise the phase prior to the onset of sleep may require the use of several features that will serve as FES premise inputs. There are several studies in the literature that aim to determine the appropriate physiological signals that allow hypovigilance diagnosis from a broad set of candidate inputs [5, 6]; however, most of them are inconclusive and there seems to be no golden standard in feature selection or combination of features that can lead to a fool-proof prediction system.

The physiological features that are related to hypovigilance are

  • EEG features such as alpha and theta waves,

  • eyelid activity features such as long blinks,

  • eye activity related features such as slow eye movements (SEM)

  • and pupillography.

However, since EEG and SEM data can only be acquired via electrodes, they cannot be used for online predictions due to restrictions stemming from user unobtrusiveness requirements. Because of this, EEG analysis is only used as a reference and we can only utilise eyelid activity features (blinks) that can be recorded unobtrusively with CMOS cameras.

For the proposed FES, the decision on the blink-related features selection was taken following a two-step process:

  1. 1.

    Literature review study in order to pinpoint the most promising features for the discrimination of the various behaviours prior to sleep [5, 6] and also following the guidelines over the use of various physiological signals (eyelid related, eye movement related and EEG related) for hypovigilance diagnosis and sleep prediction provided by [7, 10].

    Fig. 2
    figure 2

    Parametric analysis of the “number of long blinks” feature. The variables are the duration of a “long blink” and the number of long blinks during a 20 s window that slides every five seconds. Even though the sensitivity of the feature is good (e.g., when 5 detected blinks with duration over 0.2 s around 70% of the hits are predicted within the next 2 min), the specificity of the system is not acceptable (more than 40% of the warnings are inaccurate)

  2. 2.

    Experimental parametric analysis of the above-mentioned features using real driving data from 37 subjects [3], in order to select the features with the highest correlation to accidents (Fig. 2).

4 Genetic algorithm (GA) training of the FES parameters

The objective of the FES training is to set the values of the premise and consequence part variables in such way as to predict as accurately as possible the accidents, based on the eyelid-related features that are used as inputs. The training patterns have the following structure:

$${{\left[ {\left. {\bar{X}_{p}} \right| \quad \left. {\bar{X}_{c}} \right| \quad Y_{{\rm ACCIDENT}}} \right]}}$$

where \({\bar{X}_{p} }\) and \({\bar{X}_{c} }\) are the input vectors to the premise and the consequent part, respectively (blink-related features) and Y ACCIDENT is a binary value that indicates whether an accident happened at that moment (“1”) or not (“0”). We must note here that the accidents are filtered based on EEG and EOG analysis in order to take into considerations only those accidents that are due to hypovigilance [4].

For the training of the accident prediction FES a real-coded GA is used. For this GA implementation the parameters of the premise and the consequence parts are concatenated in order to form a genotype or chromosome which is a consolidated representation of a FES. The premise parameters are the mean values and standard deviations of the membership functions that partition the premise inputs. These variables define fully the membership functions and also set the boundaries of the fuzzy rules (the IF part of the rules).

The consequence part parameters are the λ j i i = 0,..., NCI, j = 1, ...,NR coefficients that define the output of each fuzzy rule as shown in (7).

All training parameters, as well as the training patterns’ data are normalised in the [0, 1] space. An obvious advantage of the real-coded GA over binary-coded GAs is that with the direct encoding of floating-point numbers in the chromosomes we achieve absolute precision, overcoming the critical decision of the number of bits to be used for the encoding of each FES parameter.

The training process of the FES using GA begins with the random generation of an initial population of m genotypes. The quality of the solution that a specific genotype represents is measured by calculating its fitness following the next steps:

  1. (a)

    Decomposition of the chromosome into FES premise and consequence parameters.

  2. (b)

    Calculation of FES output for each training pattern.

Equation (3) provides the memberships of the training pattern to the fuzzy sets that partition the premise inputs. Then the pattern’s firing strength for each fuzzy rule is calculated (4). Each rule has an output that corresponds to the specific pattern as is shown in (2). The overall output of the FES for the specific pattern is the weighted average of the fuzzy rules’ outputs as shown in (7). Each rule’s contribution to the final solution is analogous to the degree that the pattern triggers the specific rule.

  1. (c)

    Calculation of the chromosome’s fitness.

The FES output is compared with a threshold. The threshold is also part of the chromosome, hence trainable as well. If the output of the FES is larger than the threshold then the expert system produces an accident warning (“1”). If not, the system’s output is “0”. The outputs of the system are compared to the actual accidents and a measure of accuracy is calculated:

$${\hbox{FitnessFunction} = \frac{{1 + \hbox{shp}(\%)}}{{1 + \hbox{far}(\%)}}}$$
(8)

where shp(%) is the successful hit prediction ratio, defined as the percentage of hits that were predicted and far(%) is the false alarm ratio which is defined as the percentage of FES warnings that did not correspond to an accident up to 2 min ahead.

As it can be seen from (8) this fitness function promotes the sensitivity (promoting accurate predictions) and the specificity of the system (by minimising false alarms).

The GA is allowed to evolve for a number of generations. The evolution takes place using the well-known genetic operators of selection, crossover [9] and mutation [8]. The final FES derives from the elite solution of the GA at the final generation. Upon termination of the training process, the quality of the obtained model is verified with the validation dataset. While GA training lasts from minutes to some hours, depending on the size of the measurements database, the on-line predictions that are based on real time measurements are attained instantly.

5 Experimental results

A driving simulator experiment was carried out at the VTI simulator in Sweden in order to collect data for the development of SENSATION physiological prediction algorithm [3].

In order to acquire an objective accident reference, rumble strips were used in the simulator run (Fig. 3a). Hitting a rumble strip can indicate a critical event since it can be attributed to either inattention or sleepiness. In the latter case this is the event that should have been predicted (or detected) by the FES. Forty-four shift workers participated in the pilot programme. All drove during morning hours directly after a full night-shift with no sleep. Driving took place on a two-lane rural road with milled rumble strips both in connection to the centre line and the right side line (hard shoulder). All drivers drove between 45 and 90 min. A total of over 500 rumble strip hits were recorded. These rumble hits were afterwards assessed offline by a VTI expert in order to keep only the hits that were clearly due to extreme hypovigilance. A wide range of data was also collected: pre and post questionnaires, sleep diary 3 days prior to the test, simple reaction time test, pupillometry, driving behaviour, subjective sleepiness ratings (KSS), physiological data (EEG, EOG, EMG) and eye gaze and eye lid opening with a video based system (SmartEye). Furthermore, all driving sessions were recorded on a DVD with a 4-split screen showing the road, the driver’s face, side view of the driver and the physiological recording (Fig. 3b).

Fig. 3
figure 3

a The typical driving simulation environment. The rumble strips are located in the middle and at the edges of the road [3]. b DVD recording layout SmartEye display (top left), physiological signals from the VITAPORT 2 system (right top), driver front view—figures at top show subject number, driven distance, speed and type of rumble strip design [3]

A FES was developed and trained as described in Sects. 2, 3 and 4 using 28 driving sessions and it was tested using seven new driving sessions. The other nine sessions were discarded due to technical problems during the recordings. Since a high-speed camera was not available at that time only blink data from the SmartEyes system (60 Hz cameras) were available, namely blink start, peak, end timestamps and the blink duration (Fig. 4).

Fig. 4
figure 4

Blink induced waveform in the EOG-signal and a concurrent signal shift due to vertical eye movements. “T” is the blink duration, and “1/2” indicates half the amplitude of the waveform within the rise-part and the drop-part

From this data only conventional blinking features could be extracted for use with the FES such as:

  • Long blinks duration: the blinks in the 20 s window are filtered and only the ones lasting more than 0,3 s are kept. If the number of long blinks is larger than 2 the sum of their durations is the long blink duration feature. Else the LBD = 0.

  • Maximum closing duration: closing duration is calculated from the data as the interval between the peak and the end of the blink. The maximum closing interval registered within the 20 s window is the MCD feature.

  • Maximum interval between blinks is defined as the interval between the end of the current blink and the beginning of the next.

The parameters of the features e.g., the duration that characterises a blink as “long” were selected based on the parametric analysis described in Sect. 3 using the sensitivity/specificity ratio as criterion. Preliminary tests resulted in 92% accuracy in hit prediction accompanied by 30% false alarms (alarms fired more than 5 minutes prior to the “accident”) and as Table 1 indicates the fusion of the features results in better prediction accuracy compared to predictions that are based only on one feature.

Table 1 Prediction up to 5 min ahead (Accuracy/False alarms rate) using all hits

Even thought these results are very encouraging, there is a consideration whether using all the hits is correct for the practical application of the system. This is based on two reasons. The first is that some drivers exhibit a very large number of hits compared to other drivers that hit the rumble strips only once or twice and this results in the domination of the training and validation sets by the individuals with a lot of hits. The second reason is that the algorithm’s primary objective should be the prediction of the first hit that would result in an accident in real driving conditions.

The results for the prediction of only the first hit per driver using the same features and method are shown in Table 2.

Table 2 Prediction up to 5 min ahead (Accuracy/False alarms rate) using only the first hit per driver

It is obvious from Table 2 that the accuracy of the first hit prediction is much lower than the one achieved when taking into account all of the hits. This is because the drivers are not as hypovigilant when the first hit occurs as they are during the hits that follow and also because at this case all drivers have equal weight in the training and testing sets.

6 Future work

In future, work will focus on improving the accuracy of the first hit prediction and in order to achieve this,the following directions will be explored:

  • Study and integration of new more advanced eyelid-activity related features when they are available, in order to develop a more reliable accident prediction system. These features include amplitude, peak closing velocity as well as lid closure and opening speed [6].

  • In addition to the eyelid-related features, the eye-gaze parameter that is already recorded by the SmartEyes system may also be utilised in order to extract features related with the gaze distribution over time. Such features are the GAZEDIS and PERSAC [11] that potentially allow the detection of eye fixations which is a sign of extreme hypovigilance and microsleeps. This measurement should also allow the detection of sleep episodes for people that fall asleep while driving, keeping their eyes open.

  • A future plan is to integrate EEG features into the expert system when the online extraction of EEG features, such as alpha and theta waves becomes available. EEG is one of the primary indicators used by human experts to diagnose the level of alertness of a person. At this point and within SENSATION EEG measurements cannot be included in driver applications due to user requirements restrictions; however, there is the intention to test EEG in pilots that are related to professional drivers or crane operators, where the requirements for sensor unobtrusiveness are more relaxed while the safety requirements are stricter.

  • Finally a control set of alert drivers will be created in order to estimate the real false alarms ratio of the final system.

7 Conclusions

In this paper a FES for the prediction of accidents due to extreme hypovigilance was presented. Even though at this stage it fuses a limited number of blinking features, the architecture of such a system allows the easy integration of other physiological modalities such as EEG, gaze and more advanced eyelid activity features. At the moment the prediction results for the first hypovigilance-related hits are not satisfactory; however, it is clearly demonstrated that the fuzzy fusion of features outperforms the prediction accuracy of individual features and addresses more efficiently the interpersonal and intrapersonal differences that characterise the physiological manifestation of extreme hypovigilance.