1 Introduction

Advances in Brain Computer Interfaces (BCI) make this technology solicited by dependant people to provide another alternative to control home devices by using only brain activities [1]. ElectroEncephaloGram (EEG) signals are captured from the brain using electrodes placed along the scalp according to one of the most known standards is the 10-20 standard. Two main approaches are used in the BCI community during the validation process of the BCI system. On the one hand the offline approach, widely applied for testing, which consists of using existing data sets available on the Internet like those provided by BCI Competition [2, 3] considered as benchmarks for EEG signal processing. These data sets were recorded using the so-called cue paradigm. The subject is sitting in front of a computer screen and an arrow will appear prior to starting the recording of each trial indicating a right or left hand imagination. If the arrow points to left or right direction, then the subject should imagine moving its left or right hand respectively.

On the other hand, according to the online approach, the data is recorded and processed immediately at the end of each acquired trial. The online validation is a final step of the validation process. It is worth noting that it’s very difficult to compare the obtained results with those presented in the literature because we are not sure that the EEG acquisition process is done using the same conditions, the same environment and tools. Furthermore, subjects are not the same, subsequently, the EEG frequencies bands are so differents.

A typical BCI chain is shown in Fig. 1. The system is based on recording and analyzing EEG-brain activity and recognizing EEG patterns associated with specific brain activity; i.e. it consists of matching between the EEG data and classes corresponding to a mental task represented in imagined right and left hand movement. In order to control home devices, the user has to produce two different brain activity patterns: Left-Hand (LH) and Right-Hand (RH). Next, the acquired signal will be processed using dedicated signal processing components to decode the activity into commands enabling the artificial actuator to control basic home devices. However, the acquired EEG signal is contaminated with several artifacts derived from many factors such as bad electrode location, dirty skin, etc. [4]. Furthermore, the presence of these artifacts is also due to the interference with signals coming from other parts of the body such as heart and muscle activities. It is mandatory to remove all artifacts and enhance the signal to noise ratio (SNR) by filtering the acquired data. The filtering block aims to remove artifacts, improve the stationarity, and increase the classification accuracy. Unfortunately, pre-processing may introduce spurious informations, and could cause the loss of precious data, which might lead to a system performance deterioration. To avoid any of the above mentioned symptoms, undesirable signals should be carefully removed through one of the appropriate techniques such as FIR or IIR filters [5, 6]. Another challenge, which should be taken into consideration, is the large inter-subject variance of EEG signal properties. The assumption that the motor imagery information is located in α-rhythm and β-rhythm is not always true [7]. For this reason the band for each subject should be adjusted to maximize its classification accuracy.

Figure 1
figure 1

A typical block diagram of a BCI architecture.

Moreover, artifacts in recorded EEG signals are the consequence of any EEG contamination like muscle activity and eye blinks. Given these particular reasons, the selection of filter parameters is one of the most challenging problems for EEG processing [8], and should be realized very carefully. One of the main objectives of this paper is to provide an efficient BCI with adaptive pre-processing techniques customized for each subject to improve the classification accuracy. Many digital filtering techniques can be used to remove undesirable frequencies [9]. Unfortunately, if the same filter is applied to all subjects in the data set, its effect will depend on the subject and might skew the results.

In general, the EEG signal is described in terms of rhythmic activity split into frequency bands with respect to specific function of the brain. The most interesting EEG brain waves presented in [10] are:

  • The delta (δ) activity ([0.5-4]Hz) more related to sleep and anesthesia.

  • The theta (𝜃) activity ([4-8]Hz) describing sleep and micro-sleep stage towards drowsiness

  • The alpha (α) activity ([8-13]Hz) providing somatosensory cortex and temporal cortex acquired during reduced visual attention.

  • The beta (β) activity ([13-30]Hz) resulting from active thinking or during solving concrete problems.

Even if the proposed pre-processing approach is promising for cleaning data from contaminating artifacts and keeping meaningful data in α-rhythm and β-rhythm [11, 12], it is accompanied with significant increase of processing time. Besides, for a given data set with long EEG recordings involving many subjects, filtering techniques require significant computing capabilities. To reduce the time-related overhead for the EEG data, hardware/software solution is proposed and presented as an embedded system in which the pre-processing component is represented as a customizable co-processor controlled by an embedded soft-core processor. An FPGA-based platform has been used for the validation of our filtering approach by using offline data sets of twelve subjects.

The next step of the BCI chain is the feature extraction which has been reported in the literature including power spectral density [13], Short-Time Fourier Transform STFT [14], Common Spatial Pattern (CSP) [11, 15], wavelet analysis [14] and band power, etc. The choice of a particular technique depends basically on the application domain. For example, CSP is appropriate for the motor imagery application as it allows effectively to extract ERD/ERS effect [16]. The main idea of this technique is to design a pair of spatial filters so that the filtered signal’s variance is maximal for one class while being minimal for others [15]. Eventually, all selected and extracted features have to be classified using a high-accuracy classification component. A review of many BCI techniques can be found in [17] where the most effective and widely used classification algorithms are: Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), Neural Networks (NN), Hidden Markov Models (HMM) and Mahalanobis distance (MD).

Although the BCI theory has been well established, the implementation of hardware in real-time environment is still far below the currently available high computational signal processing operations. It is well known that the computational complexity of the home device control system using EEG signals is too high due to the large data set and complex mathematical operations [16]. Our design methodology is based on a deep software performance analysis to identify and extract critical functions and components to be moved as hardware parts and integrated with the software. The entire system is co-simulated and validated at different abstraction levels. Its performance evaluation is conducted using FPGA-based platform, which can be configured for different applications.

The proposed system-on-chip (SoC) architecture of the home device control system is implemented on Altera Stratix-IV EP4SGXFPGA chip. Two main targets of the design are set: firstly, minimum processing delays with maximum accuracy should be achieved. Secondly, a practical implementation of an adaptive filter based on the auto-selection of best filter parameters with respect to each subject, and self-adjusting of the frequency bands containing useful information are provided.

This paper is organized as follows: In Section 2, brain computer interface fundamentals and theory are presented, as well as hardware-based BCI platforms. In Section 3, we will provide the design exploration of the BCI chain with an evaluation of complexity and timing. In Section 4, the design exploration of the proposed solution is elaborated: a brief discussion of the system performance improvement and its execution time is also suggested. Finally, Section 5 concludes the paper and outlines our future works.

2 Related Work

To control home devices using brain signals, it is essential that imagery-related brain activity should be detected with high accuracy from the ongoing EEG signals. The motor imagery detection process requires sophisticated pre-processing techniques. A typical BCI scheme (cf. Fig. 1) consists of signal acquisition, pre-processing, feature extraction, classification and device command generation. Therefore, the generated output signal allows subjects to control basic home devices like automatic door locks, automatic lighting control switch, AC, etc. To interact with the above-mentioned external devices, the user should imagine left or right hand movement according to the state machine presented in Fig. 2. Thus, a right-hand EEG signal allows the user to pass to the next device, where a left-hand movement selects the current device and goes to the next state machine to complete the ON/OFF action or to come back using only RH and LH movements.

Figure 2
figure 2

State machines to control home devices.

The acquired EEG signals have to be recognized and converted into a control command through one of the following signal processing techniques:

  • The steady-state visual evoked potential (SSVEP) is based on the sensory stimulation of the visual field. The related visual stimuli, flashing at the center of the visual field, create a higher potential than those flashing at the visual field periphery. According to the pattern analysis, the SSVEP technique recognizes the stimuli gazed at by the subject. The amplitude and the phase of SSVEP have proved to be highly sensitive to stimulus parameters such as flicker frequency, contrast, spatial frequency and environmental conditions [18]. This technique doesn’t seem to be suitable for people with severe disabilities, because it requires a high degree of concentration.

  • The event-related desynchronization/synchronization (ERD/ERS), which reflects a decrease and an increase of the oscillatory activity related to an internal or external event, is considered as a subject related technique. It does not require any stimuli interaction and the power increase and decrease of the acquired EEG signal can be quantified as function of time and space providing more flexibilities for EEG signal analysis [19]. Although this technique requires a training phase, the overhead induced by the training does not affect the overall system performance, as long as it is done during the initialization phase.

  • The Movement Related Potentials (MRPs), occurs before and during voluntary movements such as stand, walk and point [20]. Two main components can be distinguished in MRPs movements, which are the late component begins within 500 ms before movements and the early component begins within 200 ms after movements. Late component consists of rapidly increasing negative potential from the contralateral primary motor cortex area. On the other hand, the early component implies a slow increasing negative potential at the vertex are of the brain.

Even if BCI users provide only slight left-right differences accompanied with artifacts, the application of advanced pre-processing techniques can enhance differences and improve BCI control accuracy via filtering. Many signal processing techniques have been widely used for pre-processing purposes based on:

  • Using time and frequency domain transforms such as: fast Fourier transform (FFT) or discreet wavelet transform (DWT). For example, FFT can be applied for each channel to perform the discrete Fourier transform computation to extract the amplitude and the phase of the ongoing EEG signal efficiently. The Fourier component located in α and β rhythm are selected for pre-processing and then the signal is reconstructed by taking the inverse fast Fourier transform (IFFT). It is quite obvious that the Fourier transform components are well localized in frequency but not in time. Wavelet coefficients provide a trade-off in time-frequency localization. This technique is successfully used for removing the undesirable signals as long as the SNR is maintained above 10 dB. The wavelet technique does not provide a good de-noising of the EEG signals contaminated with noise especially in high frequency [21].

  • Subtracting artifacts from the acquired signal: this technique requires an average artifacts template estimation to be subtracted from the original EEG signal. For instance, the average artifacts subtraction techniques (AAS) require a high sampling frequency and are just capable of eliminating repetitive artifact patterns [22]. Independent component analysis (ICA) can also be applied on multichannel EEG signal by decomposing the original one into multiple source components. Some of them which are related to the ocular activity can be discarded to remove the main ocular artifacts. Unfortunately, some non-ocular data can also be removed and still, the classification accuracy remains limited [23].

  • Using the same static filtering for all subjects like finite impulse response (FIR) and infinite impulse response (IIR) filters: FIR filters like Equiripple (F Fe ) and Kaiserwin (F Fk ) are based on Parks-McClellan algorithm using the Remez exchange algorithm and Chebyshev approximation theory to design filters with an optimal fit between the desired and the actual frequency responses [24]. The main classical IIR filters are Butterworth (F Ic1), Chebyshev type I, type II (F Ic2) and elliptic (F Iec1), where each one is optimal for a specific context. For example, Butterworth based on the Taylor series approximation provides the best representation of an ideal band-pass filter response where elliptic filters allow to get equal ripples in both the pass-band and stop-band filter limits. The Chebyshev technique minimizes the absolute difference between the ideal and the actual frequency responses over the entire pass-band by incorporating an equal ripple in the pass-band for the type I filter, and equal ripple in the stop-band for type II. The above-mentioned filters are frequently used with an order less or equal than eight providing a steep transition band and uniform ripples in the pass-band and stop-band regions. Consequently, the attenuation of the EEG signal in the stop-band region is limited to -6 dB and cannot be pushed to a greater value like -80 dB [24].

  • Using adaptive filtering techniques: the electrooculographic (EOG) artifacts are removed from EEG signal using the independent component analysis (ICA) that allowing to extract information from electrodes close to eyes [25]. Then the interference of EOG with EEG is estimated using the recursive least squares (RLS) algorithm based on the adaptive adjustment of all filters for each ICA by modifying the offset of the total band from 8 to 30 Hz to get higher accuracy. Similar, adaptive filtering techniques are used in [26] where the best band is retained by optimizing the objective function of the common spatial pattern (CSP). This technique depends on the CSP outputs and can lead to failure when the CSP does not succeed in providing the feature vector and the filtering approach becomes useless. The so called adaptive signal enhancer (ASE) is defined as an adaptive filter capable of adjusting its parameters in order to minimize the mean square error (MSE). This method is used to detect a single sweep event related potential in EEG record [27]. Adaptive recursive band-pass filter (ARBF) is employed to estimate and track the center frequency of the dominant signal of each EEG channel. The main disadvantage of the ARBF is that it only updates one coefficient in order to adjust the center frequency of the band pass filter to match the noise signal provided as an input. Thus, this technique is not suitable for unpredictable noise [28]. Even if these techniques are suitable for some subjects and for a specific data set, they cannot provide the same accuracy for other subjects belonging to other data sets [26]. To address this issue, an exploration of filter design is proposed to find an appropriate filter for each candidate based on the SNR maxima. Given a set of FIR and IIR filters, the filter type, their orders and coefficients are defined during the training phase. Furthermore, a customizable validated filtering architecture is also proposed, designed and validated using data sets from the BCI competition. In fact, the SNR providing the maximum accuracy for each subject is identified. This parameter is then used as an input for the filter design, which calculates the filter order and their coefficients accordingly. Consequently, each subject has its own filter to guarantee the maximum accuracy based on the filter design technique applied during the training phase using 5-fold cross-validation approach [29]. Then, the test can be processed for any data set.

The theoretical aspect of BCI systems have been well developed, and a few attempts to implement the complete hardware system have been reported in the literature. Lun-DeLiao et al. [30]developed a wearable mobile EEG-based brain computer interface system (WMEBCIS) for long-term EEG required for drowsiness detection. Kuo-kaiShyu [4] implemented a low-cost FPGA based architecture using the SSVEP to develop a BCI multimedia control system. The same architecture is applied to control a hospital bed nursing. Gao et al. [31] used SSVEP to control environmental devices, such as TV or air-conditioners. It is worth noting that the SSVEP systems need gaze movements. Hence, an important effort is required from the user to acquire EEG signals for such applications. Thus, the SSVEP approach seems to be inappropriate for people with concentration difficulties or with sight problems when the acquisition process becomes unfeasible. Moreover, the SSVEP approach needs fast actions from user who is directly in front of the stimulation panel [32]. To be easily used by people with severe disabilities, we propose to provide a EEG-based control system operating only by thought instead of using SSVEP approach. The proposed technique allows the user to move freely and interact easily without any constraint [33]. Our proposed design methodology is based on deep software performance analysis to identify and extract critical functions and components to be moved as hardware parts and integrated with the embedded software component. The entire system is co-simulated and validated at different abstraction levels and a performance evaluation is conducted using an FPGA-based platform.

3 BCI Approach

To implement efficiently the EEG signal processing techniques, a novel design approach is proposed as depicted in Fig. 3.

Figure 3
figure 3

The EEG filter design approach.

In this respect, the main idea consists of finding the suitable filter with appropriate filtering parameters for each subject. We perform the offline validation, where the EEG data set is divided into training and validation components distributed respectively into 60 % and 40 % based on several experiments. For each subject, the EEG-data are filtered, their features are extracted, and each EEG trial is linked to its corresponding action. The filtering block is controlled by multiplexer according to its type (from 1 to N p ), where N p is limited to 60 in our proposed system design. We explored our filter design architecture for all SNR values from 10 to 100 dB. Thus, for each subject sixty parameters have been applied to their data to find the best filter. Once the filter providing the best performance is well identified, the CSP is applied to generate the feature vector. Eventually, the classification, based on Mahalanobis Distance (MD) technique, is conducted.

3.1 Data Description

Two public data sets of the BCI competition, provided by Graz University of Technology, are used in our experiments. These data sets contain motor imagery EEG signals recorded from twelve subjects performing two different motor imagery tasks (Left Hand ’LH’ and Right Hand ’RH’). These data sets are organized as follows:

  • Data set IIa [3], from BCI competition IV: It consists of EEG data acquired from nine subjects performing four different motor imagery data, i.e., LH, RH, foot and tongue. The data have been recorded in two different sessions using 25 electrodes where three of them contain EOG artifacts. EEG signals were sampled with 250 Hz and filtered between 0.5 and 100 Hz. The recorded data for each subject contain 288 trials. During this study, we have only used EEG signals corresponding to left-hand and right-hand motor imagery (MI) tasks.

  • Data set IVa [2], from BCI competition III: This data set contains EEG signals from three subjects integrating four different motor imagery data i.e., LH, RH, foot and tongue. The data have been acquired through 60 electrodes sampled with 250 Hz and filtered between 1 and 50 Hz. The recorded data contain 80 trials for each class. For our experiments, only EEG signals corresponding to LH and RH were used.

In this respect, the data set IIa uses 25 electrodes and provides all frequency components between 0.5 and 100 Hz where the data set IVa considers 60 electrodes with the frequency components from 1 to 50 Hz only. The 25 electrodes used in the data set IIa are represented in the Fig. 4 with grey color where the 60 electrodes used in the data set IVa are colored by grey and white (all electrodes). However, both of the above-mentioned data sets have been sampled at the same frequency of 250 Hz.

Figure 4
figure 4

Position of EEG Electrodes.

Prior to initiating the EEG signal analysis, all samples are extracted from the recorded data according to the state of their associated trigger through the acquisition process. The trigger enables the beginning of each trial during the acquisition step. Moreover, it serves as an indicator to start the EEG signal analysis. The length of each trial is fixed to 500 samples (2 seconds), taken from the total period of 7 seconds representing motor imagery (MI) actions.

3.2 Filter Design

The EEG-based motor imagery classifier allows us to discriminate between right and left hand movements by analyzing particular frequency bands of brain activities. Existing methods cannot find multiple frequencies for different subjects accurately [34]. The frequencies of the received data are in the range of 0.5-100 Hz where frequencies outside α-rhythm and β-rhythm bands are removed to make sure that the detected MI is not due to any muscular activity of the arm [35]. It is essential that pre-processing steps don’t introduce any spurious information while preserving all useful data. However, if the pre-processing is inappropriately conducted, the classification accuracy will be strictly impaired. To address these issues, a new auto-selection-based approach of the best filters parameters is suggested. An automatic selection of the most suitable filter for each subject is proposed. Six types of filters are used: Equiripple and Kaiserwin as FIR filters and Butterworth, Chebyshev 1 & 2 and elliptic as IIR filters. The filter parameters used in our design are: stop band (SB), pass band (PB), bandwidth (BW) and transition width (TW), where the filter order highly depends on the above mentioned parameters. All coefficients and filter orders are identified through a filter design process so that the SNR is optimized for each subject during the training phase. Increasing the SNR in the stop-band will automatically increase the filter order as illustrated in Fig. 5A providing an accurate filter and the EEG signal will be well filtered. Subsequently, the transition width is decreased so that adjacent bands to α and β are removed [36].

Figure 5
figure 5

The order and the execution time of the pre-processing filters depending on their SNR values.

The filter design is applied for each subject during the training process, where the SNR providing the best accuracy is identified. This value of SNR is used as an input value for the filter design to calculate both the filter order, as well as their coefficients for all the proposed type of filters. Among these six different types of filters, we select the best one for the current subject and we iterate the process for all subjects. Fig. 5B presents the execution time for each filter according to the filter order. It is worth noting that more increasing the filter order, the more efficient it becomes and the more accurate its selectivity becomes too. Hence, this occurs at the expense of an increase in the execution time. Figure 5A shows the complexity of each filter as the SNR increases from 10 to 100 dB. The worst case in term of filtering delays is found for the Kaiserwin filter. The proposed EEG filter design takes 12.31 seconds to process 144 EEG trials, which is equivalent to 0.08 seconds by trial. This execution time was measured from Matlab tool running on a Laptop with a 2.4 GHz CPU.

3.3 Feature Extraction

The feature extraction represents the second step in the signal processing chain combining time-domain and frequency-domain signal features. One of the widely used algorithms to extract useful information from motor imagery data is the CSP [37]. It is applied to reduce the dimension of the feature set by selecting a subset of features to offload the work of the classifier. Formally, CSP computes the normalized covariance matrices by applying the following equation:

$$ C_{i}= \frac{EE^{T}}{trace(EE^{T})} $$
(1)

where trace(x) is the sum of diagonal elements of x, i is the index of class (LH, RH) and E is the data of each trials of dimension N s ×N Ch , where N Ch is given by the number of channels and N s represents the number of samples. Then, the overall composite spatial covariance matrix is calculated by adding the covariance matrices of each classes. In the next step, the composite matrix (C c ) is decomposed according to the following equation:

$$ C_{c}= U_{c}\lambda_{c}{U_{c}^{T}} $$
(2)

where U c is the matrix containing eigenvectors and λ c is the diagonal matrix containing the eigenvalues sorted in the ascending order. According to the Eq. 3, the whitening transform will be computed to equalize the variance in the space that is created by U c .

$$ P= \sqrt{\lambda_{c}^{-1}}U_{c} $$
(3)

The transformed covariance matrix S i∈{1,2} is obtained according to the following equation:

$$ S_{i}= PC_{i}P^{T} =B\lambda_{i}B^{T} $$
(4)

Then, the projection matrix W is obtained through to the following equation:

$$ W= B^{T}P $$
(5)

The feature vector which optimally discriminates the two classes is the N f /2 smallest and N f /2 largest eigenvectors of Z (see Eq. 6), where N f is the number of the selected features. In our case, the number of features is fixed to six.

$$ Z=WE $$
(6)

Finally, the returned feature vectors are calculated based on the following equation:

$$ F_{i}=log(\frac{var(Z_{i})}{var(Z_{1})+var(Z_{2})}) $$
(7)

CSP is complex in terms of computational loading, especially during the computation of the eigenvalues and the covariance matrix. The processing time is highly dependent on two parameters: the number of trials and the number of channels N Ch . For instance, the time elapsed during the feature extraction vector calculation from EEG trials with a dimension 500×22 is close to 75 ms measured on the aforementioned platform.

Figure 6 shows an example of ERD/ERS maps for one subject from the IIa data set on which we applied our filtering techniques, as well as the local average reference (LAR) filtering techniques. We remark that our best filtering technique offers a better segregation of the ERD/ERS distribution. For example the ERS distribution in red color is neatly localized in the motor cortex area [16, 38] increasing its classification. Furthermore, the LAR-based distribution seems to be in the middle of the color which indicates that their feature-overlapping is high.

Figure 6
figure 6

The order and the execution time of the pre-processing filters depending on their SNR values.

3.4 Classification

The obtained feature vectors are then classified using the Mahalanobis distance (MD). Thus, the statistical distance function MD is minimized to classify the EEG pattern for one of two classes. The MD avoids the limitation of the linear classifiers based on Euclidean metric, since it automatically computes the correlation between two different features [17, 39]. The training phase starts by calculating the mean vector μ R and μ L corresponding to the average of the features for right hand and left hand respectively. The MD is computed according to the following equation:

$$ {d_{i}^{2}}=(F_{v}-\mu_{i})^{T}{\Sigma}^{-1}_{i}(F_{v}-\mu_{i});i=\{R,L\} $$
(8)

where Σ i is the covariance matrix for the imagined movement under consideration (left or right hands) and T is the transposition operator. Thus, to classify the incoming new feature vector F v , the Mahalanobis distance d i from the mean of F v is measured and the current feature vector is assigned to the class with the minimal distance. The processing time of the MD algorithm is highly depending on the number of trials and the number of features provided by CSP. The execution time of the classifier is highly correlated with the number of features as shown in Fig. 7.

Figure 7
figure 7

The execution time of the classifier under Matlab.

Our proposed adaptive filter approach recognizes the best filters parameters based on the class label available only during training. Indeed, the system performance is measured during the training phase according to the 60 filter parameters as mentioned in Fig. 3. The maximum accuracy Pa(maxin%) for each subject is obtained after selecting the best filter, where their selected parameters are used during the run-time for each subject. We notify that the proposed system can be self adjusted by the user when the system provides many wrong classifications during the test phase by re-initiating the training phase.

4 Embedded EEG-based BCI System (3EGBCI)

To build the design, the Altera environment organized around the Nios-II soft-core processor has been used according to the work flow presented in Fig. 8. We started with high level programming of the EEG-based signal processing techniques using the Matlab environment to validate the system. A multi-subject data set was used to check the accuracy of the proposed architecture after interconnecting all filtering, feature extraction and classification components. Then, the EEG-based BCI Matlab-code has been migrated to an embedded system architecture integrating both hardware and software components, with the hardware part developed using Verilog language and the software component built according to the ANSI-C language. The design and implementation of a BCI system in the context of SoC architecture requires considering several issues with reference to the following design flow:

  • Hardware design step: in this step, the embedded system-based hardware architecture is defined. The available Nios-II family of the embedded core processors implements a common instruction set architecture. Moreover, each instruction is optimized for a specific price/performance point and supported by the same software tool chain. Altera provides three versions of core processors: Nios-II/s standard implementing a smaller processor with a limited performance, a Nios-II/e economic version designed to use the fewest FPGA logic memory resources and the Nios-II/f fast version with high performance over 300 MIPS. Our proposed architecture incorporates a fast version of the Nios-II core processor with on-chip memories and an appropriate interfaces to interconnect co-processors to the standard bus interface provided by Altera. This interface is exclusively used to build all Altera system design to simplify the interconnection and to manage the communication within a complex architecture including a multiprocessor organization.

  • Software design step: this step consists of the design of the embedded software. The BCI soft-core application is developed using ANSI-C language and is primarily developed on the Instruction Set Simulator (NiosII-ISS) via the Nios-II IDE environment of Altera. Once simulated and checked, the code is integrated on the FPGA to be executed by the Nios-II processor within the FPGA. Since the proposed system is designed to work in real time, it is important to have a powerful software real-time package to accelerate the execution time of the embedded software. Indeed, the developed ANSI-C code is combined with GNU Scientific Library (GSL) within our embedded architecture. In fact, GSL is an open and free C library providing a wide range of mathematical routines that help us to encode complex operators such as covariance, eigenvalues, generalized eigenvectors and inverse matrix. All the 3EGBCI blocks have been implemented using ANSI-C. Prior to exporting the code into the embedded system, it has been checked on an Intel-based processor platform and its performance was evaluated in terms of the execution time and its classification accuracy.

  • System integration step: both the FPGA-based hardware architecture and the software code have been integrated within the same platform. The code runs on the Nios-II processor within the FPGA. The critical parts of the proposed embedded system are identified and subsequently exported as hardware modules or co-processors to improve the performance of the system. Further optimizations have been performed on the system level architecture (memory organization, cache optimization) to provide the best accuracy and timing performances. The cache optimization is done during the configuration of the Nios-II processor. Thus, the size of the cache memory have been increased from 16 to 64 KB to ensure that all program data are manage correctly on the Nios II processor. This management is done using the data cache flushing and bypassing facilities to move data between the shared memory and the data cache as required. For our system architecture, we fixed the size of the cache memory to 64 KB and we enabled the burst transfer option to accelerate the data transfer between the Nios-II and all remaining components of the architecture.

According to our design flow, after validating the developed code, the compiled GSL library and the developed ANSI-C code are then integrated into the Eclipse environment of Altera to be uploaded onto the Stratix-IV Altera platform. To demonstrate the interaction between the hardware and the embedded code, the Stratix-IV EP4SGX230-KF40C2 has been configured to support our system-on-chip design.

Figure 8
figure 8

The proposed design flow for the 3EGBCI system.

5 Experimental Results and Discussion

The target architecture is based on the FPGA technology built with the Altera environment and dedicated integrated tools such as: Qsys for the hardware design components and Eclipse for the embedded software development. Figure 9 shows the organization of the proposed embedded system which includes:

  • The fastest version of the Nios-II, data cache with a size of 64 Kbytes and 4 Kbytes instruction cache.

  • A timer to measure the execution time, with 32-bit counter, and timeout period of 10 microseconds.

  • JTAG-UART to establish communication between Eclipse and the Stratix-IV board.

  • DDR2 memory with 1 GB size.

  • DMA (Direct Memory Access) transfer data as efficiently as possible, reading and writing data in the maximum space allocated by the source or destination.

  • On-chip memory with a size of 4 KB to synchronize data transfer between source and destination through the DMA interface.

  • PLL for clock generation and synchronize system design.

Figure 9
figure 9

The embedded software solution.

Once the components and their connection are added in Qsys, the HDL files have been generated to implement the instances of each IPs in the SoC. Thereafter, the Quartus-II tool has been used, which is an integrated synthesis and place & route engine to get the virtual EEG-System prototype.

As previously mentioned, the 3EGBCI system is trained with BCI competition data sets [2, 3] to customize pre-processing techniques for each subject and provide high accuracy for all LH class or RH class samples. The system is validated on both nine subjects and three subjects data sets. The EEG signals of each subject are applied to the prototyping EEG-based system, whereby each trial is classified according to a two-label approach. The accuracy is computed as the ratio of test samples classified correctly by the algorithm over the total number of trials with respect to the given data set [40].

5.1 Software Results

The EEG data set is uploaded into the DDR2 of 3EGBCI system using a 16-bits format. Therefore, the procedures for real-time signal processing of the home device system controller are as follows: all trials are extracted from the data set based on the trigger for the duration of 2 seconds. The 500 uploaded EEG-samples belonging to one out of 144 trials are filtered by the appropriate filter before building the feature extraction vector using the CSP technique. It is worth noting that all coefficients of the adaptive filters are stored in the cache memory of the Nios-II processor. For comparison purposes, the outputs of the feature extraction block have been checked and compared with the Matlab tools results. An evaluation based on statistical feature analysis is conducted for both Matlab and C codes. The above mentioned statistical features are: standard deviation, mean and smoothness. Thus, all these measurements provided approximately the same values (with an error of 10−3) for both Matlab and C code. Finally, the extracted features are used to estimate the physiological state index for sending out the appropriate command to control home devices. The interaction between the hardware Stratix-IV platform and the software is used via the Nios-II Software Build Tools provided by Altera. We have also used the GCC compiler dedicated for Nios-II to compile both GSL library and ANSI-C code. Communication between the embedded system and the Eclipse console is performed through the JTAG interface to show: the results of the classifier, the system accuracy and the time spent for each 3EGBCI block. For instance, classification results obtained with subject number-1 are presented in Fig. 10 for different filters. All accuracy values fluctuate according to the filtering technique, and its best value is obtained with the Kaiserwin filter. Additional results are shown in Table 1 representing the evaluation of the adaptive filtering technique effects for each subject to reach the required predefined SNR value to maximize accuracy. These results confirm the variability of the bandwidth limits due to the intrinsic characteristics of subject’s EEG signals [15].

Figure 10
figure 10

The Nios-II accuracy for subject 1.

Table 1 Summary of accuracy (%) by subject for different filters

To complete the proposed classification system evaluation, the information transfer rate (ITR) has been measured for all EEG signal processing of the system during the off-line validation process. The ITR can be expressed by Eq. 9 as in [41]:

$$ ITR=L[p_{s}log_{2}(p_{s})+log_{2}(N_{t})+(1-p_{s})log_{2}(\frac{1-p_{s}}{N_{t}-1})] $$
(9)

where L is the number of decisions per minute, and p S the accuracy of the decision made for the N t targets. A set of metrics are used to evaluate the performance of our proposed system, as well as similar architectures based on P300, SSVEP and motion paradigms, including: ITR, and scores incremented by one when the symbol selected by the system matches with the target symbol. As depicted in Table 2, our solution provide about 94.47 % for the average accuracy of twelve subjects with ITR close to 20.74bit/min. Table 3 presents the best filter for each subject providing the highest classification accuracy. These parameters are obtained during the training phase to be fixed when testing the remaining data. Furthermore, the ITR seems to be reasonable compared with values reported in similar works [42, 43]. The execution time is evaluated on the embedded software solution running on the Nios-II soft-core processor operating at the clock frequency of 250 MHz. As shown in Table 5, the execution time for both feature extraction and classification are quite similar where the pre-processing is time consuming and can be considered as a critical part which can be potentially reduced to enhance interaction between the proposed embedded system and its environment. For this purpose, a highly accurate internal timer of the Stratix IV board has been used to achieve accurate evaluation of the EEG-based embedded system execution time. For a given EEG sample, the system takes about 0.941 seconds to decide if the current sample belongs to LH or RH class, where the critical path is given by the filtering block. To reduce the above mentioned execution time, the filter block is implemented in HW as a co-processor. The complexity of the design in terms of hardware resources is shown in Table 4 after synthesizing the design by Quartus II tools. The consumed look up table resources (LUTs) is close to 5 %, whereby only 4 % of block memories have been used. In addition, about 1 % of DSP blocks have been used, leading to a low complexity embedded software solution. This has allowed us to integrate hardware IP to accelerate the execution time presented above.

Table 2 ITR comparison of different applications in BCI competition [42].
Table 3 Summary of the best filter parameters and accuracy for different subjects
Table 4 Resource utilization of the Stratix IV FPGA using pure software approach
Table 5 Processing time (ms) and power consumption (W) by trial using embedded software and Matlab tools.

5.2 Hardware/Software Issues

To accelerate the system, EEG filters have been implemented in HW as a co-processor of the Nios-II. The FPGA hardware accelerator requires the transfer of many kilobytes of EEG data between the Nios-II processor and IP accelerator. To address this issue, a DMA on the FPGA accelerator has been designed to increase data transfers. With one DMA interface, additional on-chip memories were needed to synchronize data transfer and avoid any loss of data [44]. The data transfer is done according to a specific protocol, as shown in Fig. 11, to synchronize data transfer between the processor and the slave FIR and IIR components.

Figure 11
figure 11

Communication protocol between Nios and FIR/IIR IPs.

In order to obtain a compact implementation, intensive computations are conducted using the fixed point coding. The bit length of the filter coefficients and EEG signals is 16 bits, providing a fixed-point representation with 4 bits for the integer part (the amplitude values of the EEG signals are very small) where the remaining 12 bits are dedicated for the fractional one. Consequently, the error measured with this encoded data is close to 10−5, which is quite reasonable for our application. Thus, the design has been extended by adding new hardware components which are FIR and IIR filters as shown in Fig. 12.

Figure 12
figure 12

The 3EGBCI Nios-II-based embedded system.

FIR and IIR filters are widely used in EEG processing. A FIR filters with T taps is given according to the Eq. 10 and IIR filters with T taps represented by the Eq. 11:

$$ y[i]=\sum\limits_{k=0}^{T-1} b[k]x[i-k] $$
(10)
$$ y[i]=\sum\limits_{k=0}^{T-1} b[k]x[i-k]-\sum\limits_{k=1}^{N-1} a[k]y[i-k] $$
(11)

where x[i] is the i th value of the EEG signal, a[k] and b[k] are the k th coefficient and y is the output. All filter parameters such as the number of taps, the sampling frequencies, the size of the time window of the trials and the SNR on the stop-band, have been considered and adjusted adaptively to enhance their performance. The number of taps of the FIR filter is fixed to 500 according to trial length, while the number of taps of the IIR filter is maintained at 128. When the EEG signal requires a filter with an order less than 128, all upper coefficients are completed by zeros using the same architecture. The hardware accelerator FIR and IIR have been designed in Verilog. The ModelSim tool has been used to verify whether the design behaves correctly. Figure 13 shows the internal architecture of the two proposed filter banks. During the initialization phase, the two accelerators receive N c coefficients during the N c first clock cycles. These filters are then launched in parallel to take full advantage of the FPGA resources. Once the synchronization process of the design is successfully completed, a software-related development that involves the design of a frontend and backend driver along with a real device driver for the FPGA accelerator is conducted [44]. Communication between the Nios-II processor and the filter accelerator is improved using an integrated DMA interface, which allows a direct access to the DDR2 memory for EEG data retrieval and storage. Through these actions, the Nios embedded system has been offloaded so that it can perform others tasks.

Figure 13
figure 13

Internal architecture of adaptive filter.

Furthermore, to reduce the power consumption, the overall design frequency has been reduced. Moreover, the system has been accelerated by hardware IPs in parallel with Nios-II to make a fast decision for each trial. The new design organization provides hardware (HW) and software (SW) components running over the master Nios-II processor [45]. The new HW/SW design has been downloaded on the Stratix-IV board to measure timing enhancement and evaluate all BCI functionality using adaptive filtering. The results, presented in Table 5 show that the system spent approximately 0.4 seconds instead of 0.941 seconds in full software implementation case. Furthermore, the delay associated with the filtering module is decreased by a factor of 80. Despite smooth timing constraints for real-time application, system on chip processing time is shorter than the execution time of several other embedded BCI systems.

As shown in Table 6, an EEG-based smart living environmental control system takes about 2 seconds to estimate the physiological state [46]. Second, to control hospital bed nursing system, the system proposed in [47] takes about 5.2 seconds to process one trial while it is based-on a very simple algorithm and uses only 3 channels. Furthermore, other works mentioned that the timing spanning to process one EEG trial based-on 5 channels is about 3 seconds [48]. For comparison purposes and to show the benefits of BCI embedded implementation, the system has been launched on an Intel platform running with a clock frequency of 2.4 GHz. Table 5 shows neatly the advantage of the embedded implementation by comparing the execution time of Matlab and C-code that are running on the same environment. With the ANSI-C version, our system is 21 times faster than the previous solution, despite a small decreasing of system accuracy caused by the CSP algorithm when its eigenvalues have been calculated. As shown in Table 5, the execution time of the 3EGBCI system based-on Nios-II increases rapidly up to 1500 times more than using Intel-based platform. This cost is primarily due to the difference in the clock frequency, with a factor of 16 between them. Furthermore, the architectures of the two processors are completely different, which can be evaluated based on the number in millions of instructions per second (MIPS), The Nios-II has 300 MIPS, while the Intel has 145700 MIPS. Finally, the HW/SW implementation decreases the time of the software version leading to a reduction in the power consumption of the system. Thus, the 3EGBCI based HW/SW version runs with a clock frequency of 150 MHz and the estimated static power consumption is about 1.067 Watt, where the software implementation running at 250 MHz clock rate provides an estimation of the power consumption close to 1.77 W. However, these accelerations are done at the cost of an increase in the FPGA resources as presented in Table 7. We notify that for the HW/SW solution, the resources in term of ALUTs, registers and DSPs are 15.29 %, 22.15 % and 50 % respectively compared with pure software solution resources presented in Table 4. The effect of this increase is limited since the design does not exceed 50 % of the overall FPGA resources.

Table 6 Comparison with existing BCI application.
Table 7 Resources utilization of the embedded 3EGBCI system.

6 Conclusion

In this paper, we have proposed a HW/SW implementation of entire brain computer interface chain including training and classification steps. The design exploration is performed for a home-self application allowing people to control home devices by thought using two motor imagery actions. Our results show a clear improvement in the system performance by integrating adaptive filters controlled by an adaptive process to select the appropriate filters parameters. We have considered the co-processor approach to export this component as hardware due to its critical time in the Nios-II processor. The proposed system is verified and validated on public data set from the BCI competition executed on the Stratix-IV development kit (EP4SGX230KF40c2). The timing evaluation shows that the processing delay of one trial is approximately 0.399 seconds after integration of the adapted filter as a hardware component instead of 0.94 seconds as a software component. Thanks to its significant improvement of the classification accuracy, processing speed and embedded and low development cost, this programmable hardware makes a good BCI platform and creates new research opportunities and interests in further useful applications related to disabled or severe impairment people.

As a future work, we will extend the 3EGBCI system to support three classes instead of two by adding foot or tongue thought. This will facilitate the navigation on the state machine and will allow a better control (multiple parameters) of home devices such as: HDTV, AC, etc. Furthermore, the proposed HW design could be exported as an ASIC to minimize the size of the system, as well as to decrease its power consumption. Finally, an on-line evaluation will be done on the proposed system by connecting an acquisition board and measure the system performance on many subjects.