Keywords

1 Introduction

The growth in the BCI research groups, journals, conferences, articles and number of attendees are evidences of the speedy growth the research field. Apart from these evidences, numerous projects are approved by different companies to develop BCI related applications. They also have announced their roadmaps to collaborate with different research groups for the development of BCI-based applications.

There are many annual conferences, workshops and seminar, which transmit latest developments in the field and give platform to prominent scientists to present their research projects such as National Center for Medical Rehabilitation Research of the National Institute of Child Health and Human Development of the National Institutes (USA), international conferences on Multimodal Interaction (ICMI), the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Intelligent User Interfaces (IUI), IEEE Transactions on Neural Systems and Rehabilitation Engineering, Journal of Neural Engineering etc.

An incursion of researchers from assorted disciplines, including rehabilitation, psychology, computer science, mathematics, medical physics, neurology and neurosurgery and biomedical engineering is the justification behind the unusual growth of BCI research. Brain-Computer Interface is at the Innovation Trigger stage of the emerging technology mega-trends in the Gartner’s 2018, 2017 and 2016 Hype Cycle. The predictions in the Gartner’s Hype Cycle suggest that mainstream embracing will occur in more than 10 years for BCI research. This phenomenon is captured in Fig. 1.

Fig. 1
figure 1

Gartner’s Hype Cycle

The requisite knowledge of complex BCI designing involves BCI modes of operation, experimental strategy, signal recording, types of measurable brain signals and feedback system [1,2,3,4,5,6]. The type of BCI can be divided on the basis of their mode of operation like synchronous or asynchronous, exogenous or endogenous. An exogenous BCI uses brain signals generated by the brain in the presence of external stimuli like visual or auditory stimuli that can elicit large response in the form of neuron activity. Steady State Visual Evoked Potentials (SSVEPs) and P300 are the example of control signals used by the exogenous BCI. Therefore the response of the exogenous is spontaneously generated brain patterns which don’t require extensive user training. The advantages of such systems are less minimal training to user, single channel recording, easy and quick set-up of control signals, high information transfer rate. However user has to be more focused during the training phase which may cause tiredness, fatigue. In contrast with exogenous BCI, an endogenous BCI uses the self-regulated brain rhythms and potentials generated in the brain without any external stimuli. User needs extensive neurofeedback training to learn to generate specific brain patterns. This category of BCI is directly dependent on the user’s will and capability of learning the patterns. The endogenous BCI is beneficial for cursor control application using brain activity and for the users with sensory disabilities. The other criteria for the bifurcation of BCI systems is input data processing modality i.e. synchronous or asynchronous BCI. Synchronous BCI systems are the cue-based systems in which first set of features extracted and processed, then only another set of features are allowed to be extracted and processed. A predefined time window is decided and the signals belongs to that window are analyzed first. This system allows user to send commands only in predefined time frame. Regardless of user ability of modulating his/her brain signals, early and accurate detection of the control task can be acquired by using cueing process. This results into increase in confidence, sovereignty and interest of the user while taking the training of BCI skills. Beside easy and simple designing and evaluation of synchronous BCI as compare to asynchronous BCI, synchronous BCI is not very helpful in real world set-ups. However asynchronous or non-cue based BCI offers more practical approach for human-computer interaction. It does not require any sequence to extract and process the feature set. There is no predefined time frame for accepting and processing the feature set. User can act more normal and can initiate the communication by his/her will. It is also known as self-pace BCI. Independent of cue, this BCI system continuously analyzes the user’s brain activity which leads it to real world set-ups.

Invasive BCI uses surgical implantation of microelectrode arrays inside the grey material of brain. Electrocortigography (ECoG) and Intracortical Neuron recording are the two invasive modalities in BCI research. Furthermore, in electrocortigography or intracortical neuron recording microelectrodes are placed on the surface of cortex. It could be Epidural Electrocortigography in which electrodes placed outside the dura mater or Subdural electrocortigography in which electrodes placed under the dura mater [1,2,3,4,5,6]. On the other hand, Intracortical neuron recording places the microelectrodes inside the cortex. Both the modality involves significant risk of infection and tissue damage in brain. Also scar-tissue build-up leads to issues related to long term stability. Though invasive modality leads to reasonable risk, it provides high quality of signals, very good spatial resolution and a higher frequency range.

Non-invasive BCI does not require any excruciating surgical procedure. The electrical activity generated by the millions of neuron can be recorded by placing small disc shape sensors known as electrodes on the scalp. This conventional and cost effective method has been used successfully in clinical and BCI research settings. It records signals at good temporal resolution i.e. change in signals within a specific time interval. However, the spatial resolution and frequency range is limited due to brain and non-brain artifacts. This results in the decrease in signal to noise ratio (SNR) as the frequency increases.

The brain computer interface is not a solitary mission. This vast multidiscipline endeavor includes neurology, concepts of instrumentation engineering and brain activity measurements, signal processing, computer science algorithms and statistical methods for brain activity pattern identification, training and feedback to the user.

2 Brain Anatomy

Most imperative part of the BCI systems is human brain. With advancement in the neuroscience researches, researchers are able to describe the complex structure and functions of the human brain. It is indispensable to know the anatomy of human brain, its different activities, measurable signals and prerequisite of BCI design.

2.1 Essential Brain Anatomy Brief

The human’s central nervous system is consists of brain and spinal cord. The peripheral nervous system connects the central nervous system to rest of the body. Human brain is the center of the whole body. It gives the instruction to other body parts like sensory organs, other organs, muscles, glands, blood vessels through peripheral nervous system. The anatomy of brain divides the brain into cerebrum, cerebellum, and brainstem. The largest part of the brain is cerebrum which is composed of left and right hemispheres. Both of the hemispheres are connect to each other via corpus callosum (collection of white matter fibers). These hemispheres are further divided into four lobes known as: the frontal lobe, the parietal lobe, the occipital lobe and temporal lobe. The different responsibilities of these lobes are given in Table 1. Interpreting touch, vision and hearing, speech, reasoning, emotions, learning, and fine control of movement is associated with different locations on the cerebrum. Maintaining the body balance and body posture, coordination of muscles movements are the functions of cerebellum. It is located under the cerebrum. The last but not the least is brainstem which connects the cerebrum and cerebellum to the spinal cord. The heart rate, breathing body temperature, digestion, sneezing, wake-up and sleep cycles, vomiting, swallowing, coughing are main functions of brainstem.

Table 1 Different responsibilities human brain lobes

Billions of neurons in human brain connected via thousands of synapses generate an electrochemical pulse called as action potential. This potential can be measured as electrical waveform known as brain wave or brain rhythm. These brain waves transmit the information via a specialized connection synapse to neighboring neuron which is received through dendrites connected to that neuron. In this way brain forms a dynamic neural network every time brain experience new facts or new remembered event. This network grows stronger with increase of transmission of signals between the neurons. Other than electrical signals, human brain contains thousands of neurotransmitters molecules in vesicles of axon, which amplify relay and modulate signals between neurons. Glutamate, GABA, acetylcholine, dopamine, adrenaline, histamine, serotonin and melatonin are some common neurotransmitters of human brain. These chemical messengers help the brain wave to travel through neurons and information transmission is between the neurons achieved with the help of chemicals.

3 Brain Computer Interface

In 1999, First International meeting on Brain Computer Interface technology [5] took place in USA with 50 participants from 22 research group. BCI taxonomy, methods and approaches had proposed in review. Two main following approaches had discussed: (1) Operant Conditioning Approach, (2) Pattern Recognition Approach. Former approach considers the self-regulation of brain potentials or rhythms. The thought-translation device (TTD) developed in 2003 by authors [3] was based on self-regulations slow cortical potentials (SCP). The author’s Wolpaw et al. [7] also used the self-regulations of brain rhymes for BCI. In this approach, no stimuli is present to user and user should know the real time feedback, enforced correct behavior according to the feedback and right training to user [8]. The later approach i.e. pattern recognition approach for BCI uses different mental task which activate potentials at specific cortical area of brain. These mental tasks include motor imagery tasks, arithmetic baseline tasks, visual tasks, and speech and emotion task. Different mental task activate the different patterns in EEG closed to the cortical areas detectable by scalp electrodes. Many BCIs [9,10,11,12,13] are based on this approach.

3.1 BCI Components

Figure 2 demonstrates the typical framework of brain computer interface comprising signal acquisition, pre-processing of acquired signals, feature extraction and selection, classification of these features into control actions and finally feedback to user for training of their minds. The orchestration of these components decides the performance measure of whole brain computer interface. The feature extraction, feature selection and classification can be replaced by deep learning algorithms too [14]. The following section will demonstrate each step in detail.

Fig. 2
figure 2

Typical framework of brain computer interface

3.1.1 Signal Acquisition

There are different types of signals comprise of thermal, mechanical, electrical, chemical metabolic and magnetic activities inside the human brain generated due to intrinsic ignition. These signals can be recorded and become basis for alternative modes of communication and control. As discussed earlier, brain signals can be acquired by three methods (1) Non-invasive, (2) Partially invasive and (3) Invasive acquisition of signals. Figure 3 demonstrates positioning of electrodes on human brain according to acquisition method. Only non-invasive method does not involve any surgical procedure while others requires surgical procedure to place the electrode inside the skull. Scalp Electroencephalography (EEG), epidural electrodes and electrocorticography (ECoG), Local Field Potentials (LFPs), intracranial EEG (iEEG) are different methods to measure the electrical activity of human brain. Magnetoencephalography (MEG) [15] is the neuroimaging technique to measure magnetic fields produced by the electrical activity of the brain. The blood flow inside the brain also creates the neural activity which again can be imaged using functional magnetic resonance imaging (fMRI) and positron emission tomography (PET). Magnetic resonance spectroscopy (MRS) measures the chemicals (neurotransmitters) produces by the neural activity of brain. Invasive and non-invasive are two approaches of acquiring the brain signals [1,2,3,4,5,6].

Fig. 3
figure 3

Brain’s electrical activity acquisition methods

3.1.1.1 Electroencephalograph (EEG)

Among all the various methods, EEG is most explored and experimented method for BCI systems. Electroencephalography is a non-surgical method used for measuring the electrical activity generated inside the brain. The temporal resolution of EEG is in milliseconds or better which is very good in terms of signal processing. But the spatial resolution is poor and in the range of centimeters. Spatial resolution depends upon the number of electrodes placed on the scalp. The position of electrodes also referred as channel and the distance between these channels is in few centimeters. The available EEG recording cap uses maximum 256 channels for recording. The amplitude and frequency are two basic features to characterize the EEG signals. The amplitude of EEG signals vary between 10 and 100 µV and frequency ranges between 10 and 1000 Hz. EEG patterns can be tracked above 256 Hz sampling rate and its frequency component ranges approximately between 10 and 100 Hz [16,17,18,19]. Figure 4 gives a glimpse of International 10/20 Standard for 64 + 2 channels EEG placement positions [20] for signal acquisition.

Fig. 4
figure 4

International 10/20 standard for 64 + 2 channels EEG placement positions [20]

The electrical activity never stops as brain remains active always even when one is in sleep or unconsciousness. However, it does not mean that there would be general patterns. Brain waves are so irregular most of the time. According to Allison [21] activity of any neural network makes a pattern or visible in EEG signal if the following prerequisites meet: (1) the sign of electrical activity produced by each neuron should be same; (2) the specific axis of electrical activity generated by most of the neurons should be perpendicular to the scalp; (3) neuronal synchrony of neurons should be high; (4) neuronal dendrites should be aligned in parallel to summate the potential which results into a production of signal and this signal could be detectable at some distance. Therefore finding patterns for neuronal communication is a complex task. Nevertheless, there exist some characteristics of EEG, which could be the basis of BCI system: (1) rhythmic brain activity; (2) Event-related potentials (ERP); (3) Event-related synchronization (ERS) and Event-related desynchronization (ERD) [1,2,3].

Brain Rhythms

Brain is always working and depending upon the perception level, it shows different rhythmic activity. The rhythms are affected by thoughts and preparation of actions, for example eye blink can attenuate particular rhythm. The reality that sheer thoughts distress the rhythms can become the basis for the BCI system. Different brain rhythms can be identified in EEG with different range of frequencies [22]. They have given Greek letters delta, theta, alpha, beta, gamma, and mu (δ, θ, α, β, γ, and μ) to represent them. The order and meaning of letters is not logical. Figure 5 is demonstrates different brain wave patterns available in brain electrical activity [23].

Fig. 5
figure 5

Different brain waves [23]

The delta wave can be recorded from 0.1 to 3.5 Hz of frequency range and with amplitude of 50–100 μV. This irregular rhythmic activity has found in infants (around 2 months) in waking stage. In adult’s delta rhythm found only in deep sleep stage and below 3.5 Hz of frequency range. Hence this wave is not useful in BCI research. Next in the queue is theta wave whose frequency and amplitude ranges from 4 to 7.5 Hz and below 100 μV respectively [24]. It can be recording on the frontal midline area on scalp. It rarely found in children of age two or below in waking stage. In adults theta waves can be recorded in drowsiness and during the sleep especially in females. It can be blocked by the eye opening and disappear with the occurrence of alpha activity. It had been used in different applications like Quadcopter [25]. Alpha wave has already been used in many BCI applications. Its frequency ranges from 8 to 13 Hz and its amplitude varies but stays below 50 μV. It appears in EEG mostly over the posterior regions of the brain, mostly on the occipital areas. It can be seen clearly in EEG during the conditions of physical relaxation and relative mental inactivity. It can be attenuated by attention especially due to visual attention. Other important brain rhythm is mu rhythm. Its frequency and amplitude is same as alpha wave (10 Hz and below 50 μV respectively) but topographically and physiologically dissimilar from later one. This wave is present over the precentral motor cortex basically at EEG C3, Cz and C4 electrode placement [7]. It can be blocked or attenuated when person perform motor activity or after training when person visualizing the motor activity. Instead of suppression, it shifts from ideal state to high frequency when motor action is performed. These facts about mu rhythm make it important in BCI research. Beta rhythm comes next in the list which ranges from 13 to 30 Hz and amplitude is around 30 μV. Beta is present over frontal and central region of brain. It is again divided into beta 1 (13–20 Hz), beta 2 (21–30 Hz) and gamma (30–60 Hz) [26]. Beta waves involve in conscious focus, problem solving, memorizing and tend to have a simulating effect. In adults it can be observed in awaken state while thinking and logical reasoning. It also plays an important role in BCI research. The summary of the brain rhythms are listed in Table 2.

Table 2 Different brain rhythms [29]

Pineda [27] studied the use of the mu rhythm in BCI and concluded that “mu rhythm is not only modulated by the expression of self-generated movement but also by the observation and imagination of movement.” Wolpaw and McFarland [28] have used the self-regulation of the mu rhythm or central beta rhythm amplitude in their BCI.

3.1.1.2 Event Related Potentials (ERP)

Event related potential recording technique is useful for human electrophysiology research. It has good and precise temporal resolution which can be the basis for testing the theories of perception, attention and cognition that are unobservable with behavioral methods. It allows recording the brain activity from 1 ms or above in the presence of stimuli or an event occurs. The potential changes are so small that in order to find the pattern, EEG samples are averaged. Further event-related potentials can be alienated into exogenous and endogenous depending upon the temporal resolution. It is exogenous potentials if resolution is under 100 ms and endogenous potentials occur after 100 ms onwards after the stimulus onset. They depend upon the properties of stimulus, physiological and behavioral processes related to the event. The main characteristics of ERP are polarity (positive or negative going signals), sensitivity to task manipulation, spatial distribution and time. Figure 6 is showing ERP generated in response to visual as well as audio stimuli presented to user [29].

Fig. 6
figure 6

Different evoked potentials present in brain electrical activity [29]

P300 is most commonly explored ERP. This positive component of ERP occurs in brain at peak 300 ms or more (up to 900 ms) after onset stimuli. As it peaks above 100 ms, it is an endogenous ERP activity. A P300 based BCI system have advantage of minimal user training. In this system, users have to choose the one of the choices given in stimulus and designate this as the target. Evoked potentials (EP) are the subset of ERPs caused by the sensory stimulation in response of in physical stimulus (auditory, visual, somatosensory etc.). It ranges from 1 μV to few microvolts. They are present at different areas of brain like cerebral cortex, brain stem, spinal cord, peripheral nerves. Visual evoked potentials (VEP), auditory evoked potentials (AEP), steady state evoked potentials (SSEP) are some typical evoked potentials that reflects the output features of pathways of different brain sensory activities. Thought-translation device (TTD), a training device and spelling program was developed by Birbaumer et al. [3], for completely paralyzed patients using slow cortical potentials.

3.1.1.3 Event-Related Desynchronization (ERD) and Event Related Synchronization (ERS)

Event related desynchronization is decrease in certain rhythms due to movement or preparation of movement. Contrary to this, increase in the amplitude of the rhythm results event-related. Mostly mu and beta rhythms are the rhythms involved for ERD and ERS. ERD and ERS can be presented in both spatial and time domain. ERS/ERD can be measured by calculating the amplitude of certain brain wave before and after the presence of external/internal stimulus over a number of EEG trials. Then averaged power over a number of trials is measured in terms of percentage in relation to power of referential interval e.g. 1 s interval i.e. between 3.5 and 2.5 s before and after the event.

The interval between the two events should be random and not shorter than second to keep power at reference interval. In 1990, Pfurtscheller and Berghold [24] has developed Graz-BCI mu rhythm ERD/ERS based system using imagery of motor action as the mental task. Generalized ERS and ERS w.r.t. constant referencing scheme has been demonstrated in Fig. 7 [30].

Fig. 7
figure 7

Generalized ERS and ERS w.r.t. constant referencing scheme [30]

3.1.1.4 Electrocorticogram (ECoG)

Electrocorticogram are the signals recorded at the surface of brain by placing the electrodes at the surface of cortex [31]. The surgical procedure “craniotomy” is used for opening the skull and cutting the membrane which covers the brain. ECoG signals are like EEG signals but have better spatial resolution and attenuation due to absence of skull and scalp. The location and arrangement of electrodes as well as implant duration is variable and depends solely upon the application requirements. The electrodes used for recoding are typically platinum electrodes with 4 mm diameter and arranged in a grid of 8 × 8 or in strip of 4–6 electrodes. The distance between the electrodes is more often 10 mm. The spatial resolution and amplitude of ECoG signals vary from 1.25 to 1.4 mm and 50 to 100 μV respectively. Also, they are less affected by the brain and non-brain artifacts as the task related signals are larger than the noise floor of the amplifier/digitizer. Thus, the signal to noise ratio of ECoG signals is much higher than the EEG signals. It also concludes that they carry substantial amount of information about cognitive, motor and language tasks. The brain neurons stays undamaged as the electrodes do not penetrate the brain. From the literature [32], it can be concluded that ECoG electrodes are likely to provide longer stability than fully invasive intracortical electrodes. In spite of its advantages over EEG and intracortical recording, ECoG signals generally are not used for research need as there major surgery is involved. Typically used for medical implications especially for actual site and extent of epilepsy symptoms [33]. Perhaps, the future of nanotechnologies that might develop nano-detectors to be implanted inertly in the brain, may provide a definite solution to the problems of long-term invasive applications. Further, a link between the microelectrode and external hardware that uses wireless technology is needed to reduce the risks of infection. Wireless transmission of neuronal signals has already been tested in animals [34,35,36]. Further refinements of recording and analysis techniques will probably increase the performance of both invasive and non-invasive modalities.

3.1.2 Preprocessing of Acquired EEG Signals

Digital EEG data recordings have advantages of flexibility, user specific montage selection, horizontal scaling likes compression and time resolution, filters, vertical scaling of sensitivity etc. EEG data recordings are digital time series or set of discrete time series, thus it makes possible application of variety of digital signal processing techniques. Raw EEG data is contaminated with the other neurological or non-neurological signals which are known as artifacts e.g. eye blink, muscle activity, electrode movement etc. [37, 38]. Electromyogram (EMG) is class of artifacts due to muscle activity like facial movement, tongue movement, neck movement etc. the noise created by eye blink are electroculogram (EOG) signals and have high amplitude then neural signals. These artifacts results into interference in control signals for BCIs, poor signal to noise ratio (SNR) and change in the distinctiveness of specific interest of EEG data. Thus, removal of the artifacts from raw EEG data is necessary for improved SNR signals and BCI performance.

The EEG signal must be amplified, filtered, digitized and referenced before extracting the features out of it. There are many signal preprocessing methods exists, only EEG signal preprocessing methods are discussed here. The EEG signal must be boosted, amplified from few microvolt signals to million-fold to avoid the artifacts. The amplified signal then filtered in the range of 0.5–50 Hz to include necessary oscillatory components of EEG and to filter out high frequency signals like muscle activities (EMG) (>50 Hz), eye blinks (EOG) etc. Most of the researchers have used subject dependent band filter to filter the raw EEG signal like Notch, Finite Impulse response etc. [39,40,41,42,43,44]. This method is also known as temporal filtering and eliminates low as well as high and frequencies from signal. Signals can be spatially filtered using referencing schemes like common average referencing (CAR) [45], bipolar referencing, surface Laplacian. These filters use high pass spatial filtering to enhance the focal activity like mu and beta rhythms from local sources. The authors [46, 47] has used subject specific filtering using Independent Component Analysis (ICA) for blind source separation which assumes EEG data as linear superposition of independent components to remove the artifacts. The artifacts fNIRS due to breathing and heart beat can be filtered by moving average filters [48], IIR low pass filters [43], wavelet denoising [49].

3.1.3 Feature Extraction

The identification of signal’s characteristics (features) that might help in identifying the specific pattern related to user intends present in filtered, amplified, digitized and referenced EEG signal is known as feature extraction process in BCI design. These features can be the basis of pattern recognition algorithms that leads to classification of mental activity [7]. The aim of feature extraction step is to find most distinctive features and thus, enhancing the signal to noise ratio (SNR). This important step becomes difficult when signals and noise are similar e.g. EOG is very similar to beta rhythms and EMG is very similar to slow cortical potentials (SCPs). EEG signals are spread over space, time and frequency. It can be studied in many domains like time domain, frequency domain or time-frequency domain. Bashashati et al. [50] reviewed different types of feature extraction methods in 2007. Many features like amplitude values of signal, auto regressive model coefficients (AR), band power, power spectrum density (PSD), correlation coefficients, entropy, wavelet coefficients etc. are studied and proven to be good for pattern matching algorithms. Common spatial patterns (CSP) are the most of efficient method for feature extraction from EEG signals [51]. Several variants of CSP method established for grasping spatial information of brain signals like Probabilistic common spatial patterns [52], bank regularized common spatial pattern ensemble [53]. Signal power/energy levels at different location over the scalp are known as band power (BP) features [1, 2]. After band power estimation of signals, these values can be used to find the event related synchronization/event related desychronization (ERS/ERD) maps to visualize certain activity/events in the signal. The raw signal should be band passed filtered within defined frequency bands and then squared and then averaged for consecutive time intervals. Visualize ERS/ERD for these values for each subject and then selection of bands with most distinctive information is stored for further classification [6]. The authors [54] has compared the CSP and BP features for four class BCI experiment and tackled the BP feature by adding phase information with time information. Power Spectrum Density (PSD) is the power distribution with frequency in signals/time series. The power of a signal can be power only or can be squared value of signal. The PSD feature only exists if the signal is wide-sense stationary process. PSD is the Fourier transform (FT) of autocorrelation of the wide-sense stationary signal. It does not exist in non-stationary signals as autocorrelation function must have two variables. However some researchers have estimated time varying spectral density as distinctive feature [55]. Autoregressive (AR) model coefficients also have shown good results for classification of different mental task/event using EEG signals [56,57,58,59]. The linear regression of current series data against one or more prior series of data is used to find autoregressive model coefficients. Many variants of linear regression can be applied for estimation of autoregressive like least square regression, recursive-least-square methods etc. Another Burg method is well known method for estimating reflection coefficients for autoregressive models. Differentials Entropy is also used as distinctive feature by authors of [60, 61]. Moreover Wavelet Coefficients also have been employed to extract features for EEG signal classifications [62,63,64]. The wavelet fuzzy approximate entropy, clustering techniques, cross-correlation techniques and many techniques exists for feature extraction from raw EEG signals. Following are some discussion points that might be of interest in deciding the feature to be used:

  • Usage of BCI: BCI can be used as online or offline. Feature extraction for designing online BCI application is more complex than offline BCI design. Thus, low complexity features within small time frame would be advised choice for the design.

  • Robust BCI: the noisy EEG signals have poor SNR and more sensitive to outliers. Thus, robustness towards artifacts and noise must be taken care for the BCI design.

  • Distinctiveness: higher distinctiveness of extracted features towards brain events, easier and accurate is the classification task. This uniqueness can be measured with measure/index e.g. Fisher Index, DBI [6] or direct accuracy of classifier.

  • Non-stationarity: for designing online BCI systems, non-stationarity based time varying shift detection in intra or inter session changes of EEG data could be a point of interest. Some features like approximate entropy is less affected by these shift variation in EEG signals.

The choice of features and application of the BCI system are correlated. Feature can be ignored/selected on the basis of application of BCI system. Traditional feature extraction techniques like AR model, PSD or band power assumes the EEG signal as superimposition independent wave (mostly sinusoidal) components and avoid the phase information. Higher order statistics and non-linear feature extraction can be used to tackle this problem [65, 66].

3.1.4 Feature Selection

The features extracted can be high dimension vectors depends upon the number of channels, number of trials, number of sessions from multiple modality and sampling rate of modality. It is neither realistic nor useful to consider all features for classification. So selecting a smaller subset of distinctive feature set or feature space projection is an important step in pattern recognition for classification. The aim of feature selection process is to remove the redundant and uninformative features along with finding unique features which do not over fit the training set and classify the real dataset with higher accuracy even in the presence of noise and artifacts [67]. Projection techniques can be useful when the relevant information is spread in all over feature space and data is transformed in order to retrieve the discriminative information. In some applications channel selection might be helpful by setting the score to features of different channels. Then, channels having features with highest score is selected for further classification. Thus, there could be three approaches to handle the problem of high dimensionality:

  • Feature Selection: here the goal is to find best combination of subset features using search base methods like genetic algorithms, wrapper’s approach, filter approach, Sequential forward floating search etc. there is basic two criteria to find the good feature set (1) an optimized search method (2) a performance measure to evaluate the selected subset of features searched by (1). Finding the appropriate subset of features is considered as NP-hard [68]. Figure 8 depicts the four stage feature selection process demonstrated by authors Liu and Yu [69].

    Fig. 8
    figure 8

    Feature selection steps [69]

Both heuristic and intuitive search methods can be used for the searching purpose. Based on these factors, Wrapper approach and filter approach can be used to evaluate the performance of feature subset. In wrapper approach classifier is defined first, takes subset of feature as an input to classifier for training, then classification accuracy is evaluated in validation testing phase and finally these accuracies are compared across each subset. On the other hand, filter approach evaluate the goodness of features on the basis of measures/indexes independent of classifier. Distance measures in cluster like Davies-Bouldin Index (DBI) [6], information measures like information gain, dependency measure like coefficient of correlated feature or similarity index etc. There is a hybrid approach which uses both wrapper and filter to reach the higher accuracy in less computational cost [65].

  • Dimensionality Reduction: here reduction of feature space is done by projecting high dimensional features into lower dimensional feature space. To deal with this curse of dimensionality, these methods can be divided into categories like linear/non-linear, supervised/unsupervised. Linear methods like principal component analysis (PCA), factor analysis (FA) consider covariance of data and transform it linearly to reduce the dimensions of observable random variables. Most nonlinear unsupervised method for dimensionality reduction is based on manifold learning theory. In these methods a weighted graph of data points depending upon the neighboring relation, are projected into lower dimensional space [65]. These methods uses structural knowledge like locality or proximity relation while maintain the relationship among the data points. These methods can be categorized in the following three methods [70]: (1) methods which preserve local properties of data in lower dimension e.g. Isomap, Kernal PCA (2) method which preserve global properties of data in lower dimension e.g. Laplacian Eigenmaps (3) methods which align mixture of linear models globally e.g. Manifold Charting.

  • Channel Selection: here main aim is to find combination of channels which are generating most relevant and distinctive information specific to application. In some cases these methods are advantages than feature selection methods e.g. finding the spatial distribution of motor imagery events. The first approach for channel selection is to apply the feature selection methods and then mapping these features with associated channels. This method is limited to some specific applications. On the other hand, direct channel selection incorporate prior knowledge into analysis of results or in the selection process which leads to better understanding of spatial information, further can be used to implement required control.

3.1.5 Pattern Matching

The ultimate goal of BCI design is to translate the mental event of user into control commands. The acquired raw EEG signal has to be converted into real action in surrounding environment. So, classification or pattern matching of the signal into predefined classes is naturally the next step after preprocessing and feature extraction and selection. Machine learning has played an important role not only in identifying the user intent but also handle the variation in ongoing user’s signals. Considering traditional approach of pattern matching [71], the classification algorithms for mental task recognition inside the EEG signals can be categorized in four categories: (1) adaptive classifiers, (2) transfer learning based classifiers, (3) matrix and tensor classifiers and (4) deep learning based classifiers.

  • Adaptive Classifiers: In mid-2000s adaptive classifiers were used for EEG based BCI design [72,73,74]. The adaptive classifiers update the parameters (e.g. weights, error) incrementally over time and classifiers adapt the changes in the incoming EEG data. This enables the classifier work efficiently even if there is drift in the dataset. These classifiers can use supervised or unsupervised adaption [75, 76]. The former adaption uses previous knowledge of output classes. Figure 9 demonstrates the typical supervised classification approach for EEG-based BCI design. The dotted lines denote the algorithm which can be optimized from available data in training phase. The optimized algorithms then can be used for testing phase or original use to translate electrical brain signals into real time control commands. The real time or free BCI cannot take advantage from supervised adaption techniques as the true label of raw EEG data is unknown. Whereas, the unsupervised adaption approach do not use any previous knowledge of output classes and thus, output labels are unknown. The class label estimation can be done based on retraining/updating of classifiers or adaption with unknown class labels e.g. by updating mean or correlation matrix of variables. The combination of both type of adaption is known as semi-supervised adaption. These adaptions consider both the unlabeled and labeled dataset for training the classifier. First the classifier is trained with available dataset along with output class label. Then unlabeled testing data is classified by this supervised trained dataset. Finally, classifier is retrained/updated incrementally with unlabeled and available labeled dataset.

    Fig. 9
    figure 9

    Typical classification approach for EEG based BCI design

Various state-of-art classification algorithms have been employed by different groups to infer the mental task. Linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) [72], adaptive Bayesian classifier [77], adaptive support vector machine (SVM) [78, 79], adaptive probabilistic neural network [80], radial basis function (RBF) kernels [81], L2-regularized linear logistic regression classifiers [82] are combination of linear or nonlinear state-of-art algorithms for supervised adaption approach. Ensemble and extreme leaning has also been implemented by Li and Zhang [83]. The unsupervised learning is complex and difficult to implement due unavailability of class specific information. Adaptive LDA and Gaussian Mixture model (GMM) [84], Adaptive LDA with Fuzzy C-means [85], Incremental logistic regression [86], Incremental SVM [87], Semi-supervised SVM [88], Unsupervised linear classifier [89] are some semi- or unsupervised algorithms used in different modalities in BCI design.

  • Matrix and Tensor Based Classifiers: These classifiers works on the alternate approach as used for adaptive classifiers i.e. feature extraction and then selecting the relevant features. Instead of optimizing dual problem, these classifiers do the mapping of the data directly to classification domain e.g. geographical space. The idea behind these classifiers is the assumption that spatial distribution and power can be considered fixed and thus, can be represented in covariance matrices. These covariance matrices can be used directly as an input to classifier. Figure 10 demonstrate both adaptive feature learning and direct learning of matrices approaches for pattern matching in EEG signal classification. This approach can be applied to both oscillation based BCI and ERP based BCI systems. A regularized discriminative framework for EEG analysis in which data is represented as augmented covariance matrices has used this approach [90]. Riemannian Geometry Classifiers (RGC) in [91] are also based on the same concept in which data is directly mapped into geographical space with suitable matrices. These approaches benefits in the form of higher accuracy but complexity, high dimensionality of these classifiers is more demanding than tradition approaches.

    Fig. 10
    figure 10

    The dotted area is interchangeable

    Two approaches for classification of EEG data.

Tensors are multi-way arrays and used to generate high order tensors from EEG data format. For example, 3rd-order tensor for EEG classification can be represented as space × frequency × time. These modes define the order of tensor, also known as dimensions of tensors. Almost all classification algorithms can be generalized using tensors but this field is yet be explored [92, 93].

  • Transfer Learning: The hypothesis which most of the machine learning algorithms follows that the data set for training and testing belongs to same data domain with same probability domain. Opposite to this hypothesis, in BCI design data distribution is different in real time testing phase across time or subject. Transfer learning handles this problem by exploiting the knowledge about one task, while learning another related task. So, effectiveness of transfer learning is totally depends upon the correlation in these task. For instance, motor imagery task performed by two subjects is more effective than performing motor imagery task and p300 speller task by same subject. Transfer learning plays important role where domain data is labeled for one task, and target domain contains the scarce to acquire another task. Transfer learning can be categories in two types based upon domains, tasks and learning setting. Homogeneous transfer learning is the learning where source domain task and target domain task is same, and adaption of the probability distribution or conditional probability distribution is not same in source and target domain. Whereas, Inductive transfer learning is where source task and target task are different in labeled data in both source and target domain. For instance, there could be left hand and right hand movement is labeled in both source and target domain, whilst target domain involves tongue movement. Another situation, Transductive transfer learning is the situation where source and target domain are different but tasks are similar. It happens frequently in BCI systems, as there is inter/intra session variability or inter-subject variability usually arises.

Many transfer learning approaches evolve by transformation of data to match their distribution. This could be linear or non-linear transformation. Figure 11 illustrates an example of domain adaption and transfer learning where source domain and target domain are differently labeled. A normal classifier trained on source domain will perform poorly on target domain. But by applying domain adaption technique [94] transfer the dataset distribution as to match the source and target domain distribution. A detailed survey has been presented by Pan and Yang [95] on transfer learning for more detailed illustration on transfer learning.

Fig. 11
figure 11

Domain adaption [94] in transfer learning

  • Deep Learning: is special branch of machine learning algorithms which directly learn from the data set instead of learning from extracted feature set. It is based on the deep learning done by the human brain which created the pattern from data and learn from it for decision making. In recent year deep learning has shown good classification results and improved accuracy of the pattern recognition system. Like machine learning, it is also supervised, unsupervised or semi-supervised. An inbuilt cascade of feature extractor modules handles the non-linearity of available data domain. Figure 12 demonstrates the difference between tradition machine learning algorithms and deep learning algorithms.

    Fig. 12
    figure 12

    Traditional versus deep learning approach

Deep Boltzmann Machine (DBM), Recurrent Neural Network (RNN), Recursive Neural Network (RvNN), Deep Belief Network (DBN), Convolution Neural Network (CNN), and Auto Encoder (AE) are some examples of deep learning algorithms.

Deep Extreme Learning Machine (ELM) has used by authors of [96] for finding slow cortical potentials (SCP) in EEG signals. This ELM contains multilayer of extreme learning machine ending with last layer of kernel ELM. The motion onset visual evoked potential BCI features have been extracted using deep brief network (DBN). The DBN deep learning machine is composed of three Restricted Boltzmann machine (RBM) [97]. Yin and Zhang [98] employed adaptive deep neural network (DNN) to classify both workload as well as emotions. They compose the stack of Auto Encoder (AE). They retrained the first layer of network with adaptive learning algorithm taking labeled input with estimated class.

The deep learning classifiers are advantageous as it leads to better features and classifying accuracy. But they need large number of training dataset for calibration. BCI is user specific application, subject have to perform thousands of relevant task for calibration before actual use of it. For online systems, it is quite expensive in terms of money as well as time.

3.1.6 System Feedback and User Training

Finally, before providing feedback to user about a specific mental state is recognized or not, EEG signals should be classified on the basis of selected features to convert the EEG signal into a control command. Thus, system feedback and user training is an important step in BCI design. Many research findings have shown that inaccurate feedback to user causes the impeded accuracy of BCI system [18]. Feedback can be continuous/discrete audio video signal, virtual/realistic 1D, 2D and 3D environment. Feedback makes the BCI design as adaptive closed loop system between human brain and computer.

4 BCI Performance Measures

Evaluation of BCI system is different depending upon the design of BCI system and target application. Some of the common BCI performance measures are classification accuracy, kappa metric, bit rate, area under the curve (AUC), uncertainty and mutual information, the receiver operating characteristic (ROC) curve and entropy. Every step of BCI design has different components for performance evaluation in closed loop BCI dependent upon the design. The basic and most commonly used method is classification accuracy specifically for the equally distributed samples per class and for unbiased classifiers [77,78,79,80,81]. Another Kappa metrics or the confusion matrix is used to measure the sensitivity-specificity pair for unbalanced classes and less biased data [54]. A bit rate, an information transfer rate is used in account to both accuracy and speed of a BCI [99]. Channel capacity has to be calculated with several assumptions in bits/min. Entropy and uncertainty of a classifier can also be used to appraise the performance of a BCI system [60].

5 Conclusion

This new pathway to human brain can open many doors to complex and unimaginable solutions to many applications. Many more type of diseases can be diagnosed. There is a great scope of enhancement in existing BCIs using artificial intelligence and machine learning algorithms. Use of high computing electronic devices and transfer learning, tensor and deep learning algorithms could serve the purpose. Security and privacy issues open challenge has gained significant attention and can further be explored [100].