1 Introduction

Multirobot systems have extremely promising applications, such as search and rescue, environmental monitoring, autonomous construction, or geographic mapping. The topic has been extensively studied from various perspectives, including swarm robotics (Brambilla et al. 2013), collective robotics (Kernbach 2013), and distributed robotics (Martinoli et al. 2012), each of which refer to the form of interaction among the robots. In swarm robotics, researchers and engineers have successfully designed scalable (Rubenstein et al. 2012), robust (Winfield and Nembrini 2006), efficient (compared to single robot) (Bonani et al. 2012), and affordable distributed multirobot systems (Rubenstein et al. 2014). On top of the challenge of designing autonomous control strategies, researchers have recently shown an increasing interest in another aspect of swarm robotics: human–robot interaction. While well-established control interfaces exist for single-robot scenarios, human–swarm interaction (HSI) is still an open research field (Kolling et al. 2016).

A majority of researchers addressing the human interaction with a robot swarm use remote control strategies, based on a centralized approach that allows the operator to have an overview of the mission (Kolling et al. 2016). This approach stands in stark contrast to several fundamental principles of swarm robotics, which relies on simple mechanisms, local interactions, and spatially targeted communication, among others. These principles, normally applied to robots only, can also be considered for human–robot interaction. This is possible, for instance, when human and robot swarm share the same physical environment. In such situations, the operator can interact locally with the part of the swarm close to him/her and observe the same environment that the robots observe. In the literature, this interaction is called proximal, in opposition to remote interactions (Kolling et al. 2016).

We therefore consider an application scenario in which an operator is surrounded by mobile robots that have semi-autonomous behavior. This might be the case, for instance, in an inspection or construction task. The operator simply interacts with the robots that are close to her/him and share the same environment. The robots can either act independently or be part of a swarm. In our application scenario, when the robots meet a predefined condition, find some interesting information, or cannot solve an issue, they stop and request a command from the operator. In the case of a swarm, the robots stopping and asking for interaction with the operators could be either single robots or leaders of a subgroup of the swarm (Goodrich et al. 2013). As several robots may be in this situation, the operator must select one of them, based on criteria that are application dependent and managed by the operator himself. Triggering interaction with a single robot within a group is a challenging HSI problem: the communication channel should be easily accessible to the operator, combined with an infrastructure that is distributed and compatible with the swarm robotics approach. Fong et al. have proposed a simple selection protocol that uniquely identifies each robot using a numbering system; the selection and manipulation of the robots were performed via a remote control (Fong et al. 2003). Such systems require several explicit coding rules that add on top of the communication channel, which reduces efficiency and is incompatible with a distributed system. Other more intuitive methods, such as gesture recognition (Couture-Beil et al. 2010; Jones et al. 2010; Monajjemi et al. 2013; Nagi et al. 2014), robot vision-based user-gaze interpretation (Couture-Beil et al. 2010; Monajjemi et al. 2013; Pourmehr et al. 2013), and speech recognition (Pourmehr et al. 2013), have been studied. Several relevant literature reviews exist on the topic (Goodrich and Schultz 2007; Kolling et al. 2016; Yanco and Drury 2004).

Most of the aforementioned methodologies have been tested on real robots. For example, automated vision-based detection of hands and face combined with machine learning-based spatial gesture analysis showed successful selection of a single drone from a group of four just by robot vision. The research team claimed that their algorithm can scale up to 20 drones (Nagi et al. 2014). Similar research has discussed the capacity of vision-based systems with regard to the varying distances between the operator and the robot; in this case, the studied range was 1–4 m (Couture-Beil et al. 2010). However, speech and gesture interaction systems have some practical limitations: (1) they require prior training of the operator to use specific coded words or gestures that can be culture dependent (Trovato et al. 2013), limiting intuitive interaction (Kirchner et al. 2015; 2) they are sensible to the detection of the intention to interact, as they use communication channels that are common with other tasks (Rzepecki et al. 2012); and (3) they are based exclusively on explicit communication, which generates heavy protocols (Kirchner et al. 2015).

To address these issues, we studied the use of electroencephalography (EEG) signals as a robot selection mechanism. This approach does not require the definition and learning of explicit communication codes, as it is based on implicit information extracted by EEG from the operator observing the robot. We define implicit information as information provided by the operator in a passive way, in opposition to explicit information, which is exchanged actively (Kirchner et al. 2015). We define implicit communication as an exchange of implicit information. EEG-based implicit communication is not culture dependent, and EEG techniques are more reliable than gesture- and speech-based techniques in detecting the intention to interact (Rzepecki et al. 2012). Recent advances in neuroscience provide us with reliable and affordable devices that allow acquisition of two reliable and well-documented EEG neural responses—the P300 and the steady-state visually evoked potential (SSVEP) (Beverina et al. 2003; Bi et al. 2013; Zhu and Bieger 2010). The P300 neural response is elicited as a reaction to salient stimuli. The SSVEP, on the other hand, is measured when a visual stimulus is repeatedly shown at a certain frequency. Although the P300 response has been given more attention, recent studies show that target selection can be achieved efficiently using SSVEP because it is possible to reliably distinguish different SSVEP responses corresponding to different frequencies through computational analysis (Gao et al. 2003). Therefore, we used the SSVEP response to lights blinking at different frequencies in our robot selection scenario to detect the target being watched by the operator. This new communication channel is compatible with the swarm robotics approach but does not solve the question of the distributed infrastructure, which will not be addressed in this paper. For this layer of HSI, we refer the reader to the latest results in protocols implementing spatially targeted communication (Mathews et al. 2015).

The SSVEP response can be extracted from an EEG signal following several approaches (Bi et al. 2013). Most studies use machine learning, but this approach requires a training phase, which we want to avoid in order to validate the fact that we use pure implicit communication. Therefore, we applied two other techniques: a signal processing approach using canonical correlation analysis (CCA) and a simpler short-time Fourier transform (STFT). The CCA-based approach has been chosen because it does not require training and showed very interesting results on the same equipment we used in our study (Lin et al. 2014). We also compare the results obtained with CCA to the simpler short-time Fourier transform (STFT) processing chain (Durak and Arikan 2003). The STFT is also relevant in such a scenario because it can provide shorter response times. The response delay of the system is probably the major limitation of most SSVEP-based approaches.

To obtain the best possible results, we began by exploring the role of three key system parameters: the frequency of the blinking light, the distance between the operator and the robot, and the color of the visual stimuli. Once the optimal parameters were set, we tested our approach on ten subjects, most of whom had no experience using EEG-based interfaces.

This paper is structured as follows. Section 2 presents the state of the art in SSVEP-based brain–computer interfaces (BCI). Section 3 gives further details about the experimental setup and, in particular, about the EEG device, the robot, and the general data collection protocol. Section 4 presents the study of the three key parameters of our setup: the frequency, the distance, and the color of the targets. Section 5 builds on the chosen parameters to study the performances of ten subjects using the CCA and STFT approaches. A discussion section concludes the paper.

2 State of the art

After the pioneering example of BCI for the control of a wheelchair by Millan et al. (2004), the research community has shown a growing interest in this mobile robot interaction technique (Bi et al. 2013). The main motivation behind these studies is to enable severely disabled people to control wheelchairs. With a better understanding of these techniques, however, other usages have appeared, including the control of mobile robots by healthy subjects in various applications. The work by Kishore et al. (2014), targeting the control of a humanoid robot, is a representative example of the most common approach: the interaction is made through a screen, where all possible commands are associated with visual stimuli (Volosyak et al. 2009). When the subject looks at a given command on the screen, the associated stimulus frequency is detected in the EEG signal and the command is triggered. Stawicki et al. (2016) follow the same approach, using a screen, but illustrate the commands in an interface based on the subjective view of the robot, generated by a camera located on the mobile robot itself. A slightly more sophisticated approach consists of introducing an avatar to represent the possible actions (Faller et al. 2010). An additional abstraction can be introduced by selecting a goal that can be achieved by a combination of actions, for instance by selecting the destination in the scenario of driving a car (Xa et al. 2015). Most BCI studies targeting the control of mobile systems follow this same approach, using a computer screen as support for the visual stimulus (Bi et al. 2013). Computer screens offer flexibility in the graphical expression of the commands and in the placement of the stimuli.

However, the fixed refresh rate of a screen reduces the usable frequencies to divisors of the refresh rate, which can be seriously limiting. Güneysu and Akin (2013) control a humanoid robot with a panel of LEDs instead of a computer screen. Although the principle of displaying a set of possible commands on an LED matrix is identical to the principle used with computer screens, the choice of LEDs allows a better flexibility in the choice of frequencies. Ortner et al. (2010) also use LEDs on a control panel to define the movement direction of a mobile robot, but they introduce a specially designed shape for their panel, better fitting its purpose. Still, none of these studies allows a direct proximal interaction with the robot, always introducing a control panel between user and robot. To our knowledge, only Jacobs (2013) studied a direct interaction, with the visual stimuli created by LEDs on the robot itself. In his study, the LEDs are placed at the end of three arms fixed on the robot. The three arms correspond to three directions (forward, right, and left) that the user can choose by looking at the corresponding LEDs. This work was very preliminary and tested on very few subjects.

Concerning the choice of the neural response used to detect user intention, SSVEP is increasingly chosen as it achieves acceptable performances with most people (Guger et al. 2012). SSVEP-based target selection procedures allow choosing among many items. Gao et al. (2003) claim that their algorithm could successfully detect 45 different target frequencies using green blinking LED lights. The performances of SSVEP-based systems can be improved by coupling them with other neural responses, like the P300 (Yin et al. 2015). In the domain of rehabilitation, the combination of SSVEP and P300 signals has been used to control actual wheelchairs (Li et al. 2013). These performances come at a cost: they require EEG acquisition systems that are extremely expensive and not portable, and experiments must be carried out under conditions that are extremely controlled.

The goal of reaching practical applications pushed the development of affordable and portable EEG headsets, but most consumer headsets have fewer than five electrodes and do not allow exploration of a sufficiently large number of signals. Only two affordable systems acquire signals on 14 or 16 electrodes: the OpenEEG and the Emotiv EPOC headsets. The OpenEEG is an affordable system targeting research experiments (Salehuddin et al. 2011), but it requires substantial deployment effort. The Emotiv EPOC is simpler to deploy (Jian and Tang 2014; Van Vliet et al. 2012); compared to traditional systems that require gel on the scalp as well as cumbersome wiring, Emotiv uses saline solution and a radio connection. However, ease of use and affordability come at the price of reduced signal quality. Still, a comparative analysis of SSVEP data acquired from EPOC and medical-grade EEG found that the data acquired from EPOC are reliable (Liu et al. 2012), although the authors cautioned that the Emotiv should not be used for medically serious cases (Duvinage et al. 2013). The radio connection is also a limitation, but studies have shown its reliable use in real-time applications (Hvaring and Ulltveit-Moe 2014).

3 Materials and methods

Our goal is to explore the use of neural responses for proximal interaction with a swarm of robots without a computer screen, a panel of LEDs, or any other interfacing tools between the robots and the operator.

For the acquisition of EEG signals, we used the Emotiv EPOC EEG headset (Stytsenko et al. 2011). As described in Sect. 2, this headset is a good trade-off between affordable price and level of performance. While it is affordable with respect to medical-grade devices, it is expensive (approximately $700 with drivers to access raw data) compared to other “consumer” headsets because of its 14 electrodes (see Fig. 1 for their positioning on the skull), which allow several types of data acquisition. A final advantage is its compatibility with open-source EEG signal acquisition and processing software for BCI design. This study uses OpenViBE, a well-established open-source BCI design software (Renard et al. 2010).

Fig. 1
figure 1

Top view of the location of the electrodes of the Emotiv EPOC EEG headset on the skull (forward looking direction toward the top of the image), with their international code labeling

We used Thymio II as the robot for our experiments; this programmable robot features a differential drive system, infrared (IR) remote control receiver, and LEDs to change body color (Riedo et al. 2013). Its small size (\(11 \times 11 \times 5\,\hbox {cm}\)) and affordable price (approximately $130) make it well suited for multirobot experiments. The communication between the computer and the robot was supported by an infrared emitter dongle controlled by USB. In this configuration, the computer only plays the role of the processing and communication unit of the operator, establishing local communication with the robots that are in the field of view of the operator.

Fig. 2
figure 2

Configuration of the experiment, showing the signal acquisition setup and the processing infrastructure

Figure 2 summarizes the experimental setup. The EEG signal is acquired and transmitted to a laptop running the various software tools: a driver to access to the EEG data, the OpenVIBE software to manage the EEG data processing, an interface toward the infrared remote control of the robots and MATLAB to analyze the results of the experiments.

Each experiment was composed of a set of trials. In each trial, the subjects were instructed to look at an indicated target robot. One second after the instruction, the robot began to flicker and continued for 7 s. During the stimulus, the subjects were asked to look at the blinking light; they were requested to blink as little as possible to limit EEG artifacts. A break of 3 s was then introduced to avoid tiring the subject.

4 Preliminary study: parameter optimization

To optimize the extraction of the SSVEP response within the EEG signal, we studied the effect of three important interaction parameters on the strength of the SSVEP response: the blinking frequency, the distance to the stimulus, and the blinking color. These studies not only make sense within the context of HSI but are also of fundamental scientific interest.

The LED blinking frequency is the first important parameter. The blinking frequencies used in the literature vary from 4.5 to 50 Hz (Zhu and Bieger 2010). However, since the signal-to-noise ratio in EEG is higher in the lower part of the spectrum, some researchers have suggested using low frequencies for SSVEP-based applications (Akhtar et al. 2014). In particular, Gao et al. 2003 found empirically that the usable range of frequencies for SSVEP-based BCI is 6–24 Hz. This is the range we used in our first experiment.

The distance between the target and the operator is the second critical parameter. Wu and Lakany (2013) have studied the impact of distance on the SSVEP response, but using a medical-grade EEG headset.

The third key parameter is the color. In the literature, white was predominantly preferred over red, green, or blue (Akhtar et al. 2014; Aljshamee et al. 2014, 2016; Cao et al. 2012; Yin et al. 2013). Cao et al. justified the preference: white is a combination of all the primary colors and therefore excites cone cells associated with red, green, and blue light simultaneously (Cao et al. 2012). Some studies, however, have successfully used red (Faller et al. 2010; Jian and Tang 2014; Li et al. 2013) and green (Chua et al. 2004; Duvinage et al. 2013; Gao et al. 2003; Hvaring and Ulltveit-Moe 2014; Li et al. 2013; Mouli et al. 2013) alone as stimuli as well. Some studies found red to be more effective than white (Faller et al. 2010; Hvaring and Ulltveit-Moe 2014), while others found green to be more effective under similar conditions (Chua et al. 2004; Duvinage et al. 2013). There is similar contradictory evidence between the red and green colors; Mouli et al. observed green to be more effective (Mouli et al. 2013), while others were more successful using red (Cao et al. 2012).

Based on these observations, we decided to conduct our own study on the impact of these parameters on the SSVEP neural response when the stimulus is generated by the body of several robots.

4.1 Evaluation metrics

To evaluate the quality of the SSVEP response, we computed a metric that indicates the prominence of the stimulus frequency in the EEG signal. To compute this metric, we applied a fast Fourier transform to the EEG signal from each trial to obtain the averaged frequency spectrum. To quantify the detectability of the SSVEP response, we used the first peak to the second peak ratio (FSR) (Zheng and Zhang 2010): given a particular frequency f, let F and R be two disjoint subsets of the averaged spectrum such that F contains the spectrum of the frequencies \([f-1, f+1]\), and R contains the other frequencies, that is, the range \([6, f-1[ \,\cup \, ]f+1, 24]\); the FSR ratio is then defined as:

$$\begin{aligned} q =:\frac{\max F}{\max R} \end{aligned}$$
(1)

The FSR provides the ratio of the highest peak within \([f-1, f+1]\) to the highest peak in the rest of the spectrum. The SSVEP neural response to a regularly blinking stimulation is characterized by a peak in the spectrum of the signal at the same frequency as the blinking frequency. Thus, if the FSR based on the stimuli frequency is above 1, then the highest peak is within 1 Hz of f, and the SSVEP can be considered detectable and recognized. Otherwise, the SSVEP cannot be detected. We therefore call q the recognition ratio. Please note that we decided to consider peaks within 1 Hz of the stimulation frequency as valid SSVEP responses because we always have at least 2 Hz difference between one stimulation frequency and another. This band could be restricted, as existing literature shows that neural responses are, in general, very accurate (Gao et al. 2003).

4.2 Parameter: stimulation frequency

Six frequencies were tested (9, 12, 15, 18, 21, and 24 Hz). For each frequency condition, five trials were performed on three different subjects. The subjects had normal or corrected-to-normal vision and no history of major head injury. The blinking robot was set 1 m away from the subject. Figure 3 confirms the decrease in the amplitude of the neural response as the frequency grows, as already described in the existing literature (Herrmann 2001); furthermore, it shows that the detection fails beyond 15 Hz. This is lower than what is described in the literature with medical-grade EEG headsets; in Gao et al. (2003), the range used is 6–24 Hz. Therefore, we deduced that SSVEP activity can be measured with this headset and in these physical conditions, provided that low frequencies are chosen. Based on these observations, we restricted the frequency band in the following two studies to the interval [7, 17 Hz].

Fig. 3
figure 3

Recognition of a red visual stimulus in the EEG spectrum based on its blinking frequency. Each of the three subjects was subjected to five trials for each frequency; the trial period is 7 s. The plotted recognition ratios for each frequency represent the values of the averaged power spectrum of the five stimulation trials

4.3 Parameter: stimuli distance

As a second parameter, we analyzed the impact of varying distance between the operator and the blinking target robots, taking into consideration the frequencies 7, 9, 12, 15, and 17 Hz; the tested distances were 30, 1, and 2 m. Considering the small size (12 cm in diameter) and the weak light-emitting power of the robot (\({<}300\,\hbox {mW}\) electrical power), these experimental distances correspond to a range of 1.5–10 m for a robot with a diameter of 60 cm and a 7.5 W light, corresponding to a standard LED lamp. This range seems compatible with the proximal interaction of an operator directly in contact with the robot. Existing interactions using explicit communication channels have a maximal range varying between 2.5 m (Pourmehr et al. 2013) and 5 m (Nagi et al. 2014), enabling a good supervision of the robot.

The experiment was conducted on three subjects, and four trials were performed for each subject at each frequency and each distance. Figure 4 summarizes the results; there is not much difference in neural response between 30 cm and 1 m; however, the response starts to deteriorate at 2 m. Indeed, the recognition ratio at 2 m falls under 1.0 at 13 Hz. This is because (1) the targets become smaller with increasing distance and (2) the LED light intensity decreases, leading to a weaker SSVEP response.

Fig. 4
figure 4

Recognition of a red visual stimulus in the EEG spectrum based on the distance from the robot. Four trials per subject for each distance and frequency combination were performed. The plotted recognition ratio for each frequency and distance combination represents the values of the averaged power spectrum of all the stimulation trials on all the subjects

4.4 Parameter: stimulation color

The experiment featuring stimulus color was similar to the stimulus-distance experiment. Four trials were conducted for each combination of frequency (7, 9, 12, 15, and 17 Hz) and LED color (red, green, and white). The target robot was placed 1 m from the subjects. Figure 5 shows that the best results were obtained using the red or green stimuli, which is in agreement with part of the literature who did a direct comparison between white and red or green stimuli (Chua et al. 2004; Duvinage et al. 2013; Faller et al. 2010; Hvaring and Ulltveit-Moe 2014). White light did not increase the neural response, in opposition to the findings of Cao et al. as documented in Cao et al. (2012). This can be explained by the specific configuration used by Cao et al., who displayed the stimuli on a black background, achieving a high contrast with white.

Fig. 5
figure 5

Recognition of a visual stimulus in the EEG spectrum based on its color. Four trials per subject for each color and frequency combination were performed. The plotted recognition ratios for each frequency and distance combination represent the values of the averaged power spectrum of all the stimulation trials on all the subjects

5 Robot selection by SSVEP response

Based on the results of the studies described above, we designed an experiment to implement and test the robot selection methodology using CCA-based and STFT-based SSVEP analysis. The layout of this setup is shown in Fig. 6. Three Thymio robots blinking in red at frequencies of 8, 10, and 12 Hz are placed in a half circle, 90 degrees apart. In addition to the general architecture presented in Fig. 2, we equipped the subject with an IR remote control. The subject looks at the robot she/he wants to control, and the EEG signals acquired from the Emotiv device are used to make a prediction with the processing chain. This information is transmitted via IR to the robots. The selected robot turns green and executes the command received from the IR remote control, while the other robots remain red and ignore these commands.

The subjects underwent 15 trials: 5 trials at each frequency. Before each trial, the subjects were told which of the three robots she/he should look at and was given 4 s to prepare. During the trial, the subject had to look only at that robot even though all three robots were blinking; a 3-s break followed each trial. To assess the reliability of this methodology, the experiment was conducted on 10 different subjects, seven of them having no previous experience with EEG. The subjects were between 17 and 48 years of age: three women (age: 17, 32, and 44) and seven men (age: 18, 18, 19, 29, 35, 37, and 48).

Fig. 6
figure 6

Setup of the experiment, showing the configuration of the subject with respect to the robots and the communication channels used for interaction. The detailed schematics of the computational unit (signal acquisition and processing chain) are the same as shown in Fig. 2

Fig. 7
figure 7

The signal processing chain uses the occipital signals O1 and O2. These signals are first buffered; only the last part of the buffer is used for processing. The length of this period is variable and increases at each processing loop. The signal is compared with ideal signals, and the best fit is selected. Four consecutive coinciding predictions are required to make a final selection. The loop is terminated when such a selection is made or when the whole buffer of 8 s has been used

5.1 Signal processing

The objective of the signal processing methods used in this study is to classify the SSVEP response from the occipital region of the brain (O1 and O2) into one of the following three categories: 8, 10, and 12 Hz. The occipital region of the brain is known to be neurologically important in the SSVEP process, as it contains the visual cortex.

Figure 7 shows the details of the CCA signal processing chain. The signal processing consists of a loop that is repeated until a successful classification is made. In the event of classification failure, a new attempt is made with a signal length increased by 0.25 s. Initially, this signal length parameter is set to 2 s. It represents the length of the signal that is used during the classification attempt. Increasing the length boosts the chances of success of the new classification attempt by reducing the impact of the noise present in the signal; however, it also introduces longer recognition delays as changing states do not affect the predictions as quickly as before. If the signal length parameter reaches 8 s, the classification is interrupted and no prediction is made. Each loop iteration ends with a classification attempt. A classification is considered successful only if four consecutive classification attempts reach the same prediction. This measure significantly reduces the false positives; the choice of four consecutive attempts is based on the results of Lin et al. (2014).

During each iteration, the classification attempt is made using CCA: the measured EEG signal is correlated with three other signals that are precomputed, and then the signal frequency with the highest correlation to the measured signal is chosen. The CCA can be thought of as a generalization of the correlation measure to multivariate signals and has shown good results in SSVEP recognition (Lin et al. 2014). The principle of this approach is as follows: given two multivariate signals X, Y, the optimization problem of CCA is to find \(\rho \) such that

$$\begin{aligned} \rho = \max _{a, b \in \mathbb R^n} r_{ a^\top X, b^\top Y} \end{aligned}$$
(2)

Here, \(r_{a^\top X, b^\top Y}\) is the correlation between \(a^\top X\) and \(b^\top Y\). This is achieved when a is the eigenvector associated with the largest eigenvalue of \(S(X, X)^{-1} S(X,Y) S(Y, Y)^{-1} S(Y, X)\); similarly, b is an eigenvector of \(S(Y, Y)^{-1} S(Y, X) S(X, X)^{-1} S(X, Y)\), where S(XY) is the covariance matrix. The proof can be found in Rencher (2003).

In our case, the multivariate signals are precomputed models of an idealized reaction to one of the three different blinking stimulations (blinking frequencies of 8, 10, and 12 Hz). For a given stimulation frequency, the model is composed of the sine, cosine, and first harmonic of that frequency, known to be present in SSVEP responses (Herrmann 2001). Linear combinations of these multidimensional signals allow to modulate arbitrarily the model of the SSVEP response of the brain and search for a maximal correlation with the measured signal.

For comparison, we applied to the same signals a standard STFT (Durak and Arikan 2003). Starting at the beginning of the stimulation period, the STFT was computed using the longest time frames possible (up to 4 s) using the available signal, as longer time frames give higher spectrum resolution. We therefore used a time frame of 0.5 s during the first second, 1 s during the second second, 2 s for the third and fourth seconds, and then a time frame of 4 s.

Fig. 8
figure 8

Frequency recognition rate versus time for two processing methods: canonical correlation analysis (CCA) on the left and short-time Fourier transform (STFT) on the right. Only the first 6 s of this approach is shown, as the performances are not increasing later on. These numbers are an average over 10 subjects considering the 5 trials of 15 s each and the stimulation frequencies (8, 10, and 12 Hz)

Fig. 9
figure 9

Frequency recognition rate per stimulation frequency and per delay between start of stimulation and start of recognition process. These numbers are an average over 10 subjects considering the 5 trials of 15 s each and the stimulation frequencies (8, 10, and 12 Hz). The values for each subject are detailed in Fig. 10

Table 1 Frequency recognition rate per stimulation frequency and per delay between start of stimulation and start of recognition process
Fig. 10
figure 10

Frequency recognition rate per subject and per stimulation frequency, considering the different stimulation durations. The numerical values are given in Table 2

5.2 Results and discussion

Figure 8 shows the recognition rate as a function of time; the data presented were averaged over all predictions made on all 10 subjects in all stimulations. The recognition rate starts randomly and increases gradually, plateauing around 75 %. The same increase in recognition reliability after 4 s can also be seen in Fig. 9 (Table 1); this graph shows the average recognition rate per frequency. We can observe that the lowest reliability is at 12 Hz, while the highest is at 10 Hz with a very small standard deviation. The variance between the subjects is shown in more detail in Fig. 10 and the corresponding Table 2. The predominant reliability of 10 Hz can be seen in different subjects but especially in Subjects 5 and 7, where the recognition rate at 10 Hz is double compared to that at 12 Hz. This graph also shows the divergences between different people: Subject 1 has a 98 % recognition rate at 8 Hz, while Subject 5 has a recognition rate around 40 % for the same frequency. This very high variability is a characteristic that makes EEG analysis delicate and must be carefully considered when developing new applications. Also for this reason an average of 75 % is considered a good result.

For comparison, we also computed the STFT on the same data sets. However, Fig. 8 shows that the STFT performed significantly worse than CCA.

Based on these results, the time required to recognize and select the robot in a reliable way is four seconds. The CCA approach and the loop processing structure allow the first prediction using exclusively EEG signals acquired during the current stimulation to be made only three seconds after the beginning of the stimulation. An additional second is required to reach the best performances, which matches results achieved in the literature (Xa et al. 2015; Gao et al. 2003; Jian and Tang 2014; Li et al. 2013). Although this signal processing approach does not require a training session, as opposed to systems that use machine learning algorithms, this delay of 4 s is a clear drawback of this prediction system. With further study, this issue could perhaps be addressed using a hybrid processing chain combining the reliability of CCA with the rapidity of STFT. Nonetheless, the stability of this setup is remarkable: it shows that despite the numerous artifacts, it is possible to achieve, on average, a recognition rate of 75 % at any time after the first 4 s.

Finally, we developed and conducted some further experiments combining the use of EEG signals as illustrated above with some processing of the gyroscope mounted on the EEG headset. In our tests we used the lateral movement of the head to trigger recognition. This allows the operator not only to start a recognition by moving the head toward a new target but also to restart the process after an inaccurate recognition by briefly shaking the head laterally. A video illustrating the approach can be accessed at www.bit.ly/ssvep-bot. These preliminary tests significantly improved the whole interaction and show the merit of combining the EEG-based implicit communication with other human–robot interaction methods.

6 Conclusions

This study systematically analyzes two SSVEP classification techniques and some of their key parameters in an effort to tackle the robot selection problem in proximal HSI using implicit communication. In comparison with the literature based on explicit proximal communication such as gestures or voice, this approach uses implicit information that is not culture dependent and does not require prior learning. However, the SSVEP approach depends on the operator’s brain activity, which is variable from subject to subject. This results in a modest average success rate of 75 %, but with some subjects having success rates higher than 85 % and a peak success rate of 98.2 % on specific frequencies. These performances are comparable with the success rates of other approaches such as gesture- or speech-based HSI (Nagi et al. 2014; Pourmehr et al. 2013). This variability indicates that some subjects will perform poorly as operators of these interfaces or will need training to obtain better performances.

Although distance is a parameter that is considered in gesture- and speech-based interactions when evaluating the success rate, this is the first study to examine the effect of distance on the recognition among several sources of the SSVEP neural response. Despite the limited range utilized in our experiments, less than 2 m, this distance must be considered with respect to the size of the robot and the type of visual stimuli. Indeed, the setup used in this experiment is equivalent to a robot with a diameter of 60 cm placed up to 10 m away and having a blinking LED of 7.5 W. This is a reasonable range for proximal interaction; the maximal distance for existing interactions using explicit communication channels varies from 2.5 m (Pourmehr et al. 2013) to 5 m (Nagi et al. 2014).

One limitation of the current setup comes from the number of available frequencies. Although theoretically the 8- to 12-Hz frequency range could allow the classification of up to 20 different frequencies (Gao et al. 2003), the number of different robots that could be involved in the interaction might limit the scalability of the approach. This limitation can be overcome by reducing the range of interaction or by combining the SSVEP-based selection technique with other approaches, such as detection of head orientation, allowing operators to preselect part of the swarm followed by the EEG-based technique. The allocation of frequencies among the various robots still requires specific distributed protocols (Mathews et al. 2015).

Another limitation of this approach is the required delay of four seconds before recognition. This delay is similar to the delay in gesture recognition or speech interaction when considering the complete time of interaction, and it is compatible with many applications. Even in a search and rescue scenario where there is time pressure, repeating the selection and losing another four seconds in one selection over five increases the selection time by 20 %. Considering that selection is not the most time-consuming communication action, this should only marginally impact the whole activity. Still, studies should verify whether this delay can be reduced using more sophisticated processing methods—for instance, combining CCA with STFT. More importantly, such limitations of EEG processing techniques could be solved using one of the greatest advantages of this approach: the possibility of combining it with other HRI channels. Indeed, using implicit information means that integrating EEG analysis in other scenarios could enhance the global performance of the setup without requiring any additional effort from the operator.

Some factors that are uncontrollable in real-world applications, such as muscular artifacts or personal attitudes of the operator, could negatively impact the performances of such a solution. This can be particularly significant if the robots are moving and the operator must track them visually. Other factors, such as the relative surrounding brightness, the variable distance to the targets, and blinking light interferences from other robots, should be carefully considered to reach optimal performance.

In conclusion, we believe that despite the limiting factors described here, the use of an implicit EEG-based communication in the proximal interaction of a human with a robot swarm could open new and interesting possibilities in HSI.