1 Introduction

A brain-computer interface (BCI) offers alternative pathways for communication and control, circumventing the brain’s conventional regulation of muscles and peripheral nerves [1]. By quantifying brain activity signals, BCI enables individuals to engage with computer programs and control devices [2], providing essential communication for those with disabilities such as amyotrophic lateral sclerosis (ALS), spinal cord injuries, or brain injuries, facilitating direct message transmission from the brain [3]. Scalp electroencephalography (EEG), a widely accepted non-invasive technique, collects brain electrical signals without the need for implantation [4]. This technique is gaining attention in BCI applications for its safety, flexibility, and superior temporal resolution over invasive alternatives.

The BCI typing system, also known as the BCI speller, is a widely recognized application that facilitates communication [5]. It interprets brain signals to allow users to select letters or words on a computer screen. This system comprises several interrelated components (Fig. 1), including signal acquisition, processing and analysis, output generation, user interface (UI), and feedback mechanisms, all designed to ensure effective operation. While manual keyboard input is the standard for text entry in daily life, the BCI speller enables communication with a virtual keyboard through thought, bypassing the need for hand movements. The initial BCI speller, introduced in 1988 with a typing speed of 2.3 characters per minute [6], demonstrated the potential of brain-signal-based text entry systems. As contemporary technologies rapidly advance, BCI has gained significant attention for its applications in controlling wheelchairs, drones, and robotic limbs [7]. Concurrently, researchers are dedicated to enhancing the UI, refining EEG data processing, and improving analysis methods within speller systems.

Fig. 1
figure 1

A BCI typing system empowers individuals to communicate by selecting letters or words on a computer screen solely through their brain signals. The system comprises an integration of several components, including signal acquisition, signal processing and analysis, desired output generation, and a user interface complete with feedback mechanisms

Two prevalent brain signals extracted from EEG for BCI speller control inputs are the P300 and steady-state visual evoked potential (SSVEP). Characters are strategically arranged in a matrix and highlighted to elicit the desired EEG signal, facilitating the typing process (Fig. 2a). Once processed and analyzed, the spellers interpret the intended command. The P300 speller, featuring a 6 × 6 matrix design for text input, was first developed in 1988 [6] and has become the most widely used BCI communication system, with modern iterations achieving over 90% text input accuracy [8, 9]. In contrast, the SSVEP speller is more user-friendly and requires minimal training [10]. It leverages the visual cortex’s synchronized response to specific visual stimuli frequencies. Recent advancements include the calibration-free system [11] and flexible spatial information decoding [12]. The impressive information transfer rate (ITR) of 265.23 bits/min was also achieved by the SSVEP paradigm [13]. To enhance performance further, researchers have developed hybrid spellers that integrate multiple brain signals. For instance, the P300-SSVEP hybrid QWERTY speller has surpassed the performance of traditional P300 and SSVEP spellers [14], with Uma et al. [15] achieving a remarkable average accuracy of 97% using a rapid serial visual presentation (RSVP) paradigm and a convolutional neural network (CNN) for feature extraction.

Fig. 2
figure 2

Typical user interface designs for a BCI speller. a Matrix layout, where buttons are sequentially highlighted in specific orders. b Pie layout [23], featuring multiple characters per partition, highlighted in a clockwise direction. c Row layout [22], allowing characters to be highlighted either left or right based on user commands

Each signal type presents distinct advantages and challenges. The P300 paradigm excels in target character detection but necessitates repeated screen stimulation, potentially causing eye strain [16]. SSVEP spellers, while eliminating calibration and training, are limited by the number of usable frequencies and can induce visual fatigue with extended use [17]. Both paradigms involve flashing buttons, which may fatigue the eyes and restrict UI design possibilities. To address these limitations, eye movement signals have emerged as a promising alternative in BCI spellers, particularly for individuals with limited limb mobility but intact eye control. Eye movement signals are the electrical or mechanical responses that correspond to the physical motion and behaviors of the eyes [18]. These signals accurately represent a range of oculomotor behaviors, including steady fixations, smooth pursuits, saccades, as well as blinks and other related activities. Captured and recorded through methods such as electrooculography (EOG) and sophisticated eye tracking systems, eye movement signals constitute an intuitive form of input, presenting a flexible approach to human–computer interaction [19, 20]. This modality operates with rapid efficiency, inherently reflects the user’s focus, and the advantage of not demanding extensive training [21, 22]. Consequently, this review dedicates its discussion to BCI spellers that employ eye movement signals as part of their specific input mechanism.

The discourse on BCI spellers predominantly centers on UI design, the typing process, and system performance metrics such as text input accuracy, speed, and information transfer rate (ITR). ITR serves as a gauge of a system’s computational efficiency, initially employed in the communications industry. Jonathan R. Wolpaw first introduced ITR to the BCI domain [23]. The ITR calculation assumes a trial with N potential targets, each with an equal selection probability (P), and an equal error probability of (1-P)/(N-1). ITR is calculated as follows:

$$ITR=[{log}_{2}N+P{log}_{2}P+\left(1-P\right){log}_{2}\left(\frac{1-P}{N-1}\right)]\times \frac{60}{T}$$

In this formula, \(T={t}_{s}+{t}_{b}\), representing the single-target selection time, which encompasses the stimulus duration for target selection (\({t}_{s}\)) and the inter-selection pause (\({t}_{b}\)). ITR is commonly expressed in bits per minute (bpm), indicating the average information conveyance rate of the BCI system. Numerous research articles assess the efficacy and reliability of BCI spellers by measuring their ITR.

2 Literature review methodology

Our literature search was conducted using PubMed, with Google Scholar employed to supplement the findings with relevant papers. We sought studies on the rapid evolution of BCI spellers over the past decade, utilizing keywords of “brain-computer interface”, “BCI”, “mental speller”, “spellers”, “text entry”, “text input”, “typewriter”, and “spelling system”.

The included studies featured BCI spellers developed with non-invasive EEG and eye movement signal acquisition technologies, with no limitations on the eye movement acquisition methods. Participants involved in the validation of speller systems were required to have functional ocular motility without visual impairment. Studies employing invasive technologies were excluded. We focused on complete systems, thus excluding papers that only discussed signal processing algorithms. Additionally, we did not consider spellers that incorporated supplementary signals like electromyography (EMG) or functional near-infrared spectroscopy (fNIRS). The articles had to be in English. The screening process is depicted in Fig. 3, and the literature search was performed in December 2023.

Fig. 3
figure 3

Methodology for the literature review

3 BCI spellers with eye movement signals

Eye movement input is characterized by rapid operation and an intuitive conveyance of user attention, providing ample freedom while capturing a rich array of information. The following section delves into the potential applications and significance of eye movement signals within the BCI typing system context. Figure 2 showcases exemplary UI designs that visually elucidate these concepts.

3.1 Introduction of eye movement signals in BCI

EEG signals are vulnerable to interference from eye movements and related actions [24]. Eye movement artifacts in EEG are disturbances caused by the electrical activity associated with eye movements. These movements alter the current distribution on the scalp, leading to interference in the EEG signal [25]. To reduce the impact of these artifacts, researchers employ various correction techniques, including reference electrodes, motion correction algorithms, and independent component analysis (ICA). These methods enhance the accuracy of EEG recordings’ analysis and interpretation.

Nevertheless, other studies have demonstrated that eye movements recorded through EEG can be processed to yield meaningful signals for analysis. Specific eye movements, such as blinks, upward, downward, leftward, rightward movements, and eye closure, can be detected, extracted, and classified from EEG data. These movements can then be mapped to different command outputs for a BCI typing system.

Raheel et al. [26] developed a text speller based on eye movements detected by EEG sensors. Characters were typed using EEG signals categorized by left, right, and blinking eye movements. Their UI design featured 26 characters, a backspace, and three special characters arranged linearly (Fig. 2c). Users could select a character by moving their eyes left or right until the desired character highlighted, and then type by blinking, akin to a mouse click. When 15 volunteers used this speller to input their names, the classification accuracy for the three eye movement types reached 76.66%. Another speller utilized EEG signals to identify double-blinking and eye opening/closing, with a virtual keyboard comprising 26 alphabets and an additional special symbol [27]. The virtual keyboard comprised 26 letters of the alphabet and one special symbol, totaling 27 characters. These were evenly distributed across three segments, which were arranged in a circular configuration, akin to a pie chart (refer to Fig. 2b). During the typing process, which advanced in a clockwise direction, each of the three segments would sequentially illuminate in green. Users selected their desired segment by closing their eyes. To input a character, a three-stage selection process was required. To undo the last character entered, users could execute the UNDO command by performing a double blink. Ten healthy volunteers tested this speller by typing “bcispeller,” achieving an average input accuracy of 92.3% at a rate of 4.94 letters per minute.

BCI spellers leveraging eye movement signals necessitate fewer EEG channels and commands, suggesting that the UI should be designed flexibly to accommodate these limitations and optimize eye movement patterns. The lower classification accuracy underscores the need for further research to develop more effective paradigms.

3.2 Hybrid EEG and EOG typing system

Hybrid BCIs integrate control signals from one or more bio-signals to enhance system performance and usability [25, 28]. This multi-modal approach allows for superior detection and interpretation of brain signals, capturing a broader spectrum of neural activity and providing a more nuanced understanding of user intentions.

Electrooculography (EOG) captures potential changes between the cornea and retina during eye movements [29]. As the eyes move, EOG signals vary with the gaze angle, offering a clearer representation of eye movements with a higher signal-to-noise ratio (SNR) compared to EEG [30]. EOG signals can be acquired using the same electrode setup as EEG [31], typically with five electrodes: two for horizontal movements, two for vertical, and one as a reference (placed on the forehead or below the ears). The number of electrodes may vary depending on the specific eye movement patterns required.

Duraisamy et al. [32] introduced a hybrid speller system that integrates SSVEP and EOG. In the initial typing phase, the EOG module registered eight distinct eye movements, such as double and triple blinks, double left and right winks, and gazes directed upwards, downwards, leftwards, and rightwards. Users were guided to select one of the nine target boxes, each containing four characters (36 characters in total), by performing the corresponding eye movements. The central box was chosen by default if no eye movement was detected. Subsequently, in the second phase, SSVEP signals were utilized, with the four characters from the selected target box presented in various directions, each with a unique stimulus frequency. Another design by the same team inverted the signal usage, with SSVEP for initial selection and EOG for the final stage [33]. During the initial phase, each of the five boxes was stimulated with a unique frequency, prompting users to look at their desired box to activate the corresponding SSVEP signal for selection. In the subsequent stage, eight characters encircling the target box flashed red in an oddball pattern, allowing users to select the highlighted character with a simple blink. This system required only three EOG channels and achieved an average accuracy of 99.38% and an ITR of 116.58 bits/min in tests with ten individuals.

Lee et al. [34] developed a spelling system based on the conventional row-column (RC) SSVEP speller, enhanced with EOG for decision-making. Rows and columns were illuminated in a sequential manner, with the EEG signal being recorded continuously throughout the process. Utilizing EEG data, the researchers developed an algorithm to assess the probability of each potential character. They then offered users visual feedback by displaying the four most likely characters as footnotes labeled with the numerals 1 through 4. To finalize the selection, users could execute an upward eye movement when their chosen character was indicated with the footnote ‘1’. This model achieved 100% typing accuracy with an ITR of 57.8 bits/min in tests on six volunteers.

Yu et al. [35] created a Chinese character input speller by fusing P300 and EOG signals, featuring a UI that displays Chinese Pinyin, consisting of consonants and vowels. A monopolar electrode was strategically positioned to monitor blinking movements over both the left and right eyes of the user. The system was activated by a triple blink, signaling the user’s readiness, while a double blink was used for swiftly selecting the intended characters. Additionally, they incorporated statistical methods from natural language processing for character prediction, aiming to enhance the speed of text input. Users achieved a rate of approximately 2.39 Chinese characters per minute with an average accuracy of 93.6% when entering a poem, suggesting a viable method for logographic language users.

Zhang et al.’s hybrid BCI combined SSVEP and EOG to improve performance [36]. The graphical UI presented 16 buttons, grouped into four distinct sectors, which illuminated simultaneously to draw the user’s focus to a chosen target. Post-illumination, the buttons moved in a counterclockwise direction, eliciting corresponding eye movements. These movements were accurately traced by waveform analysis, which utilized the time-domain features of EOG data. Meanwhile, the task-related component analysis (TRCA) algorithm was applied to detect SSVEP. This hybrid system achieved an average classification accuracy of 90.77% and an ITR of 73.73 bits/min. Further research with refined signal analysis methods improved these figures to 94.75% and 108.63 bits/min [37].

Ha et al. [38] introduced an innovative calibration-free hybrid BCI system that integrates SSVEP and EOG for a nine-target SSVEP-based BCI within a virtual reality (VR) setting. This system utilized EOG to ascertain the user’s horizontal eye movement and subsequently pinpointed the target stimulus among three vertically aligned stimuli within the chosen column, employing the multivariate synchronization index (EMSI) algorithm. The system’s performance was evaluated with 20 participants using a commercial VR head-mounted display (HMD), and the results demonstrated that the proposed hybrid BCI outperformed conventional SSVEP-based BCIs in terms of both accuracy and ITR within VR environments.

EOG acquisition is straightforward, cost-effective, and does not necessitate complex signal processing. Researchers have developed custom hardware for electrode placement and signal transmission, often employing multi-threshold algorithms and machine learning techniques like support vector machines (SVM) for eye movement classification. However, the natural occurrence of blinking in daily life can introduce ambiguity for systems that rely on blink detection for text input, necessitating a balance between eye movement patterns and classification accuracy.

3.3 Hybrid EEG and eye tracking typing system

Eye tracking technologies employ cameras to capture either visible light or infrared images of the eyes, which are then analyzed to ascertain the precise locations of eye fixations [39]. These devices are adept at precisely tracking eye movements, and when integrated with EEG signals, they can significantly improve the performance of high-speed typing applications [18]. In BCI typing systems, the combination and processing of eye tracking data with EEG signals are implemented through diverse methodologies. In this article, we systematically review and summarize these methods into two distinct categories.

  1. 1)

    Serial approach: This method initiates with eye tracking to identify the target character area, followed by precise selection using the SSVEP paradigm. For instance, Mannan et al. [40] designed a graphical UI with 48 characters distributed across eight boxes, each holding six characters. Users first fixated on the box containing the desired character, then SSVEP was employed to stimulate the final selection from six predetermined frequencies within that box. Cue-guided experiments familiarized users with the typing method before transitioning to a free-spelling task, resulting in an overall spelling accuracy of 90.35% and an ITR of 190.73 bits/min. Stawicki’s UI design [41] featured 26 English alphabets, three symbols, and a delete button, with 20 hidden gaze points defined between characters. If a user’s gaze landed on a gaze point, the surrounding four characters were selected for the next SSVEP-stimulated stage, achieving an overall text entry accuracy of 93.87% with an ITR of 46.13 bits/min. Similar studies [42,43,44] have also shown the efficacy of hybrid BCI spellers, which have demonstrated enhanced input accuracy and speed over traditional SSVEP spellers. Notably, Lin et al. [44] developed a 112-character UI to explore the feasibility of a more extensive command set in BCI systems. Their innovative typing system achieved an average ITR of 233.3 bits/min.

  2. 2)

    Parallel approach: This approach fuses eye tracking and EEG inputs at a certain level to locate the target character. Kalika et al. [45] combined eye-gazing data with a P300 speller, employing a fusion technique to estimate the probability of selecting each target. This integration of eye-gaze information improved the accuracy by 6% and decreased the number of flashes needed for character selection. Li et al. [46] also integrated eye tracking into a P300 speller, designing a hybrid BCI system for Chinese character input. Following the traditional RC paradigm, eye tracking data was combined with the P300 signal to select target consonants and vowels, achieving an average typing speed of 1.14 sinograms per minute.

Ma et al. [47] applied SSVEP and eye tracking for text input in a VR device, utilizing a fusion decision-making system to accurately identify the gaze direction for text entry, achieving an average ITR of 270 bits/min with 96% accuracy. Tan et al. [48] devised a strategy for autonomous control and a fusion technique for integrating EEG and eye tracking data. They employed a sliding window approach to process eye-gaze data, which initiated target recognition when the variance fell below a predefined threshold. The EEG and eye-gaze data were collected synchronously and fused using a particle swarm optimization (PSO) algorithm to determine the most effective fusion weights. In assessments involving 15 participants, the PSO-based fusion method demonstrated superior performance, yielding increased accuracy and ITR compared to systems relying on a single modality.

Jiang et al.’s hybrid system [49] employed low-frequency stimulations (12 classes, 0.8–2.12 Hz) to elicit visual evoked potential (VEP) and pupillary response (PR). Accuracy was enhanced in the hybrid BCI system through the application of supervised and unsupervised classification methods, coupled with a decision fusion technique that integrated VEP and PR data. In online experiments with 10 subjects, this system demonstrated superior accuracy and ITR compared to standard SSVEP-BCI systems, notably for shorter data lengths. Participants also reported a more comfortable experience and better user satisfaction with the low-frequency stimulation protocol.

When a speller relies solely on eye tracking, character selection is dwell-time based, which can result in inaccuracies as the system may not differentiate between intentional and accidental fixations. However, by incorporating EEG signals for target selection, accuracy can be improved by eliminating dwell time-based classification. Additionally, the hybrid speller offers a more user-friendly experience and ensures reliable control, as demonstrated in Stawicki’s study [41].

Table 1 offers an overview of the key characteristics and performance metrics of BCI spellers that integrate eye movement signals and EEG.

Table 1 Overview of the BCI spellers involving eye movement signals

4 Discussion

4.1 Diverse functions in system design

Eye movement signals provide significant flexibility in UI design for BCI spellers, as they are unrestricted by the limitations of target quantity, display patterns, or the complexity of the selection process. The layout of buttons can be ingeniously crafted, taking into account the typical frequency of character usage [43]. For instance, commonly used characters can be strategically positioned at the center of a virtual keyboard to reduce user effort and gaze duration, thereby enhancing typing speed. Conversely, certain spellers streamline the text input by segmenting it into two or three stages, organizing characters into groups. This approach can improve classification accuracy, albeit potentially at the expense of extended selection duration.

Context-aware word prediction is an additional feature that significantly boosts the efficiency of text input. This functionality expedites the typing process by offering word suggestions after just a few characters have been typed, aligning with the ultimate goal of communication – constructing words and sentences. Word-based algorithms are capable of analyzing the current text and forecasting probable completions [50]. Nonetheless, compiling an exhaustive dictionary necessitates sophisticated computational methods, and the timing of word suggestions must be meticulously managed to prevent them from appearing prematurely or belatedly.

Reflecting on real-world experience, it is advisable to employ questionnaires in research for evaluating user workload post-text input tasks, offering critical insights into the user experience with BCI systems [26, 33, 40, 51]. Research evidence reveals that extended use of BCI spellers can result in user fatigue, which emphasizes the critical need to address this factor in the design of BCI systems.

4.2 Limitations and potential improvements

There are several promising aspects for system enhancement. Firstly, many existing studies have involved a limited number of participants, with some BCI spellers being tested by as few as one or two users. This restricted sample size may result in system performance that is not generalizable. Secondly, in addition to conventional EEG signals such as P300 and SSVEP, other signals like motor imagery (MI) could be integrated into hybrid BCI speller systems. MI signals represent spontaneous EEG activity that does not rely on external stimuli [52,53,54], offering users greater flexibility. However, studies relying solely on MI for text input have achieved modest typing speeds due to a limited command set [55,56,57,58]. Further investigation into the combination of MI with eye movement signals is warranted. Additionally, the use of error-related potentials (ErrP) for error correction has been shown to enhance text entry accuracy [20, 59]. Future research should delve into the characteristics of various brain signals. Moreover, feature extraction and classification algorithms require refinement. Advanced deep neural networks, such as CNN and EEGNet, have demonstrated superior performance in classifying EEG data over traditional approaches [60, 61]. A deeper exploration of deep learning, tailored to the properties of eye movement signals, could yield significant advancements in BCI speller applications.

5 Conclusion

BCI typing systems have become a focal point of research, offering a critical communication modality for individuals with motor and speech impairments. Eye movement signals, characterized by their adaptability and ease of control, present a compelling complement to EEG. These signals not only augment typing efficiency but also facilitate user interaction with minimal effort. The strategic amalgamation of the unique strengths of EEG and eye movement signals is poised to drive innovative breakthroughs in the field of BCI technology.