Keywords

1 Introduction

Through the considerable progress of the last decades, the technology of Head Mounted Displays (HMD) has improved significantly. In addition to including improved display technologies [7], the number of sensors, for different measurements based on physiological indicators, increased. The manufacturer HP for example, announced the HP Reverb G2 Omnicept Edition for May 2021 [13]. This HMD device intergrates sensors for eye tracking, pupillometry, heart rate, pulse rate variability and even a face camera. In addition to sensory hardware integration, the first software solutions for measuring near-real-time cognitive load are already being developed. However, these solutions are often insufficient in their precision and quality because only individual sensors are interpreted. Due to sensor characteristics and software misinterpretations the number of errors increases whereas the quality of measurements decreases at the same time. By using more precise sensors or fusing sensor data, the quality of measurement can be increased.

In addition to sensor fusion, the interpretation of sensor data is a key discipline. Here, human computer interaction (HCI) forms the interface between humans and computers. The previously collected sensor data is processed in a special machine language. For this purpose, the signals are interpreted and evaluated in the form of logical relationships. These evaluated data sets can be visualised and converted into a human-understandable language.

The next step of HCI on the way to further connect humans and technology is seen in the creation of interfaces between the human brain and computers through so-called BCIs [9]. BCIs are basically intended to translate signals from the human brain into directly commands for interactive applications [18]. Methods for measuring brain activity can be divided into two groups: invasive and non-invasive [27].

Invasive measurement methods such as electrocorticography (ECoG) record signals directly from the cortical surface or from inside the brain [24, 34]. In this context, the decoding of bio-mechanical parameters has already been successfully used for monkeys and humans to control prostheses [12, 16, 31,32,33]. However, the high risk of the associated necessary surgical intervention and the gradual degradation of the recorded signals over time, entail considerable disadvantages, which is why non-invasive approaches such as functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG), near-infrared spectroscopy (NIRS), and EEG are increasingly used in context of human users [1]. Although fMRI, for example, offers higher spatial resolution, EEG has become the most popular method due to its direct measurement of neuronal activity, comparative ease of portability and low cost [27].

Although significant progress has been made in this field of research. BCIs still face a number of challenges due to inaccuracy, unreliability and latency, as it is difficult to obtain and accurately process the information from the brain. For this reason, the inclusion of other signals in order to decode data for BCI user is becoming increasingly important. A BCI system that additionally uses other physiological indicators, also called bio-signals, is referred to as hBCI [5]. Several works [6, 22] have already shown improvements for BCI, e.g. in accuracy and information transfer, rate by incorporating other bio-signals such as electromyography, electrooculography or electrocardiography (ECG). Nevertheless, the search for valid and reliable hBCI systems remains a serious challenge [4, 23].

In this context, we propose a conceptual framework which facilitates and improves the generation of meaningful measurement data through their collection within a corresponding VR application and set-up. Furthermore, the measured sensor data should also be used in order to analyse other sensors in the same setup for ultimately being able to replace one sensor with the measurements from others. Therefore AI based EEG emulations, derived from other directly related bio-signals, could potentially replace EEG measurements which indirectly enables the use of BCI in situations where it was previously not possible.

In addition to a comprehensive concept proposal, our contribution also provides valuable information of the potential procedure in the context of an expected subsequent implementation.

The presented work is structured as follows. In Sect. 2, we discuss the currently relevant fields of literature. Section 3 describes the general conceptual framework, as well as the detailed theoretical procedure in five consecutive steps, followed by a discussion of the presented approach in Sect. 4. Here the potential advantages and expected difficulties, especially in the interaction of the measurement data collected in the context of the VR application were addressed. While Sect. 5 summarizes our approach and intentions, Sect. 6 furthermore gives an outlook on the potential next steps and further work.

2 Related Work

We identified three areas of relevant related work. The first is the machine learning based analysis of BCI data, often with a focus on deep learning. The second is concerned with the connection of BCIs and the VR ecosystem while the third addresses the idea of analysing the correlation of sensor data from existing sensor data. The amount of research in the three areas decreases as far as we have observed. Hence, there is currently no research that creates a connection between these three aspects, namely an analysis of sensor correlation for hBCIs and HMDs. Some relevant work on these three areas will be highlighted in turn.

The use of physiological measurements for emotion detection has been a research subject for a longer time. While in many papers emotion detection is performed using one or two individual measurements, recently there have been more publications where a multi-modal sensor-set has been used. Ali et al., for example, created an emotion recognition system where multiple physiological signals like ECG, EDA and skin temperature were studied and most commonly used statistical features extracted [3]. A public HCI tagging database “MAHNOB” was used for training, testing and initial validation while own data was used for final validation. The study also compared the results of using the same sensor brands in training and testing to the results when using different brands in order to generalize the findings.

Regarding the use of deep learning in EEG-signal analysis Zhang et al. give a comprehensive review over the advancements in this field [38]. Here the different deep learning methods for EEG-analysis are explained, reviewed and discussed and a selection of applications in different fields are listed. In particular, [14] Kanjo et al. 2019 are using an approach that is similar to our approach, where they take data from multiple sensors and process the input from all sensors at the same time using deep learning models. The use case for classifying the sensor data is emotion detection. Kanjo et al. differ by using over twenty sensors, providing a wide range of data types and sensor types such as physiological and environmental sensors and using deep learning models to enhance the predictive capabilities of the sensor data rather than analysing the sensors themselves.

For the practical implementation of EEG-sensor measurement while wearing VR-Headsets Tauscher et al. examine how VR-Headsets and EEG-Caps can be physically combined for parallel measurements so that signal quality from both is preserved [29]. Weibel et al. furthermore present a framework for conducting physiological experiments in VR-environments where standardized modules for data collection using questionnaires, the synchronization of physiological measurements, and data storage are included as well as tools for data visualization and evaluation [35]. A reproducible protocol for conducting experiments with this framework is also provided.

Although literature already mentioned certain improvements for BCI by integrating other bio-signals [6, 22], the variety of additional integrated bio-signals and the previous amount of research yet remain limited. Besides potential correlations of data from different sensors, possible redundancies of certain sensors have been less frequently studied so far. In addition, these research topics are not interconnected across disciplines, suggesting that there is a need for a general concept across research disciplines. [2] Agarwal et al. directly address the idea of replacing sensors with the help of other sensor data. Their goal is to use existing sensors more prudently in order to improve energy management. Although the basic idea of sensor replacement is present, there is no analysis of the sensor data itself.

[37] Yan et al. and [11] Gao et al. take multiple sensor data and perform correlative analysis on the sensors to address data loss in wireless sensor networks. Yan et al. propose a multiple linear regression model to recover data, while Gao et al. propose an approach to find the spatial-temporal correlation in the sensor data. This approach is also used by [30] Tayeh et al., who follow the data reduction and energy saving paradigm for environmental sensors in wireless sensor networks. Yan et al. attempt the same thing as the model in this paper. Instead of making one sensor redundant because other sensors contain the same information, it attempts to find this information because it may be missing for a specific use case. Even though Yan et al. and our approach share the idea of enhancing sensor networks, they differ in the type of sensors (environmental sensors in green houses), the reason for analysing the sensors, and the machine learning tools used. Gao et al. and Tayeh et al. use the approach even more similar, but the premise is different. Both papers focus on data reduction in systems with many identical sensors distributed across a region rather than an array of different sensors at the same location.

Thus, this paper fills multiple gaps in the current research. Not only by sensor data correlation analysis using deep learning models but specifically since this analysis is done in order to analyse the used sensors themselves. The gained knowledge of sensor correlation can thereby be applied to a large number of use cases.

3 Conceptual Framework

In this chapter we describe the conceptual Framework Fig. 1 of EEG sensor replacement by AI based emulation.

Fig. 1.
figure 1

Concept: The time-dependent signals (S,t) of different sensors (eye tracking, heart rate, pupillometry, face cam) are detected in the course of measurement. The signals are further evaluated in process and read into a buffer as a data set (\(S_\mathrm {M}\)). Various AI methodologies are used to analyse the data from buffer. A cluster analysis based on patterns, rules, and principles takes place. If a data modification is made, the parameters are stored again. Furthermore, new data sets are deposed in the buffer. As a result of the AI analysis, a temporal and signal-based adjustment is made to the actual number of sensors required as well as temporal query of signals. During the entire measurement, a user machine communication takes place via log control. The user can intervene in the system at any time. The concept can be divided into five steps of measurement, cluster, user control, dataset and AI/Results.

Contrary to the previous literature on sensor-sensor interaction, the goal of the presented concept is to replace one sensor with the measurements from other sensors. Thus, sensors are not investigated for their ability to predict certain use cases like emotion or stress but rather used to analyse the data of other sensors in the same setup. If the concept is successful, a costly or difficult to measure sensor could be left out of a setup completely without losing its functionality. The basic idea of the concept is to cluster the sensor data from one sensor and to classify the data from the other sensors into these clusters. Afterwards, the sensor data in each cluster is analysed for pattern that make statements about the output of the original sensor possible. The individual steps of the concept are each described below, followed by a discussion of the potential uses and the significance of the concept.

Step 1: Measurement of Sensor Data in a VR Setup. The experimental setup consists of a VR environment in which a test person plays a game or is exposed to specific different situations that trigger different emotions. Physiological sensors measure the user’s responses throughout the experiment and the results are logged. As long as there is a time stamp attached to the data, the concept is agnostic to the origin of the data. As mentioned above, some retail HMDs already provide several sensors to measure bio-signals, including eye tracking, ECG, gyroscope and electro-dermal activity. The impractical headcaps used for EEG measurement represent both, a physical constraint to the experimental setup as well as an inspiration for the concept. Even though the literature has shown the possibility of combining an EEG headcap and HMDs [29], it is desirable to ultimately lose the need for a headcap with the successful implementation of our concept.

The main challenge for this step is the elusive nature of emotions. Research on emotions is at the intersection of philosophy, linguistics, sociology, anthropology, psychology and neuroscience. One approach is to look at emotions as a discrete variable and to define certain basic emotions that participants can chose from as a response. For this, a prominent model is the differentiation by Ekman, in which the six well-established basic emotions anger, disgust, fear, happiness, sadness, and surprise are proposed as distinguishable across cultures [10]. A different approach looks at emotions on a continuous scale between pleasant-unpleasant and between low and high intensity [20]. Although breaking up emotions into a small set of discrete categories is appealing for an experimental study, a number of problems are associated with this approach: Taking the six basic emotions as the basis for an experiment would be a misapplication of Ekman’s research. Even though the basic emotions might be distinguishable for humans across cultures, there is no claim that these are the only emotions or building blocks of other emotions. In fact, Ekman identifies candidate emotions like relief, guilt, embarrassment or excitement that are not as easily distinguishable. Ignoring emotions that are not part of the six basic emotions because of their unclear boundaries is to simplify the concept of emotions beyond recognition. At the same time, there might be a disconnection between what humans feel, the signals their brains produce and the way that they transport the feelings in words or physical movement. The dimensional model may avoid the problem of oversimplification of the available labels but introduces an element of subjectivity that is more difficult to measure and control in an experimental setup. Because of the complexity and ambiguity of the term “emotions”, an experimental setup should attempt to investigate specific emotions rather than emotions in general.

Beyond the problem of emotion classification there is the problem of emotion elicitation in VR, which is equally challenging for the experimental setup proposed here. Even if Ekman’s six basic emotions are taken as a basis of what the term “emotions” means, the setup needs to trigger at least one of the six emotions reliably, in order to deliver reliable quality data. To achieve this, common techniques are active and passive in nature [17]. In the passive version, participants are faced with stimuli with no interaction beyond the sensory experience. The International Affective Picture System and the International Affective Digitalised Sound System, for example, are databases that can be consulted for visual and auditory passive elicitation of emotions and have been used in VR setups as well [25]. In the active version of emotion elicitation, a participant could, for example, interact with a situation or an avatar in a VR environment to elicit emotions [26]. Even though the passive elicitation poses a lower threshold to the experimental setup, VR is uniquely suited for the active elicitation of emotions and has been shown to be more effective than 2D elicitation [28]. For a systematic literature review of emotion elicitation using VR, see Marin-Morales et al. [21]. Also, emotion elicitation has to remain ethical, so an experimental setup that elicits strong negative emotions would have to be designed prudently and would have to be conducted with prior knowledge of the participants. Experiments that use jump scares (e.g. [36]) have to be weighed carefully for the benefits of the jump scare in emotion elicitation on the one hand and the biases that prior proper information of the risks introduces on the other hand. As a word of caution for the interpretation of results of any VR based emotion elicitation study, there is more research needed on the relatedness of emotions triggered in real world situations and emotions triggered in artificially created VR setups [19, 21].

There are several consequences for an experimental design for the concept proposed here. Firstly, a choice has to be made as to which emotions should be induced. Secondly, a decision should be made between active (interaction based) or passive (consumption of content based) methods of elicitation. This paper does not propose a right way for either of these design choices, as the concept proposed here engages with the results of sensor data. These results are in principle independent from the design choices of the underlying experiment. With regard to the emotions chosen in an experimental setup, the next step of the concept is to cluster the measured sensor data. It may seem contradictory to first classify emotions for the experimental setup but to disregard this classification in the next step, but the significance becomes clear in the third step, where the researchers are then able to check, whether the clustering has produced results that are similar to the results envisioned from the experimental setup. This may not only lead to an adjustment of the clustering algorithm, but also to an adjustment in the experimental design if some emotional triggers appear to produce ambiguous results.

For a larger setup, a higher number of participants as well as a control group would be needed. A control group in sensor measurements would have to provide a baseline of EEG data. Here, providing no stimulus would probably not enable good control groups, as participants’ wandering minds would cause unforeseen sensor data. We rather propose an easy task or games like light Sudoku puzzles or Solitaire while wearing the same HMD and sensor cap to be appropriate for a control group.

Step 2: Clustering of Sensor Data. Even though the data is time series data, on stream clustering is not necessary. Recording and storing of the data makes it possible to work with the data at will and without the constraint of processing in real time. Here, one sensor is chosen as a possible replacement candidate. The data from this sensor has to be broken up into different clusters representing one emotional category each. To achieve this, several approaches can be used. In the vein of deep learning, autoencoders could be used as feature extractors. It is possible that the experimental setup does not yield enough data for clustering to be useful or necessary. If labelling by hand is feasible, there is the option of labelling the experimental data outright. This has the downside of being labour intensive and costly. A middle ground would be to either take one occurrence as a prototype and classify the rest of the data by similarity or to create pseudo-labelled data with a semi-supervised machine learning algorithm.

Regardless of the model used, the time series data could be approached with a sliding window strategy, in which the step size is smaller than the window size. The resulting clusters contain lists of timestamps at which certain significant events occur. These lists of timestamps have to be controlled for their meaningfulness as they are the basis for further processing.

Step 3: Labelling of Clusters and Reassessment. If a clustering mechanism is used, the clusters in and of themselves are not meaningful. Instead, the clusters have to be controlled for how well a certain human-understandable label can be connected to them. If, for example, a cluster contains time stamps from EEG data that were all connected to joyful moments in the VR clues, the cluster may safely be labelled as a positive emotional category. This not only depends on the experimental setup, but also on the use case that is envisioned. A more or less fine-grained approach will yield different amounts and qualities of clusters. If the clusters do not represent the clues in the VR simulation, the clustering parameters such as bias, learning rate or the loss function of the algorithm can be changed until a useful clustering is achieved.

As a last resort, if the clusters do not seem to produce comprehensible groupings, the VR simulation can be tagged for timestamps at which certain emotions are cued. These timestamps can be used for manual classification or a semi-supervised deep learning based model.

Step 4: Synchronization and Operationalisation of the Other Sensor Data. Each cluster will correspond to a certain type of emotion or set of emotions, depending on the label that it is assigned in step 3. Each of the clusters contains a list of timestamps at which the clustered data points occurred. Now the sensor data from all other sensors are taken into account. The list of timestamps from a cluster is used as the index of a relational database. Each column represents the data from a different sensor, each row represents a point in time. As a first approach, each cell could contain the measurement for one sensor at one point during the experiment.

Taking the raw sensor measurements at the exact timestamp would not be fine-grained enough for some sensors, however. Rather, each sensor has to be taken into account individually. For example, the value from a sensor measuring electro-dermal activity may be taken at face value. Heart rate data or ocular movement on the other hand would be better represented by a gradient of change. Ideally, each sensor’s data is preprocessed using recurrent neural networks to look for patterns around each timestamp. This preprocessing transitions into data analysis seamlessly, as patterns in the data of one sensor sampled by timestamps already indicate a correlation between the cluster results and the sensor results. The cell values of the resulting table would contain single values of the sensor or short periods of sensor data depending on what has the highest information density.

It is possible, that during analysis, the sensor produces data that is very different from the originally clustered data. In order to prevent false classification in the analysis stage, data that is considered to be an outlier will be captured by a buffer that is fed back into the clustering stage. In this way, not all the data is classified immediately by brute force. Rather the data that is easily classified will be passed through, while some data will await processing later on. The concept can even run in soft real time after an initial setup period.

Step 5: Analysis of the Sensor Data and Interpretation. Statistical analysis of the table data can be done using household statistical methods or machine learning models. This requires a machine learning based pipeline, which includes preprocessing, feature selection and adjusting of weights. This pipeline can be bypassed using a deep learning approach. While an artificial neural network could take the form of any neural network for pattern recognition, of which there is a large choice of different types [8], a promising approach in the area of emotion classification based on input from multiple sensors has been to use a hybrid approach of a convolutional neural network together with a long short term memory recurrent neural network [14], although any other deep learning approach could be used as well, depending on which model produces the best result.

The most basic analysis of the table would be to look for patterns within a single column, denoting one sensor. If all data points in a single column are similar, there might be a correlation between this sensor and the one to be replaced. Yet the similarity may only be assessed when compared to the rest of that sensor’s data that is not in the table. If, for example, a sensor has the value 0.75 for all occurrences in the table, the meaningfulness is determined by the values of the sensors outside of the selected timestamps. If all values are 0.75, there is no meaning. If all other values are 0.50, there is a strong correlation between that sensor and the EEG data. Therefore, the analysis has to find patterns within a column as a first step and compare these to the rest of the data produced by that sensor.

The more interesting and more intricate approach is to look for patterns in the combined data of multiple sensors. Even though there might not be a pattern within one sensor, the relationship between multiple sensors might be distinctive. These patterns could be found through regular machine learning models or using deep learning models again. The advantage of deep learning models is that the required domain knowledge is reduced drastically. It is, however, not eliminated, as the assessment of a model still requires human evaluation of the results.

The result of the concept would be a list of correlations between the sensors. If sensor A is investigated, there would be individual correlation values of sensor A’s data with the other individual sensors but also values for sensor A’s data with all possible combinations of the other sensors. This table would allow insights into how much the information content from the input sensor A could be gleaned from any combination of the other sensors. In the following exemplary Table 1, the combination of sensors A, B and C together contain 90 % of the information content of sensor A.

Table 1. Each cell contains an exemplary value of how closely the combined sensor data of the sensors named in row and column correlate with the target sensor A’s data

4 Discussion

There are several underlying assumptions attached to the concept. The first is that it is beneficial for certain use cases to use multiple sensors at the same time. The interaction of sensors, which the concept in this paper seeks to investigate, can only be relevant if multiple sensors provide a benefit in the first place. This assumption is backed up by the related work, in which many papers utilize multiple sensors. The authors consider this assumption to be reasonable in the face of the complexity of use cases.

Another assumption is that there is overlap in the information content each sensor measures with regard to a use case. Again, interaction can only take place if the data measured has some significance for the use case. If, for example, heart rate measurements and EEG measurements are completely disconnected, investigating overlap in their respective information content would be futile. This possibility should not be considered as a shortcoming but as a justification of the very model proposed here. Ultimately, knowing that heart rate and EEG measurements in emotion detection are disconnected is a result that is as relevant as finding a strong connection. In this way, there is a strong connection between possible shortcomings of the concept and opportunities for further research.

Overall, the results of the concept would be percentages, not absolute answers. If overlap in the sensors’ information content is found, the overlap would not be binary (yes/no) but rather on a scale between 0 for no correlation and 1 for a total correlation. This result not only depends on the efficacy and the error margin of the models used but also to a large extend on the quality of the input data. It might be that some sensors do not produce data reliable enough for repeatable measurements. It could also be that the categories chosen per use case (e.g. emotional categories such as fear, joy, etc.) do not correspond to measurable physiological phenomena. Here, setting up the experimental design with specialists for the specific use case is crucial to prevent spurious research. Also, a strict ethical review of the experimental setup is advisable for two reasons. Firstly, a simulation triggering strong emotions repeatedly can cause harm to participants without the right oversight. Secondly, physiological sensor data is amongst the most sensitive of personal data that has to be strictly protected.

If the experimental setup is sufficient, the concepts holds great potential to fill a research gap on the interaction between sensors in VR applications. In a best-case scenario, sensors could be replaced outright because their information content can be taken from a combination of other sensors. This could lead to cheaper devices, a reduction of data transfer or increase the hardware lifecycle, as individual sensor failures would not result in a broken device. In a worst case scenario, the concept can enhance our knowledge on the information content of sensors in specific use cases. This has immense scientific value, which is further described in the section on opportunities for future work.

5 Conclusion

The aim of this work was to propose a theoretical concept for potentially replacing one sensor with the measurements from other sensors. Hence, costly or difficult to measure sensors could be left out of a setup completely, without losing its functionality. An AI based EEG emulation, derived from other directly related bio-signals, could therefore potentially replace EEG measurements which indirectly enables the use of BCI in situations where it was previously not possible.

The conceptual framework thereby builds upon the generation of meaningful measurement data through the collection within a corresponding VR application and set-up. The evaluation of the collected sensor data is processed in five consecutive steps, by clustering the data of one sensor and classifying the data from other sensors into these clusters. Afterwards, the sensor data in each cluster are analysed for regularities that make statements about the output of the original sensor possible.

Although the concept presented has not yet been implemented, the detailed description of the individual processing steps enables a targeted and comprehensible implementation. Nevertheless, the success of the possible implementation of the concept stands and falls with the validity of a number of assumptions. One is that it is advantageous for certain use cases to use several sensors simultaneously. Secondly, that there is overlap in the information content that each sensor measures in relation to a use case. However, it must be fundamentally emphasised that the results of the concept are in any case statistical probability values and not absolute answers.

In the best case, sensors, not limited to EEG, could be indirectly integrated or completely removed based on the findings. In turn this could lead to a reduction in the size of devices, a reduction in data transfer or an extension of the life cycle of the hardware in general. In the worst case, the concept can at least expand the general knowledge about the information content of sensors for specific use cases.

Therefore, the proposed approach offers considerable added value in any case, especially in the results of the literature research carried out and the relevance in the field of BCI established in the process. A successful implementation could also be used for a number of use cases in which the measurement of brain activity could not be used or could not be used in a practicable way for interaction between the user and the application.

6 Future Work

There are two main avenues of future work. The first is the realisation of the setup described in this paper, the other is how the idea presented here can be developed further. The first question, how this concept could be realised, requires a good VR application to trigger emotions. An issue here is that most VR applications have the goal of eliciting some kind of emotion at some point during the use of the application. A more direct approach to investigating emotion would be to create a simulation that produces certain emotions repeatedly in a controlled manner. The data is most distinctive if the patterns occur repeatedly and the relevant events in the data are not outliers. A good VR application for this experiment would therefore seek to produce similar emotions reliably. Here, working with neuroscientists is imperative, as the concept of emotions in natural language understanding can be at odds with a scientific meaning of what is being measured.

Another area of research is how well the parameters of a clustering algorithm and the patterns found in the data can be transferred between participants. A study here would take one person’s patterns and apply them to other participants’ data to control for individual differences in the sensor data that is to be replaced.

The second avenue for future work, the further development of the concept, is an open ended field of research. Here, the idea of a sensor hierarchy could be developed for each use case. Since the results would be percentage based, a final result might be a large venn diagram per use case in which sensors are represented as circles and overlap by the amount they correlate. If one sensor can be replaced by others outright, that sensor would be represented as lying inside of an overlapping area.

Additionally, more sensors and different types of sensors could be added to the analysis. The close connection between emotional and environmental data in emotion detection [15] begs the question whether there is a correlation between environmental sensors measurements and physiological sensor measurements when testing for emotion.