Deciphering Emotional Responses to Music: A Fusion of Psychophysiological Data Analysis and LSTM Predictive Modeling

Mahat, Maheep; Gracanin, Denis

doi:10.1007/978-3-031-61569-6_4

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14694))

Included in the following conference series:

International Conference on Human-Computer Interaction

211 Accesses

Abstract

This paper presents a comprehensive study on the utilization of the “Emotion in Motion” database, the world’s largest repository of psychophysiological data elicited by musical stimuli. Our work is centered around three key endeavors. First, we developed an interactive online platform to visualize and engage with the database, providing a user-friendly interface for researchers and enthusiasts alike to explore the intricate relationships between music and physiological responses. This platform stands as a significant contribution to the field, offering novel ways to interact with and interpret the complex data. Second, we conducted an in-depth correlation analysis of the physiological signals using Dynamic Time Warping within the database. By categorizing the data into two main genres of music — classical and modern — and further subdividing them into three age-specific groups, we gleaned valuable insights into how different demographics respond to varied musical styles. This segmentation illuminated the nuanced interplay between age, music genre, and physiological reactions, contributing to a deeper understanding of music’s emotional impact. Finally, we developed a predictive model using Long Short-Term Memory (LSTM) networks, capable of processing Electrodermal Activity (EDA) and Pulse Oximetry (POX) signals. Our model adopts a sequence-to-vector prediction approach, effectively fore-casting seven distinct emotional attributes in response to musical stimuli. This LSTM-based model represents a significant advancement in predictive analytics for music-induced emotions, showcasing the potential of machine learning in deciphering complex human responses to art. Our work not only provides novel tools and insights for analyzing psychophysiological data but also opens new avenues for understanding the emotional power of music across different demographics, ultimately bridging gaps between music psychology, physiology, and computational analysis.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Keywords

1 Introduction

The intricate relationship between music and its emotional impact on listeners has long captivated researchers across various disciplines. This intersection of music psychology, physiology, and computational analysis offers profound insights into the human emotional experience. The “Emotion in Motion” database, renowned as the world’s largest repository of psychophysiological responses to musical stimuli, provides an unprecedented opportunity to explore this domain [1, 2]. Our research harnesses this extensive dataset to unravel the complex dynamics of emotional responses elicited by music, leveraging advanced analytical and machine learning techniques.

The primary objective of our study is threefold. Firstly, we aim to democratize access to this rich dataset by creating an interactive online platform. This platform is designed not only for researchers in the field but also for a broader audience interested in the study of music-induced emotions. It facilitates an intuitive exploration of the data, enabling users to visualize and interact with the diverse array of physiological signals recorded in response to musical pieces.

Secondly, we delve into the dataset to conduct a comprehensive correlation analysis. By categorizing the physiological signals based on musical genres—classical and modern—and further stratifying them across three distinct age groups, we seek to uncover patterns and correlations that may illuminate how demographic factors influence emotional responses to different types of music. This analysis is pivotal in understanding the subjective nature of musical perception and its physiological manifestations.

Lastly, at the forefront of our study is the development of a predictive model using Long Short-Term Memory (LSTM) networks. This model is tailored to process two key physiological signals: Electrodermal Activity (EDA) and Pulse Oximetry (POX). Employing a sequence-to-vector prediction approach, the model is designed to predict a spectrum of seven emotional attributes in response to music. This innovative application of LSTM in the realm of music psychology represents a significant leap in predictive analytics, offering a nuanced understanding of the emotional effects of music.

Through this multifaceted approach, our research not only contributes valuable tools and analyses to the field of music-induced emotion study but also underscores the potential of integrating machine learning with psychophysiological data. As we explore these emotional undercurrents, we shed light on the universal yet deeply personal experience of music, opening pathways for future interdisciplinary research in this fascinating area.

2 Literature Review

The study of psychophysiology in response to music, particularly focusing on EAD and POX, presents a unique area of research. Boucsein’s book on EDA [3] is a comprehensive source that covers its role in psychophysiology, offering a deep dive into its biological aspects and practical applications. Complementing this, Figner and Murphy [4] discuss the use of skin conductance, a key component of EDA, in judgment and decision-making re-search. Tobin [5] provides essential knowledge on POX, particularly its importance in intensive care monitoring, contributing to our understanding of physiological monitoring.

The emotional impact of music is an area rich with research. Juslin and Västfjäll examine the various emotional responses triggered by music and the processes behind them, highlighting the complexity of this interaction [6]. Similarly, the work of Tomic and Janata on temporal patterns in music [7] offers insights into the importance of rhythm and its psychological effects. In terms of computational analysis, the use of Hidden Markov Models in music mood classification by Eghbalzadeh et al. [8] and Müller’s examination of dynamic time warping [9], showcase the application of these techniques in understanding music’s influence on emotions.

Research by Salimpoor et al. sheds light on the physiological basis of musical experiences, particularly the role of dopamine in emotional experiences related to music [10]. Hodges further expands on this by discussing various psychophysiological measures used in the context of musical emotion [12]. The foundational work by Cacioppo et al. [13] provides a broad overview of psychophysiology, setting the groundwork for understanding these measurements. The challenges in recording and visualizing psychophysiological data are discussed by Stern et al. [13] and Fairclough [14], emphasizing the complexities involved in interpreting such data.

Visualization techniques, important for understanding psychophysiological data, are explored in various studies. Keil et al. offer guidelines for data visualization in electroencephalography and magnetoencephalography [15], while Francois and Miall present 3D visualization techniques for functional data [16].

The practical application of these visualization techniques in gaming research is highlighted by Kivikangas et al. [17], and Mandryk and Atkins demonstrate their use in continuous emotion modeling [18]. Finally, looking at future trends, Healey and Picard discuss the application of physiological sensors in real-world settings, pointing towards the growing importance of wearable technology and data visualization in psychophysiological research [20].

3 Methodology

Our methodology encompasses a multifaceted approach to understanding the emotional impact of music through psychophysiological data. Initially, we established an interactive platform for the “Emotion in Motion” database, enabling effective visualization and interaction with extensive psychophysiological data, including EDA and POX signals. This platform not only facilitated access to detailed participant data but also provided a synchronized view of EDA and POX responses with audio stimuli, allowing for a simulated real-time analysis of participants’ emotional responses.

Building upon this foundation, we employed Dynamic Time Warping (DTW) for a thorough correlation analysis, categorizing musical pieces into classical and modern genres and further dividing listener responses by age groups. This stratification resulted in detailed heatmaps, elucidating patterns across different demographics.

Finally, we developed an LSTM predictive model, intricately designed to process combined EDA and POX sequences. This model, through its complex architecture comprising multiple LSTM layers and a dense output layer, was trained to predict emotional attributes from physiological responses, providing a deep understanding of the interplay between music, emotion, and physiological change.

3.1 Emotion in Motion Platform

The foundation of our research methodology involved the development of a specialized platform for visualizing the “Emotion in Motion” database. This platform was designed to facilitate access to participant data, allowing for an in-depth exploration of their physiological responses to musical stimuli.

A key feature of this platform is its ability to provide a coordinated multiple-view display, integrating both EDA and POX responses. This integration is synchronized meticulously with the audio tracks, enabling a simulated real-time feed-back mechanism. Such a setup offers a dynamic and comprehensive view of the participants’ reactions, capturing the nuances of their psychophysiological responses as they experience different pieces of music.

By leveraging this platform, we were able to not only observe but also quantitatively analyze the intricate interplay between the emotional impacts of music and corresponding physiological changes. This innovative approach to data visualization and interaction stands as a cornerstone of our methodology, paving the way for a more nuanced understanding of the relationship between music and its emotional and physiological effects on listeners.

3.2 Physiological Responses Analysis

T-Test Analysis on Rating Based Emotion Responses for People with and Without Hearing Impairment: Each individual was asked to rate their feeling for 7 different emotions on a scale of 1 to 5. To examine the impact of hearing impairments on individuals’ emotional and experiential responses to music, we conducted a series of Independent Samples T-tests. Our objective was to compare the ratings across seven different attributes—activity, engagement, familiarity, tension, positivity, power, and like/dislike—between two distinct groups: individuals with hearing impairments and those without.

Our initial dataset comprised responses from a collection of trials, each associated with multiple media items and corresponding ratings for the aforementioned attributes. To ensure a balanced comparison, we first identified the top 10 most popular media items within the dataset based on their frequency of occurrence in the specified experiment. This step ensured that our analysis focused on media items with sufficient data coverage across trials.

Given the potential for missing or incomplete ratings within the trials, we implemented a preprocessing step to handle such instances. Specifically, for trials where the ratings for a particular media item were not provided, we replaced the missing values with zeros. This approach allowed us to maintain consistency in the dataset, ensuring that each trial contributed equally to the subsequent analysis without introducing bias from incomplete data.

We segregated the trials into two groups based on the presence of hearing impairments, as reported in the trial responses. This segregation resulted in two distinct sets of data for comparison: one representing ratings from individuals with hearing impairments (215 trials) and another from individuals without hearing impairments. To balance the groups for statistical comparison, we randomly selected 215 trials from the larger group of individuals without hearing impairments, matching the sample size of the group with impairments.

For each of the seven attributes, we conducted an Independent Samples T-test to compare the mean ratings between the two groups. The T-statistic was computed to measure the difference in means relative to the variability observed within the groups, while the P-value was used to assess the statistical significance of the observed differences. A P-value threshold of 0.05 was predetermined to denote statistical significance (Table 1).

The analysis yielded the following results:

Table 1. Result of T-test analysis

Full size table

Activity, Engagement, Positivity, Like/Dislike: The negative T-statistics for “activity”, “engagement”, “positivity”, and “like and dislike” suggest that the mean ratings for these attributes are lower in the group with hearing impairments compared to the group without, but none of these differences are statistically significant (P-values are all well above 0.05). Familiarity, Power: Conversely, the positive T-statistics for “familiarity” and “power” suggest higher mean ratings in the group with hearing impairments compared to those without. Again, these differences are not statistically significant, as indicated by P-values above 0.05.

Familiarity, Power: Conversely, the positive T-statistics for “familiarity” and “power” suggest higher mean ratings in the group with hearing impairments compared to those without. Again, these differences are not statistically significant, as indicated by P-values above 0.05.

Tension: “Tension” has a positive T-statistic, suggesting a slightly higher mean rating among the group with impairments, but the difference is not statistically significant (P-value = 0.566).

None of the attributes exhibited statistically significant differences between the groups, as all P-values exceeded the 0.05 threshold. This suggests that within the scope of our dataset and analysis, hearing impairments do not significantly affect how individuals rate their experience across the tested music attributes. It is important to note that the lack of statistical significance does not imply an absence of differences but rather indicates that any potential differences were not detectable with the employed statistical tests under the study conditions.

T-Test Analysis of EDA Signals for People with and Without Hearing Impairment: Continuing our analysis, Our investigation sought to examine physiological responses through Electrodermal Activity (EDA) signals among individuals with and without hearing impairments. To facilitate a balanced comparison, we selected a sample size of 150 individuals for each group, ensuring an equal representation of participants with and without hearing impairments. This sample size was determined to provide sufficient statistical power for detecting meaningful differences between the groups while maintaining manageability for detailed signal analysis.

The EDA signals were acquired under controlled environmental conditions to minimize external influences on physiological responses. Following data collection, a series of preprocessing steps were applied to each signal to ensure consistency and comparability across the participant pool. These preprocessing steps included:

Noise Filtering: Application of low-pass filters to remove high-frequency noise, which is not relevant to the EDA responses of interest.

Normalization: Adjustment of signal amplitude across participants to a common scale, accounting for individual variations in baseline skin conductance levels.

Length Standardization: To facilitate direct comparison of signals across all participants, each EDA signal was standardized to a uniform length. This was achieved through truncation of longer signals and padding of shorter signals with zeros, ensuring that all processed signals contained an identical number of data points, conducive to aggregate analysis. These preprocessing efforts aimed to refine the EDA signals into a format amenable to feature extraction and subsequent statistical analysis, laying a foundation for rigorous comparison of physiological responses between individuals with and without hearing impairments.

From the preprocessed EDA signals, we extracted key features representing the physiological responses of interest. These features included the frequency and amplitude of Skin Conductance Responses (SCRs), reflective of autonomic arousal in response to stimuli. Each EDA signal was analyzed to identify SCR events, with the mean SCR amplitude and frequency calculated for each participant.

To assess the impact of hearing impairment on autonomic arousal, we conducted independent samples T-tests comparing the mean SCR amplitude and frequency between groups (individuals with versus without hearing impairments). The analysis aimed to determine if hearing impairment was associated with significant differences in physiological responses to stimuli.

The T-test comparing the mean SCR amplitude yielded a T-statistic of 0.9587219179851065 and a P-value of 0.33847603147970595. Similarly, the comparison of SCR frequency produced analogous statistical values, indicating the comparative analysis’s outcome. The statistical analysis revealed no significant differences in the EDA signal features between individuals with and without hearing impairments. The T-statistic values indicated a minimal difference in mean SCR amplitude and frequency between the two groups, while the P-values (above the conventional alpha level of 0.05) suggested that these differences were not statistically significant.

Our findings suggest that, within the scope of the analyzed EDA signal features, hearing impairment does not significantly affect the physiological responses measured through EDA. This outcome contributes to our understanding of the autonomic nervous system’s response to stimuli in populations with sensory impairments, indicating that the presence of hearing impairment may not substantially alter the physiological markers of arousal and emotional engagement as captured by EDA.

Dynamic Time Warping for Physiological Signal Responses Analysis: A pivotal aspect of our research methodology was the implementation of Dynamic Time Warping (DTW) for the analysis of physiological responses. Initially, we applied DTW to analyze the EDA signals, comparing them against each other to identify patterns and correlations. A similar approach was adopted for POX signals, allowing us to delve into the intricate dynamics of these physiological responses.

The crux of the DTW algorithm lies in the construction and computation of a cost matrix, which encapsulates the distance between each pair of elements from two sequences being compared. Let’s consider two sequences $X = {x_1, x_2, ..., x_m}$ and $Y = {y_1, y_2, ..., y_n}$, where m and n represent their respective lengths. The cost matrix D is initialized with dimensions $(m+1) \times (n+1)$, and all elements are set to infinity, except for D[0][0], which is initialized to 0. This matrix will store the cumulative distances between points across both sequences.

The core of the DTW algorithm involves iteratively filling this matrix. For each element D[i][j], the algorithm calculates the distance between $(x_i, y_j)$, typically using a measure such as the Euclidean distance. This distance is then added to the minimum of the three adjacent elements $(D[i-1][j], D[i][j-1]$, and $D[i-1][j-1])$ in the cost matrix, corresponding to the operations of insertion, deletion, and match, respectively. This process is succinctly captured in the following Python code snippet:

Here, distance (x_i, y_j) computes the chosen distance metric between the elements $(x_i, y_j)$. The cumulative minimum cost at each matrix cell ultimately leads to the total cost of aligning the two sequences, found at D[m][n].

To deepen our analysis, we segmented the audio signals into two distinct categories: classical music and modern music. This classification was crucial in understanding how different genres of music elicited varying physiological responses. Furthermore, we divided the participants into three age groups — 20 to 30, 30 to 50, and 50 to 80 years-to explore age-related variations in response to these musical categories. For each age range and music genre, we performed DTW, leading to a comprehensive set of analyses across six distinct groupings for each signal type.

This systematic approach yielded a total of 12 different results, visualized as heatmaps: EDA for classical music (Fig. 1), EDA for modern music (Fig. 2), POX for classical music (Fig. 3), and POX for modern music (Fig. 4). These heatmaps provided a clear, intuitive representation of the correlations within each group, offering insights into how age and music genre influenced the EDA and POX signals. In addition to these 12 results, we also generated two additional heatmaps by applying DTW to compare EDA signals with each other and POX signals with each other.

The heatmaps served as a visual guide to under-standing the complex relationships between the physiological responses and the musical stimuli, considering both the type of music and the age of the listener. This comprehensive approach allowed us to uncover nuanced patterns and trends in the data, providing a deeper understanding of the psychophysiological impact of music across different demographics and genres.

3.3 Model to Predict Emotional Attributes

Building upon the foundation of our methodology, we developed an LSTM predictive model with a carefully structured architecture to analyze and predict emotional responses based on physiological data (EDA and POX signals). The model was designed to process sequences of combined EDA and POX data, mapping these to predictions about emotional states as induced by musical stimuli.

The LSTM model’s efficacy in handling sequential data stems from its unique gating mechanisms, governed by several key equations. Each LSTM unit comprises three gates: the input gate i (Eq. 1), the forget gate f (Eq. 2), and the output gate o (Eq. 3), alongside a cell state C (Eq. 4) that holds the memory for hidden state update h (Eq. 5).

$$\begin{aligned} i_t = \sigma (W_i \cdot [h_{t-1}, x_t] + b_i) \end{aligned}$$

(1)

$$\begin{aligned} f_t = \sigma (W_f \cdot [h_{t-1}, x_t] + b_f) \end{aligned}$$

(2)

$$\begin{aligned} o_t = \sigma (W_o \cdot [h_{t-1}, x_t] + b_o) \end{aligned}$$

(3)

$$\begin{aligned} \tilde{C}_t = \tanh (W_C \cdot [h_{t-1}, x_t] + b_C) \qquad C_t = f_t *C_{t-1} + i_t *\tilde{C}_t \end{aligned}$$

(4)

$$\begin{aligned} h_t = o_t *\tanh (C_t) \end{aligned}$$

(5)

$\sigma $ represents the sigmoid activation function, $\tanh $ is the hyperbolic tangent activation function, W and b denote the weights and biases of the respective gates, $(h_{t-1})$ is the previous hidden state, $x_t$ is the current input, and $*$ denotes element-wise multiplication. These equations collectively enable the LSTM to regulate the flow of information, making selective decisions about what to retain or discard over time, which is crucial for learning from long and complex sequences.

Model Architecture and Layers. The model was instantiated using Keras’ Sequential API, a linear stack of layers that allows for the easy building of deep learning models. The first layer in our model is an LSTM layer with 150 units. This number of units was chosen to ensure sufficient model complexity to capture the nuances in the data without causing overfitting. The input shape for this layer was set to (2200, 2), reflecting our data’s structure with 2200 time steps and two features per time step (one for EDA, one for POX). The return sequences parameter was set to True, enabling the layer to return the full sequence of outputs for each input sequence, a necessary configuration for sequence-to-sequence learning.

A second LSTM layer, also with 150 units, was added. This layer, with return sequences set to False, returns only the last output in the output sequence. This setup helps in reducing the dimensionality of the output and prepares it for the final dense layer.

The model’s output layer is a Dense layer with 7 units, corresponding to the seven emotional attributes we aim to predict. Each unit in this layer provides a prediction for one of the emotional attributes, giving us a multi-dimensional output that encapsulates the predicted emotional state based on the input physiological data.

Compilation and Training. The model was compiled using the adam optimizer. Adam is an adaptive learning rate optimizer that has proven effective in various deep learning applications. It is particularly well-suited for datasets with noisy and sparse gradients. For the loss function, Mean Squared Error was employed, aligning with our goal of predicting continuous variables (ratings of emotional attributes). This loss function computes the mean of the squares of the differences between predicted and actual values, making it suitable for regression tasks.

The training process involved feeding the combined EDA and POX sequences and the corresponding emotional ratings into the model. We set the number of epochs to 150 and the batch size to 32. The choice of 150 epochs was based on the observation of the loss value. We observed that the loss value did not change noticeably after 150 epochs and so we concluded it was sufficient for the model to converge without over-fitting. A validation split of 20 percent was used during training to monitor the model’s performance on unseen data, ensuring generalizability and preventing overfitting.

Model Evaluation. Post-training, the model’s performance was evaluated using a separate test dataset. The loss on this test dataset was computed to assess the model’s predictive accuracy on new, unseen data. This evaluation step is crucial to understand the model’s practical applicability and its ability to generalize beyond the training data.

The outcome of the LSTM model is a set of predictions for the seven emotional attributes, based on the input physiological data sequences. These predictions represent the model’s understanding and mapping of complex physiological signals to specific emotional states, offering a novel approach to deciphering the emotional impact of music through machine learning. This model, with its intricate architecture and careful training, stands as a significant contribution to the field, demonstrating the potential of deep learning in understanding and predicting human emotional responses.

4 Evaluation and Results

In our sequence analysis, we utilized DTW to investigate relationships within our dataset, focusing on two music categories: classical and modern. For each category, we conducted DTW analysis on three age ranges—20 to 30, 30 to 50, and 50 to 80 years for both EDA and POX signals. For each age range, we had a sample space of 50 for EDA and 25 for POX signals. The sample space for both classical and modern music were 10 songs. This led to twelve DTW heatmaps per music type, each visualizing the physiological response patterns across different age groups. These heatmaps effectively demonstrated the variance in responses to music genres across demographics, showcasing DTW’s utility in elucidating complex physiological data patterns.

The analysis of the distance matrix reveals distinct patterns in how physiological responses vary with musical genres.

Age 20 to 30 (EDA): In Fig. 1, we can observe from the image on left that for age group of 20 to 30, when listening to classical music, there seems to be very high similarity between the EDA signals of the participants.

Age 30 to 50 (EDA): In Fig. 1, we can observe from the image on the center that for the age range of 30 to 50, when listening to classical music, there seems to be somewhat less similarity between the EDA signals but overall, the signals seem to be mostly similar to each other.

Age 50 to 80 (EDA): In Fig. 1, we can observe from the image on the right that for the age range of 50 to 80, when listening to classical music, there seems to be very high similarity between the EDA signals of the participants.

Age 20 to 30 (EDA): In Fig. 2, we can observe from the image on left that for age group of 20 to 30, when listening to modern music, there seems to be somewhat dissimilarity between the EDA signals in comparison to when this group listened to classical music.

Age 30 to 50 (EDA): In Fig. 2, we can observe from the image on the center that for the age range of 30 to 50, when listening to modern music, there seems to be somewhat less similarity between the EDA signals and in comparison to when they listened to classical music, their EDA signal responses seem to be similar overall.

Age 50 to 80 (EDA): In Fig. 2, we can observe from the image on the right that for the age range of 50 to 80, when listening to modern music, there seems to be quite a bit of dissimilarity between the EDA signals of the participants. In comparison to classical music, there seems to be quite a difference.

Age 20 to 30 (POX): In Fig. 4, we can observe from the image on left that for age group of 20 to 30, when listening to classical music, there seems to be very noticeable dissimilarity between the POX signals as well as very noticeable similarity for some signals. In comparison to listening to modern music, this age group seems to have pox signals that resonate with with each other better.

Age 30 to 50 (POX): In Fig. 4, we can observe from the image on the center that for age group of 30 to 50, when listening to classical music, there seems to be slightly less similarity between the POX signals in comparison to the age group of 20 to 30. In comparison to signals when listening to modern music, there seems to be less resonance between the POX signals.

Age 50 to 80 (POX): In Fig. 4, we can observe from the image on the right that for age group of 50 to 80, when listening to classical music, there seems to be more dissimilarity between the responses in comparison to when listening to modern music for this age group but overall, there is no distinct indication of anything discernible. This result seems to be in line with the other age groups when listening to classical music.

Age 20 to 30 (POX): In Fig. 4, we can observe from the image on left that for age group of 20 to 30, when listening to modern music, there seems to be very noticeable dissimilarity between the POX signals.

Age 30 to 50 (POX): In Fig. 4, we can observe from the image on the center that for age group of 30 to 50, when listening to modern music, there seems to be less dissimilarity between the POX signals in comparison to the age range of 20 to 30.

Age 50 to 80 (POX): In Fig. 4, we can observe from the image on the center that for age group of 50 to 80, when listening to modern music, there seems to be more similarity between the POX signals for this age group in comparison to the age range of 20 to 30 and 30 to 50.

From the heatmaps generated for classical and modern music for participants of three distinct age groups, we can observe that there seems to be noticable difference between the participant’s EDA signals when listening to classical music vs when listening to modern music for the age group of 20 to 30 and 50 to 80.

Unlike EDA signals, the POX signals do not demonstrate any clear differentiation between classical and modern music as well as between the age groups. This observation leads to an intriguing conclusion: EDA signals appear to be more reflective and sensitive to variations in musical genres and age groups in comparison to POX signals. This difference in response patterns underscores the potential of EDA signals as more effective indicators for distinguishing between different types of musical experiences for different age groups.

In our study, the developed model demonstrates a proficient capability to predict seven distinct emotional states that individuals are likely to experience while listening to music. Recognizing the inherently subjective nature of emotions, where definitive accuracy is challenging, our model’s performance is notably significant. It marks a substantial advancement in comprehending human emotional responses to music.

Furthermore, this achievement has promising implications for the development of recommendation systems. Such systems, informed by our model, could analyze user inputs to tailor selections more closely aligned with individual preferences. This represents not only a stride in understanding emotions but also in enhancing user experience through personalized content curation based on emotional responses (Table 2).

Table 2. Accuracy obtained from LSTM for each emotion

Full size table

5 Conclusion

This study represents a stride in the field of psychophysiology and its intersection with music. By harnessing the extensive data from the “Emotion in Motion” database, we have developed an interactive platform that allows for an in-depth exploration and visualization of psychophysiological responses to musical stimuli. Our application of DTW to analyze EDA and POX signals across different musical genres and age groups has yielded insightful findings. Particularly, we observed that EDA signals more distinctly differentiate between classical and modern music genres compared to POX signals, suggesting a higher sensitivity of EDA in reflecting emotional responses to different types of music and for different age groups.

Furthermore, the implementation of a LSTM predictive model has been an important part of our research. This model’s ability to predict seven different emotional states from physiological data showcases the potential of machine learning in decoding complex human emotions. The implications of this are twofold: first, it enhances our understanding of the nuanced relationship between music and its emotional impact on listeners; second, it paves the way for developing sophisticated recommendation systems that can personalize content based on an individual’s emotional responses.

Our research highlights the intricate connections between music, emotion, and physiological responses. It indicates new avenues for future investigations into how music can evoke a spectrum of emotions and how these can be quantified and utilized in practical applications. The convergence of psychophysiological data analysis and machine learning, as demonstrated in this study, sets a precedent for further interdisciplinary research that can expand our understanding of the human emotional experience.

References

Bortz, B., Jaimovich, J., Knapp, R.B.: Emotion in motion: a reimagined framework for biomusical/emotional interaction. In: Proceedings of the International Conference on New Interfaces for Musical Expression (NIME 2015), pp. 44–49 (2015)
Google Scholar
Tasooji, R., Gracanin, D., Knapp, R.B.: Exploring the impact of labeling on psychophysiological data analysis. In: Proceedings of the EmotionAware 2022 Workshop, the 20th International Conference on Pervasive Computing and Communications (PerCom 2022), pp. 371–376 (2022)
Google Scholar
Boucsein, W.: Electrodermal activity. In: Springer Science & Business Media, New York (2012). https://doi.org/10.1007/978-1-4614-1126-0
Figner, B., Murphy, R.O.: Using skin conductance in judgment and decision making research. In: A Handbook of Process Tracing Methods for Decision Research, pp. 163–184 (2011)
Google Scholar
Tobin, M.J.: Principles and Practice of Intensive Care Monitoring. McGraw-Hill (1990)
Google Scholar
Juslin, P.N., Västfjäll, D.: Emotional responses to music: the need to consider underlying mechanisms. Behav. Brain Sci. 31(5), 559–575 (2008)
Article Google Scholar
Tomic, S.T., Janata, P.: Beyond the beat: modeling metric structure in music and performance. J. Acoust. Soc. Am. 124(6), 4024–4041 (2008)
Article Google Scholar
Eghbal-zadeh, H., Schedl, M., Widmer, G., Haunschmid, E.: Hidden Markov models for mood classification in music. In: Proceedings of the 16th International Society for Music Information Retrieval Conference (2015)
Google Scholar
Müller, M.: Dynamic time warping. Inform. Retrieval Music Motion, 69–84 (2007)
Google Scholar
Salimpoor, V.N., Benovoy, M., Larcher, K., Dagher, A., Zatorre, R.J.: Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nat. Neurosci. 14(2), 257–262 (2011)
Article Google Scholar
Hodges, D.A.: Psychophysiological measures. In: Handbook of music and emotion: Theory, research, applications, pp. 279–311 (2010)
Google Scholar
Cacioppo, J.T., Tassinary, L.G., Berntson, G.: Handbook of psychophysiology. Cambridge University Press (2007)
Google Scholar
Stern, R.M., Ray, W.J., Quigley,K.S.: Psychophysiological Recording. Oxford University Press (2001)
Google Scholar
Fairclough, S. H.: Fundamentals of physiological computing. Interact. Comput. 21(1-2), 133–145 (2009)
Google Scholar
Keil, A., et al.: Committee report: publication guidelines and recommendations for studies using electroencephalography and magnetoencephalography. Psychophysiology 51(1), 1–21 (2014)
Google Scholar
Francois, C., Miall, R.C.: An Interactive 3D visualization tool for time series of functional maps. In: VIIP, p. 6 (1996)
Google Scholar
Kivikangas, J.M., et al.: Review on psychophysiological methods in game research. In: Proceedings of 1st Nordic DiGRA (2011)
Google Scholar
Mandryk, R.L., Atkins, M.S.: A fuzzy physiological approach for continuously modeling emotion during interaction with play technologies. Int. J. Hum.-Comput. Stud. 65(4), 329–347 (2007)
Article Google Scholar
Mahat, M.: Prediction and prevention of addiction to social media using machine learning. In: Swain, D., Pattnaik, P.K., Athawale, T. (eds.) Machine Learning and Information Processing. AISC, vol. 1311, pp. 319–329. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-4859-2_31
Chapter Google Scholar
Healey, J., Picard, R.W.: Detecting stress during real-world driving tasks using physiological sensors. In: IEEE Transactions on Intelligent Transportation Systems
Google Scholar
Mahat, M.: Detecting cyberbullying across multiple social media platforms using deep learning. In: 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, pp. 299–301 (2021)
Google Scholar
Rajgure, Sumit, Mahat, Maheep, Mekhe, Yash, Lade, Sangita: Reconstructing obfuscated human faces with conditional adversarial network. In: Swain, Debabala, Pattnaik, Prasant Kumar, Gupta, Pradeep K.. (eds.) Machine Learning and Information Processing. AISC, vol. 1101, pp. 95–104. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1884-3_9
Chapter Google Scholar

Download references

Acknowledgments

This work was supported in part by Virginia Tech Institute for Creativity, Arts, and Technology.

Author information

Authors and Affiliations

Virginia Tech, Blacksburg, VA, 24060, USA
Maheep Mahat & Denis Gracanin

Authors

Maheep Mahat
View author publications
You can also search for this author in PubMed Google Scholar
Denis Gracanin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maheep Mahat .

Editor information

Editors and Affiliations

Soar Technology Inc., Orlando, FL, USA
Dylan D. Schmorrow
Katmai Government Services, Orlando, FL, USA
Cali M. Fidopiastis

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahat, M., Gracanin, D. (2024). Deciphering Emotional Responses to Music: A Fusion of Psychophysiological Data Analysis and LSTM Predictive Modeling. In: Schmorrow, D.D., Fidopiastis, C.M. (eds) Augmented Cognition. HCII 2024. Lecture Notes in Computer Science(), vol 14694. Springer, Cham. https://doi.org/10.1007/978-3-031-61569-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-61569-6_4
Published: 01 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-61568-9
Online ISBN: 978-3-031-61569-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deciphering Emotional Responses to Music: A Fusion of Psychophysiological Data Analysis and LSTM Predictive Modeling