1 Introduction

The analysis and investigation of the effects and intricacies of social dances are ample and find contributions in many sociological, cultural, and psychological areas. This comes at no surprise, as social dances already exist for centuries and are embedded in many cultures, ethnic groups and are often related to a social and/or religious context [33]. More particularly, couple dance is a specific declination of social dance acted in pair, traditionally with one man and one woman in a mechanical interaction that allows making some complicated moves, with each of the partners having a specific role (the man is leading, and the woman is following). This type of dance can be found widely across the world, as Forro in Brazil, Tango in Argentina, Fox-trot in the USA, or Valse viennoise in Austria. In more recent studies, the attention to social couple dances is also found in the fields of bio-mechanics, Human Robotic Interaction (HRI), and Human Computer Interaction (HCI), examining the features and its application in the digital domain. Within the latter context, we focus on the predominantly cognitive connection between the dancers while performing a social couple dance. The human to human interaction with full-body movements are coordinated and fine-tuned upon each other, and in most cases attuned to the music, which dictates the rhythm and the “way” a dance is carried out (e.g., slow vs. energetic). Another aspect of the interaction is the “lead” and “follow” roles, which refer to the impulse and response pattern during the dance and the connectivity between the couple. The vastly dynamic and interactive situations of social couple dances bring a plethora of parameters, derived from the physical and cognitive interaction, the musical interpretation and listening (e.g. body “drive”), and represents a tremendous challenge to comprehend and analyze this intricate and interdependent set of parameters. Salsa is a very social couple dance that is popular around the world and whose learning has its challenges:

  • Learning in (large) collective classes, which is less effective in spotting errors on individual students.

  • The need to practice with another partner on location, meaning the risk of inadequate facilities and/or not having a partner to practice with (either by lack of dance partners or due to personal time schedules).

  • Other parameters can influence the study, such as mood, stress, fatigue, and other external social factors.

  • Time and location constraints due to other obligations (e.g. studies, work).

Besides, when the student is reaching a similar skill level as its teacher, the student may oppose the advises given by the teacher as to what is “correct”. The status of an expert in social dance can be a source of confusion as there is no official diploma state validated but rather a public recognition of skills by pairs. In many cases, the learning process can be less effective, halted, or reconsidered, depending on the relationship between student and teacher. The use of virtual reality exercises have been proved relevant for training in a range of essential jobs (army, pilots, firefighter, etc.) and shows a real improvement of the learner skills, allowing being in the complicated situation as in real life. Giving the complexity of salsa dance, virtual reality is an excellent alternative option for learning dance since it provides the required mechanical interaction between the user and the virtual character, and allows for tracking the full-body movements over an area similar to the one needed for dancing. s The main objective of this paper is to demonstrate that we can guide and help users to improve their salsa dancing skills through a Virtual Reality (VR) game that simulates salsa practice. In previous work [40], we showed that six criteria are important for learning salsa; Rhythm, Guidance, Fluidity, Sharing, Styling and Musicality. In this work, we focus on the evaluation of three main skills, which are the Guidance, Rhythm, and Style. In that manner, we have designed a VR application that facilitates a virtual partner, in an interactive environment, and simulates dancing in a couple. Each user wears a VR headset with hand controllers and performs along with a virtual partner. The motion of the users is recorded using an optical motion capture system, and their movements are linked to the virtual avatar using Inverse Kinematics. The user goes through a series of exercises, and the system returns an overall score to motivate the user to compete against others. We performed an extensive analysis of the recorded exercises, and evaluated the learning skills and progress of the users at different learning stages with regard to the aforementioned important criteria; the analysis was conducted using a number of Music-related Motion Features (MMF) and Laban Motion Analysis (LMA) features. Results demonstrate the improvement in dancing qualities of the non dancers that tend to converge to the qualities of the regular dancers. Figure 1, shows a visual illustration of our VR environment, where a user interacts with the virtual environment.

Fig. 1
figure 1

Our learning salsa gamified VR application. The user wears a virtual reality headset and interacts, in real-time, with a virtual partner using hand controllers. The images of the user in the real-world (left side) and the dancer in the virtual environment (right side) are blended

The main contributions of this paper are itemized below:

  • A VR environment that guides and helps users to practice and improve their dancing skills through dance gamification, and more specifically, via interaction with a virtual avatar. This application also provides seamless motion capture that can be used for further processing and studies.

  • A motion analysis that evaluates the influence of our application on the dance skills of users, in terms of three main criteria: the guidance, rhythm, and style. We extract, evaluate, and validate the important MMF and LMA features using a two-class dataset of regular and non dancers, while their movement is synchronized with music.

2 Related work

The human motion, during a dance, often carries emotion and is connected with the whole cognitive-motor and psychological system. It has been investigated through multiple scientific studies, including dance motion generation [21, 41], synchronization to music [10, 43], emotion recognition and stylization [8], and represents many challenges for learning [34]. Besides the benefit of social dances for health as improving balance and cognition for elderly [25,26,27], its interactive aspect has been touched upon by the HRI domain. For example, where through sensors detection, the user’s movements transcribed into an intermediary data set to generate poetry [12, 13]. Human to human interaction has also been explored via a setup of patches [42] and scene ranking [47] in the context of an animated character. Another example is the use of robots acquiring the knowledge and skills to perform a dance [30]. However, the research is limited to single instances of a dancer, thus not taking into account the simultaneous act of dancing. The interaction between performers themselves has been studied in the psychological domain [29, 46]. The interaction between the public and the performers has also been investigated by [44]. More in our focus, a set of studies would evaluate the dance performance using various methods, as Kinect [2].

Extracting the motion features from continuous movement is a crucial element for describing, evaluating, and understanding dance and movement in general. For instance, locomotion has been studied with gait analysis and classification with extreme machine learning and leg joint angles data [31, 37]. Studies on everyday actions [18] are proposing a set of features inspired by psychology and physiology to characterized behaviors and the subsequent emotion involved. More specifically, the use of LMA-based features has proved to work well in different situations, such as motion retrieval, indexing, and comparison [4, 6], and is therefore ideal to be used as a base to build a machine learning classifier, as demonstrated for theatre emotional expression [38] or evaluating the performer’s emotion using LMA features [3]. Other studies focused on a specific motion feature, for example, the fluidity of the movement that is a critical dance parameter investigated in [32]. In this particular study, it is proposed to see how fluidity can help to describe and to classify dance performance through interdisciplinary research including biomechanics, psychology, and experiments with choreographers and dancers, and they propose a definition that takes specifically the minimum energy dissipation when looking at the human body as a kinematic chain. Another work [1], elaborated upon the expressive qualities, such as rigidity, fluidity, and impulsiveness, to investigate intra-personal synchronization for full-body movement classification. In our previous study [39, 40], we propose a set of motion features that take into account the particular context of salsa dance: motion synchronized with music and interaction with other partners.

Learning is an essential aspect of enhancing dance performance and the use of virtual reality for it is bringing immersion, visualization, and interactivity that shows promising results [11, 22]. Important studies in the field of human visual appearance [9] provide several advises on a good virtual human representation for better interaction. Even a commercial application has been already proposed to learn Salsa dance in virtual reality with a coach [14]. The first study about Forro dance [16], evaluates how the user can learn and improve his dance skills through repetitive training, monitored by his smartphone. The proposed evaluation features computed with the user’s motion data from the smartphone’s Inertial Measurement Unit (IMU) sensor and the music data: First the “Rhythm Beats Per Minute (BPM): We calculate the average beats per minute.”, then the “Rhythm consistency: we calculate the coefficient of variation of the student’s BPM across the full dancing exercise”. This study brings exciting insights into characterizing Forro dance learning and relevant dance features, but the restricted data source (only one point IMU source) is an obvious limitation. In a recent study, a VR interactive simulation of salsa dance using Hidden Markov model to predict the virtual partner dance behavior has been developed [28]. Although the kind of “Top-down” approach, the introduction of jump transitions is making sense as going towards the structure of Salsa dance as learned in classes (based on cycles of 8 beats). This study has good feedback from users regarding the naturalness of the motion and the dance-following feeling. It would be interesting to understand which specific motion features produced by the Markov model enables such perception by the users. Another work is to developed a dance game based on motion capture technology [15], addressing the issue of user’s performance real-time estimation to determine what a virtual dance partner should display as interactive motion. The real-time prediction was based on body parts indexing in conjunction with flexible matching to estimate the completion of motion and reject unwanted motions. A method to control a real-time virtual character using a motion capture system is also proposed [24]. In their method, the character’s motion extracted from a database and pre-processed using a two-layer method. A Markov process is used in the first layer, and a clustering technique is used in the second layer. Finally, a framework is developed [20] for synthesizing the motion of a virtual character in response to the actions performed by a user-controlled character in real-time.

In comparison to the previously mentioned approaches, our work is based on simulating a Salsa dance environment in VR with the focus on user experience and dance skills learning. Indeed, gamification is an interesting process of improving the engagement of user for learning system [19]. We aim at providing the most convincing level of Salsa simulation such that we induce performance improvements. Alternatively to studies that take into account only the basic steps or style elements, we are considering the whole behavior of each partner and their relationship to the music. To analyze and validate the application, comparative motion analysis is required. This analysis is done using, on the one hand, the well-known LMA features, that have shown their accuracy in depicting style in dance within a lot of papers, and on a second hand the MMF, that are a new proposition dedicated to interrelated music and dance motion.

3 Design

3.1 Overview

Our objective is to develop an interactive dance learning system that is able to improve the dance skills of the engaged users. To achieve that, we propose a framework constituted in three components that fulfill the following technical requirements: a VR salsa simulator, a gamified learning system and motion recording for further analysis. The VR salsa simulator recreates the condition of salsa dance from the leader role side, involving: (a) visual contact and viewing of the engaging partner, (b) natural and physical interaction, (c) an adequate music to dance with the virtual partner, having the ability to guide it into dance, and (d) finally, enough space to allow freedom of movement. This educational and gamification activity ensures the development of dance skills through pedagogical training: it embeds a series of exercises that are easy to understand and start with, it has repetitions, based on timed hand gesture and full body movements, different musical tempos for a dynamic training, and a final score that is accessible at the end f the session to keep up the motivation and engagement. During all exercises, the full body motion is recorded at high frame rate to allow real time or post processing motion analysis. Figure 2 illustrates an example of a person testing our VR environment.

Fig. 2
figure 2

Example of a person testing the game

3.2 Salsa simulator

The first step of our work is the design of a VR application based on real salsa practice. For that, we based our work on the observation of real body movements during dance. An important point is the role of each partner. There are one leader and one follower. Both are dancing on the rhythm independently, but the leader will influence the follower motion via his hands, chest or other “connection” tools, and the follower will “listen” this indication and change its dance pattern accordingly. In our game, the user will have the role of a leader, and a virtual partner will be the follower. Similarly to real dance scenarios, our virtual partner behavior can be structured into two animation layer working in parallel: moving the body and feet on the tempo of the music, and reacting to the user guidance. The latter reaction has to be natural regarding the user stimulus. Inverse Kinematics (IK) is thus used as it allows to animate the full body (the end-effectors, such as the hands, feet, and head) with time and position constraints. A good and reliable VR setup is necessary to ensure good immersion. We used the HTC Vive, as our VR system, since it possess very high-fidelity and wide space tracking, enough to cover the needed space when dancing Salsa, and it allows the use of additional tracked markers.

3.2.1 Virtual partner model and music-synchronised dance animation

A visually pleasing model, but still a little bit cartoony, is chosen among commercial solutions for the Virtual partner appearance, so to engage the user for interaction. A layer of inverse kinematics with physical constraints (bending of the upper body and other limbs) is added to the rigged model, allowing to manipulate the end-effectors with ease, achieving constituent motion. The knowledge of basic salsa step’s motion in space comes from a previous study [39, 40], from which we extract a motion profile for each foot, as illustrated in Fig. 3. This motion profile serves as a base to set the position in space of the IK targets corresponding to the right and left foot of the Virtual Partner (VP).. The time length of the motion profile is proportional to the music tempo, ensuring the virtual partner always dance “in rhythm”. Additionally to this, we move the root at the half distance of the foot position, such that the upper body is always straight and kept balanced. The result is an entirely natural motion that is totally in adequation with the basic salsa steps theoretical description.

Fig. 3
figure 3

The distance for right and left foot of the virtual partner related to a neutral position. This motion profile is repeated every 8 beat of a music. In the application, the curves were smoothed to achieve better naturalness

The direction of the basic step using this motion profile can be divided into two main directions, giving us two dance patterns: A forward-backward motion called “Mambo” and a right-left motion called “Cucaracha”, visualized in Fig. 4. The user can follow the steps of the VP in order to catch the music tempo. A drawing of footsteps is placed in front of the VP to help the user be rightly positioned.

Fig. 4
figure 4

Our two salsa dance pattern. On the left side for “Mambo”, the VP will step backward during the beat 1 and 3, then moves forward during the beat 5 and 7. On the right side for “Cucaracha”, the VP will move his right foot to his right then his left foot to his left alternately

3.2.2 User interaction: guiding the virtual partner

To simulate the feeling of guidance, the user can control the transition of the VP dance pattern via interactive gesture and timing. To give the feeling to hold hands as in Salsa, the hands of the VP are placed near the user’s hands in real-time (as an IK position constraint), and the remaining arm is animated through IK, as in the case of manipulating a rag-doll. The correct user hand gesture required to control the transition is detected through the computation of forces. The IK system computes the push force applied from each hand to the respective VP shoulders. Then we extract a forward force (whether the user is pushing or pulling the VP’s arm in front) and a side force (whether the user is pushing or pulling VP’s arms on the sides) with the dot product. This information is calculated in real-time and allows us to know how much force the user is producing on the VP, and in which direction. This analysis gives us two important information: the time the force is applied and the direction of the force (Sides or front). A valid gesture for transition is considered if: The direction of force is perpendicular to the direction of the current dance pattern, and if the force occurs between the beat 7 and 8 (in a similar manner to [39]). The results give the user the feeling of guiding the VP, as illustrated in Fig. 5.

Fig. 5
figure 5

Detail of the required gestures to control the virtual partner on its transition to the two possible dance pattern. To guide the virtual partner from a “Cucaracha” motion to a “Mambo” motion, the user have to push the virtual partner between the beat 7 and 8 with his left hand. To do the reverse transition, the user have to pull the virtual partner on his left at the same time

3.2.3 Software design

The overall VR application is developed under Unity3D game engine, including all necessary plugins to work with our VR device. When our VR application starts, an initialisation phase waits for the user inputs e.g., the name, to automatically label the saved motion data. In the meantime, the IK animation is activated, allowing to manipulate the virtual partner via holding hands to get familiar with the environment. Then when the user start the training, a countdown is provided and the virtual partner dance animation is triggered, as well as its transition system and the music, all at the exact same time. Finally at the end of the training, the application displays briefly the final score and goes back to the initialisation phase.

3.3 Learning and gamification

The main focus of our implementation is to provide the essentials to users to develop two main dance skills: rhythm and guidance through pedagogical and fun exercises. We set up in our VR application a series of repetitive exercises containing two dance tasks. The tasks consist of the user to move his feet on the music and to guide with his hands the VP to change its basic dance pattern every two-cycles of 8 beats (two simultaneous attentions are needed). There are eight exercises of different tempos in order to vary the difficulty of the task and keep the training dynamic, with a short pause in between them. A feedback, in the form of a final score, is then computed, based on the number of successful guidance attempts compare to a reference number, and provided at the end of the session as reward. Between the first and the last exercise (that are at the same tempo), the user is expected to show an improvement in terms of guidance, style, and rhythm. The gamified aspect of this application is important for the user engagement with a focus on the usability, playability and fun.

3.4 Motion data recording

A post-process motion analysis allow to evaluate the ability of learning system to improve dance skills and subsequently, the relevance of our design. The movements of the user are captured via the default VR setup (hands, and head), and additional tracking markers that are placed on the hips and feet. We then get a pose representation in this context of six points. The coordinates of each point are recorded during the training session at a high frame rate (100 frames per second) to ensure high quality and high speed analysis of all kinematic components. This pose representation is giving us enough information for meaningful motion analysis.

4 Experiment

One way to show that our VR platform helps users to improve their skills is by computing their MMF and LMA features on the early stage of the training, and then comparing them with the corresponding features at the end of the training. To test our application and evaluate its ability to help users to improve their salsa learning, we conduct experiments using two dancer categories with different experience:

  • Non dancer: people that never have any class nor experience in salsa dance,

  • Regular dancers: people that do take class of salsa and have at least one year of practice.

We expect that the performance of the non dancers, at the end of the experiment, will converge towards the regular dancer’s, indicating an improvement in their learning skills. For each user, the objective is to go through a series of eight exercises. In each exercise, salsa music is played, and the VP moves in synchronization. The aim is to follow the music and guide the VP to change its dance pattern every two iterations. The time of each exercises is about 60 intense seconds during which they are constantly making physical effort to keep the rhythm of the music and the guidance task. The criteria for evaluation are the same for each exercise, with minor variation in difficulty to keep the training dynamic.

The tempo varies in order to stimulate the user but is the same at the beginning and the end of the training for consistent analysis. A summary of the exercises performed by each user is listed in Table 1.

Table 1 Summary of the exercises constituting the application

We invited 40 people to participate, half of the participants were regular dancers and half non dancers. Note that data acquisition is challenging, mainly because it requires the participants to physically participate in the experiment, in our lab, and use our devices. Nevertheless, as shown in Section 6 a training sample of 40 people show a clear learning trend, and suffice to validate this direction.

The setup is not as light as the simplest VR devices, but is light enough so that the participant can move freely (this is also due to the wireless system used). After a short tutorial preparation, each participant went through 8 exercises and got a final score. This score is based on their success to accomplish the given aim (changing dance pattern every 2 cycles) and serve mainly as a motivation for the user to compete against others. With 40 users over 8 exercises, the resulting database represents 320 sequences of motion capture, which are recorded each as 4500 frames of 6 points-skeleton.

5 Motion analysis

In this work, we used two well-known motion analysis system to evaluate the movement of the participants, the Musical Motion Features, and the Laban Movement Analysis system.

5.1 Musical motion features - MMF

Salsa is a specific type of dance in which movements are highly correlated with the music and the other partner. To take that into consideration, we have previously proposed the MMF framework [39, 40], which contains the relevant motion features. MMF indicates excellent performance in classifying motion data with regard to three essential salsa dance skills: rhythm, guidance, and style. In our previous study, we proposed (following dance experts’ suggestions) six criteria. However, only three of them were investigated, mainly because the remaining three require complex analysis, and each one a full study on their own. Similarly, in this study we used the same three criteria, which provide though the essentials for developing an accurate prototype for analysis and evaluation of the learning performance of our participants. This framework has been used to distinguish beginner from expert dancers, and was validated through a user study (participants are separated based on their dance level: Beginner, Intermediate and Expert) from a huge amount of motion data (26 couple dancing over 10 songs of 120 minutes). These MMF features carry information relative to dance skills and are therefore a sort of interface between low-level and high level data. Here, the goal of our analysis is to evaluate the performance of one person dancing with a virtual partner that has a predefined behavior.

We consider only a subset of the proposed MMFs, given that features concerning the VP will not vary. We are using sixteen measurements \(\mathfrak {\mu }_{j}\) that belongs to five feature categories, extrapolated from three dance skills, that are shown in Table 2. All measurements are observed on a temporal window of given frames corresponding to 8 beats. Previous experiments, e.g., [39], show that 25 frames per second are the best to extract meaningful results. Thus, we downsampled the initial frame rate (100Hz) to 25 frames per second (fps) without loss of the temporal information (see Forbes and Fiume [17]). Finally, each measurement of \(\mathfrak {\mu }_{j}\) is normalized between 0 and 1.

Table 2 Subset of the musical motion features in our case of virtual reality

5.1.1 Dance skill: rhythm

Step accuracy (\(\mathfrak {\mu }_{1}\) - \(\mathfrak {\mu }_{4}\))

One of the essential features when learning salsa dance is rhythm and the ability of the user to follow and be synchronized with the music beats. In that manner, we consider the velocity magnitude over 8 musical beats for each foot. For example, when dancing the “Mambo” pattern, two peaks occur that indicate a movement of the foot on the music. The first peak corresponds to a step forward (beat 1), and the second peak to a step back to the neutral position (beat 3). The same occurs for beats 5 and 7. Given the temporal location of each musical beat, we can compute the step accuracy for each beat as the difference between the musical beat and the user’s foot motion time. Thus, via filtering and peak detection, we can evaluate the temporal location of each of the user’s steps, and compare them to the musical beats, once it is extracted from the music, as shown in Fig. 6. The result is 16 measurements that are extracted through a sliding window of width proportional to the music tempo. The beat 1 is detected by hand at the beginning of each song to ensure the good temporal accuracy of each sliding window.

Fig. 6
figure 6

Velocity peak that occurs during the basic step for the user, and the measurement of step accuracy

Rhythm difference between partners (\(\mathfrak {\mu }_{5}\) - \(\mathfrak {\mu }_{8}\))

These features have been placed as a Rhythm skill since the partner motion, in our application, is predefined and therefore acts as a tempo reference. During the dance, the foot motion of the user and the VP are in opposition. Then, similarly to the aforementioned Step Accuracy feature, we detect the temporal location of the user’s beats and compare each of them to those from the VP. Values toward zero are considered as good synchronizations.

5.1.2 Dance skill: guidance

Correlation between foot movements (\(\mathfrak {\mu }_{9}\) - \(\mathfrak {\mu }_{10}\))

Computing the 2D correlation coefficient of the 8 beats velocity’s magnitude between the user and the VP foot motions gives insights about the synchronization of the couple, given that their respective moving feet are supposed to move oppositely and simultaneously (the left foot of the user in the same time as right foot of the VP).

5.1.3 Dance skill: styling

Area (\(\mathfrak {\mu }_{11}\) - \(\mathfrak {\mu }_{14}\))

During a cycle of 8 beats, the displacement of the feet is measured by the integration of the velocity over time. In addition, the net velocity change is measured by the integration of the velocity’s derivative. These values are computed for each foot, and provide insightful information on the dynamic of the stepping action.

Hands movements (\(\mathfrak {\mu }_{15}\) - \(\mathfrak {\mu }_{16}\))

The dynamic of hands movements provide intuitions and helps in characterizing the styling aspect of salsa. It is computed by taking the mean distance between left / right hands and hips over 8 beats.

5.2 Laban movement analysis - LMA

Analyzing human motion is particularly challenging, especially when the goal is to evaluate the learning skills with parameterized geometry and style control. In order to identify and evaluate the learning skills of our platform, we learn motion characteristics based on the LMA principles [23], drawing from the framework described in Aristidou et al. [7]. This framework was strategically designed to capture the diversity of stylistic and geometric characteristics of a set of dancing motions [3], and has been used to analyze and compare folkloric dances [6]. In contrast, the goal of our analysis is to learn features that are characteristic of learning skills among performers with different experiences in dancing.

In this work, we define, as local spatiotemporal descriptors, one-dimensional arrays that encode the LMA-derived features, from selected key joints. We have considered 29 low-level spatiotemporally varying features (fi) of the human body, which were chosen according to the four LMA components (Body, Effort, Shape, Space). For each feature the minimum, maximum, mean and standard deviation values were computed, resulting in 114 different feature measurements (ϕj). These measurements are taken by observing each feature over a short temporal-window around a given frame (a 30-frames right anchored sliding window, at 25 frames per second) through each motion sequence, with step 20 frames (10 frames overlap). These feature measurements are after that normalized so as their values range between 0 to 1.

Thereafter, and similarly to [8], we select those features that are consistent among the same group of performers (regular dancers Vs non dancers), and effective across the two different groups. This allows us to make a meaningful mapping from the low-level feature space of the underlying motion into the learning skills. To achieve this, we consider in our analysis the mean and standard deviation of the sample values for each feature, for both classes. We define as effective and consistent features those that their standard deviation is small for motions of the same group (< 10% of the value), and the mean values between the two classes have a significant difference (> 20%). Since the movements in our dataset are strictly structured, and the variation in motion is limited, not all LMA features are important in separating the two classes. Based on our LMA feature analysis, we concluded that only twenty LMA feature measurements are useful for separating the two classes, which are listed in Table 3.

Table 3 The consistent and effective LMA-derived feature measurements used for separating the two classes, which provide insights about the learning skills of the participants. The feature measurements indexing follows the numbering of [7]

6 Results and discussion

Two complementary methods are used to describe the learning effect of the game. In terms of guidance and rhythm (including synchronization), we used the MMF features, and in terms of the movement style (including effort, volume, and space) the LMA features. To evaluate the skills’ improvement in learning salsa, we compare the values of the corresponding MMF and LMA features for the second and the last exercises. Note that, we chose not to use the first exercise since it is acted as a training step for the dancers to get familiar with the VR environment.

6.1 MMF study

For each performer and exercise, we extract one-dimensional arrays (the windows of MMF measurements using a sliding window of width proportional to the music’s tempo), and represent each performance by the mean value of all these local descriptors. Our target is to evaluate the performances of the two categories (regular dancers vs. non dancers) over time, and observe potential changes in the quality of dance after training.

Figure 7 left shows the mean values of the MMF measurements μj of the performers for the two classes for all exercises, while on Fig. 7 right shows the mean values of the performers for the second (top) and the last exercises (bottom). It can be easily observed that the mean of the MMF measurements for the regular dancers have larger values than those of the non dancers in regards to the MMF styling and guidance skills. This is in line with our expectations since regular dancers, due to their long-time experience, have better guidance than the non dancers, and put more effort into dancing, making wider steps and moving their hands more intensely. Another important observation is the significant improvement in the guidance feature for the non dancers when comparing the beginning and the end exercises of the training, as well as the notable decrease in their rhythmic error (hence increase their rhythmic accuracy). These two observations indicate an advancement in the performance of the non dancers, which supports our claims that our system helps users to improve their salsa learning ability and skills. It is also important to note that regular dancers have slightly improved their performance (their MMF features stays relatively the same), reducing their rhythmic error. This indicates that their dance behavior has not changed much during and after the training, which was an expected behavior since they already know the basic salsa steps. Most of the improvements in the regular dancers performance seem to be attributed to the familiarization of users with the system.

Fig. 7
figure 7

The mean values over all exercises for the MMF-derived feature measurements μj is shown on the left. On the right top is the mean values of the second exercise, and at the bottom is the mean values of the last exercise

Another remarkable notice, as shown in Fig. 8, is that the standard deviation (std) of the MMF measurements for non-dancer are much larger than those for the regular dancers regarding guidance. This indicates that the movements and guidance skills varied a lot within non dancers. This can be justified by the fact that non dancers, as non-experienced in salsa moves, have a different sensibility and synchronization of their body movements to the music. In contrast, regular dancers’ movements have smaller variation since they have prior experience in leading a salsa dance scenario, and control better their body movements and gestures.

Fig. 8
figure 8

The standard deviation of the MMF-derived feature measurements μj for the regular dancers (blue) and non dancers (red)

To visualize the differences between the two classes, we portray the high dimensional arrays that represent the performance of each participant into a 2-dimensional space using the t-Distributed Stochastic Neighbor Embedding (t-SNE) [45]. We used t-SNE for dimensionality reduction, rather than the Multi-Dimensional Scaling (MDS) [36], since it is particularly well suited for the visualization of high-dimensional datasets such as ours. Figure 9 shows the 2D embedding of the two classes, regular dancers and non dancers. The most significant observation is that the two classes can be separated at the beginning of the training, but as the performers gain more experiences and training (e.g., in the last exercise), the two classes are mixed up. Assuming that regular dancers have good learning skills, this is a good indication that the overall guidance and rhythm profiles of the users have been improved, and are converging toward a more homogeneous one, thus validating the learning effect of our training.

Fig. 9
figure 9

t-SNE dimension reduction of the MMFs for the second exercise (left) and the last exercise (right)

6.2 LMA study

To evaluate the learning skills and the improvement of the performers in terms of the style/LMA analysis, we performed the following analysis. For each performer, and different learning stages (exercises), we extracted the one-dimensional arrays (the windows of LMA-derived features measurements using a sliding window), and represent each performance by the mean value of all these local spatiotemporal descriptors. In this direction, we aim to conclude to some useful information, e.g., study how the learning skills for each performer or group of performers change over time and observe the differences in the style for users with different dance experiences.

During our motion analysis, we noticed some important observations regarding the two classes (regular dancers Vs. non dancers). First, the mean of the LMA feature measurements for the regular dancers have larger values than those of the non dancers, especially at the early exercises of the exercise. That means that the users with regular dance experience put more effort to perform the task than the non dancers. Figure 10 shows the mean values of the LMA-derived feature measurements ϕj of the performers for the two classes for all exercises (left), and the mean values of the performers on the right for the second (top) and the last exercises (bottom). It can be clearly observed that the two classes are easily distinguishable for the early exercises, but as we move forward to the latest exercises, these differences are getting smaller. Another important observation is that the standard deviation (std) of the LMA feature measurements for the regular dancers is larger than those of the non dancers (refer to Fig. 11). This indicates that the movements of the regular dancers are more variant, while the non dancers movements are more compact. One should expect that professional dancers will be more consistent in their movements, and non dancers will have larger variation. However, there are many reasons for this peculiarity in the dancers’ motion measurements. Unlike non-dancers who put the minimum required effort to do the experiment, and only perform the absolutely basic steps required by the VR application, dancers tend to put more effort on their movements, since each dancer has its own individual dancing style/improvisation/accent, that may be different from others, resulting in larger variation in their LMA feature measurements. In addition, since the dancers who participated in our experiments have no experience with VR environments, while the non-dancers have, we believe that previous VR experiences have a substantial impact on the performance of the participants.

Fig. 10
figure 10

The mean values for the LMA-derived feature measurements ϕj. The mean values of the performers for all exercises is shown on the left. On the right top is the mean values of the second exercise, and at the bottom is the mean values of the last exercise

Fig. 11
figure 11

The average standard deviation of the LMA features for the regular dancers (blue) and non dancers (red) over all exercises (left image). On the right, top image, is the standard deviation of the LMA features for the second exercise, and at the bottom, for the last exercise

Also, we have studied the effect of our system on the personal style of the dancers. As illustrated in Fig. 11, the std of the LMA features for the non dancers remains unchanged over time, since non dancers usually oversimplify their movements to only those steps that are required by the system. In contrast, the std of the LMA features for the regular dancers seems to converge in later exercises, isolating their personal style, the stylistic nuance of their movement, and their improvisation; std in the last exercise has declined by 20% compared to the second exercise. This indicates that, in a similar way to the case of real teachers, users are getting familiar with the VR environment and accumulate the style of the system (teachers).

Similarly to Section 6.1, we visualize the differences between the two classes, using t-SNE. Figure 12 illustrates the 2D embedding of the two classes, regular dancers and non dancers. It can be observed that the two classes can be separated, at least for the early exercises, but as users become more familiar with the VR environment, and its tasks, they are mixed (it is more difficult to be separated). Figure 12 shows the 2D embedding for the second (left) and last exercises (right).

Fig. 12
figure 12

The 2D embedding of the two classes using the mean LMA-derived arrays for the second (left) and final exercises (right). It can be observed that, for the early exercises of the experiment, the two classes can be separated based on the performers learning skills, but as we move forward to later exercises, the dancing skills of the non dancers are converging to those of the regular dancers

In addition to the LMA analysis, we evaluated the stylistic behavior (signature) of the movement of the participants, and how it evolves over time. More specifically, we extracted the LMA-derived arrays for all the performances, and similarly to Aristidou et al. [5], we represented each performance by the distribution of its LMA-derived arrays. We positioned all these arrays into a d-dimensional space (d = 10), using Multi-Dimensional Scaling [36], clustered them in this space using K-means (K = 100), and then computed the normalized histogram of the frequency of these arrays for each performance (similar to the concept of bag-of-words). Thus, each performance is succinctly characterized by the distribution of is LMA-derived arrays; stylistically similar performances have a resemblance distribution, while stylistically dissimilar performances have a different distribution. The distance between these LMA-derived arrays was computed using the Earth Mover’s Distance (EMD) metric [35]; note that, EMD performs better than the Euclidean distance, or the Pearson Correlation Coefficient that was originally used in [3]. Again, we applied t-SNE for dimensionality reduction, and the 2D embedding of the two classes for the second and last exercises is illustrated in Fig. 13. Again, it can be observed that the two classes are separable in the early exercises, but tend to converge and be inseparable at the latest exercises.

Fig. 13
figure 13

The 2D embedding using t-SNE, when the two classes are represented by the distribution of their LMA-derived arrays [5]. Again, it can be observed that the two classes can be partly separated in the early exercises (left), but as we move forward to the latest exercises (right), the two classes are mixed together. This indicates that our dance VR application helps the participants to improve their dancing stylistic behavior

7 Conclusions and future work

We have designed a VR application that simulates salsa dance practice. In our VR environment, the user interacts with a virtual partner via hand to hand contact using controllers and can control the salsa dance pattern’s transitions similarly to real dance situation. A six points skeleton of the user is motion captured to provide enough data for analyzing the enforced performance. As a validation, we made an experiment that consists of a series of 8 exercises with different tempos in which the user leads the movements of the virtual partner with specific gestures at given times, as in real life salsa scenarios. We acquired the motion of 40 participants divided into two groups of people from different dance experience, the non dancers and the regular dancers. The performance was evaluated using MMF and LMA features, which show a clear difference before and after training using our dance VR environment, and significance to classify people upon their learning profile. The results demonstrate an overall improvement of the dance skills for the non dancers, and a more uniform profile, that is converging towards the regular dancers profile after training.

Our method has some limitations. First, the gesture and timing required to trigger the dance pattern transition felt not enough natural for some users, as there is more complex mechanical interaction to be taken into account. Secondly, the learning duration of our training was too small for some users that shows an understanding of the VR technology. By having longer learning sessions, we expect that the users will feel more comfortable and familiar with the application. In future work, we aim to extend the learning study for a more extended period, e.g., one month with two training sessions per week, to evaluate a more significant impact in terms of performance improvement. Moreover, the diversity of users dance profile was quite broad, and thus, it was challenging to come up with definite conclusions. For example, some of the non dancers participants have some minor dance experience or extensive experience with virtual reality applications, and that was not taken into account in our analysis and classification. We want to investigate a more extensive diversity of dancers, that could be categorized based on their experience, e.g., expert dancers, regular dancers, amateur dancers, and non-dancers. Other information, such as previous experiences with virtual reality platforms and applications, age, gender, will also be taken into consideration. Finally, for future work, we also foresee to provide many real-time hints, such as audio clues or the presence of a virtual teacher, to help users to assimilate the given tasks better and improve their skills. The mechanical interaction with the virtual partner can be improved with a more complex vibration-feedback system. From the two motion used in this study, more salsa movements can be investigated as turns and spins. Finally, we look forward to investigate the remaining criteria, as reported in [39], which are more challenging.