1 Introduction

Nowadays, many educators seek to develop in students the competencies that they will require in the current employment reality. However, at the same time, most educational systems still use traditional methods of instruction [37], without taking into account the great technological advances that have taken place for educational purposes in the last decades. The truth is that interactive learning environments (ILE), learning management systems (LMS), intelligent tutoring systems (ITS), and personalized learning environments (PLE) have proliferated in all education sectors, producing a vast amount of data. Nevertheless, although these learning environments store user data automatically, the data exploitation for both learning and teaching is still very limited [13, 33].

Educational datasets offer opportunities for learning theories evaluation, feedback and learning support, early warning systems, learning technologies, and future applications development. At this point, learning analytics (LA) turns out to be an interesting approach, since it allows to measure, collect, analyze, and present data about students, their contexts, and interactions. All these data help to understand the learning process that is being developed, as well as to optimize the environments in which it occurs [34]. However, LA also has important limitations. On the one hand, it mostly focuses on online courses and cognitive tutors with written and structured activities [5]. On the other hand, the entire interaction process occurs in front of a computer, i.e., during human-computer interactions that do not allow analyzing interactions between participants [6]. These limitations are overcome by the multimodal learning analytics (MMLA) approach that analyzes natural communication modalities (e.g., speech, writing, gestures, look) during the educational processes [27].

Nowadays, companies require highly competitive professionals that have necessary skills to confront new challenges [10, 29]. However, current evaluation techniques do not allow to define in a consistent manner whether the student (and future professional) has developed highly valued skills in the work environment, such as collaboration, teamwork, or effective communication. Given the above, we present an application that integrates data generated from voice sensors (microphones) in order to analyze behavior in collaborative activities for post-visualization and analysis through multimodal learning analytics techniques.

Once the data are collected, the collaborative discussion groups are modeled as influence graphs [20], i.e., labeled, weighted, directed graphs that allow to represent the dynamic flow of information that circulates among the individuals. Influence graphs have been used before on multiagent systems [19], mediation systems [18], and decision systems [21]. This model allows to use social network analysis techniques, such as centrality measures, to identify either the most influential or the most active individuals during the discussion [15]. The results obtained allow to generate evidence that the integration of this type of solutions is highly valuable, for both educators and students.

This article continues as follows. Section 2 discusses some relevant works related to collaborative learning and MMLA. Section 3 presents the problem description. In Sect. 4, the developed solution is detailed, in terms of the technical environment and the social networks model. Section 5 presents the case study, corresponding to different discussion groups, each of which collaborates for the successful execution of a task. The result analysis is presented in Sect. 6. We finish by presenting our main conclusions and future work in Sect. 7.

2 Related work

Traditional approaches to individual social theory and data analysis focus on individual decisions without considering the behavior of others. This type of approach ignores the participant’s social context. In social network analysis (SNA), the relationship between participants becomes the main priority, and the individual properties go to a secondary level [25]. SNA employs a broad approach to sociological analysis and a set of methodological techniques whose aim is to describe and explore the apparent patterns in the relationships between individuals and groups [32].

According to Dillenbourg [9], the term “collaborative learning” describes a situation in which there is an expectation that certain interactions will occur in order to derive learning mechanisms. However, there are no guarantees that these interactions will occur. For this reason, the concern is to find ways to increase the chances that these types of interactions will occur.

For Blikstein and Worsley [6], a key objective of Learning Analysis is to develop methods that quantify nonstandardized learning forms. Given the current context, which integrates forms of learning based on the development of projects and pedagogies centered on the student, it becomes necessary to find ways to measure certain parameters that allow quantifying learning. This is where MMLA is important, since it takes advantage of advances in the capture and processing of multimodal data signals in order to study a variety of aspects relevant to learning developed in complex collaboration environments [24]. In addition, Oviatt [26] points out that most learning evaluations are based on the existence of precise metrics as obligatory functions. The author indicates that MMLA is expected to produce more variety of perceptible, objective metrics and linked to behaviors within a learning context. Finally, she emphasizes that the discrete, continuous, and automatic analysis provided by MMLA allows development in a natural field setting and is more affordable than traditional educational evaluations. In his work with MMLA, Oviatt [26] set out to determine the level and consolidation of experience in a fast, reliable, and objective manner. For this, he analyzed groups of students who had to solve math problems of different levels. The data were captured from different sources: video cameras, digital pens, and microphones. The results showed that performance during the development of the collaborative activities for most experts was entirely different from that of the nonexperts. The groups with the most expertise showed a collaborative activity at least four times higher in the resolution of the problems, and a four times greater probability to provide correct solutions.

Andrade et al. [3] showed how traditional education research could benefit from MMLA. They use different data sources such as posture, gesture, look, language, and speech in order to predict different epistemological frameworks that students adopt during interviews. After that, they establish the level of reasoning made by the students, according to the grouping of these frames.

Chen et al. [7] developed an experiment to find useful multimodal features for the automatic estimation of speaking skills in public spaces. In addition to being a study based on multimodal experimentation, the article demonstrates the feasibility of the automatic evaluation of public speaking skills through MMLA.

Recently, Cukurova et al. [8] conducted an empirical study on young students who participate collaboratively in solving problems. They showed that the differences presented in the group behaviors show a possible relationship between the competence of a collaborative group and the observed behavior, similar to the work done by Oviat previously mentioned [26].

According to the above, we can observe the interest of researchers for improving learning environments by collecting data and then carrying out data analysis. The main idea is to analyze the natural environment where the learning process occurs (naturally subject to multiple forms of interaction) and categorize the participants according to previously defined criteria. The information provided by this type of analysis allows a much broader vision than that obtained with traditional methods of evaluation, since each interaction captured reveals some aspect in the learning process that may be key to understanding the relationship between what happened during the development of the activity and the result that has been obtained. The final purpose is to measure the learning in an objective manner, which would allow delivering a feedback much more aligned to the observed behavior of the student.

3 Problem description

Educational institutions have the responsibility to train highly qualified students [2]. These students should be equipped with the skills necessary to confront new challenges. However, the evaluations used to gauge skills are standardized tests that in most cases do not reliably evidence the student skills [12]. In classrooms, the same phenomenon occurs; students are evaluated according to test results and not through a behavioral analysis regarding what has been developed in a given activity. Fail in evaluations implies an uncertainty regarding the skills acquired by the student.

Qualitative approaches based on video-recorded sessions, student written reports, or rubrics, have been explored [1]. However, these approaches do not reliably allow capturing and characterizing on a large scale the same student’s problem resolution processes [36]. Furthermore, these approaches do not allow to monitor learning progress. How learning is measured greatly differs from how teaching–learning situations actually occur; hence, it is difficult to assure that true results are actually developed.

In order to obtain true learning results, there are necessary, firstly, teachers monitoring the student behavior development, and secondly, evaluations based on representative metrics of the student’s progress. This kind of monitoring is complicated in classes with a large number of students [17]. Moreover, it becomes complicated to visualize the performance of heterogeneous students groups [22]; i.e., how the behavior of some student can influence in another, or even the entire group.

In addition to the mentioned challenges, feedback is a crucial aspect in the educational process, as it can support the students’ academic performance and promote their motivation and self-reflection skills [31]; this may be a strategy to reduce the gap between current and expected performance. However, properly obtaining and analyzing data for student feedback is a time-consuming task for educators [14]. Hence, appropriate or adequate formative feedback is not a very frequent practice [28]. Nevertheless, according to [4], academic feedback is more strongly and consistently related to achievement than any other teaching behavior. Faced with the aforementioned challenges, our objective is to provide a tool that allows the teacher to visualize, in a simple manner, the aspects of social groups interaction, in this case, students collaborating to solve a problem, in addition the possibility of generating reports of the results obtained. In this way, we hope to generate additional input in order to objectively evaluate the teacher’s professional skills and encourage self-reflection in students.

4 Developed solution

To address this problem, a computer application has been developed for the capture, storage, analysis, and visualization of data coming from collaborative discussion groups. Speech data are captured by multidirectional microphones, and social network analysis techniques are used for the data analysis. Next, we introduce the system technical environment that includes both the high-level architecture and the developed system interfaces, as well as the used technologies. After that, we define the social network model used for data analysis.

4.1 Technical environment

Figure 1 illustrates the high-level architecture of the developed system. For the data collection, we use ReSpeaker devices, which are low-cost, multidirectional microphone arrays. These microphones allow voice activity detection (VAD) and the direction of arrival (DOA) for four individuals in a 3 meters capture radius.

Fig. 1
figure 1

High-level architecture of the developed system

The application receives the data from the microphones, which are preprocessed on Raspberry Pi devices and stored in a centralized database. Then, on the server, social network analysis techniques are used to process the data and generate the visualizations that are displayed to the client. All in all, the application functionalities are the following:

  • Group visualization, or visualizations regarding a specific group. It includes visualizations for participants interactions; precedence and intervention relationships; and specific visualizations for each participant: voice activation, number of interventions, voice intensity, activity and influence measures, speaking time, and interventions time;

  • Environment visualization, or visualizations regarding multiple groups. It includes general visualizations for group comparisons: group interventions, total speaking time, total number of interactions, average voice intensity per group, the most active and influential participant of each group.

Note that we have distinguished between “speaking time” and “interventions time.” By speaking time, we mean the time during which the microphone has detected that the same individual has been speaking continuously and without interruptions. By interventions time, we mean the time during which the individual has been speaking without being interrupted by another person. Namely, if an individual is speaking and stops for a moment, but then continues his/her speech, then two different speaking times are considered, but the same interventions time. Thus, an individual speaking time will always be less or equal to their interventions time.

All the generated files can be downloaded in a comma separated values format (CSV). This facilitates the data reuse in other applications. The software developed for data collection and data preprocessing was programmed with Python 2.7. The following libraries were used:

  • Numpy: handling multidimensional arrays;

  • Wave: audio files manipulation;

  • Csv: read–write in CSV files;

  • Webrtcvad: voice activity detection (VAD);

  • Voice_engine: direction of arrival (DOA);

  • Mic_array: ReSpeaker functionalities;

  • Pyaudio: audio recording;

  • Gpiozero: interfaces manipulation on Raspberry Pi (GPIO12 Button);

  • Unittest: unit testing.

The Web application was developed with the Django 2.0 and Python 3.5 framework. PostgreSQL 9.4 was used as database management system. For the server and client logic, the following libraries and plug-ins were used:

  • Numpy: handling multidimensional arrays (Python);

  • Dateutil: functionalities for handling dates (Python);

  • Unittest: unit testing (Python);

  • Bootstrap: toolkit for responsive applications development (CSS, JS);

  • Morris: graphics creation (JS);

  • D3js: visualizations (graphs, shapes, graphics, etc.) (JS);

  • JQuery: event handling, animations, and AJAX requests (JS).

Figure 2 presents some available interfaces for any given group. Interface (a) shows two speech detection graphics that represent the participants activity during the whole event. Each participant is represented by a different color. The chart above shows speaking times, while the chart below shows intervention times. Interface (b) illustrates the average voice intensity for each intervention of each participant. In addition, the interface displays the average intensity of all the interventions associated with each participant. Interface (c) displays interactions between participants, as well as the amount of time each participant spoke. This amount is proportional to the size of each node. Each node represents a participant and each edge represents the interaction between two participants, while the arrow indicates the direction of speech flow. The width of each edge is proportional to the number of occasions that the interaction occurred. This graph will be explained more formally in Sect. 4.2. Finally, interface (d) shows the number of interventions associated with each participant. Each circle distinguished by a color represents a participant, and his or her corresponding interventions are found within it. The size of each intervention (orange circles) is directly proportional to the duration of the intervention.

Fig. 2
figure 2

Interfaces of the developed solution

4.2 Social network model

In this section, we delve into the social network analysis techniques used in the server module of the system architecture (see Fig. 1).

A discussion group can be represented by a directed multigraph (VE), where the vertex set V represents the collaborators and the edge set E the speech flow that is formed between the collaborators through time. More precisely, an edge \((a,b)\in E\) means that the sendera has transmitted a message to the group, which has been directly received by receiverb, who will be the next sender to intervene in the discussion. In this way, each edge has an associated time stamp, which allows recreating the entire discussion from beginning to end. The ReSpeaker does not have the ability to detect participants speaking exactly in unison. Therefore, when the ReSpeaker detects two sounds or voices from different sources, it determines as the dominant direction the one with the signal with higher intensity.

In a collaborative work session, the number of interventions can be very high (see Sect. 5), which translates into a multigraph with too many edges, difficult to visually understand. Therefore, we propose to use influence graphs as a more compact way to represent discussion groups without loss of information [20]. An influence graph (VEwf) is a labeled, weighted, directed graph, where \(w:E\rightarrow {\mathbb {R}}\) is a weight function that assigns a weight w(ab) to each edge (ab), and \(f:V\rightarrow {\mathbb {R}}\) is a label function that assigns a label f(a) to each vertex a. In this context, V represents the same collaborators set as before and E represents the total speech flow between the collaborators, during the whole discussion session. The weight w(ab) for edge (ab) represents the number of interventions issued by the sender a that were replicated by the receiver b. The label f(a) for vertex a represents the interventions time of collaborator a (in seconds).

Influence graphs provide several advantages, such as the possibility of using a more quantitative analysis approach, by using social network analysis techniques. In particular, we will use centrality measures, which are measures to quantify the relevance of each actor within the network. There are dozens of centrality measures, and new measures constantly appear [30]. However, the measures must be chosen according to the context, and many of them do not adapt naturally to discussion groups. For instance, measures based on eigenvector centrality, such as PageRank [16], do not seem to be adequate in collaborative networks of few vertices, where all individuals interact with each other. Similarly, measures such as closeness are not interesting in this context, since the shortest path between two vertices will usually be 1 or very close to 1. Therefore, here we use measures based on variations of the classic in-degree and out-degree centrality [11], though considering both the weight function and the label function on the influence graph. We consider two kinds of centrality measures: activity measures and influence measures. Regarding the activity measures, we consider the permanence (\(A_1\)), based on the total time of the individual’s interventions, and the persistence (\(A_2\)), based on the total number of the individual’s interventions. Regarding the influence measures, we consider the prompting (\(I_1\)) that allows us to identify the idea starters [35]. Formally, let (VEwf) be an influence graph representing a discussion group, then for all networks' individual \(i\in V\), we have:

$$\begin{aligned} A_1(i)= & {} \frac{f(i)}{\sum _{i\in V}f(i)} \quad A_2(i)=\frac{\sum _{a\in V}w(a,i)}{\sum _{e\in E}w(e)} \\ I_1(i)= & {} \frac{\sum _{b\in V}w(i,b)}{\sum _{e\in E}w(e)} \end{aligned}$$

As usual, the denominators allow to normalize the measures.

5 Case study

To test the utility of the developed system, an activity was generated in which 44 bachelor students had to solve an engineering problem with everyday materials [38]. To achieve this, first the students were randomly separated into groups of 4. After that, each group had to create a structure that was capable of supporting a 0.8-kilogram book. In addition, the groups had three restrictions to consider their task as successful:

  1. 1.

    Material restriction: the used material should be a piece of standard letter paper;

  2. 2.

    Time restriction: students had a maximum of 4 min to complete the challenge;

  3. 3.

    Height restriction: the structure should hold the book at a minimum distance of 3 cm from the surface (in this case, the table).

Figure 3 shows the environment where the activity was executed and some of the found solutions to the challenge.

Fig. 3
figure 3

Work environment

To analyze the results, we generated 11 influence graphs with four vertices each. Due to the reduced size of each group and the task nature (people talking for several minutes in a highly collaborative dynamic, without any formal mediation), typically all individuals reach to interact with each other and hence almost all these graphs are complete.

6 Results analysis

In this section, we will explain how influence graphs may help to find and visualize nontrivial information regarding student interactions in collaborative working groups. We shall see that this information can be useful to classify the different working groups, as well as to support complex decision-making processes.

Figure 4 illustrates the 11 groups considered in the case study, represented as influence graphs. At the top left of each graph is indicated whether or not the group successfully completed the task within the stipulated time. In all what follows, we denote S1, S2, S3, and S4 to the different students associated with each microphone. S1 is marked in red, S2 in blue, S3 in green, and S4 in yellow. The size of each node i reflects its label f(i). The thickness of each edge e reflects its weight w(e).

Fig. 4
figure 4

Influence graphs generated by the application developed

At first glance, it is possible to note that all influence graphs, except for Groups 4, 8, and 9, are complete. By definition, the lack of an edge means that during the entire activity, there was a student who never intervened after the comments of one of his teammates. In the context of this study, in which group collaboration is encouraged, we might expect that the longer the activity time, the more likely it is to obtain complete graphs. However, in practice we can observe that interventions between individuals can vary considerably between one group and another. In fact, it may even be the case of asymmetrical communication dynamics, opposed to the collaborative nature of the activity. Furthermore, note that Groups 4, 8, and 9, despite their asymmetry, successfully completed the activity. As a matter of fact, the type of communication (symmetrical or asymmetrical) cannot explain by its own the success or failure of a collaborative activity.

That is why it is interesting to notice how different dynamics can reach a successful task. For instance, from the edge weights, it can be seen that the relationships in Groups 5, 7, 10, and 11 are much more symmetrical than in Groups 4, 8, and 9. Moreover, the number and characteristics of the most dominant students also differ between one group and another. Group 4, for example, presents a strong and successful collaboration between students S1 and S4, leaving the other two students in a rather secondary role. Remarkably, in all the successful groups, there is a strong collaboration among the most active network participants. In what follows, we will analyze in more detail the data obtained with the ReSpeaker devices. Throughout the whole analysis, the influence graphs will help visualize the results found.

Table 1 details the number of interventions of each student, the total number of interventions of each group, and the total activity duration for each group, i.e., the time since the microphones were turned on (initial silence, without activity of any student) until they were turned off. Note that the instructions were given to each group before the microphones were turned on. Group 1 reached the highest number of interventions, with a total of 303. By contrast, Group 8 was the one with fewer interactions, with only 36 interventions. On the other hand, Group 3 was the slowest in the task, with almost 4.9 min, while Group 8 was the fastest, with a little over 1 min. In fact, Groups 1, 2, 3, and 6 failed to complete the task, since they exceeded the 4 min indicated in the restrictions, without being able to solve the problem. Note also that the total number of interventions does not correlate with the total duration of the event. Indeed, as we can see in the last column of Table 1, Group 10 was the group that most interacted with each other, in proportion to the short time it took to solve the task. Instead, note that other groups such as Group 8, although they also quickly resolved the task, have a much lower intervention rate. This means that, if we considered collaboration (in a very simplified way) as the simple act of dialogue, then although both groups were successful, Group 10 achieved the goal in a more collaborative way than Group 8. The latter can also be seen in the graphs of Fig. 4: Group 10 has three large nodes with strong interactions between them, while Group 8 is incomplete and has only one large node, with asymmetrical interactions with the rest of the nodes.

Table 1 Number of interventions, total duration of each event (in seconds), and intervention rate per event duration (num/seconds)

Both speaking and interventions times are shown in Tables 2 and 3, respectively. As we discussed in Sect. 4.1, speaking time is always less than or equal to the corresponding interventions time, because the second one also considers silences between the same speech. Moreover, the total interventions time for a group is always less than or equal to the event duration (see Table 1) because the duration also considers the initial silence of the activity, when there is still no activity on the part of all the students. It is interesting to note that, although Group 1 presented the highest speaking time by far, Groups 2 and 3 exceed it in terms of interventions time. None of these three groups successfully completed the task. However, it can be seen that the generated dynamics were quite different, since of the three, Group 1 had a much more active collaboration than the other two.

Table 2 Speaking times (in seconds)
Table 3 Intervention times (in seconds)

With the previous data, it is possible to compute the centrality measures defined in Sect. 4.2. These results are shown in Table 4. Note that the results of these measures can be complemented with the influence graphs illustrated in Fig. 4. In effect, the sizes (i.e., labels) of the nodes are related to the permanence of each student; the thickness (i.e., weights) of the edges of arrival at a node is related to the persistence of the student and the thickness of the edges of exit of a node, to the prompting of the student.

From these data, relevant information about particular students can be extracted. For instance, the most active and influential student at Group 1 was S1, because it has the highest values in permanence, persistence, and prompting. However, we see that this activity and leadership did not allow the group to complete the task. Another interesting example is Group 4. Here, the student with the most permanence was S4. Proportionally, it is even more active than user S1 of Group 1. However, student S1 in this group was more persistent and prompting than S4, and thus more influential. Therefore, here we see a network dynamics with two relevant collaborators instead of a single dominant one. In this case, this dynamic paid off, since they managed to successfully complete the activity.

If we focus on the groups that managed to accomplish the task successfully, we can note that the collaboration was present in all the groups. However, we were able to identify at least three different types of collaborative dynamics that worked. As already mentioned, Group 4 and also Group 9 have two relevant collaborators with a symmetrical interaction. In contrast, Groups 5, 7, 10, and 11 presented a more extended collaboration, which included at least three of the four students, also following a relatively symmetrical relationship. From all these cases, it seems that strongly symmetrical relationships (that is, based on dialogue) favor the accomplishment of collaborative tasks. However, a third type of interesting dynamics occurred in Group 8. From Fig. 4, we can see that there is a circular relationship between students S1–S4–S3, with S4 being the most dominant actor. In this case, there is no symmetry S4 used to respond to S1 (with long interventions), and then S3 responded to S4, which in turn was replied by S1, and so on. If we also consider the small number of interactions between the different participants (Table 1) and their scarce interventions and speaking times (Tables 2 and 3), we could interpret the dynamics of this group as constructive and silent, rather than collaborative. In effect, in this case each participant was proposing alternatives or improvements based on what the previous partner had proposed. This shows that, although less common, there are alternative dynamics to strongly collaborative ones that can arise spontaneously in order to solve this type of problems.

This also demonstrates the need to use different centrality measures to understand a phenomenon, since each measure provides a different criterion to identify the most relevant collaborators within a social network. Finally, note that both the persistence and prompting measures returned very similar values, and even the same for several groups. This is due, on the one hand, to the fact that networks with a small number of collaborators generate fairly symmetrical relationships, and on the other hand, to the cooperative nature of the activity. It would be interesting to see how both measures distance themselves in less collaborative contexts, such as meetings guided by a formal mediator.

Table 4 Centrality measures for each student

7 Conclusions and future work

Disciplines evolve when they can be assessed. Under this premise, incorporating learning analytics in contexts where transformations occur is of vital importance for the continuous improvement of education processes [23]. Learning analytics is very important in education, since it helps to evaluate the students performance not only at the end of the process, but during the entire process. Furthermore, information technologies are a vital support for multimodal learning analytics, since they can support complex performance measurements, facilitate the storage of large amounts of data, and perform intelligent data analysis.

In this article, collaborative discussion networks among students were modeled through influence graphs. This allows immediate visualization of interrelations between subjects, in order to support complex decision-making processes. Furthermore, it allows the use of social network analysis techniques, such as centrality measures, to improve data analysis.

The computational environment proposed in this article focuses exclusively on the analysis of voice interventions using ReSpeaker devices. A first limitation is the recording of environmental noise, which makes it difficult to carry out during outdoor activities or in noisy spaces. Secondly, the data obtained do not consider various relevant aspects, such as the profile of the students, their previous knowledge, the way in which the groups are formed, or their nonverbal language during the activity. Given the above, it would be interesting to complement the studies with video recordings and surveys to profile the study groups.

Modeling discussion groups as social networks opens up numerous questions and possible lines of research. It would be interesting, in the future, to study other types of work groups and compare their behavior: for example, to compare the collaborative dynamics studied in this article with other mediated dynamics, in which leaders and mediators emerge spontaneously. It is interesting to contemplate to what extent the idea starters considered here relate to the mediators, or if it would be necessary to implement a new measure to be able to identify the latter. Furthermore, a future aspect to develop is related to the collaboration that happens between groups. For this, as future work, we intend to integrate beacons into group tables in order to measure how students interact at the time of solving a complex problem that requires collaboration from other groups to be solved.