Introduction

Computer-Supported Collaborative Learning (CSCL) has gained a broader usage in multiple educational scenarios (Stahl et al. 2006). CSCL technologies facilitate the development of learning environments in which knowledge is collaboratively built and shared (Stahl 2006), based on the inter-twining of collective and individual learning processes (Cress 2013). Moreover, CSCL has become a viable alternative to classic learning environments as it can be employed in various settings, such as Massive Open Online Courses (MOOCs) or collaborative serious games (Hummel et al. 2011). At the same time, the need for automated tools capable of supporting and evaluating the involved actors has become more evident given the time-consuming processes involved in the analysis of multi-participant conversations (Holmer et al. 2006). For example, Trausan-Matu (2010a) reported that the time required for a thorough analysis greatly exceeds the actual duration of the chat session, rendering the manual evaluation process impossible for large corpora.

In terms of defining the variables of our analysis, collaboration can be perceived as a measure of interaction among participants centered on sharing ideas, fostering creativity for working in groups (Trausan-Matu 2010b), and influencing others’ points of view during the discussion. Complementary, participation represents the active involvement of members in ongoing CSCL conversations, which can be seen as independent processes that do not solely consist of collaboratively exchanging ideas with other participants. The number of uttered contributions can be considered the baseline for participation, whereas collaboration is reflected in the utterances addressed to other participants that bring a contribution to the knowledge-building process. As such, our principal interest lies in automatically assessing collaboration within CSCL text-based, multi-participant interactions, and in particular, those performed within educational contexts. In order to achieve this goal, we propose two computational models based on dialogism and cohesion, two core concepts that can be used to highlight collaboration zones and become signatures of collaboration between different participants.

Dialogism

The concept of dialogism was introduced by Bakhtin (1981) and covers a broader, more abstract and comprehensive perspective of continual dialogue that exists in any type of text. Dialogism is centered on the dialogue reflected in “any kind of human sense-making, semiotic practice, action, interaction, thinking or communication” (Linell 2009, pp. 5–6). This definition of dialogism, besides the intrinsic dialogue between individuals, may well be present in any text as “life by its very nature is dialogic … when dialogue ends, everything ends” (Bakhtin 1984, p. 294). In addition, dialogue can be also perceived as ‘internal dialogue within the self’ or ‘internal dialogue’ (Linell 2009, ch. 6), ‘dialogical exploration of the environment’ (Linell 2009, ch. 7), ‘dialogue with artifacts’ (Linell 2009, ch. 16) or ‘dialogue between ideas’ (Marková et al. 2007, ch. 6). Regardless of context, discourse is modeled from a dialogical perspective as interaction with others, essentially towards building meaning and understanding.

Dialogism offers a well-grounded theoretical framing to automated discourse analysis and in particular, CSCL. Its key features are multivocality and polyphony (Koschmann 1999), both tightly connected to the core concept of voice. In a nutshell, a voice expresses a distinct point of view, a position within the dialogue, and is reflected in concepts, utterances or events that will further influence the conversation (Trausan-Matu 2010a). Therefore, a voice can be perceived as individual or collective perspectives on topics (Linell 2009) that are socially generated and sustained in the “circulation of ideas” (François 1993; Hudelot 1994; Salazar Orvig 1999). Individuals internalize and assimilate these ideas, and re-emit them as personal points of view or voices centered on the topics of the conversation. The overall conversation becomes analogous to a “voting” of uttered ideas, followed by an alignment to other individuals who share similar perspective (Linell 2009).

Starting from the definition of voices, multivocality is centered on the multitude of meanings and the dialogue between multiple voices. Even further, polyphony, a central concept within our analysis, encapsulates multiple points of view or voices while focusing on their inter-animation, as well as the inter-relationships captured by their co-occurrence and overlap. Moreover, in addition to multivocality, polyphony is characterized also by a coherent achievement of the participating voices.

Following the perspective of Bakhtin (1981), the inter-animation of voices is generated by the influences between utterances, their interaction one with another, as well as one’s reflection onto another (Trausan-Matu et al. 2007b). This process of voice inter-animation occurs progressively from simple repetitions to complex referential relationships between utterances. Moreover, aside from providing a theoretical starting point for developing tools to instruct thinking skills (Wegerif 2006), dialogism and the underlying inter-animation of voices become key components for ensuring the success of a collaborative learning activity. To further elucidate the concepts of polyphony and voice inter-animation, Tables 1 and 2 present chat excerpts corresponding to different scenarios.

Table 1 Conversation sample highlighting a dense inter-animation of voices (e.g., “”, “”, “” technologies used to define the best “” in the context of the semantically related concepts of “” that can be perceived as a background voice), as well as a high collaboration between participants
Table 2 Conversation sample denoting a low inter-animation of voices as the dialogue is centered on only the “” voice that is presented in terms of “” and “” voices, as well as low collaboration due to the monologue of one participant

Text cohesion

Besides dialogism, a key element of analysis in terms of discourse structure is cohesion. Halliday and Hasan (1976) introduced the notion of cohesion as “relations of meaning that exist within the text, and that define it as a text.” (Halliday and Hasan 1976, p. 4). Cohesion provides overall unity and is used to establish the underlying structure of meaning. In other words, cohesion addresses the connections in a text based on features that highlight relations between constituent elements (words, sentences, or utterances). Overall, textual cohesion can be perceived as the sum of lexical, grammatical, and semantic relations that link together textual units. High cohesion usually models a consistent information flow, whereas cohesion gaps indicate in most cases topic changes corresponding to different discussion threads or off-topic contributions (see Tables 3 and 4).

Table 3 Conversation sample denoting a lower cohesion between adjacent contributions specific to brainstorming sessions – multiple topics and inter-twinned discussion threads can be observed (e.g., all discussion topics are clearly highlighted as voices that pertain to multiple users: , , , )
Table 4 Conversation sample denoting a high cohesion between more elaborated contributions centered on the benefits of “” and their corresponding “

Transition toward automated computational models

To date, only a few CSCL models based on dialogism have been proposed, and even fewer approaches provide automated analytic tools – for example, Dong’s use of Latent Semantic Analysis (LSA) of design-team communication (Dong 2005), Polyphony (Trausan-Matu et al. 2007a), the Knowledge Space Visualizer (Teplovs 2008), and PolyCAFe (Trausan-Matu and Rebedea 2010; Dascalu et al. 2011; Trausan-Matu et al. 2014). As a detailed comparison to other computational models is more suitable after providing an in-depth view of our models, the Discussion section highlights similarities and differences to three major approaches: the contingency graph (Medina and Suthers 2009; Suthers and Desiato 2012), transacts (Joshi and Rosé 2007; Rosé et al. 2008), and Teplovs (2008) knowledge space visualizer.

In this paper, we propose two computational models integrated within our framework – ReaderBench (Dascalu et al. 2013a). The first one, the dialogical voice inter-animation model described in the following section evaluates collaboration as an intertwining or overlap of voices pertaining to different speakers (Dascalu et al. 2015a). The second approach, the social knowledge-building model (Dascalu et al. 2013a, 2015c), represents a refinement of gain-based collaboration assessment (Trausan-Matu et al. 2012b) and takes full advantage of the cohesion graph (Trausan-Matu et al. 2012a). In order to implement this model, we introduce the Cohesion Network Analysis (CNA) in section 3 as a means to score utterances and to analyze discourse structure within collaborative conversations. Both models are then used to assess the degree of collaboration between participants and to identify intense collaboration zones. Table 1 is a representative example of such an intense collaboration zone, which has, in that particular case, both a dense inter-animation of voices, as well as a high cohesion between contributions.

As an initial comparison between our two models, collaboration is regarded within the dialogical voice inter-animation model as the intertwining or overlap of voices pertaining to different speakers, therefore enabling a transversal analysis of subsequent discussion slices. On the other hand, the social knowledge-building model based on CNA (Dascalu et al. 2013a) can be perceived as a longitudinal analysis accounting for collaboration from a social knowledge-building perspective. Afterwards, in Section 4, we validate the two computational models by comparing the predictions generated by ReaderBench with human annotations of collaborative conversations. In the end, we compare our models to other computational approaches, discuss their benefits and limitations, and conclude with future research paths. As an overview of the performed analyses, Fig. 1 presents the key concepts and methods of both computational models, as well as all of the automated indices used to predict collaboration, described in detail in results section.

Fig. 1
figure 1

Visual representation of collaboration assessment based on both dialogical and social knowledge-building models

From a more pragmatic perspective, this study represents an extension of the initial model (Dascalu et al. 2013a), which has now been further validated within an educational setting. Moreover, this paper represents an integrated view of dialogism (Dascalu et al. 2015a) and cohesion-based (Dascalu et al. 2015c) models which were previously presented separately. In contrast to simpler models which rely on counting the number of utterances exchanged between different speakers or the underlying links (Mislove et al. 2007), our models support the idea that dialogism and cohesion are salient predictors of collaboration. Therefore, signatures of collaboration emerge by modeling the interactions between participants through textual cohesion and voices’ inter-animation. In addition, it is common for tutors to attempt to detect breaks in conversations that have limited or no collaboration or intense collaboration zones in learners’ productions. Automated methods, such as those implemented in ReaderBench (Dascalu et al. 2013a; Dascalu 2014), provide crucial support to tutors in extracting such zones.

The polyphonic model and collaboration derived from voice inter-animation

Philosophical implications of dialogism and the polyphonic model

One of the most important ideas of CSCL is that learning can be seen as a collaborative knowledge-building process (Bereiter 2002; Scardamalia and Bereiter 2006). Small groups of students interact (Stahl 2006) and inter-animate (Trausan-Matu and Stahl 2007), rather than participate within knowledge transfer from the teacher to the learner. Moreover, if students receive tasks in their Zone of Proximal Development (ZPD) (Vygotsky 1978), the learning process may be seen as having two intertwining cycles: a personal one and a social knowledge-building one (Stahl 2006).

In order to properly introduce the polyphonic model presented in detail later on within this section, we must first present the three core and inter-dependent concepts of discourse analysis: utterances, voices and echoes. While utterances are defined as the main units of the analysis, voices may be considered to represent distinctive points of view emerging from the ongoing discussion. On the other hand, echoes represent the replication of a certain voice, the overtones and repetitions of the specific point of view that occur later on, with further implications in the discourse. Although the complexity of an utterance may vary greatly from a simple word to an entire novel (Bakhtin 1986), our analysis adheres to Dong’s perspective of separating utterances based on turn-taking events between speakers (Dong 2009). Therefore, a new point of view or contribution from a different participant may divide the discourse by potentially modifying the inner, ongoing perspective of the current speaker. At a more fine-grained level, words, seen as the constituents of utterances, provide the liaisons between utterances and deepen the perspective of others’ contributions into one’s discourse. Obviously, utterances may contain more than a single voice, as well as alien voices to which the current voice refers (Trausan-Matu and Stahl 2007). An alien voice is part of a turn uttered by a given participant that is later replicated in another one, marking therefore the transfer among different participants and their corresponding points of view with regard to the voice’s central word.

In addition, if we consider the case of CSCL using instant messenger (chat), the collaborative knowledge construction in small groups necessitates the negotiation of participants’ perspectives (Stahl 2006). Any negotiation comprises both divergences and agreements among participants’ opinions. In CSCL chats, students articulate personal beliefs (Stahl 2006), they write utterances that contain ideas mediated by words. These utterances contain each student’s personal ideas but they also contain others’ ideas. We may say that they revoice others’ utterances (Trausan-Matu et al. 2014). Following the musical metaphor introduced by Bakhtin (1981), during the chat conversations, the divergences and agreements among participants’ opinions may be seen as dissonances and consonances among voices (Trausan-Matu et al. 2007b).

The utility of the musical metaphor for CSCL may be more evident if we refer to the polyphony phenomenon, which was considered as an ideal model for collaborative sessions (Trausan-Matu 2010a). Polyphony can be described as a group of voices jointly constructing a harmonious musical piece while each voice keeps its individuality. An important aspect of polyphony is that dissonances appear and are needed for assuring novelty, but these are eventually resolved. Therefore, conflicting views, various angles, and multiple perspectives can emerge, generating a truly collaborative conversation. However, as voices express ideas and opinions, the polyphony perspective can be used to perform a deep dialogical discourse analysis by summing up multiple voices co-occurring within the same discussion thread.

Starting from the polyphony phenomenon, Trausan-Matu and colleagues introduced a polyphonic model of CSCL (Trausan-Matu et al. 2006, 2007b; Trausan-Matu 2010a). The topics of discussion in students’ CSCL chats can be seen as voices that inter-animate. Due to the specific individual features of each voice, differences appear manifested in dissonances that, for the sake of a coherent discussion, need to be resolved towards consonances, as in a polyphonic music piece. Each utterance contains both individual (inner) and alien (echoed) voices. The analysis of knowledge construction in groups should consider both these contributions. Therefore, the polyphonic model focuses on the notion of identifying voices in the analysis of discourse and building an internal graph-based representation, whether relying on the utterance graph (Trausan-Matu et al. 2007a) or the previously defined cohesion graph (Dascalu et al. 2013a). To this end, links between utterances are analyzed using repetitions, lexical and semantic chains, as well as cohesive links, and a graph is built in order to highlight discussion threads. Lexical and semantic cohesion between any two utterances can be considered the central liaison between the analysis elements within the graph.

Moreover, of particular interest is the multi-dimensionality of the polyphonic model (Trausan-Matu 2013). First, following the conversation timeline, the longitudinal dimension is reflected in the explicit or implicit references between utterances. This grants an overall image of the degree of inter-animation of voices spanning the discourse. This polyphony provides a signature for collaboration, as the quality of interactions between multiple participants in a conversation is reflected within their voices. Second, threading affords the highlighting of voices’ evolution in terms of the interaction with other discussion threads. Third, the transversal dimension is useful for observing a differential positioning of participants, when a shift of their point of interest occurs towards discussing other topics.

Finally, we must also emphasize an intrinsic problem that “it is indeed impossible to be ‘completely dialogical’, if one wants to be systematic and contribute to a cumulative scientific endeavor” (Linell 2009, p. 383). The latter point of view also augments the duality between individual involvement and actual collaboration throughout a given CSCL conversation, as it is impossible to focus on both the animation with other participants’ utterances and sustainably provide meaningful contributions. In the end, a balance needs to be achieved between individuals, without facing discourse domination.

Polyphonic model

Until recently, the goals of discourse analysis in existing approaches oriented towards conversations analysis were to detect topics and links (Adams and Martell 2008), dialog acts (Kontostathis et al. 2009), lexical chains (Dong 2006), or other complex relations (Rosé et al. 2008). The polyphonic model makes use of advanced NLP techniques by taking full advantage of cohesion, integrates multiple semantic models (i.e., Latent Semantic Analysis – LSA, Latent Dirichlet Allocation – LDA and semantic distances from WordNet), as well as Social Network Analysis (Wasserman and Faust 1994). Several computer-based support systems were developed for assisting the polyphonic analysis: Polyphony (Trausan-Matu et al. 2007a), PolyCAFe (Trausan-Matu et al. 2014), and ReaderBench (Dascalu et al. 2013a; Dascalu 2014), the latter being used within the current experiments.

The automated voice identification process starts by building lexical chains spanning throughout the conversation, which are afterwards merged into semantic chains by using the previously defined cohesion function (Dascalu et al. 2013b). Due to the limitation of discovering lexical chains (Galley and McKeown 2003) that only consider words with the same part-of-speech, the merge step is beneficial as it unites groups of concepts based on the degree of cohesion. In this context, we have proposed an iterative algorithm similar to an agglomerative hierarchical clustering algorithm (Hastie et al. 2009) for merging lexical chains (Dascalu et al. 2015a). Groups of already clustered words are merged if the cohesion among them exceeds an imposed threshold. The empirically selected values for our experiments were of .75 for LSA and .85 for LDA, which best associated concepts pertaining to different lexical chains.

As semantic chains span across the discourse, the context generated by the co-occurrence or repetitions of tightly cohesive concepts is similar to the longitudinal dimension of voices. Echoes can be highlighted through cohesion based on semantic relationships between voice occurrences in different contributions, whereas attenuation is reflected in the considered distance between analytic elements. Moreover, by intertwining different semantic chains within the same textual fragment (sentence or utterance), we are able to better grasp the transversal dimension of voice inter-animation. Therefore, after manually selecting the voices of interest, the user can visualize the conversation as an overlap of co-occurring semantic chains that induce polyphony (see Fig. 2). A voice is displayed within the interface as the three most frequent semantically related word lemmas. Its occurrences throughout the conversation are marked accordingly within the overall timeframe. Different speakers who utter a particular voice are demarcated with randomly assigned colors, consistent throughout a conversation for each participant. Each utterance may incorporate more than a single voice, as it may include, in addition to the current participant’s voice, at least one other, an alien or echoed voice, re-uttered later on in the discourse after its first occurrence (Bakhtin 1981; Trausan-Matu and Stahl 2007). Overall, voices are reflected in the individual occurrences of the concepts from each semantic chain and, in return, are used to highlight the cohesive links that span throughout the discourse (Dascalu et al. 2013c).

Fig. 2
figure 2

Chat voice inter-animation visualization covering participants’ voices and implicit (alien) voices

Based on the previous rules of representation, the chart from Fig. 2 follows the conversation timeline expressed in utterance identifiers and depicts the occurrences of five dominant voices, manually selected by the user for visualization purposes: a) use, application, technology; b) need, thing, want; c) chat, talk, debate; d) information, database, password; and e) forum, meeting, conference. Each of the five chat participants has a corresponding color and each voice occurrence reflects a speaker’s assigned color.

In order to better grasp the importance of each voice within the discourse, we have devised a series of indices, some inspired from ‘rhythmanalysis’ (Lefebvre 2004) and ‘polyrhythm’ (Randel 2003). First, the number of contained words within each voice is used as a purely quantitative factor. Second, the cumulative scores of the analysis elements provide a broader qualitative perspective of the importance of the context of each voice's occurrences. Third, the recurrence of voices, inspired from rhythm analysis and seen as the distance between two analysis elements in which consecutive occurrences of the voice appear, is used to reflect the spread of each voice. Moreover, in accordance to Miller’s law (Miller 1956), we have applied a moving average (Upton and Cook 2008) on the voice distribution for five datum points representing consecutive utterances. In other words, we have weighted the importance of each concept occurrence over 5 adjacent utterances, if no break in the discourse is larger than an imposed, experimentally determined threshold of 1 min. Exceeding this value would clearly mark a stopping point in the overall chat conversation, making unnecessary the expansion of the singular occurrence of the voice over this break. The imposed values were experimentally determined, as there were extremely few explicit links manually added by the users that exceeded these thresholds. This step of smoothing the initial discrete voice distribution plays a central role in subsequent processing as the expanded context of a voice’s occurrence is much more significant than the sole consideration of the concept uttered by a participant in a given contribution. In this particular case, entropy (Shannon 1948) has been applied on the smoothed distribution in order to highlight irregularities of voice occurrences throughout the entire conversation.

By considering all of the previous indices used to estimate the importance of a voice, Table 5 presents an image of their correlations when considering a conversation of approximately 420 utterances. All 75 automatically identified voices, including the ones presented in Fig. 2, are considered with the sole constraint that each voice include at least five word occurrences in order to have a quantifiable overall impact. Overall, all factors, besides recurrence, correlate positively and can be used to estimate the overall impact of a voice within the conversation. In contrast, recurrence is more specific and can be used to pinpoint whether the concepts pertaining to a voice are collocated or are more equally dispersed throughout the discourse. Nevertheless, small correlation values are acceptable as our aim was to identify meaningful factors that can be used to better characterize a voice’s importance. Further evaluations need to be performed in order to determine the most representative factors, but our aim was to identify specific measures that are generated as effects of different underlying assessment factors. For example, the use of the number of utterances in which the voices occurred or of statistics applied on the initial distribution would have been inappropriate as all of these indices would have been directly linked to the number of words within each semantic chain.

Table 5 Cross-correlation matrix between factors used to estimate the importance of voices (*p < .05; **p < .01)

As voice synergy emerges as a measure of co-occurrence of semantic chains, mutual information (Manning et al. 2008) can be used to quantify the global effect of voice overlapping between any pair of contiguous voices. Therefore, by computing the Pointwise Mutual Information (PMI) (Fano 1961) between the moving averages of all pairs of voice distributions that appear in a given context, we obtain a local degree of voice inter-weaving or overlap. In order to better grasp the underlying reason of using PMI, we have presented in Fig. 3 three progressive measures for synergy (Dascalu et al. 2013c).

Fig. 3
figure 3

Evolution of voice synergy: a Timeline evolution of voice occurrences (baseline for comparison); b Number of co-occurrences; c Evolution of cumulated moving average; d Average Pointwise Mutual Information

The first and the simplest estimator of overlap, the actual number of voices (co-) occurring, is misleading as we encounter a large number of singular values (meaningless, as only one voice is present) and double values, which are also not that interesting in observing the global trend. Also, the first spike with a value of 5 in Fig. 3 is locally representative, but because it is isolated from the rest of the conversation, its importance should be mediated globally. The second estimation, the cumulated moving average, is better as the smoothing effect has a positive impact on the overall evolution. Nevertheless, it is misleading in some cases – for example, a spike is obtained around utterance 400 where the overall inter-animation of voices is quite low. The third estimator, the average PMI applied on the moving averages, best grasps the synergic zones (e.g., just before utterance 60 where we have four selected voices co-occurring, as well as around 90, 110, 220, and 260 due to the overlap of all five voices). Therefore, by observing the evolution of PMI using a sliding window that follows the conversation flow, we obtain a trend in terms of voice synergy that can be later on generalized to Bakhtin’s polyphony (Bakhtin 1984).

We opted to present the evolution of voice synergy as our computational model uses co-occurrence and overlap of voices within a given context. In order to emphasize further the effect of inter-animation that would induce true polyphony, we envisage the use of argumentation acts and discourse patterns (Stent and Allen 2000). The latter approaches enable a deeper discourse analysis by highlighting the interdependencies between voices and how a particular voice can shed light onto another.

Dialogical voice inter-animation model

In order to achieve genuine collaboration, the conversation must contain threads of utterances integrating voices that inter-animate in a similar way to counterpoint in polyphonic musical fugues (Trausan-Matu et al. 2005; Trausan-Matu and Stahl 2007). As collaboration is centered on multiple participants, a split of each voice into multiple viewpoints pertaining to different participants is required. A viewpoint consists of a link between the concepts pertaining to a voice and a participant through their explicit use within one’s contributions in the ongoing conversation. We opted to represent this split in terms of implicit (alien) voices (Trausan-Matu and Stahl 2007) (see Fig. 4) because the accumulation of voices through transitivity in inter-linked cohesive utterances clearly highlights the presence of alien, echoed voices. In addition, this split presentation of semantic chains per participant is useful for observing each speaker’s coverage and distribution of dominant concepts throughout the conversation.

Fig. 4
figure 4

Chat-conversation voice split per participant, with examples from the last occurrences highlighting the voice’s echo between different participants

Afterwards, starting from the polyphonic model, collaboration is determined as the cumulated PMI value obtained from all possible pairs of contiguous voices pertaining to different participants (different viewpoints) within subsequent contexts of the analysis. From an individual point of view, each participant’s overall collaboration is computed as the cumulated mutual information between an individual’s personal viewpoint and all other participant viewpoints. In other words, by comparing individual voice distributions that span throughout the conversation, collaboration emerges from the overlap of voices pertaining to different participants.

The inter-animation frame from Fig. 5 presents the voices with the longest semantic chain span throughout the conversation. Each peak of collaboration obtained through PMI corresponds to a zone with a high transversal density of voices emitted by different speakers (e.g., around utterances with the following identifiers: 110, 136, 225, 280, or 350). Two important aspects need to be mentioned. First, because the algorithm uses the moving averages and applies PMI on sliding windows, the user must also consider a five-utterance frame in which each individual occurrence is equally dispersed. Second, all of the voices from the conversation are considered (even those that have as low as three constituent words); this explains greater cumulative values encountered in the graph. As an example, Table 6 presents the chat sample centered on utterance 136 in which all conversation participants are engaged and multiple voices inter-animate.

Fig. 5
figure 5

Collaboration evolution viewed as voice overlap between different participants (intertwining of different viewpoints)

Table 6 Conversation sample highlighting a dense inter-animation of voices pertaining to different participants (e.g., “”, “”, “” and “”)

Cohesion network analysis and the social knowledge-building model

Discourse structure and cohesion network analysis

Cohesion is a central linguistic feature of discourse (McNamara et al. 2010) and is often regarded as an indicator of its structure. More specifically, cohesion can derive from various discourse connectors including cue words or phrases (e.g., ‘but’, ‘because’), referencing expressions identified through co-reference resolution, as well as lexical and semantic similarity between concepts (Jurafsky and Martin 2009; Raghunathan et al. 2010; McNamara et al. 2014). Semantic relatedness can be determined as semantic distances in lexicalized ontologies (Budanitsky and Hirst 2006) or by using semantic models, such as LSA (Landauer and Dumais 1997) or Latent Dirichlet Allocation (LDA) (Blei et al. 2003).

Within our implemented model, cohesion is determined as an average semantic similarity measure of proximities between textual segments that can be words, phrases, contributions or the entire conversation. This semantic similarity considers, on the one hand, lexical proximity, identified as semantic distances (Budanitsky and Hirst 2006) within WordNet (Miller 1995). On the other hand, semantic similarity is measured through LSA and LDA semantic models trained on the Touchstone Applied Science Associates (TASA) corpus (http://lsa.colorado.edu/spaces.html, containing approximately 13 M words) for the English version of our system used in the current experiments. Additionally, specific natural language processing (NLP) techniques (Manning and Schütze 1999) are applied to reduce noise and to improve the system’s accuracy: (a) the reduction of inflected forms to their lemmas, (b) the annotation of each word with its corresponding part of speech, and (c) stop word elimination. Additionally, individual word occurrences are adjusted for the term-document LSA matrix through the use of term frequency-inverse document frequency (Tf-Idf) (Manning and Schütze 1999).

Our previous studies (Dascalu 2014) showed that Wu-Palmer ontology-based semantic similarity (Wu and Palmer 1994) combined with LSA and LDA models can be used to complement each other. Underlying semantic relationships are more likely to be identified if multiple complementary approaches are combined after normalization, reducing the errors that can be induced by using a single semantic model. To estimate cohesion using CNA, we combine information retrieval techniques (reflected by word repetition and term frequency) with semantic distance, estimated using ontologies (i.e., WordNet), LSA, and LDA. Cohesive links are defined as connections between textual elements that have high values for cohesion (i.e., a value that exceeds the mean value of all semantic similarities between constituent textual elements). In the end, a cohesion graph (Trausan-Matu et al. 2012a; Dascalu et al. 2013a), which is a generalization of the utterance graph previously proposed by Trausan-Matu et al. (2007b), is used to model all underlying cohesive links, providing a semantic, content-centered representation of discourse.

The cohesion graph is a multi-layered mixed graph consisting of three types of nodes (see Fig. 6) (Dascalu 2014). Starting from a central node, the entire conversation is split into utterance nodes (i.e., contributions per participant), which are divided into corresponding sentence nodes. Hierarchical links are enforced to reflect the inclusion of sentences into contributions, and of utterances within the entire conversation. Mandatory links are established between adjacent contributions and sentences, and are used to model information flow, rendering possible the identification of cohesion gaps within the discourse. In the particular case of chats, explicit links defined by users – such as those added by users in the ConcertChat (Holmer et al. 2006) graphical interface – are also included in the cohesion graph and are considered mandatory. Additional optional relevant links are added to the cohesion graph to highlight the semantic relatedness between distant elements. In our experiments, in order to reflect a high degree of similarity between the selected textual fragments, we opted to include only the cohesive links that have values exceeding the mean of all cohesion values by one standard deviation.

Fig. 6
figure 6

Cohesion graph generic representation

In addition, due to the high number of contributions within a chat conversation, we opted to limit the search space for significant implicit cohesive links to 20 adjacent utterances. Rebedea (2012) has shown that links explicitly defined by users span a maximum of 20 utterances and are usually generated when a user feels that an implicit link is not obvious. Therefore, from a computational perspective in which the search space of similar utterances needs to be limited, we have adopted an equivalent window.

Cohesion-based utterance scoring

Within the CNA approach, we perform a content-centered analysis of utterances based on NLP and a cohesion-based discourse analysis. A central constituent for the evaluation process is the utterance score that reflects topics’ coverage and the strength of the relatedness of each utterance to other contributions. Our approach can be compared to a purely quantitative approach that uses only the number of contributions as a signal of collaboration. Here, we estimate an utterance’s impact from the underlying concepts’ relevance and cohesive links. Nevertheless, we cannot ignore the existing intrinsic link to the number of contributions, as more related words, even off-topic, determine the trend of the conversation.

In order to evaluate the importance of each utterance, we must first determine the value of its constituents or, more specifically, the relevance of each contained word. With regards to the process of evaluating each word’s relevance in relation to its corresponding textual fragment (e.g., sentence, utterance, or entire conversation), there are several classes of factors that play important roles in the final analysis (Dascalu et al. 2015b, c) (see Table 7).

Table 7 Factors used to measure a word’s relevance

The most straightforward factor consists of computing the statistical presence of each word. The next factor is focused on determining the semantic relatedness between a word and its corresponding textual fragment, whereas the last evaluates the semantic coverage of each concept. Semantic coverage is reflected by the length and the span of the semantic chains that contain semantically related concepts. This provides a reliable global estimate for the importance of each concept with regards to the entire conversation. Based on the previous classes of factors, the keywords of the conversation are determined as the words with the highest cumulative relevance based on their individual occurrences.

In terms of the scoring model, each utterance is initially assigned an individual score equal to the normalized term frequency of each word multiplied by its previously determined relevance (Dascalu 2014). We measure to what extent each utterance conveys the main concepts of the overall conversation as an estimation of on-topic relevance. Afterwards, these individual scores are augmented through cohesive links to other inter-linked textual elements by using the previously defined cohesion values as weights. Keywords reflect the local importance of each word, whereas cohesive links are used to transpose the local relevance upon other inter-linked elements.

Special attention is given in our approach towards utterances pertaining to the same speaker, considered as inner links, expressed as a continuation of the discourse that might potentially follow alien voices belonging to different participants. For some conversations, the importance of the links can be comparable in strength to the sum of all other out-going links, marking an individual behavior instead of collaboration, an aspect that we elaborate upon in the following section.

Social knowledge-building model

The social knowledge-building model considers both personal and social knowledge-building (KB) processes (Bereiter 2002; Scardamalia 2002; Stahl 2006). First, a personal dimension emerges by considering utterances by the same speaker, therefore modeling a kind of inner voice or continuation of the discourse. Second, inter-changed utterances with different speakers define a social perspective that models collaboration as a cumulative effect. This information exchange can also be perceived as “alien” voices that model the replication of the initial voice to different participants and their corresponding points of view with regards to the voice’s central concept.

Our model is similar to some extent to the gain-based collaboration model (Trausan-Matu et al. 2012b) and marks a transition towards Stahl’s model of collaborative knowledge building (Stahl 2006) by representing a conversation thread as our multi-layered cohesion graph. Whereas the previous section emphasized participatory analysis, our aim now shifts towards idea sharing, fostering creativity for working in groups (Trausan-Matu 2010b) and influencing the other participants’ points of view, thus enabling a truly collaborative discussion.

As presented in Fig. 7, the continuation of ideas or explicitly referencing utterances of the same speaker builds an inner dialogue or personal knowledge explicitly expressed in the discourse. In other words, personal knowledge building addresses individual voices, more specifically participant voices and/or alien voices re-uttered by the speaker. In contrast, social knowledge building, derived from explicit dialog that by definition is between at least two different individuals, sustains collaboration and highlights external voices. Moreover, by referring to the dialogic model of discourse analysis, echoes are reflected by cohesion in terms of the information transferred between utterances. In addition, the echo attenuation effect considers the distance between the contributions and diminishes the strength of the cohesion link proportionally to the increase in distance.

Fig. 7
figure 7

Slice of the cohesion graph depicting inter-utterance cohesive links used to measure personal and social knowledge-building effects (Dascalu 2014)

Therefore, each contribution now has its previously defined importance score and a knowledge-building effect, both personal and social (see Fig. 7). The personal effect is initialized as the utterance’s score, whereas the social effect is zero. Later on, by considering all of the links from the cohesion graph, each dimension is correspondingly augmented. If the link is between utterances having the same speaker, the previously built knowledge (both personal and social) from the referred utterance is transferred through the cohesion function to the personal dimension of the current utterance. Otherwise, if the pair of utterances is between different participants, the social knowledge-building dimension of the currently analyzed utterance is increased by the same amount of information (previous knowledge multiplied by the cohesion measure). As such, we measure collaboration as the sum of social knowledge-building effects, starting from each utterance score corroborated with the cohesion function.

We must also consider the limitations of our implemented model in terms of personal knowledge-building analysis. Through cohesion, collaboration emerges from social knowledge transfer and is perceived as the influence of one’s contributions over other participants’ discourse. In contrast, the approximation of personal knowledge building represents an upper bound of the explicitly expressed information transfer between one’s personal contributions. Similarly to the gain-based approach (Dascalu et al. 2010; Trausan-Matu et al. 2012b), we use a quantifiable approximation of inner dialogue, without being able to evaluate the overall cognitive and inference processes performed behind the scenes by the learner. Personal knowledge building is seen as a reflection of one’s thoughts expressed explicitly within the ongoing conversation as cohesive links between utterances of the same chat participant. But this reflection does not necessarily induce personal knowledge building, only a cohesive discourse. Therefore, we can consider that the computed value of personal knowledge building is a maximum value of the explicit personal knowledge-building effect, modeled during the discourse through cohesive links.

Results

Validation experiment

Our validation experiment is focused on the assessment of 10 chat conversations, selected from a corpus of more than 100 chats that took place in an academic environment. The 10 conversations were manually selected as being the most informative ones while covering most usage scenarios: combinations of highly collaborative sections with monologues, on-topic discussions versus off-topic ones, equitable versus off-balanced involvement of participants, limited time-span versus extensive and long discussions. Within each chat, Computer Science undergraduate students from the fourth year undergoing the Human-Computer Interaction course at our university debated on the advantages and disadvantages of CSCL technologies (e.g., chat, blog, wiki, forum, or Google Wave). Each conversation involved four or five participants, with an equitable gender distribution, who previously knew each other by attending the same class. Each participant first debated on the benefits and disadvantages of a given technology, and then proposed an integrated alternative that encompassed the previously presented advantages.

Afterwards, 110 fourth year undergraduate and master students were asked to manually annotate three chat conversations, grading the entire conversation and each participant individually on a 1–10 scale in terms of collaboration and, separately, participation. We opted to distribute the evaluation of each conversation due to the high amount of time required to manually assess a single discussion (on average, users reported 1.5 to 4 h for a deep understanding) (Trausan-Matu 2010a). Initially, for each conversation, we had on average 35 annotations, out of which raters with no variance and with a correlation lower than 0.3 in terms of intra-class correlations (ICC) with the other raters were disregarded. Most of the weak relationships to the other raters were, in most cases, due to erroneous or superficial evaluations. In the end, we had more than 20 ratings for each conversation. This resulted in an increased Cronbach’s alpha from an average of 0.9 to a value of 0.96 (see Table 8). These high values demonstrate a very good agreement between raters and are justifiable by taking into consideration the high number of evaluations per conversation.

Table 8 Collaboration agreement among raters

Raters were specifically instructed to evaluate collaboration as the exchange of ideas with other participants, not as the active involvement throughout the conversation. Raters had previous knowledge about each debated CSCL technology, but were unaware of the dialogical implications (e.g., polyphony) or of the automated models that would be enforced. In addition, raters were asked to identify intense collaboration zones as segments from the conversation with a high degree of collaboration among participants. These non-overlapping segments determined by each rater were defined as the start and end indexes of utterances among which participants actively collaborated. We opted not to request a rating per segment as from the overlap of more than 20 evaluations, collaboration peaks would emerge.

With regards to the pre-processing phase of the chat conversation logs exported from ConcertChat (Holmer et al. 2006), all emoticons and non-dictionary words have been disregarded as typos were not represented in any semantic model space. In spite the fact that chats are considered in most cases a noisy text-based interaction medium, in our conducted experiments students retained an academic conduct as they were afterwards graded based on their involvement throughout the conversation. Moreover, although ConcertChat includes a second interaction space – a shared whiteboard –, no corresponding information was processed because learners were instructed to use the chat facility for brainstorming, without necessarily needing the whiteboard facilities. Therefore, we were faced with only a few typos, extremely limited slang and abbreviations, rendering adequate our approach of disregarding such words. Afterwards, natural language processing (NLP) techniques (Manning and Schütze 1999) were applied to improve the system’s accuracy: the reduction of inflected forms to their lemmas, part of speech tagging, and stop word elimination.

Validation of collaboration assessment

In order to have a broader analysis of collaboration, besides the two indices derived from the computational models presented in detail, we considered it adequate to introduce additional indices of collaboration. First, we introduce in-degree and out-degree as Social Network Analysis (SNA) metrics applied on the interaction graph (Dascalu et al. 2013a, 2014b). This graph models the interaction between participants based on CNA, including the cohesion graph and on the utterance importance scores, as links reflect the cohesion similarity between the utterances of different participants. Second, the number of nouns is used as an estimator of the descriptive concepts expressed by each participant. Third, the number of verbs estimates each participant's commitment towards action and involvement with other participants. The simplest quantitative index mentioned in the Introduction section (number of exchanged utterances to other participants) is not feasible in this case because there are only a few explicit links added by users. All implicit links that are used to model the discourse are identified via CNA.

Pearson correlations (see Table 9) and non-parametric correlations (Spearman’s Rho) (see Table 10) were determined between automated and human mean ratings for each conversation. As an interpretation of the results presented in Tables 9 and 10, we can observe that predictions are accurate except for four conversations in which we could identify atypical behaviors highlighted in bold. In chats 2 and 10, similar rankings of collaboration for multiple participants highlight the difficulty in differentiating between participants due to similar involvement, therefore making the evaluation more prone to error. Chat 3 is overall off-balanced due to the focus on only one technology (“blog”), which shifted the overall equilibrium with the other technologies that should have been debated. Chat 8 had specific zones in the conversation dominated by certain participants who misled the evaluation since monologue was not accordingly differentiated by raters in contrast to collaboration.

Table 9 Pearson correlations between indices and mean rater collaboration (*p < .05; **p < .01)
Table 10 Spearman correlations between indices and mean rater collaboration (*p < .05; **p < .01)

While there are reliable predictors of collaboration for each conversation, we must also consider that the overall evaluations are partially biased because some raters took into consideration quantitative factors to estimate collaboration (i.e., the number of utterances). Instead of focusing on the quality of the dialogue and on the way utterances pertaining to different participants inter-animate, quantity became the determinant factor for some raters.

The indices were checked for multicollinearity (see Table 11) and all of the indices except the Social KB model were considered in further analyses, as this index was highly correlated with in-degree derived from CNA. We have opted to use in-degree because it has higher individual correlations per conversation and it better grasps collaboration in terms of social involvement.

Table 11 Correlation matrix among collaboration indices (*p < .05; **p < .01)

Overall, individual chat assessments support the reliability of the proposed qualitative indices in assessing collaboration, as well as the complementarity of the implemented indices—when one is skewed due to atypical behavior, the others compensate. Moreover, since our intent was to create a unitary predictive model for evaluating all conversations, we performed the same measurements after combining all individual ratings for all conversations (see Table 12). The later significant correlations support the adequacy of our proposed computational models. The lower values for the dialogical PMI model are justifiable, as the voice identification process requires further enhancements.

Table 12 Correlation between indices and mean rater collaboration for all conversations together (*p < .05; **p < .01)

A final stepwise regression analysis was calculated to determine the degree to which the automated indices predicted the human ratings of collaboration. This regression yielded a significant model, F(1, 45) = 46.426, p < .001, r = .713, R 2 = .508. One variable was a significant predictor in the regression analysis and accounted for 51 % of the variance in the manual annotations of collaboration: number of verbs [β = .713, t(1, 45) = 6.814, p < .001]. This is understandable from the point of view of collaboration, as verbs induce action among participants. Moreover, regression analyses based on each collaboration model separately yielded significant models as well: F CNA In-degree (1, 45) = 45.960, p < .001, r = .711, R 2 = .51 (extremely close to the step-wise model) and F Dialogical voice PMI (1, 45) = 24.533, p < .001, r = .594, R 2 = .35.

Validation of the identification of intense collaboration zones

In addition to the estimation of collaboration based on both previous assessment models, ReaderBench automatically identifies intense collaboration zones. These zones are defined as utterance intervals in which participants are actively involved, collaborating and generating ideas related to the ongoing context of the discussion. With regards to the social knowledge building model, these collaboration zones emerge as conversation segments with multiple cohesive links between different participants, therefore modeling the information transfer among them in a cohesive context. As a complementary view, the dense inter-animation of voices pertaining to different speakers also generates similar collaboration zones represented as voice overlap or co-occurrence.

From a computational perspective, the first step within our greedy algorithm (Dascalu et al. 2013a) that builds up intense collaboration zones consists of identifying social knowledge building or voice PMI peaks as maximum local values. Afterwards, each peak is expanded sideways within a predefined slack (experimentally set at 2.5 % of the utterances). This slack was important due to our focus on the macro-level analysis of collaboration and due to the possible intertwining of multiple discussion threads. In the end, only zones above a minimum spread of five utterances are selected as intense collaboration zones.

In other words, after identifying the utterances with the greatest collaborative effect (highest social KB score or highest voice PMI pertaining to different speakers), the algorithm expands each zone to the left and to the right in a non-overlapping manner. If in the end, the zone covers more than the specified minimum spread, it is considered an intense collaboration zone. From a different point of view and highly related to dialogism, cohesion and voice synergy bind utterances within an intense collaboration zone in terms of topic relatedness. For example, in Fig. 5, we start with the maximum value of estimated collaboration around the utterance with ID 108 and we expand sideways, in the end obtaining the first intense collaboration zone - [87; 159]. All utterances within that interval have a high PMI score and denote voice overlap between different participants. Afterwards, the algorithm expands around utterances with IDs 375, resulting in the [311; 391] zone, as well as around 274, resulting in the third most important collaboration zone - [256; 282].

With regards to the validation experiment, all manual annotations were cumulated in a histogram that presented, for each utterance, the number of raters who considered it to be part of an intense collaboration zone. In the end, the same greedy algorithm was applied on this histogram in order to obtain an aggregated version. As presented in Table 13, there is good overlap in terms of accuracy measured as precision, recall, and F1 score between the annotated collaboration zones and the two computational models. This indicates that the models are consistent with one another, but are also good estimators of the annotated zones, therefore demonstrating the feasibility of our two approaches. Moreover, the manual annotation process was a subjective and bias-prone task as there were no constraints imposed in terms of the overall coverage of these zones and the raters’ perceptions of interaction among multiple participants.

Table 13 Evaluation of identification of intense collaboration zones

Discussion

Although constructed differently, both collaboration models are centered on dialogism and reflect cohesion. As voices are represented as points of view covering semantically related concepts, their recurrence reflects cohesive links within the discourse. Subsequently, the cohesive links from the cohesion graph represent the echoes of voices and model their span throughout the dialogue. Therefore, based on our results, we can consider cohesion as a binder between the utterances within an intense collaboration zone. Cohesion measures the topic relatedness between the utterances, whereas social interaction in a cohesive context determines collaboration. Moreover, the voice synergy effect between different participants captures a similar cohesive information flow in which alien voices shed light on each other. In other words, cohesion among the utterances of different speakers becomes a signature of collaboration within both models. In addition, the identified collaboration peaks, and synergies build on text cohesion and voices’ inter-animation become traces of dialogism and productive polyphony.

In order to better grasp the specificity of our analysis, we must also consider a comparison to other computational models of CSCL discourse, namely the contingency graph (Medina and Suthers 2009; Suthers and Desiato 2012) and transactivity (Joshi and Rosé 2007; Rosé et al. 2008). First, the contingency graph is used as a representational foundation for abstract transcriptions and considers contingencies between events. As an analogy, our cohesion graph also considers temporal proximity while performing cohesion-centered and dialogical analyses in sliding conversation windows, as well as semantic relatedness that, in our case, is computed based on multiple semantic models.

Second, transactivity (Joshi and Rosé 2007) can be perceived as a complementary approach to our information flow. In contrast to modeling information transfer between participants through cohesion and voice inter-animation, transacts are used to represent the relationship between competing positions of different speakers similar to that of dialogue acts (Stolcke et al. 2000), but at a different semantic granularity. Therefore, we consider transacts as a potential extension of our two computational models that could be used to better reflect the synergy or juxtaposition of participants’ points of view.

In terms of automated systems, the Knowledge Space Visualizer – KSV (Teplovs 2008) might be considered to have many similarities to ReaderBench. However, while both systems envision the visualization of interactions between users through Social Network Analysis and semantic similarities, their respective approaches are fundamentally different. ReaderBench evaluates collaboration via a deep analysis of each conversation that employs multiple NLP techniques, including semantic distances, LSA and LDA. By contrast, KSV provides a more shallow perspective of individuals and links that can be structural (e.g., reply-to, build-on, reference, annotation, contains), authorial, or semantic (based only on LSA). In a nutshell, KSV was designed to provide an overview of interactions, with an emphasis on visualization, whereas ReaderBench makes use of in-depth discourse analysis.

There are also certain limitations of our models. Foremost, the models address only specific educational situations in which participants share, continue, debate, or argue certain topics or key concepts of the conversation. In other words, collaboration is particularly derived from idea sharing between participants who exchange cohesive utterances. It becomes evident that specific discourse markers or speech acts (e.g., confirmations or negations) (Austin 1962; Searle 1969) should also be considered for modeling collaboration. Moreover, as CNA and voice synergy capture cohesion through semantic similarity, additional discourse markers for identifying intertwined epistemic and argumentative moves, as well as social modes of interaction and consensus building (Weinberger and Fischer 2006) need to be considered. But for our specific educational scenario presented in the validation experiments from Section 4, cohesion and voice synergy by themselves proved to be reliable predictors. As the students debated on specific topics, both textual cohesion and voice PMI highlighting the exchange or continuation of ideas represented a reliable estimator of the generated collaborative effect.

From a different perspective, the ReaderBench framework has also been used to assess the textual complexity of texts by providing a wide range of complexity indices covering surface, lexical, syntactic and semantic levels of discourse (Dascalu et al. 2014a, 2015a). In future research, we will examine the assessment of learning and comprehension in the context of collaborative discourse using analogous indices adapted for chat conversation (characterized by short contributions). Moreover, key concepts from the ConcertChat shared whiteboard will be considered for as potential measures of relatedness to the extracted keywords from the conversation.

Overall, our models should not be perceived as rigid structures, but as adaptable ones that evolve based on the cohesion to other participants’ utterances. Nevertheless, we must highlight additional limitations in terms of personal knowledge building, social knowledge transfer, noise within the experiment, and underlying cognitive processes. As an initial assumption, we consider personal knowledge building as the reflection of one’s thoughts continued into subsequent utterances through cohesive links. This is only partially valid because the written form expressed within the conversation can be substantially less representative than the processes and inferences performed in the learner’s mind. Also, with regards to the dialogism model, further refinements of the automated identification of semantic chains need to be enforced in order to exclude less relevant voices identified at present.

From a higher level perspective built on top of cohesion, coherence—used to “jointly integrate forms, meanings, and actions to make overall sense of what is said” (Schiffrin 1987, p. 39)—becomes a salient factor for collaboration. Furthermore, coherence can be considered a “semantic property of discourses, based on the interpretation of each individual sentence relative to the interpretation of other sentences” (van Dijk 1977, p. 93). Moreover, coherence can be perceived as a generalization of cohesion due to its multiple additional perspectives (e.g., reader’s skill level, background knowledge, and motivation, each helping to form the situation model) (Tapiero 2007). Based on these definitions, collaboration that emerges from cohesion or voice inter-animation among the utterances of different speakers supports discourse coherence. Therefore, collaboration becomes an additional constituent specific to CSCL conversations that is required to achieve a coherent discourse.

This does not necessarily mean that collaboration determines coherence. However, the exchange of ideas and of points of view in a cohesive and dialogical manner greatly facilitates the processes of achieving a coherent mental representation, commonly called a situation model (van Dijk and Kintsch 1983). To further argue this point, a monologue within a conversation is likely to be relatively coherent as it expresses only a participant’s perspective, but it completely lacks collaboration. On the contrary, multiple participants could be actively involved in the conversation, collaborating one with another, but on different topics and generating nested sub-conversations. The overall effect would be of discourse segmentation due to multiple concurrent discussion threads, not to mention the frequent case of off-topic or irrelevant utterances, which further reduce discourse coherence. However, these contributions might nonetheless be considered stimulants for collaboration, and ultimately, coherence.

Starting from the definition provided by Graesser et al. (2004, p. 193) that coherence is a “characteristic of the reader’s mental representation of the text content”, we further argue that, in the case of CSCL, we are dealing with a collective representation whose overall coherence is determined by the synergic effect of each individual’s points of view or voices. Therefore, discourse coherence can be achieved collectively through collaboration and is built on cohesion that can become an indicator for collaboration if the exchange of information is performed between different participants.

Conclusions and future research directions

Starting from a dialogic model of discourse centered on cohesion, we validated our system in terms of assessing collaboration by employing a longitudinal model based on social knowledge building and a different transversal model based on voice inter-animation. Within the social knowledge building model, collaboration was evaluated using a bottom-up approach. Initially, the importance of an utterance was measured with regard to the overall discourse in terms of topics coverage wherein each contribution was assigned a corresponding score. Afterwards, collaboration was estimated as the impact on other speakers’ utterances, therefore modeling information exchange between participants. In the second dialogical model, collaboration emerges from co-occurrences and the overlap of voices within a given context, emphasizing the tight inter-dependencies between collaboration and true polyphony.

Based on the performed analyses, we were able to extend the perspective of collaboration in terms of achieving a coherent representation of the discourse through the inter-animation of participants’ points of view. Therefore, starting from dialogism as a framework of CSCL (Koschmann 1999), we were able to model the exchange and sharing of ideas among participants in a conversation through specific computational linguistics. In conclusion, as the validations supported the accuracy of the models built on dialogism, we can state that dialogism derived from the overlapping of voices, as well as textual cohesion, can be perceived as a signature for collaboration.

In addition, our analyses have a broad spectrum of applications, extending from utterance cohesion towards group cohesion rooted in collaboration. For example, one line of our research will further examine the relations between student collaboration in forums and predicting their completion rate in MOOCs. We also envision the use of this dialogical perspective to assess narrative features of novels, highlighting different points of view pertaining to different characters. Still further, another set of experiments might focus on the assessment of students’ self-explanations that can be perceived as a ‘dialogue’ between the author’s text and students’ thoughts viewed as echoes of the voices from the initial text. Overall, the range of potential applications for this approach is only limited by the presence of dialog in which collaboration emerges from the interactions between participants marked by textual cohesion and voices’ inter-animation.