Keywords

1 Definitions and Scope: CSCL Methodological Practices

CSCL is an active and growing field that embraces diverse methodological practices. In this chapter, we provide an overview of research methods and methodological practices in CSCL. Research methods are the specific techniques designed to answer those questions central to CSCL. Methodological practices are how those techniques are applied. These questions revolve around how technology supports and mediates collaborative learning and sensemaking (Jeong & Hmelo-Silver, 2016; Suthers, 2006). We consider research methods and practices to include types of questions being asked, research designs, settings in which the research is conducted, sources of data that are used, and approaches taken to analyze the data. Research methods are, at their very essence, ways to produce knowledge that a professional community considers to be legitimate (Kelly, 2006). These methods are the ways that we gather and analyze data and produce evidence that help make inferences that address research questions and problems (Shavelson & Towne, 2002). We consider qualitative, quantitative, and mixed-method approaches. That said, the choice of method (and resulting methodological practice) is often driven by the overriding theoretical commitments of the authors (see Stahl & Hakkarainen, this volume) as well as the goals of research.

This chapter describes the results from an examination of an expanded corpus of CSCL research from 2005 to 2014 that includes 693 articles (McKeown et al., 2017), updating the results of similar work of the authors that covers 400 articles from 2005 to 2009 (Jeong, Hmelo-Silver, & Yu, 2014), that we refer to as “the earlier results.” This article corpus includes systematically collected journal articles with CSCL research from 2005 to 2014. The earlier results included a review of both STEM and non-STEM domains, but the updated corpus is focused on STEM domains as well as use in education programs. In the earlier results, these disciplines accounted for 78% of the papers reviewed (Jeong & Hmelo-Silver, 2012) suggesting that narrowing the focus to STEM and education disciplines will not have a substantial effect on the review presented here. To update this, we include some newer examples, selecting empirical articles from recent ijCSCL volumes to use as running examples. These examples were selected because they were about CSCL environments that used mixed methods or multiple measures as well as new analytic tools. We consider overall historical trends and the challenges in CSCL research in formal educational settings, with a particular focus on STEM and Education disciplines.

In considering methodological practices in CSCL, it is important to consider the nature of questions being asked, the research designs used, the settings in which research is conducted, the types of data collected, and the analytical tools used. Research designs indicate the plan for addressing the research question. They differ depending on whether the study goals are descriptive or explanatory. Research settings refer to the contexts in which the research is conducted, generally either classrooms or controlled laboratory settings. Data are defined as the sources and materials, which are analyzed in the research. Analysis methods are the processes conducted on the data sources that include both qualitative and quantitative approaches. These are defined further in their respective sections. Within all these different aspects, there is considerable diversity—a theme to which this chapter returns. CSCL research methods are unique in that they address written and spoken language as well as nonverbal aspects of interaction in examining collaborative processes and outcomes.

2 History and Development

Research methods have been an important topic of discussion in the CSCL community. This is not surprising given the complexity of CSCL environments with the interplay of pedagogies, technology, and modes of collaboration (Kirschner & Erkens, 2013; Major, Warwick, Rasmussen, Ludvigsen, & Cook, 2018; McKeown et al., 2017). It has been the subject of numerous workshops at CSCL conferences over the years that resulted in a special issue of Computers and Education in 2003 on “Documenting Collaborative Interactions: Issues and Approaches” (Puntambeker & Luckin, 2002) with another special issue on “Methodological Issues in CSCL” in 2006 (Valcke & Martens, 2006). In 2007, a special issue of Learning and Instruction focused on “Methodological Challenges in CSCL” (Strijbos & Fischer, 2007). The interest has continued in the CSCL book series with two methods-focused volumes in that series (Analyzing interactions in CSCL: Methodology, approaches, and issues, Puntambekar, Erkens, & Hmelo-Silver, 2011; Productive Multivocality in the Analysis of Group Interactions, Suthers, Lund, Rosé, & Teplovs, 2013).

The methods of CSCL have been derived from psychology, education, cognitive science, linguistics, anthropology, and computer science. Over time, the kinds of research questions have become more diverse. In the earlier research, most studies focused on examining the effects of technology and instruction, and although these were still a major focus in the later years of the larger corpus, researchers also addressed more questions that included effects of learner characteristics and affective outcomes. There have also been trends over time to a smaller percentage of qualitative studies that are not connected to a well-described research method and a higher percentage of studies that used inferential statistics and content analysis.

3 State of the Art: Current Methodological Practices in CSCL

3.1 Research Questions in CSCL

Research methods are designed to help answer specific types of research questions. Such questions organize research activities. CSCL researchers use research questions to determine the appropriateness of data collection and analysis methods and evaluate the relevance and meaningfulness of results (Onwuegbuzie & Leech, 2006). Research questions can be distinguished from research problems or goals (Creswell & Creswell, 2017; Onwuegbuzie & Leech, 2006). A research problem is an issue or dilemma within the broad topic area that needs to be addressed or investigated such as shared regulation in online learning. A research purpose or objective follows from the research problem and specifies the intent of the study such as whether it intends to describe variable relationships, explain the causality of the relationships, or explore a phenomenon (e.g., whether to search for causes or seek a remedy). A research question is a specific statement of specific inquiry that the researcher seeks to investigate (e.g., whether shared regulation can be improved in certain conditions). A hypothesis, unique to quantitative research, is a formal prediction that arises from a research question (e.g., group awareness tools can improve shared regulation).

In the corpus reviewed from 2005 to 2014, most research questions were, not surprisingly, organized around examining technology interventions (37% of studies) or instructional interventions (24%). In addition, 13% of studies looked at the effects of learner characteristics such as motivation. But looking at these general questions in isolation does not tell the whole story. Often CSCL research asks questions about knowledge outcomes, learner, and/or group characteristics such as gender and group cohesion, as well as different kinds of knowledge outcomes and collaborative processes. The CSCL studies reviewed often asked multiple questions in a given study. Similarly, a review that focused on classroom discourse and digital technologies found that similar themes emerged in examining the affordances of technologies and learning environments more broadly (Major et al., 2018). However, they also found a theme focused on how digital technologies enhanced dialogic activities and support knowledge co-construction. Even without the emphasis on STEM, Major et al. (2018) found similar themes providing converging evidence for the generality of the kinds of questions asked in CSCL. A recent study exemplifies this trend to use multiple questions as Borge, Ong, and Rosé (2018) ask questions about whether and how a pedagogical framework and technology support can affect group regulation as well as how individual reflective scripts affect collaborative processes. The Borge et al. study also epitomize the challenges in categorizing the kinds of research questions that CSCL researchers ask—it is testing a theoretically grounded framework, but this framework is embodied in a pedagogical approach and technology that are being studied.

3.2 Research Designs and Settings

Research designs refer to strategies for inquiry that allow researchers to address their questions (Creswell & Creswell, 2017). Traditionally, they tend to be divided into three types: qualitative, quantitative, and mixed methods and within each of these they can be descriptive or explanatory. Descriptive designs focus on what is happening whereas explanatory designs can investigate causal processes or mechanisms (Shavelson & Towne, 2002).

Quantitative designs tend to test objective theories by looking at the relationships among variables that can be measured (Creswell & Creswell, 2017) but they can also be used for exploratory data analysis to generate and refine theories and hypotheses (Behrens, 1997). Experimental designs describe studies where researchers actively manipulate variables to examine causal relationships among variables (e.g., whether the use of particular kinds of scaffolds increases interaction; see Janssen & Kollar, this volume). Statistical tests would be used to determine if any difference between conditions was greater than might occur by chance. Experiments can be further classified as randomized or quasi-experimental (e.g., assignments to conditions are nonrandom as in assigning different classes to different conditions; Shadish, Cook, & Campbell, 2002; What Works Clearinghouse, 2017). Both randomized experimental or quasi-experimental designs allow causal inferences to be made. In contrast, pre-post designs look at change between a variable measured before an intervention and measured again after an intervention. In such designs, it is harder to draw causal conclusions about the effects of an intervention. In that sense, such designs as well as correlational studies are descriptive. Although descriptive quantitative research can show what happened and experimental designs can explain why, they may not be able to explain how the change occurred and/or what would happen when these variables covary simultaneously. Qualitative designs may be better suited to address such “how” questions about phenomena that emerge from a complex set of interactions.

Qualitative research designs involve emerging questions and are conducted in natural settings (Creswell & Creswell, 2017). These designs can be either descriptive or explanatory. They are interpretive and tend to involve emergent themes. Such designs include ethnographies and case studies (see also, Uttamchandani & Lester, this volume; Koschmann & Schwarz, this volume). There is often a focus on the meaning that participants bring to the setting as researchers try to construct holistic accounts. Such designs may focus on trying to explain the how and why in their account (so they might be explanatory) or they might be more descriptive. Descriptive designs are studies that aimed at providing an account of a phenomenon or intervention. Such studies seek to uncover regularities in the data without actively manipulating variables (Creswell & Creswell, 2017). Case studies, observational studies, and surveys are examples of descriptive designs. Case studies are detailed analyses of a program, event, or activity that are clearly bounded in some way. In contrast, ethnographic designs may seek to be more explanatory, showing how processes unfold as researchers study a group in a natural setting over an extended time frame. Many of these forms of research involve collecting rich forms of data such as interviews, observations, and artifacts.

Mixed methods research designs integrate qualitative and quantitative methods. An important research design for CSCL is design-based research (DBR). DBR methods are a research strategy in which theoretically-driven CSCL designs and interventions are enacted and which are refined progressively over several iterations (Brown, 1992; Collins, 1992; Sandoval, 2014). DBR refers to a research framework or strategy that can transcend the design of individual iterations (e.g., Zhang, Scardamalia, Reeve, & Messina, 2009). The goal of DBR is to design learning environments that succeed in natural settings and advance theories of learning and design of learning environments. This involves developing theory-driven designs of learning environments and iteratively refining both theory and design. In particular, DBR aims to understand what works for whom, how, and under what circumstances (Design-based Research Collective, 2003). Such programs of research may stretch over several years (e.g., Zhang et al., 2009). All of these research designs have philosophical underpinnings that are beyond the scope of this chapter (see Creswell & Creswell, 2017 for further details).

Of the studies examined in the corpus, descriptive (including qualitative) designs were most frequent (50%), followed by randomized experiments (25%), quasi-experiments (16%), DBR (7%), and pre-post designs (3%). These relative frequencies are generally consistent with the earlier results and demonstrate that CSCL continues to use a range of research designs. DBR is somewhat infrequent in this dataset. This likely signals difficulties and challenges associated with DBR and yet it is also not clear when and how a program of research might be represented as several separate articles rather than as a single program of design-based CSCL research. These difficulties may be due to both publication pressures and limitations to the length of articles.

An important characteristic of CSCL research studies is that they tend to be conducted “in the wild.” Classroom settings are formal learning situations that are guided by teachers. Laboratory settings are controlled settings where data collection is carried out outside the context of classrooms or other authentic learning situations. Other settings include CSCL outside laboratories or classrooms such as workplace, online communities, or informal learning environments (e.g., teacher workshops, professional conferences). In the corpus, 80% of the research was conducted in classroom settings (up from 74% in the earlier research). Of the classroom studies, 52% were descriptive, 20% were randomized experiments, 17% were quasi-experimental, and 8% were DBR. As anticipated, the DBR studies were almost exclusively set in classrooms. For example, Zhang et al. (2009) study used Knowledge Forum and examined how different ways of forming groups affected collective knowledge building over three iterations of instructions. As part of a design-based classroom study, Looi, Chen, and Ng (2010) examined the effectiveness of Group Scribbles (GS) in two Singapore science classrooms codesigned by teachers and found that the GS classroom performed better than the traditional classroom by traditional assessments. In an experimental classroom study, Borge, Ong, and Rosé (2018) compared the effects of two different individual reflective scripts on group regulation in threaded discussions as part of a course on information sciences and technology. It is clear that CSCL research focuses on ecologically valid settings that are oriented toward being useful.

3.3 Data Sources and Analysis in CSCL

To make sense of the messiness of the classroom context (Kolodner, 2004), CSCL researchers use different types and sources of data. For example, text, video, and log process data can reveal CSCL learning processes (Derry et al., 2010; Strijbos & Stahl, 2007). Outcomes are data about student or group performance, achievement, or other artifacts. Outcome data provide information about change in knowledge measured in a test or through artifacts that learners construct. There are also miscellaneous data providing evidence about noncognitive and/or situational aspects of CSCL such as questionnaires that assess perception and motivation of students, interviews, or researcher field notes. In our corpus, most use multiple data sources.

Analysis methods refer to the ways that CSCL researchers make sense of the data. CSCL research uses a variety of data ranging from video to synchronous and asynchronous messages. Process data as well as outcome data are collected. Analysis of such diverse range data requires application of quite different methodological approaches. The general categories of quantitative and qualitative analysis are often used to differentiate analysis methods, but each has several subcategories. Quantitative analyses are typically applied to analyze test, survey, or questionnaire data or other data in numeric forms. Code and count, often called verbal analysis or (quantitative) content analysis, are used to quantify qualitative data such as texts or dialogues. The outcome of the code and count analysis can then be subjected to inferential statistics or other more advanced quantitative analysis (Chi, 1997; Jeong, 2013; Neuendorf, 2002). Simple descriptive statistics are another commonly used approach and include common data reduction methods such as frequencies or means. Inferential statistics can be used to make inferences about group differences, whereas modeling refers to more complex analytic techniques such as multilevel analyses that seek to explain causal relationships among different variables. One challenge in doing quantitative analyses is that individuals within groups are not independent of each other which requires either using the group as the unit of analysis or multilevel modeling (Cress, 2008; Paulus & Wise, 2019). Note that the last three types of quantitative analyses are hierarchically related. Modeling statistics presumes the use of inferential statistics, which also presumes the use of descriptive statistics. When properly combined, they can serve as a useful tool to analyze a range of data collected in CSCL settings (see Chiu & Reimann, this volume).

Of the studies in the earlier corpus, 40% of the papers used two or more different analytic techniques. In reporting on the research, 88% used at least one quantitative analysis and 47% used at least one qualitative analysis. Of the quantitative analysis approaches used, inferential statistics were most common, followed by code and count and simple descriptive statistics. In some cases, elaborate statistical analyses were used. Hong et al. (2013) designed archaeology games around digital archives in Taiwanese museums and collected survey data from teams of high school game participants after they played the game. Here the structural equation modeling technique was used to test a model about the effects of gameplay self-efficacy on game performance and perceived ease of gameplay. Such sophisticated modeling techniques are not common which may be because small sample sizes are often a limiting factor in CSCL.

Not all data can be usefully quantified with code and count. Coding and counting the frequency of certain codes, even when it can be done, may not reveal what is going on during collaboration. A deeper and more holistic analytic approach is needed to analyze data sources such as interview data or field notes which were collected in 25% and 13% of the studies in the corpus, respectively. Within qualitative analyses (qualitative), content analysis refers to systematic text analysis (Mayring, 2000). Conversation or discourse analysis refers to analyses that analyze conversations or discourse, but can vary considerably in their approaches and techniques (e.g., Gee & Green, 1998; Uttamchandani & Lester, this volume). Grounded theory refers to qualitative analytic techniques that emphasize the discovery of theory based on the systematic analysis of data. Codes, concepts, and/or categories can be formed in the process of formulating a theory, but interpreted quite differently from the way they are used in the quantitative analysis (Straus & Corbin, 1990). Interaction analysis examines the details of social interaction as they occur in practice and generally relies on collaborative viewing of video (Derry et al., 2010; Jordan & Henderson, 1995). There are also several other established qualitative methods such as narrative analysis, thematic analysis, or phenomenography. Qualitative methods are not merely about analysis, but often refer to a whole approach to inquiry that prescribes research objective, design, data collection method, as well as analysis. Boundaries of different qualitative analyses are not always clear-cut. In many cases, the approach to qualitative analysis was what Jeong et al. (2014) referred to as loosely defined—that which did not appear to be linked to any specific analytic traditions.

For qualitative analyses, loosely defined qualitative analyses remained the most common, appearing in 25% of the studies (consistent with the earlier results). In addition, qualitative content analysis appeared in 12% of the papers; other well-defined techniques were reported in less than 5% of the papers each (e.g., interaction analysis, conversation analysis, and grounded theory, in decreasing order of frequency). The quality of “loosely defined” analyses was quite variable. Some studies used these to complement statistical analysis and as a tool to illustrate and explore the observed differences that were identified quantitatively (Schwarz & Glassner, 2007). In studies that received a loosely defined code, these analyses were often verbatim examples or other excerpts from data supporting the researchers’ observations and/or conclusions (e.g., Minocha, Petre, & Roberts, 2008). Another form of loosely defined qualitative analysis was qualitative summaries of data, often supplemented with simple descriptive statistics (Menkhoff, Chay, Bengtsson, Woodard, & Gan, 2015; Rick & Guzdial, 2006). Some of the limited specificity in the description of qualitative methods may be due to the limited journal space. We need to explore ways to report these qualitative findings while making the rigor of the research clear.

3.4 Mixing Methods in CSCL

Consistent with prior work, the results from this corpus suggest that CSCL research mixes analytical methods and multiple data sources. Although more than half of the papers used exclusively quantitative analysis and some only qualitative analysis, 35% used multiple analytic methods. These choices of analytic tools were related to the particular research designs as shown in Fig. 1. Mixed methods were most often used in descriptive research designs and quantitative methods only were mostly used in experimental designs. Design-based research studies were more likely to use mixed methods than a single method. Although the scope of the study was slightly different, Major et al. (2018) also found that 46% of the studies were quantitative studies but an almost equal number used mixed methods. One typical example of the mixed-method study is a study by Zhang, Tao, Chen, Sun, Judson, and Naqvi (2018) in which they used quantitative methods to compare the scientific sophistication of explanations and topics discussed between two different classroom interventions with the Idea Thread Mapper, a tool to help students and teachers organize and monitor their collective inquiry. Here, the inferential statistics allowed them to determine where there were differences. Qualitative methods allowed the researchers to see how the collective inquiry unfolded with the different instructional designs. Video analysis documented the temporal unfolding and elaboration of collaborative inquiry structures. In a study of CSCL in threaded discussions, Borge, Ong, and Rosé (2018) counted indicators related to regulation from the group discourse, used inferential statistics to compare across conditions, and used a qualitative case study to demonstrate how a group’s collaborative activity and regulation changed over time. In both of these examples, quantitative analysis allowed researchers to see what changes occurred, and qualitative analyses were used to examine how change unfolded.

Fig. 1
A bar graph compares the qualitative analysis of different methods. Experimental design in qualitative only has a high value of approximately 30%.

Analysis types by research design

Other ways of mixing methods do not always show up in this kind of analysis. For example, Suthers et al. (2013) brought together CSCL researchers from different methodological traditions to engage in what they called Productive Multivocality in the Analysis of Group Interactions. This series of workshops demonstrated how different methods might be applied to the same datasets and illuminate new understandings that the original researcher might not have considered (Suthers et al., 2013). For example, one dataset was about peer-led team learning in chemistry; the different methods applied to it included ethnographic analysis, two different multidimensional approaches to coding and counting, as well as a social network analysis. The application of different methods revealed tensions in how data need to be prepared for analysis as well as how to conceptualize a given learning event. Different analytic approaches often driven by different research questions contributed toward constructing a richer understanding of the events. The process of resolving differences led to deeper elaboration of the underlying collaborative mechanisms as well. These different approaches provided new insights on the construct of leadership in groups.

3.5 On the Relation Between Theory and Method

Although other chapters focus on theory (Stahl & Hakkarainen, this volume), this section refers to the diversity of theories and the specific relationship to research methods. Consistent with the earlier review in Jeong et al. (2014), analysis of the expanded corpus from 2005 to 2014 (McKeown et al., 2017) shows that CSCL uses diverse theoretical frameworks that include information processing, socio-cognitive, constructivist, sociocultural, communication, social psychology, motivation, and other theoretical frameworks. This aligns well with the argument in Wise and Schwarz (2017) that CSCL does have multiple explanatory frameworks. Some of these differences are related to the multidisciplinary nature of CSCL (Strijbos & Fischer, 2007).

Jeong, Hmelo-Silver, and Yu (2014) identified several clusters organized around theory and method. These clusters represent patterns of theoretical perspectives and research designs. One pattern identified is that socio-cultural frameworks tended to be used in qualitative studies, contextualized in classrooms, and with descriptive research designs (e.g., Ares, 2008; Berge & Fjuk, 2006). Another pattern used general constructivist perspectives with quasi-experimental classroom studies (e.g., Dori & Belcher, 2005; Van Drie, van Boxtel, Jaspers, & Kanselaar, 2005). Other patterns were more eclectic with multiple theoretical orientations guiding either descriptive classroom designs or experimental laboratory designs.

3.6 Challenges in CSCL Research Methods

There are many challenges in conducting research in CSCL environments. The technology is only a piece of a complex system of CSCL that also includes pedagogy and collaborative groups in particular contexts (Arnseth & Ludvigsen, 2006). In such environments, one challenge is identifying the appropriate unit of analysis. For studies arising from cognitive and socio-cognitive perspective, this might be the individual nested within a group but studies coming from a socio-cultural perspective, the group itself and the emergent dialog might be the appropriate unit of analysis (Janssen, Cress, Erkens, & Kirschner, 2013; Ludvigsen & Arnseth, 2017; Stahl, 2006). For others, the overall activity system would be the focus of analytic interest as in Danish’s (2014) study of young children learning about complex systems through collaborative engagement with a simulation in a carefully designed activity system. The curriculum unit was designed around four key activities and the analyses demonstrated how these activities organized students’ collective engagement with the target concepts.

Related to the challenge of identifying appropriate units of analysis is that of segmentation and coding. For example, in CSCL when chat and threaded discussions are used, reconstructing the response structure can be a challenge as it is not always clear what participants are responding to (Strijbos & Stahl, 2007). The ability to appropriately segment units for analysis also presents challenges in terms of reliability of coding.

The multidisciplinarity of CSCL creates additional challenges for the field. The field is not as cumulative as it might be because researchers tend to ignore results from methodologies unlike their own rather than looking to triangulate across studies with different kinds of methods (Strijbos & Fischer, 2007). The diversity of methods, and even hybrid methods, leads to a lack of standards for how research results are reported which makes the rigor of studies difficult to evaluate. This highlights the importance of documenting how methods have been adapted and combined so that other researchers can use these methods in the future. Developing standards for such research is what Kelly (2004) has called the argumentative grammar for evidentiary support needed to warrant claims. An argumentative grammar is a clear and explicitly stated logic of inquiry that provides the basis for the particular methods selected and how claims can be warranted. This remains a challenge because of the multiple theoretical frames and methodological tools used in CSCL research.

Another important challenge in CSCL and much learning sciences research is the challenge of incorporating situational and contextual factors into analysis of CSCL. Arvaja, Salovaara, Häkkinen, and Järvelä (2007) combined individual and group level perspectives as a way to account for context but this remains one of the tensions between qualitative and quantitative research methods. Qualitative methods tend to be excellent at providing rich descriptions of contexts but are also focused very much on the particular. Mixed methods become especially important in accounting for different aspects of cognition, learning, and the learning contexts. Nonetheless, many current approaches to accounting for context are highly labor-intensive and require many person-hours for data collection, management, and analysis, whether it is video data sources or thousands of lines of chat data when trying to understand processes in CSCL settings.

4 The Future: Addressing Challenges and New Horizons

This corpus and literature analyzed here present recent methodological trends in CSCL but it has not necessarily accounted well for certain aspects of CSCL research. In this last section, we highlight several important trends in methods for analyzing CSCL: temporality, visualizations, automation, and units of analysis. These trends address challenges related to the messy, contextualized, labor-intensive, and collaborative nature of CSCL research.

One way of dealing with the situated nature of learning in CSCL is for the research methods to be able to account for the temporal dimension. CSCL researchers have the opportunities to study learning processes that unfold over extended periods of time (Reimann, 2009). A number of researchers have argued that this dimension needs to be addressed explicitly in CSCL research (Järvelä, Malmberg, & Koivuniemi, 2016; Kapur, 2010; Reimann, 2009; Suthers, Dwyer, Medina, & Vatrapu, 2010; Zhang et al., 2018). To accomplish that, Reimann (2009) has argued for events being considered as a central unit of analysis in which entities participate. Reimann notes the importance of formalizing methods of analyzing these kinds of events. Suthers et al. (2010) argue for interactions and uptake as a way of capturing temporal relationships. Uptake is defined as “the relationship present when a participant’s coordination takes aspects of prior or ongoing events as having relevance for an ongoing activity” (Suthers et al., 2013, p. 13). This latter approach considers the importance of contingencies between events, which can include both temporal and spatial coordination across actors and media. Approaches to dealing the temporal nature of CSCL can take highly sophisticated statistical forms (e.g., Chiu, 2018; Csanadi, Egan, Kollar, Shaffer, & Fischer, 2018; Reimann, 2009; Kapur, 2010) as well as more qualitative approaches. For example, quantitative approaches such as sequential data analysis, can analyze the probabilities of how likely one event is to be followed by another. In CSCL, these events could be different kinds of dialogue moves, such as argumentation (Jeong, Clark, Sampson, & Menekse, 2011). To model dynamic collaboration processes, statistical discourse analysis (SDA, Chiu, 2018) estimates the likelihood of particular discourse moves during each turn of talk or online message and the influences of explanatory factors at multiple levels (e.g., individual, group, class). SDA detects pivotal moments that substantially change an interaction and models factors affecting the likelihood of these moments (Chiu, Molenaar, Chen, Wise & Fujita, 2013). The analyses of these temporal dimensions of CSCL can help identify events critical to idea development or failures in social regulation, which in turn can help teachers and designers to determine when and perhaps how to intervene (Oshima, Oshima, & Fujita, 2018).

Visualizations can provide both temporal and relational information that could aid in qualitative interpretation of complex CSCL data that go beyond coding and counting utterances (Csanadi et al., 2018; Hmelo-Silver, Liu, & Jordan, 2009; Suthers et al., 2010). This helps address the challenge of dealing with the rich contextual information found in CSCL environments. One way of integrating across different kinds of data and multidimensional coding schemes is through the use of visualizations. Visualizations take advantage of human perceptual capabilities as they allow one to view and search a large amount of information in a single glance (Larkin & Simon, 1987). They can support perceptual inference and pattern recognition. When computer tools are used to create these visualizations, they can be used to manipulate the representations created, allowing an analyst to examine different parts of interactions, zooming in and zooming out as needed (Hmelo-Silver, Liu, & Jordan, 2009; Huang et al., 2018; Howley, Kumar, Mayfield, Dyke, & Rosé, 2013). For example, Huang et al. used CORDTRA (Chronologically-oriented Representations of Discourse and Tool-Related Activity) diagrams to analyze collaborative modeling practices in a citizen science community, showing their tool-mediated interaction on a longer time scale (a working session) and then used it to zoom into a 5-min excerpt of interest when the modeling tool was particularly salient as a boundary object to support collaboration. Howley et al. (2013) used TATIANA’s sliding window to look at how distributions of discourse codes changed over time. Suthers et al. (2010) constructed contingency graphs that demonstrate how interaction is distributed among participants and tools over time to support their framework for uptake analysis. More recently, Csanadi et al. (2018) have introduced epistemic network analysis to visualize and quantify temporal co-occurrence of codes. This latter approach is unique in allowing statistical comparisons across networks.

Many of the approaches described in this chapter require intensive work to code verbal discourse of observational data, online interactions, and patterns of activity from a variety of sources (Law, Yuen, Wong, & Leng, 2011). Work in the future of CSCL will include automated approaches to data analysis and multimodal learning analytics (see Schneider, Worsley, & Martinez-Maldonado, this volume; Wise, Knight, & Buckingham Shum, this volume). Law, Yuen, Wong, and Leng (2011) developed automated coding and visualization to focus on indicators of high-quality asynchronous discussions in Knowledge Forum. Howley and Rosé (2016) have used automated systemic functional linguistic analysis of discussions to illuminate social dimensions of interaction. Nistor et al. (2015) used automated dialogue analysis to identify clusters of central and peripheral participation in virtual communities of practice. Moreover, they found substantial correlation between automated and hand coding of the discourse data. In addition to making sense of discourse and computer-generated data, multimodal learning analytics can be used to make sense of CSCL processes that go beyond what can be captured by a computer (Blikstein & Worsley, 2016; Noroozi, Alikhani, Järvelä, Kirschner, Juuso, & Seppänen, 2019). Blikstein and Worsley noted that in addition to text analysis, similar to the work on automated discourse coding, there are technological capabilities for analysis of speech, handwriting, gesture, and movement. In addition, sensors detecting different physiological markers can be used to infer affective states (e.g., Malmberg et al., 2019). Eye-tracking data can be used to track what learners are attending to and strategies that they are using. Indeed Blikstein and Worsley (2016) argue that the most promising use of gaze data is in small groups to track joint visual attention. Noroozi et al. (2019) note the importance of coordinating multimodal data with video to support studying such complex phenomena as socially shared regulation. Buckingham-Shum and Ferguson (2012) argue that opportunities for designing learning analytics that involves social aspects of learning are particularly timely, challenging, and need to consider the ethical as well as technical issues.

5 Conclusion

The methodological landscape of CSCL is complex, and like other aspects of the learning sciences often uses multiple methodological approaches and techniques (Major et al., 2018; Yoon & Hmelo-Silver, 2017). This chapter builds on the review of Jeong et al. (2014) with an update of an additional 5 years of literature. This suggests that the research methods used in the field are stable overall; however, some newer techniques have been introduced in the time since the corpus reported here was generated. These multiple research methods in CSCL are consistent with a survey of learning scientists that demonstrated the learning sciences researchers are involved in research that uses a broad range of methods (Yoon & Hmelo-Silver, 2017). It was surprising to not see design-based research more prominently featured in the sample given that it is the signature methodology taught in many programs in the learning sciences (Sommerhoff, Szameitat, Vogel, Chernikova, Loderer, & Fischer, 2018). Although many challenges remain, CSCL researchers are developing new ways of mixing methods and new techniques to help deal with the messy real-world contexts that characterize research in the learning sciences more generally (e.g., Brown, 1992; Kolodner, 2004).

Although this review covers a broad range of CSCL literature, it primarily focuses on STEM domains and education and the years covered in the systematic review. It is important to learn more about the generality of these research methods across other different disciplinary contexts. This review also only focuses on research published in peer-reviewed journals, which may miss some current trends in research methods represented in CSCL conference proceedings. However, given the limited space in such proceedings, it would be challenging to extract the necessary methodological features from studies reported in such venues.

In general, CSCL tends to ask a range of research questions, but these are often organized around specific technologies/and or pedagogical interventions. Although most studies have clear theoretical orientation, there were few studies that engaged explicitly in theory development and testing. This provides challenges in building a cumulative science of CSCL. Encouraging however is the range of data sources and analytic tools being used to address these questions. Although CSCL has many challenges in terms of being resource-intensive to analyze and complex to understand, CSCL research methods are an active topic for reflection and research within this research community. These reflective discussions acknowledge the challenges and provide impetus for the development and appropriation of new methods for understanding learning in CSCL environments.