1 Introduction

Collaboration is becoming increasingly vital nowadays, especially with the development of the internet. The online education paradigm has shifted to collaboration-based knowledge building (Harasim, 2017; Scardamalia & Breiter, 2016). Many studies have proved that well-designed online collaboration learning could improve students' critical thinking, reflection, satisfaction, and attitude (Altınay, 2017; Ku et al., 2013; Lee, 2007; Thompson & Ku, 2006). Collaborative activity promotes initiatives, creativity, critical thinking, co-creation of knowledge, reflection, and transformative learning (Brookfield, 1995; Palloff & Pratt, 2010). Moreover, having a group or community will encourage the exchange of dialogue and reflection (Duffy & Cunningham, 1996).

Asynchronous discussion forums provide students the opportunity to explore their critical thinking in online learning (Beckman & Webber, 2016). The students are encouraged to exercise their higher-order thinking skills into meaningful discussions. Furthermore, they are given time to construct their thoughts and argumentations through reflective thinking in the asynchronous nature of discussion (Garrison et al., 1999). On the other hand, as a facilitator, the teacher plays a pivotal role in directing the discussion through triggers and interventions. Therefore, the transcripts of discussion forums are a high potential to measure the learning engagement in online learning. The use of discussion forums in online learning over the years has resulted in big data in education. These potential data provide education practitioners with essential information in understanding the learning process.

The role of artificial intelligence is very instrumental in exploring big data in education. Recently, research in educational data mining (EDM) and learning analytics (LA) has been rising. Aldowah et al. (2019) classified the applications in these areas into four categories: computer-supported learning analytics, computer-supported predictive analytics, computer-supported behavioral analytics, and computer-supported visualization analytics. Various techniques in EDM and LA, such as prediction, text mining, visualization, association, clustering, and process mining, have been successfully implemented to solve educational problems (Romero & Ventura, 2020). However, Baker (2019) identified the challenges in EDM and LA areas such as transferability, effectiveness, interpretability, applicability, and generalisability. In analysing discussion transcripts, the techniques in text mining plays the leading role. This current study focuses on the recent development of automatic content analysis of asynchronous discussion forum transcripts.

1.1 Content analysis

Krippendorff (2018) defined content analysis as a research method for making reliable, replicable, and valid inferences from texts to the contexts of their use. This scientific method has systematic procedures to investigate the phenomena using specific coding schemes in a text-based environment. The typical phases in the conventional content analysis are theory or rationale, conceptualisation, operationalisation, coding scheme, sampling, training or pilot reliability, human coding, final reliability, and reporting (Neuendorf, 2002). Furthermore, the types of unit of analysis are message, paragraph, illocution, sentence, and thematic (Rourke et al., 2001; Strijbos et al., 2006). Therefore, content analysis can be used in analysing the various types of educational data. Table 1 below shows the types of data sources used in the content analysis of educational data.

Table 1 Type of sources used in content analysis of educational data

Many previous studies conducted content analysis on various types of data sources such as asynchronous communication (Junus et al., 2019), synchronous communication (Morais & Sampson, 2010), text-based assessment (Chang, 2019; Poldner et al., 2012). Ferreira-Mello et al. (2019) investigated that discussion forum is the most-used data source in analysing text-based educational data between 2016 and 2018. The studies on content analysis of discussion forum transcripts were used to extract meaningful information from such as knowledge construction (Gunawardena et al., 1997), educational experience (Garrison et al., 1999), learning process (Henri, 1992), and critical thinking (Newman, 1995).

The conventional content analysis approach raised issues, such as coping with big data (Garrison et al., 2006). Fahy (2001) also addressed the lack of discriminant capability and reliability. This is because of the complexity of the instrument or the unsuitable unit of analysis. It is also a time-consuming task. Moreover, it cannot provide real-time analysis, which is essential in learning. Manual interpretation is extensively used in the coding process. The coding process requires experienced and trained coders to interpret the data. In addition, learning processes are dynamic, and they change rapidly. Instructors should monitor the discussion routinely in order to maintain the learning engagement. Consequently, there is a need for an automated approach for evaluating texts to understand and improve learning.

1.2 Automatic content analysis

In recent years, and with the development of technology, the techniques of automatic content analysis have become increasingly advanced. The techniques of text mining play a significant role in analysing educational text automatically. Text mining aims to convert unstructured text into high-quality information (Berry, 2014). Ferreira-Mello et al. (2019) identified four main techniques of text mining in exploring educational data: text classification, natural language processing, information retrieval, text clustering, and summarization.

Figure 1 illustrates the automatic content analysis of online learning discussion forums. Automatic content analysis of discussion transcripts can be used for various aspects of measurement in course resources, learner products of learning, and learner social interactions (Kovanović et al., 2017). On the teacher side, they can also monitor, direct and get the latest information about the dynamics of discussions in forums, such as social presence, cognitive, participation, and engagement. On the learner's side, they can obtain important information in the discussion, such as topics covered, summaries, and level of participation.

Fig. 1
figure 1

Overview of automatic content analysis of online discussion forums. Source: Developed by the authors using online design tool Piktochart (https://piktochart.com/)

The issues that has emerged in the field of content analysis and automatic content analysis are in the aspect of quality (Fahy, 2001; Garrison et al., 2006; Rourke et al., 2001). Neuendorf (2002) defined three standards in evaluating the quality of content analysis procedure: accuracy, precision, and reliability (See Table 2). These quality standards are essential to measure to what extent the performance of automatic content analysis.

Table 2 Quality standards in evaluating content analysis procedures (Neuendorf, 2002)

Recently, there were many systematic literature reviews on educational data mining and learning analytics in general (Aldowah et al., 2019; Koedinger et al., 2015; Romero & Ventura, 2020). In addition, previous studies reviewed the implementation the text mining in education (Ferreira-Mello et al., 2019; Mohammed et al., 2021). However, in recent years, no systematic literature review has explicitly focused on investigating the automatic content analysis of asynchronous discussion forum transcripts. Hence, the current study presents a systematic literature review on the development of automatic content analysis of asynchronous discussion forum transcripts. This article investigates this research area's current state, methods, limitations, and future works between 2016 and 2021.

2 Research questions

The current study aims to explore the current state of the art of researching automatic content analysis in online learning discussion transcripts. Therefore, the following research questions (RQs) guide this study:

  1. RQ1.

    What kind of measurements were used in analysing online learning discussion transcripts automatically?

  2. RQ2.

    What theoretical frameworks were used in analysing online learning discussion transcripts automatically?

  3. RQ3.

    What kind of learning contexts were used to analyse the transcripts of online learning discussions automatically?

  4. RQ4.

    What kind of methods were used in analysing online learning discussion transcripts automatically?

3 Methodology

According to Jesson et al. (2011), a systematic review must be defined with clearly stated purposes, research questions, and search approaches, stating inclusion and exclusion criteria in producing a qualitative appraisal of the article. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework, developed by Page et al. (2021), was our primary guideline in the systematic literature review. PRISMA framework provides comprehensive tools for reporting transparent systematic review such as checklists and flow diagrams. Many studies have used this review guideline in e-learning studies (Jayakumar et al., 2015; Moon & Park, 2021; Peixoto et al., 2021). In addition, the snowballing procedure was added to the PRISMA framework to identify additional relevant studies (Wohlin, 2014). This snowballing procedure aims to complement the results from PRISMA and minimise missing the relevant studies.

3.1 Information sources

This study used five primary electronic academic and citation databases: ACM Digital Library, IEEE Xplore, ScienceDirect, Scopus, and Education Resources Information Centre (ERIC). The publication types of literature reviewed were conferences or journal articles that have gone through a peer-review process. The range of the year of publication of literature is 1 January 2016 to 31 October 2021.

3.2 Eligibility criteria

In conducting the literature review, the eligibility criteria were applied in selecting the relevant articles. These prepared criteria are based on the context and the credibility of the works of literature studied. Table 3 shows the eligibility criteria used as a guide in our study.

Table 3 Eligibility criteria in the current study

3.3 Search strategies

Initially, the relevant studies were searched in five primary electronic databases based on the search string. The search string consisted of three concepts that that related to our research objectives:

  1. 1.

    The method. This first concept consisted of several methods of automatic content analysis in analysing text. The keywords are automatic content analysis, text categorisation, text classification, text mining, text segmentation, topic model, and summarisation.

  2. 2.

    The source of data. The second concept consisted of the sources of data used in the research. The keyword is asynchronous discussion forum.

  3. 3.

    The context. The third concept is the context of the research. Therefore, the keywords are online learning and e-learning.

Based on the three concepts above, our final search string is (‘automatic content analysis’ OR ‘text categorisation’ OR ‘text classification’ OR ‘text mining’ OR ‘text segmentation’ OR ‘topic analysis’ OR ‘summarisation) AND (‘asynchronous discussion forum’) AND (‘online learning OR e-learning).

3.4 Selection process

In the selection process, there are three main stages: identification, screening, and included. The PRISMA flow chart in Fig. 2 illustrates the article selection process in this study. In identification stage, searches were conducted through five online academic database portals: Scopus, IEEE Xplore, ACM Digital Library, ScienceDirect, and ERIC. The articles that did not meet the eligibility criteria were removed.

Fig. 2
figure 2

PRISMA flow chart for selection process. Source: Developed by the authors using PRISMA Framework (Page et al., 2021)

The second stage is screening. The first screening is conducted based on the title, abstract, and keywords. Furthermore, we assessed the eligibility of the articles more deeply based on the full text of the article. From this screening stage, 48 articles deserved to be included in this study. The last stage is the finalization of the articles included in this study. In this case, the snowballing method was used to complement the previous screening process. The snowballing method analyses the articles' references obtained in the previous screening stage to find additional relevant articles. In the end, 54 relevant articles were selected as studies included in our systematic literature review.

4 Findings

The selection process using the PRISMA framework and snowballing procedure resulted in 54 relevant articles. Table 4 shows that most studies were published in conferences proceedings, with 32 articles. Moreover, there are 22 journal articles. IEEE Xplore and ACM Digital Library are the databases that stores the most relevant articles, with 15 articles.

Table 4 Articles included by academic and citation databases

By year of publication, the most articles selected in this literature study were from 2018, with as many as 17 articles (see Fig. 3). There are at least three articles in 2017. It shows that few are still interested in this topic in the early years, namely 2016–2017.

Fig. 3
figure 3

Articles included by year. Source: Developed by the authors

The articles included in the literature study come from credible academic sources in the scientific world that combines technology and education research areas. The International Conference on Learning Analytics & Knowledge became the publisher that contributed the most to this study, with six articles. This conference is also well known in learning analytics research, and is organised by the Society for Learning Analytics Research (SoLAR). The Internet and Higher Education became the publisher with the most articles, with three articles in journals publications. This journal is ranked first in the field of e-learning, according to SCIMAGO. It shows that this topic is interesting and widely studied by researchers, both in computer and social sciences.

Keyword co-occurrence network was constructed using VOSViewer, as depicted in Fig. 4 (van Eck & Waltman, 2010). Nineteen related keywords were identified with a threshold co-occurrence value is five. These keywords represent the main concepts from included studies. There are three clusters in the co-occurrence network, visualised by different colours.

Fig. 4
figure 4

Visualisation of keyword co-occurrence network. Source: Developed by the authors using VOSViewer (van Eck & Waltman, 2010)

From Table 5 above, 54 relevant articles were obtained researching automated analysis of learning discussion transcripts in the last six years. Quantitative and qualitative analysis was conducted to dig deeper into several aspects: measurement, data collection, methods, and theoretical frameworks or models used in these studies. Appendix Table 16 shows a summary of selected studies in our literature review.

  1. 1.

    Measurement. The measurement can be assessed from the text data of the transcript of the discussion of online learning.

  2. 2.

    Learning context. The learning context or scope of previous studies in terms of language and learning mode.

  3. 3.

    Method. The methods or techniques used in related studies are based on their approach, algorithm, and results.

  4. 4.

    The theoretical framework. The frameworks are used as the theoretical background for these studies.

Table 5 Publisher of articles included

Figure 5 below illustrates the context study of our systematic literature review in investigating the automatic content analysis of online learning discussion transcripts.

Fig. 5
figure 5

Automatic content analysis of online learning discussion transcripts review. Source: Developed by the authors using online design tool Piktochart (https://piktochart.com/)

4.1 Addressing RQ1. What kinds of measurements were used in analysing online learning discussion transcripts automatically?

Based on in-depth qualitative analysis, there are eight measurement dimensions in this literature study. The dimensions are cognitive, social, relevance/importance, summary, pattern, behaviour, topic, and pattern. However, these dimensions are not definitive. Hence, there are dependable criteria with these dimensions: objective, variables, key terms, and learning contexts of all selected studies. The summary of the dependable criteria used in analysing the dimensions is in Table 6 below.

Table 6 Dependable criteria of the measurement dimensions

Table 7 shows the measurements summary identified from selected studies. The social dimension was the most explored during 2016–2021, with fifteen articles. It shows that social measurement in learning discussions is of great interest to researchers. According to Garrison et al. (1999), social presence is one of the main prerequisites in achieving successful learning discussions. Ferreira et al. (2020) proposed a model that can automatically measure the extent of a learners’ social presence (affective, interactive, and cohesive) in a discussion. In addition, Barbosa et al. (2021) explored automatic approach to classify the social presence in Portuguese discussion transcripts.

Table 7 Measurement of selected studies in literature review

The social aspect is also closely related to the level of collaboration and interaction of learners in discussions. One of the visualisations to see social interactions in learning discussions is networking. Social network analysis is the most widely used research in this literature study (Boroujeni et al., 2017; Chen et al., 2018; Gašević et al., 2019; Wise & Cui, 2018b; Wise et al., 2017a, 2017b; Xie et al., 2018). In contrast to social networks, epistemic networks visualise the relationships between different concepts used in the discussion text data (Ferreira et al., 2020; Gašević et al., 2019).

The sentiment is also a measurement in the social dimension widely studied by researchers (Fu et al., 2018; Kang et al., 2018; Liu et al., 2018; Moreno-Marcos et al., 2018, 2019). Sentiment analysis is used to measure whether messages or posts from learners are positive, negative, or neutral. Measurement of this sentiment is important for teachers to see the level of attachment, emotion, and motivation of learners in the ongoing discussion. Thus, teachers can provide fast and appropriate feedback.

The second most studied dimension is relevance or importance. The measurement of the relevance dimension aims to identify the level of relevance or importance of threads, sentences, and posts in the discussion. During much information in discussion forums, this automated analysis is helpful for students in finding which threads are related to the substance of learning (Wise & Cui, 2018a, 2018b; Wise et al., 2016, 2017b, 2017a). This level of importance measurement can be divided into two forms, namely sentences (Le et al., 2018) or posts (Almatrafi et al., 2018; Liu et al., 2016; Machado et al., 2019; Sun et al., 2019).

The next most explored dimension is cognitive. The closest dimension to achieving the main goal in the learning discussion, namely acquiring new knowledge. According to the Community of Inquiry (CoI) framework (Garrison et al., 1999), the degree to which learners construct meaning through reflection and critical conversation is represented by cognitive presence. Therefore, automatic cognitive presence analysis can help teachers measure the quality of discussion quickly (Barbosa et al., 2020; Barbosa et al., 2021; Farrow et al., 2019; Hayati et al., 2020; Hayati et al., 2019; Kovanovic et al., 2016; Neto et al., 2018). Furthermore, Yang et al. (2022) used this cognitive presence to measure the critical thinking level of the students: low, medium, and high.

The measurement of cognitive engagement has also been extensively researched to see how learners feel attached to the activities in the discussion (Hayati et al., 2020; Wu & Wu, 2018). This cognitive engagement was measured using the ICAP framework with interactive, constructive, active, and passive modes of behaviour (Chi & Wylie, 2014). Cognitive processing was also investigated based on analytical thinking, influence/clout, and authenticity (Moore et al., 2019). In addition, Huang et al. (2019) proposed an automatic analysis method to distinguish students' confident or unconfident sentences.

Topic measurement is also one of the trends in discussion transcript analysis research (Atapattu et al., 2016; Brinton et al., 2018; Liu et al., 2018; Peng et al., 2020, 2021; Rolim et al., 2019; Wong et al., 2016; Yang et al., 2018; Zarra et al., 2018). Extracting and modelling this topic helps see what topics are being discussed in the forum. In addition, this measurement is also often combined with other measurements such as summarisation. Gottipati et al. (2019) proposed a method of summarising the content of discussion based on a topic model. However, unfortunately, there is very little research on auto-summarising the content of discussions in recent years.

Some of the measurements analysed in the dimensions of learner behaviour are speech acts, speaking behaviour, leadership, and questions or answers. Speech act analysis aims to categorise discussion messages into seeking information or providing information (Boroujeni et al., 2017; Hecking et al., 2016; Joksimovic et al., 2020). This analysis is similar to identifying the form of a message in the form of a question or answer (Brinton et al., 2018; Irish et al., 2020; Rolim et al., 2019). Chen et al. (2018) also analysed the lexical features in identifying speaking behaviour in a message. Xie et al. (2018) identified learners who become leaders based on the messages conveyed in the discussion. Bosch et al. (2020) also proposed a solution to hiding confidential information in discussions with an automated classification process.

Pattern measurement is also an interesting aspect to study. This measurement is closely related to prediction. Wong and Li (2016) analysed patterns in discussions with chance discovery analysis to predict students' academic performance. However, the research in pattern prediction with the concept of serendipity learning is still limited. In addition, An et al. (2019) proposed a method to link text in discussion with resources in online learning.

These eight dimensions measurements above show the research trend in automatic content analysis of asynchronous discussion forum transcripts. These dimensions indicate different research focus. Fifty-nine percent of studies focus on helping teachers understand learning processes in the discussion forum. These learning processes are measured from social, cognitive, and behaviour dimensions. These three dimensions are vital in constructing knowledge and improving critical thinking through meaningful discussions. Therefore, the teachers should obtain the information to provide appropriate feedback. These measurements are generally difficult to perform compared to others due to their implicit nature. Table 8 below shows the trend of studies.

Table 8 The trend of studies based on the measurement dimensions

Ten studies aim to measure the primary information from discussion to help students learn better from discussions. The automatic topic extraction and summary measurements are helpful for students to understand the discussions quickly. Most studies also developed supporting tools to detect and visualise the topic and summary. The other studies focus on helping the participants conduct the discussions better with relevance and learning resources measurements. These measurements help the participants organize and select the related items in discussions such as posts, threads, and sentences. In addition, one study focuses on predicting students’ performance in pattern measurement.

4.2 Addressing RQ2. What theoretical frameworks were used in analysing online learning discussion transcripts automatically?

The theoretical framework is fundamental in content analysis research. Analysis of online learning discussion transcripts certainly requires guidance, especially in classifying or categorising messages, threads, or sentences. In the current study, there are some criteria used in including the theoretical frameworks. The theoretical framework must be equipped with a coding scheme or basic guideline for analysing text. It also reliable and valid framework. It can be showed from the impact of the theoretical framework. Therefore, the number of citations of original paper should be high. There are six theoretical framework identified: Community of Inquiry (Garrison et al., 1999), Post Classification (Stump et al., 2013), Speech Acts (Arguello & Shaffer, 2015), ICAP (Chi & Wylie, 2014), Speaking behaviour (Wise et al., 2014) and Bloom’s Taxonomy Revised Version (Anderson et al., 2001). However, there is only 41% studies used these theoretical frameworks. Table 9 shows which frameworks have been used in recent years.

Table 9 Theoretical frameworks of selected studies in literature review

As Table 9 shows, the CoI framework is the most-used framework for content analysis for a text-based learning environment. This framework consists of three prerequisite elements for achieving successful discussion: cognitive, social, and teaching presences. Six studies explored automated cognitive presence measurement approaches (Barbosa et al., 2020; Farrow et al., 2019; Hayati et al., 2019; Hayati et al., 2020; Kovanovic et al., 2017; Neto et al., 2018). Ferreira et al. (2020) also proposed a method for analysing social presence. However, over the last six years, no one has researched an automated approach to measuring teaching presence.

Another popular theoretical framework used in analysing cognitive engagement is the ICAP framework (Chi & Wylie, 2014). The ICAP framework has four modes, namely Interactive, Constructive, Active, and Passive. These four modes show the learner's behaviour or cognitive engagement in a learning discussion. Wu and Wu (2018) analysed cognitive behaviour and used it to measure predicted learning gain. In addition, Hayati et al. (2020) also investigated learners' cognitive engagement and social interaction in discussion.

On the other hand, Stump’s framework has been used by five studies (Stump et al., 2013). This framework has three main categories of messages, namely seeking information, providing information, and others. It was used by the same first author (Wise & Cui, 2018a, 2018b; Wise et al., 2016, 2017b, 2017a). The Speech Acts framework is also used to identify questions, answers, issues, issue resolutions, positive acknowledgements, and negative acknowledgements in a message (Arguello & Shaffer, 2015). Hecking et al. (2016) used it to see the level of information exchange in discussions. Boroujeni et al. (2017) used this framework to predict discussion activity from the content dimension. In addition, this framework is also used to analyse the dynamics of the implementation of the discussion (Joksimovic et al., 2020).

Speaking behaviour in discussion is also used as the basis for research related to analysis in the discussion. Wise et al. (2014) proposed a framework in collaborative learning on how to explore listening (message access) and speaking (message writing) activities in online discussions. Chen et al. (2018) combined this framework with social network analysis to generate insights for discussion participants. In addition, Bloom’s taxonomy also used as measurements of cognitive skills in discussion. Cheng et al. (2021) proposed automatic evaluation of cognitive level based on deep learning network.

4.3 Addressing RQ3. What kind of learning contexts were used to analyse the transcripts of online learning discussions automatically?

The context of learning is very influential in research in the field of education. The results obtained in research related to online learning discussions cannot be separated from the type of learning, the nature and the language used in the dataset. Each language has a different vocabulary, rules, and lexical. In recent years, seven languages have been ​​used in the selected studies (see Table 10).

Table 10 Languages of dataset in selected studies

English has become the most studied language in the last six years, with 42 studies. The recent development indicates that text mining of English text is more advanced than other languages. Many datasets, dictionaries, tools, and libraries in English help researchers study discussion transcripts such as LIWC, Stanford CoreNLP, Coh-Metrix, and SpaCy. The general approach in analysing text is to use features or dictionaries found in the same language. However, there is a specific approach utilising other languages ​​to analyse discussion transcripts. This approach uses datasets with multiple languages in the experiment. Table 11 below shows the studies that used the cross-language approach.

Table 11 Cross-language studies

Three studies proposed the cross-language-based approach. Feng et al. (2018) used several language features to categorise threads in English, French, Chinese, and Spanish. Barbosa et al. (2020) used a model trained by English data to classify transcripts of Portuguese discussions. In addition, Barbosa et al. (2021) proposed an automatic text translation method on English and Portuguese datasets to identify social and cognitive presences. The experiment improved performance when using English translation (Portuguese translated to English). In contrast to Portuguese translation, the performance was decreased. It shows that English has more significant linguistic features than other languages in classifying messages of discussion transcripts.

In terms of the types of online learning, we found two types examined in the research in this literature study: fully online and blended learning. However, one more category was added, because fourteen articles do not clearly describe their type of online learning. Fully online learning is the most widely studied. It could be because this type of learning will result in extensive discussion data. Generally, more data will improve the performance of the algorithm. Fully online learning has two educational level contexts: across education level and higher education. There is also one study in high school. On the other hand, there has been little research about automatic content analysis from blended learning. Blended learning mainly uses online discussion as a supporting tool. It is quite different from fully online learning, where discussion forums are one of the primary communication tools. Table 12 shows the type of online learning of selected studies in this literature review.

Table 12 Type of online learning of selected studies in literature review

Based on knowledge area perspective, there are two categories of datasets: science, technology, engineering, and mathematics (STEM), and humanities, arts, and social sciences (HASS). These terms are common knowledge areas broadly used in the educational curriculum of many countries (Gonzalez & Kuenzi, 2012; Turner & Brass, 2014). These knowledge areas should be acknowledged when conducting automatic content analysis as the nature of the research. Table 13 shows the comparison of the knowledge areas used in datasets.

Table 13 Knowledge areas of courses used as datasets

From the data in Table 13, it is apparent that 59% of studies used single-disciplinary datasets. Single-disciplinary researches only used either STEM or HASS courses as datasets. STEM courses have been used more than HASS’s with 27 studies. The STEM datasets used are programming, machine learning, software engineering, statistics, medicine, information systems, artificial intelligence, logic, technology, and discrete mathematics. On the other hand, the HASS’s datasets are psychology, history, law, general education, and English literature. These single-disciplinary researches also pointed out their limitations in generalisability issues.

In multidisciplinary researches, fifteen studies used both STEM and HASS courses as datasets in their experiments. These studies focussed on generalisability issues in analysing discussion transcripts with different knowledge areas. Wise et al., (2017a, 2017b) proposed a model to detect content-related threads on the cross-domain datasets from statistics and psychology courses. However, the generalisability experiments show poor results. Similar unsatisfactory results were also obtained from other studies. Almatrafi et al. (2018) used medicine and education courses as datasets on identifying urgent posts. Ntourmas et al. (2019) also reported non-acceptable results when testing their model on python programming and history courses.

These unsatisfactory results could be due to the differences in distribution and the unbalanced nature of datasets (Neto et al., 2021). Farrow et al. (2019) emphasised rebalancing classes and avoiding data contamination steps to improve generalisability. These findings indicate that generalisability is the main challenges of automatic content analysis that needs further investigation, especially cross-domain context. In addition, there were significant differences between STEM and HASS courses (Bates, 2015). Hence, the researchers need to define this context in their content analysis studies.

4.4 Addressing RQ4. What kind of methods were used in analysing online learning discussion transcripts automatically?

In recent years, previous studies used machine learning, rule-based, text-statistics, and co-occurrence methods in analysing online learning discussion transcripts automatically. Ninety-three percent of studies used the machine learning method. It can be concluded that machine learning still plays the leading role in this research area. Table 14 below shows a summary of the methods used in the selected studies.

Table 14 The methods of selected studies in literature review

There are two approaches in automated analysis of text data in online learning discussions, namely supervised and unsupervised. The supervised approach uses pre-labelled datasets. Therefore, training data needs to be prepared by giving labels or classes in advance. This data labelling process can be carried out based on the assessment of several coders manually or automatically. The supervised machine learning method performs training on sets where the algorithm learns based on the provided data set to get predictive features. Next, the model is evaluated by validating and testing with different data sets. Thirty-five studies explored the supervised machine learning methods. The common problems in the supervised approach are classification and regression modelling. In addition, five studies used deep learning methods. Deep learning imitates the concept of the human brain working with artificial neural networks. Deep learning is generally used to solve complex problems with extensive data. However, these studies are relatively new, between 2019 and 2021.

Unsupervised machine learning used to solve clustering problems. This approach analyses text that has no class or label. The algorithm measures the level of similarity or difference to group data, such as the extraction of topics discussed in a discussion. Topic modelling is a research problem that has been widely studied in recent years with fourteen studies. This automatic topic extraction is beneficial for both students and teachers to see the main conversation in the discussion. The most widely used algorithm for modelling topics in this literature study is Latent Dirichlet Allocation (LDA) with twelve studies.

In general, topic modelling methods were used to obtain the main topics of discussions. Recently, these methods were implemented to develop tools in detecting and visualising topics (Atapattu et al., 2016; Wong et al., 2016; Zarra et al., 2018). Also, these methods were combined with other methods to achieve different objectives. Brinton et al. (2018) investigated whether a student asks or answers a question using the topic modelling aproach. Rolim et al. (2019) identified students' weaknesses and strengths using LDA and SVM. Gottipati et al. (2019) developed a tool to generate a summary based on the topics from online learning discussion transcripts. Gašević et al. (2019) presented a method to analyze collaborative learning using a topic model, social network analysis, and epistemic network analysis. Three studies proposed models to understand forums with time, emotions, and topics features (Liu et al., 2018; Peng et al., 2020, 2021). We predict that the methods-combining approach will be essential in the future. The topic modelling methods can be used in other perspectives. The trend shows that LDA is a fundamental method in topic modelling. This method has the potential to develop further. Three studies developed new methods based on LDA (Liu et al., 2018; Peng et al., 2020, 2021).

Random forest is also predicted to be essential in the future. This algorithm is the most popular in solving the text classification problem with thirteen studies. Random forest is based on random decision trees. This ensemble algorithm is relatively faster to train than the others (Kowsari et al., 2019). Moreover, it also reduces variance. The performance of this algorithm is proven to be good in analysing discussion transcripts (Barbosa et al., 2021; Farrow et al., 2019; Ferreira et al., 2020; Kovanovic et al., 2017; Neto et al., 2021). However, special attention is needed in selecting decision trees to avoid overfitting. Too many decision trees used will increase complexity.

Dictionary or rule-based methods have different concepts from machine learning or deep learning. This method uses a guide that has been formulated previously, namely a dictionary or rules. Sixteen studies used this approach using existing dictionaries such as LIWC, Coh-Metrix, and NLPIR. The dictionary-based method is becoming an increasingly attractive method for the future because it provides advantages in terms of process speed and collaboration in dictionary development (Scharkow, 2017). Combining machine learning with dictionary or rule-based methods is also a promising approach in examining texts in discussion transcripts.

Other methods used are text statistics and co-occurrence analysis. These are the earliest methods used in the introduction of automatic content analysis. The text statistics uses statistical measurement in analysing discussion transcripts automatically. Irish et al. (2020) used the term frequency and the inverse document frequency (tf-idf) to find similar questions in discussion forums. In addition, co-occurrence analysis is an extension of text statistics method. This analysis uses the co-occurrence of specific features in the discussion transcripts such as word frequency, semantic, and time to construct association. Fu et al. (2018) used the co-occurrence analysis to group users in discussion forums with dense subgraphs visualization.

From the discussion above, each method provides different approaches and characteristics. We suggest combining machine learning and rule-based methods. Eleven studies proved that this combining method resulted in good performances (Barbosa et al., 2020; Barbosa et al., 2021; Farrow et al., 2019; Ferreira et al., 2020; Kovanovic et al., 2017; Liu et al., 2018; Moreno-Marcos et al., 2018; Moreno-Marcos et al., 2019; Neto et al., 2018; Peng et al., 2020, 2021). The experiments show that dictionaries features are also good predictors. Moreover, the researchers should implement more than one classifier in their experiments. This suggestion aims to obtain the best model based on the evaluation results.

The latest approaches in natural language processing will be more significant in the future, such as BERT (Devlin et al., 2019) and GPT-3 (Brown et al., 2020). These fine-tuning approaches use pre-trained language models with hyper-parameters in understanding text. Zou et al. (2021) showed that BERT performed promising results with accuracy, precision, and recall above 0.80. However, no study used GPT-3 to explore discussion transcripts automatically. Studying how far these latest algorithms will develop in this research area will be fascinating.

All of these studies are experimental. In machine learning, the standard evaluation measurements are kappa statistic, accuracy, precision, recall, and F-measure. These also represent the quality of automatic content analysis. The experiments’ results in Table 15 shows that the performance of the previous studies sounds promising. Nineteen studies reported high performance with an accuracy level above 80%. It also shows good performance in other evaluation metrics: precision, recall, and F-measure. However, there are no studies that reported kappa statistics more than 80%. Seventeen studies reported kappa statistics results that range from moderate to substantial levels of agreement. The kappa statistics are used to measure the reliability of the content analysis (Landis & Koch, 1977). These results indicate that there is an excellent opportunity for future research to improve reliability.

Table 15 The experiments’ results in selected studies

It should be noted that most studies used different datasets and learning contexts. Therefore, it cannot be compared directly. These results should be related to the contexts of the datasets. The detail of the experiments’ results can be seen in Appendix Table 16.

5 Discussions, Limitations and Research Directions

Insightful findings were found in the systematic literature review for future research in analysing discussion transcripts automatically. There are eight dimensions of measurement found in previous studies. These dimensions are cognitive, social, relevance/interest, summary, pattern, behaviour, topic, and learning resources. The top three dimensions that are widely studied are social, relevance/importance, and cognitive. The theoretical framework guides researchers in analysing these dimensions. Six theoretical frameworks were identified, namely Community of Inquiry (Garrison et al., 1999), Post Classification (Stump et al., 2013), Speech Acts (Arguello & Shaffer, 2015), ICAP (Chi & Wylie, 2014), Speaking behaviour (Wise et al., 2014), and Bloom’s Taxonomy Revised Version (Anderson et al., 2001). The CoI is the framework studied the most, in as many as ten studies. It suggests that the CoI framework can serve as an excellent comprehensive guide in measuring educational experiences in text-based environment.

The context of learning is vital in research in the world of education. This study found that most studies examine fully online learning, whether across education levels (MOOCs) or higher education levels. Fully online learning that can generate much data is one of the advantages for researchers. However, blended learning should also receive special attention for future research. Also, selected studies used seven languages: English, Chinese, Portuguese, Indonesian, French, Spanish, and Greek. The majority of research explores English, which is 78%. It opens opportunities for researchers to delve deeper into other languages. Furthermore, advanced research on English allows it to be combined with other languages ​​with a cross-language approach.

There are two approaches used in research related to automatic learning discussion transcripts, namely supervised and unsupervised. This approach aims to solve classification (thread, message, and sentence) and prediction problems in discussion. On the other hand, an unsupervised approach is used for modelling topic and message associations in online learning discussions. In our selected studies, machine learning is the favoured method to evaluate texts in discussion forums, in both supervised and unsupervised tasks. Out of 54 studies, 38 used machine learning as their primary method. In addition, nine studies combine machine learning with dictionaries methods. It signifies that the machine learning approach is still a trend for future research.

The literature review presented in this article has some limitations. Selection criteria are specific. This article does not include studies that are not following the research objectives. Moreover, the current study only investigates the last six-year publication (2016–2021). Despite its promising findings, the challenges to this research area need to be acknowledged. Educational contexts play an essential role in learning. Generalisability and reliability are still the main issues. The selected studies show that the experiments in generalisability of cross-domain context resulted in unsatisfactory performances. On the other hand, the reliability has potential to improve better in the future. In addition, most of the studies in this research area are in the method phase. However, there are still practical implementations in the topic, social, and summary measurements. In other measurements, there is a lack of evidence of the impact of the automated application in improving learning.

Future research will focus on tackling the challenges of automatic content analysis of asynchronous discussion forum transcripts. There is much room for improvement in this research area. The main challenge is to examine the automatic content analysis in different learning contexts, especially language and subject. Besides English, other languages ​​need to be explored further. Furthermore, there is a need to develop automatic applications at the cognitive level. The practical implementation of this automatic content analysis approach in the online learning environment is expected to impact learners and teachers positively.

As for future work, we plan to develop an automated system in analysing the cognitive presence, social presence, and teaching presence of asynchronous discussion forums based on the CoI framework. The current study shows a lack of research in Indonesian language and blended learning research contexts. Hence, our future work will explore the discussion forum transcripts of blended learning in higher education. The context of the language used is Indonesian. Currently, online collaborative learning, especially content analysis, is developing in Indonesia (Junus et al., 2019, 2021; Purwandari et al., 2022; Tjhin et al., 2017). The Indonesian government also played a role by developing SPADA (spada.kemdikbud.go.id) as the national MOOCs platform. The role of asynchronous discussion has become significant and relevant in Indonesia because of its various development of IT infrastructures. The asynchronous approach enables learners who do not have adequate infrastructure to study anytime (Afify, 2019). Moreover, the study of automated content analysis of asynchronous discussion forums using the CoI framework has not yet begun in Indonesia.

The initial studies were conducted to prepare our Indonesian language datasets from STEM courses: linear algebra (Junus et al., 2019) and computer-aided instruction classes (Junus et al., 2021). These previous studies used manual content analysis to identify the social, cognitive, and teaching presences from discussion transcripts. The unit of analysis used was messages. Three experienced coders were involved in the coding process based on Community of Inquiry’s coding protocol (Shea et al., 2010). The coding results of each coder were compared to obtain Fleiss’s kappa as interrater reliability coefficient. The content analysis results reported the level of agreements from moderate to almost perfect. Our future study will use the labelled datasets from these initial studies in developing an automated system to categorise cognitive, social, and teaching presences of Indonesian language discussion transcripts with machine learning and dictionary-based approaches.

Our research methodology consisted of three main phases: dataset preparation, model development, and implementation. Dataset generation is the first phase in developing the automated analysis system of Community of Inquiry. The purpose of this phase is to obtain a reliable and valid dataset. This phase consisted of five stages: planning, data collection, human-coding analysis, and dataset preparation. After getting the dataset, the next phase is model development. This phase aims to obtain the best model in classifying discussion messages into the Community of Inquiry categories using a machine-learning approach. Hence, there are two main stages: training and evaluation. Next, the best model obtained in the previous phase is implemented into an automated analysis system. This automated system should also be integrated where the online discussion takes place, in this case, in the Learning Management System (LMS). This methodology is used as our guidelines to develop the automated analysis of Community of Inquiry elements in online discussion forums. Figure 6 below shows our future research methodology.

Fig. 6
figure 6

Research methodology of future work. Source: Developed by the authors using online design tool Miro (https://miro.com/)

6 Conclusions

This article aims to investigate the current state of the research related to analysing discussion transcripts automatically. The systematic literature review was conducted using the PRISMA framework and snowballing on five electronic academic and citation databases: IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, and ERIC. The range of the years of publication of literature is between January 2016 and October 2021. Based on our eligibility criteria, 54 out of 414 articles are relevant to the automated approach in analysing discussion transcripts of online learning. Four in-depth aspects (i.e., measurement, learning context, methods, and theoretical frameworks used in these studies) were analyzed. This study found eight measurement dimensions: cognitive, social, relevance/interest, summary, pattern, behaviour, topic, and learning resources. Social, relevance, and cognitive were the top three measurements in the last six years. The CoI was the most widely used framework in analysing online learning discussion transcripts. It shows that the recent studies focus on helping teachers understand learning processes in the discussion forums. Seventy-two percent of studies investigated the data from fully online learning environments, especially MOOCs. Also, 78% of studies used English discussion transcripts as the dataset. In addition, 59% of studies used single-disciplinary datasets with STEM as the most-used knowledge area. In terms of method, machine learning was the most widely used method compared to other methods. Ninety-three percent of all studies used machine learning to analyse discussion transcripts. The experiments of studies show promising results in precision and accuracy. From multidisciplinary researches, the experiments show unsatisfactory results. Therefore, this research topic still has a big room for improvement in the future, especially in improving the reliability and generalisability of cross-domain context. Furthermore, the future direction of this research area should be related to the contexts of learning, such as language, learning environment, and subject.