Introduction

As chatbot technology is widely being adopted in various business fields (Grudin & Jacques, 2019; Li et al., 2020), more massive open online course (MOOC) providers are also starting to embrace chatbots for their promptness in generating responses and their potential to provide quality answers to students’ questions. Multiple studies in the area of online learning have found natural language processing (NLP) based chatbots—software programs based on NLP that converse with users (Dale, 2016; Rubin et al., 2010; Shawar & Atwell, 2007; Wambsganss et al., 2020)—to be effective in enhancing students’ social presence and promoting knowledge gains (e.g., Huang et al., 2019). Other researchers report that chatbots also encourage learner participation in online courses (e.g., Song et al., 2019). Furthermore, several researchers have engaged in studies to develop educational chatbots equipped to take on the tutor’s role to offset the limited teaching presence in online courses (e.g., Wang & Petrina, 2013; Winkler et al., 2020). Overall, these studies highlight the promising potential of chatbots’ in facilitating meaningful learning experiences for students taking online courses.

However, little research has focused on aspects of precise language usage and constructional practices reflecting in-depth student–chatbot interactions in real-world online course contexts. Early human and computer interaction (HCI) studies regarded computers and software as social actors, and many previous studies reported that users interacted with chatbots based on the constructs and frameworks traditionally used for human–human interactions (Morkes et al., 1999; Nass & Moon, 2000; Nass et al., 1994, 1999; Reeves & Nass, 1996). However, more recent chatbot studies explore why some people feel ambivalent about or bewildered while interacting with chatbots (e.g., Ciechanowski et al., 2019) and pinpoint the importance of understanding the characteristics of the human–nonhuman interaction process, which could be disparate from human–human interactions. Considering earlier research reports indicating that individuals’ perceptions of technology usage are sometimes disparate from their actual uses (Collopy, 1996; DeLone & McLean, 1992; Straub et al., 1995; Trice & Treacy, 1988; Yang & Yoo, 2004), it is necessary to investigate how people interact with chatbots based not only on their self-reported perceptions but also on the user-generated log data which afford the examination of actual usage. This approach is also aligned with the effort in learning analytics that seeks more data from various sources to overcome the ambiguity of log data (Gauthier et al., 2015; Horn et al., 2016).

Another crucial aspect that warrants attention regarding HCI studies in MOOCs is the more challenging experiences of non-native English users engaging with new technology compared to their native English classmates. A recent study by Han and Lee (2022) showed that although both non-native and native English users expressed similar levels of enjoyment and provided equivalent ratings of chatbot quality, the former group described higher (and statistically significant) levels of perceived barriers when using a newly adopted FAQ chatbot in MOOCs. According to the researchers, these perceived barriers detracted from an inclusive learning experience. This finding indicates other variables at play for non-native English users using technology such as chatbots in MOOC spaces; however, there have been no robust academic inquiries concerning these variables. Given that the majority of MOOC participants are non-native English users enrolled in courses provided in English (Cho & Byun, 2017; Engle et al., 2015; Reilly et al., 2016), there is an obvious need for an in-depth examination of this groups’ lived learning experiences within the context of new technology such as chatbots.

This study aims to address this research gap by focusing on the interaction behaviors between students and chatbots based on real-world uses (accessed via chatbot log data) and students’ self-reported responses. Adopting the multimodal computer-mediated communication (CMC) approach as a research guiding tool, we first examined students’ interactions with an FAQ chatbot focusing on the participants’ native language markers categorized by a binary classification—non-native English users vs. native English users. By analyzing 42 students’ (non-native English users n = 27 and native English users n = 15) precise language use and constructional practices in the student–chatbot interactions, this study explored the commonalities and differences between the two groups. Further, this study investigated possible reasons for the differences in the student–chatbot interactions from the perspective of positioning theory and interactional sociolinguistics. More importantly, this study sought chatbot response design strategies to mitigate potential barriers for non-native English users enrolled in MOOCs.

The research questions (RQs) guiding this study were:

  1. 1.

    How do non-native English-speaking MOOC participants interact with a newly adopted FAQ chatbot?

    1. 1-1.

      What aspects of the interactions were similar between the non-native and native English users?

    2. 1-2.

      What aspects of the interactions with the chatbot were different for non-native English users compared to their native counterparts?

  2. 2.

    Why do non-native English speaking MOOC participants perceive their chatbot interactions differently from their native English-speaking peers?

  3. 3.

    What can we learn from the findings regarding FAQ chatbot response design to inform future design and help address possible barriers for non-native English users in MOOCs?

Theoretical Frameworks

Multimodal Computer-Mediated Communication and Conversation Analysis

To understand multimodal CMC comprehensively, it is necessary to first comprehend computer-mediated discourse analysis (CMDA), one of the paradigms employed to research online user behaviors within systems mediated by computer programs. According to Herring (p. 2, 2004), what differentiates CMDA from other approaches is the “analysis of logs of verbal interactions (characters, words, utterances, messages, exchanges, threads, archives, etc.),” which focuses on textual computer-mediated communication. This approach interprets online user behaviors based on observations related to language and its uses from a linguistic perspective. From a methodological standpoint, this approach reflects previous studies in spoken and written languages, such as conversation analysis, interactional sociolinguistics, pragmatics, text analysis, and critical analysis (Herring, 2004). Herring (2019) recently extended her conceptualization of CMDA with multimodal CMC to include non-textual communications such as emojis, image memes, avatars, and robots. With this approach, CMDA researchers are equipped with the required tools to analyze multimodal and convergent computer-mediated communications utilizing the same CMDA paradigm because any computer-mediated discourse, regardless of the kind of mediating technology, can be analyzed by structure (e.g., orthography, sentence structure), pragmatic meaning (e.g., meanings of words, larger functional meaning units such as macrosegments), interactional properties (e.g., topic development, negotiating manner), and the underpinning social behaviors (e.g., play, group membership) that CMDA majorly focuses on (Herring, 2019).

In online learning studies, CMDA has been repeatedly used in analyzing the relationship between online discussions and students’ learning performance. In most cases, the research focused on understanding online interactions by examining forum discussions, a feature of online courses (Joksimovic et al., 2014; Kovanović et al., 2016; Yoo & Kim, 2014; Zhu et al., 2019). These studies analyzed human–human interactions through system-generated log data of online platforms such as learning management systems (LMS), from a linguistic perspective. Additionally, researchers have taken advantage of CMDA to analyze human–computer interactions based on chatbot log data. For example, Wang and Petrina (2013) used CMDA to analyze the chatbot log data—text only—between language learners and the chatbot ‘Lucy’ to ensure that design changes to the chatbot improved language acquisition among learners. As a result, the authors identified learning activity patterns in the logs that matched authentic learning and provided multiple suggestions for improving a language learning chatbot’s performance. However, thus far, there has been little research that includes non-text-based content in analysis using the multimodal CMC approach despite chatbot technology advances in producing a wider range of multimodal content (e.g., symbols such as the smiley face, images, and emojis) besides text.

To address this research gap, we examined students’ interactions with an FAQ chatbot based on the chatbot log data and participant self-reports using the multimodal CMC approach. Adopting this approach of analysis allowed us to benefit from three aspects: (a) focusing on the phenomenon of interest primarily before selecting a specific discourse analysis paradigm inductively (Herring, 2004), (b) enabling systematic comparisons across the data we examined (including non-textual forms) with (c) references to students’ thoughts behind the discourse shown on record. After identifying our major interest in this study as non-native English users’ interactions with the FAQ chatbot, we realized that the examination of this phenomenon required us to avoid any preformulated theoretical or conceptual categories. Therefore, we kept an open mind and showed a willingness to be guided by the phenomenon. In this way, we discovered our interest in the study participants’ attitudes toward the chatbot and patterns of interactions. This realization led us to select conversation analysis (Psathas, 1995) as an apt methodological technique for this study. We elaborate on our use of this technique under the “Analysis” subheading of the “Methods” section.

Although the analytical methods of multimodal CMC are fundamentally the same as those of CMDA, and the chatbot log data in this study were lacking in non-textual data, we decided to use the multimodal CMC approach to incorporate graphical responses (i.e., smiley/frowny face symbols and multiple punctuation marks expressing a specific sentiment) created by the study participants. This decision emerged from considering recent chatbot technology that has advanced to produce more diverse content modes (e.g., audio, video, Giphy) seamlessly. We anticipate the demand for analyzing non-textual interactions between humans and computers will inevitably increase over time, and multimodal CMC is equipped to fulfill these demands.

Interactional Sociolinguistics

Interactional sociolinguistics is concerned with the role of culture in shaping and interpreting interactions between people (Gumperz, 1982; Tannen, 1993). Rather than uncovering or predicting patterns, it focuses on the event(s)—a sequence of interactions within a social context. Interactional sociolinguists are interested in verbal genres, discourse styles, (mis)communications, framings in conversations, and analyzing the sociocultural meanings of conversational interactions. They focus on ‘contextualization cues’ (Gumperz, 1982) in verbal interactions, which sometimes are interpreted differently from their literal meaning. Gumperz (2005) explained that participants reasoning out contextualization cues during the conversation by operating conversational inferences, such as fast turn-taking in conversational interactions, indicates the participant’s impatient or enthusiastic mental status. Therefore, each contextualization cue can entail more than one meaning in different cultures and/or contexts. Successful interactions depend on the mutual understanding of the intent of a contextualization cue. Investigating whether both conversational parties have reached a mutual agreement on a cue reveals crucial characteristics of the interaction.

Furthermore, people use conversational structures to engage in interactions and convey extra meaning via contextualization cues (Bennett, 2018). That said, contextualization cues are used systematically to accomplish interactional goals in conversation, and Tannen (2005) termed the way people utilize them as ‘conversational style.’ Tannen (2005) posits that conversational style is an inherent characteristic of how people talk because one speaks in a manner influenced by multiple contextualization cues. In human–chatbot perception studies, a growing body of researchers has investigated the relationship between chatbots’ conversational skills and users’ perceptions. For example, Schuetzler et al. (2020) reported, improving the conversational skills of a chatbot impacts user perceptions of the agent positively, and Lee et al. (2020) demonstrated the positive relationship between user perceptions and social cues integrated into a chatbot’s language in creating a desirable chatbot experience. Overall, these researchers’ findings suggest that a chatbot’s conversational style influences users’ perceptions of chatbots, thus impacting user behaviors.

Similar to other NLP-based chatbots, the chatbot in this study mimicked human conversations. As a result, this framework allowed us to identify ‘contextualization cues’ (Gumperz, 1982) and the type of communicative intent each cue was received and perceived by both the student and the chatbot in the study. Strictly speaking, the information was received and “programmed to be perceived” by the chatbot; nevertheless, users viewed the chatbot as “perceiving” their cues in certain ways when treating the chatbot as a humanlike interlocutor. By detecting correctly and incorrectly perceived contextualization cues in the dataset, we examined why non-native English users perceived their interactions with the chatbot differently from their native English classmates. We also examined whether a study participant’s conversational style matched the chatbot’s based on the chatbot logs and self-reports.

Moreover, we paid special attention to the fact that some students did not perceive their interactions with the chatbot as human–human. In such cases, interactional sociolinguistics was less useful because this approach assumes interactions occur between human (or humanlike) entities. Therefore, we adopted positioning theory to investigate the internal processes that transpired when students interacted with the chatbot, focusing on their ‘positionings’ as described in the subsequent section.

Positioning Theory

Davies and Harré (1990) explored the concept of ‘positioning’ to facilitate the dynamic aspects of linguistic thinking while preventing limitations arising from the concept termed ‘role’ in sociolinguistic analysis. While ‘role’ represents static and formal elements in social psychology, Davies and Harré (1990) proposed ‘positioning’ and theorized their conceptualization as positioning theory focusing on people’s use of words and discourse types when they position themselves and/or others and its impact on the ‘selves’ of individuals (Harré & Van Langenhove, 1999).

Harré and Van Langenhove (1999) conceptualized the self as divided into three main aspects: (a) an embodied self, (b) an autobiographical self, and (c) a social self. While the embodied self and autobiographical self depict how an individual perceives his/her identity, the social self concerns itself with situations and how an individual responds to others in specific contexts. They posit people change their positions due to the dynamic features of these selves, especially the social self. Therefore, they argue that examining conversational practices may reveal people’s multiple selfhoods and the dynamics of their interactions (Harré & Van Langenhove, 1999).

We employed positioning theory to examine the influence of non-native English users’ multiple selfhoods when interacting with the chatbot in this study. This theory mainly helped us analyze the students’ self-reports, focusing more on their internal processes and how their fluid identity (composed of three different selves) impacted their interactions with the chatbot, as shown in the chatbot log data. Subsequently, positioning theory helped us highlight the students’ lived experience-based perceptions from a holistic standpoint with interactional sociolinguistics.

Literature Review

Non-Native English Users’ MOOC Experiences

Thus far, only a few studies have homed in on non-native English users’ MOOC experiences. Recognizing the lack of empirical research on this population in MOOCs, Cho and Byun (2017) investigated non-native English users’ online learning experiences. They identified certain themes that existed in their perceptions about MOOCs, including (a) wonder and interest, (b) novel learning and teaching practices, (c) preference for video style, (d) useful learning strategies, (e) motivation to learn, and (f) need for face-to-face interaction. Since more non-native English users take MOOCs offered in English than native English users (Jordan, 2014), the authors argue that more attention needs to be paid to emerging bi/multilingual students in MOOCs.

Likewise, Duru et al. (2019) examined the MOOC experiences of students for whom English is a second language. They investigated the factors that predicted a student’s course completion. They found that engagement in discussions at the end of the first week was one of the strongest predictive factors. Their research showed that non-native English users who were less active in engaging with the courses were less likely to complete MOOCs. This finding supports the research by Cho and Byun (2017) in identifying that non-native English users’ MOOC experiences are different from those of their native English peers. However, Duru et al. (2019) emphasized that MOOCs could be perceived as inequitable learning environments for non-native English users because of language barriers. Barber (2013) and Tahirsylaj et al. (2018) also revealed that varied linguistic backgrounds create barriers for non-native English speakers to collaborate, thus, hindering them from participating in course communities. Regarding individual learning, Sanchez-Gordon and Luján-Mora (2015) pointed out that as non-native speakers read at slower speeds than native English users, they may encounter cognitive overload and other cognitive problems more frequently.

Besides language barriers, there are certain cultural barriers non-native speakers face when participating in MOOCs. Previous literature indicates that a mismatch between the learners’ culture and the culture presented in MOOC videos could be a barrier that impacts students’ learning performance (Bayeck & Choi, 2018). Instructional videos tend to be embedded with the culture within which the course provider resides. For instance, students from collectivist cultures might have different expectations of the course video than their peers from individualistic cultures. These differing perceptions, resulting from distinct cultures, can create gaps in a learner’s understanding of the video material, thus, ultimately impacting students’ satisfaction and performance in MOOCs. The absence of cultural ties and the geographic distance may also make it harder to develop and sustain a sense of community and participation within a MOOC (Colas et al., 2016), constituting another layer of cultural barriers non-native learners have to deal with.

It is not only important to gain an in-depth understanding of how native language factors influence non-native English users’ MOOC experiences, but we must also understand whether new technology adoption in general, including chatbots, creates or mitigates barriers for non-native English users, an area with limited research. Therefore, this study examined how non-native English users experienced an FAQ chatbot differently compared to their native English user classmates. Considering the previous finding, suggesting this population experiences more challenges (Han & Lee, 2022), we also explored strategies to improve non-native English users’ experience with the chatbot to promote an inclusive learning environment in MOOCs.

Student Perceptions When Using Educational Chatbots

It is important to understand student perceptions of educational chatbots because chatbot usage is expanding and playing a more crucial role in students’ online learning experiences. To achieve this goal, researchers have investigated interactions between FAQ chatbots and students in online courses (Goel & Polepeddi, 2019; Sandoval, 2018), chatbots’ effectiveness as tutors (Aleven & Koedinger, 2002; Holstein et al., 2019), as well as community-facing chatbots’ functions when building a social presence in courses (Wang et al., 2020; Winkler et al., 2020). These studies have chronicled chatbots’ efficiency in promoting learning outcomes with students’ overall positive perceptions of chatbots.

On the contrary, there are also negative perceptions regarding educational chatbots that prevent active adoption. Ji and Yuan (2022) reported students’ privacy concerns when providing personal information to a tutoring chatbot. The authors identified that this concern conflicts with the students’ other demands, particularly wanting a chatbot to provide more intelligent and personalized responses in a certain learning context. To address these conflicting goals, some researchers argued the necessity of frameworks or conceptual models when using or building chatbots. For example, Vilaza and McCashin (2021) proposed an ethical framework emphasizing the importance of explicability and autonomy, which refer to a user’s ability to act and make choices independently. The researchers argue that chatbot features and outcomes should be transparent and understandable to all stakeholders. This framework resonates with a conceptual model proposed by Murtarelli et al. (2021), which addresses the issue of information asymmetry. They argued that all the user information collected by a designer or educator should also be available to the user. They highlighted that all the conversational environment features and rules should be clearly stated and understandable to users. Other than privacy-related concerns, some researchers also pointed out language barriers as one of the crucial issues that could negatively impact students’ perception of using an educational chatbot. They argued that chatbots could be intelligent helpers to enhance equity in learning when they were utilized to assist students’ diverse linguistic needs (D’Silva et al., 2020; Gupta & Chen, 2022).

However, students’ perceptions of educational chatbots and the extent of their concern remain largely unknown (Wang et al., 2021a). As students’ self-reports could be in variance with real chatbot uses, it is necessary to examine students’ archived log data and compare them with self-reports when validating real uses. In this way, researchers may identify students’ perceptions of chatbots and concerns to further the goal of addressing relevant pain points more precisely.

Methods

Research Design

A multiple-case study design was adopted to understand the interactions between students and the chatbot. As Bhattacharya (2017) explained, selected cases in a multiple case study are “representative of the issue under investigation and information-rich sources” (p. 110, Bhattacharya, 2017). Therefore, researchers engaging in this type of study report on individual cases and cross-case findings. In this way, researchers gain analytical insights into their research topics by comparing and contrasting multiple cases. Following general case study strategies, this research design also promotes researchers to examine data in their natural context in an open-ended way, explicitly avoiding tunnel vision while making use of comparisons to describe and explain complex phenomena aligned with the study’s interest (Verschuren, 2003).

This multiple-case study compares two different interaction cases with the chatbot—non-native English users vs. native English users. Following the multiple-case study procedure by Yin (2017), this study comprised the stages of theory development, case selection, and design (selection) of data type. The theory development process of this study was based on knowledge gained from a previous study reporting significant differences in the experiences of students representing the two cases (Han & Lee, 2022). Next, we analyzed the two cases and noted the similarities and differences. We drew cross-case conclusions and developed implications for improved chatbot design in the final phase. Our study process, adapted from Yin (2017), is illustrated in Fig. 1.

Fig. 1
figure 1

Multiple-case study process: adapted from Yin (2017). Note. The bolded headings indicate the terms utilized by Yin (2017). Each parenthesis describes our actions per phase/step. The dashed line represents the feedback loop we followed. As shown here, we iteratively examined each case and checked if we needed to change our study protocol

Before analyzing the data corresponding to the two different sources, we reconstructed the data to be holistic and descriptive by adding the participants’ open-ended responses and native language markers (native speaker or not) to their individual chatbot log data before categorizing them. The two cases were bounded in two MOOCs on health journalism and newsletter strategies for monetization purposes offered in 2021 by an online professional development center for journalists in the Southwestern United States. We selected the two courses to recruit study participants for two reasons. Firstly, these were typical courses provided by the research site in terms of the course duration (4 weeks), the number of course participants (2,000 on average), the course activities (instructional videos, reading materials, and forum discussion), and the number of unique countries represented by course participants for courses provided in English (140 on average). Secondly, these were the courses provided in English during our research period. As MOOCs often feature many inactive students, we shared voluntary study invitations only with active participants (those who interacted with the course materials more than once) identified by the LMS log. We asked them to follow a survey link and fill it out after interacting with the FAQ chatbot to answer the three questions provided. We also encouraged participants to ask their own questions regarding course logistics and we used these to gather additional data points about students’ unmet needs. According to the access IP address record, all study participants used the chatbot once, except for two people. Table 1 presents the course information and the number of recruited participants per course. Table 2 depicts the participants’ native language distribution.

Table 1 Course information and number of recruited participants per course
Table 2 Distribution of the study participants’ native languages

Data Sources

The data were collected from a user-testing-based survey and Dialogflow Essentials’ (ES) chatbot log data. The Qualtrics survey asked the MOOC students to respond to three questions after interacting with the Dialogflow ES-based chatbot, a natural language understanding platform embedded on a course webpage (see Fig. 2). This chatbot mainly operated the research site’s FAQ webpage, using course-related logistics content as the training set. Additionally, it had limited capability to respond to casual conversations based on a built-in small-talk package (Dialogflow Essentials documentation, n.d.). This chatbot could converse in English only.

Fig. 2
figure 2

Webpage embedded with the chatbot (left) and its interaction demo (right)

During the survey, the students were asked to find answers to three questions, about course activities, ways to receive certificates, and class meeting times, by interacting with the chatbot. After the students input the answers on the survey form, they were also asked to score their challenge levels (from one to five) and elaborate on the challenges they experienced while using the chatbot in the optional open-ended response field of the survey. The students’ open-ended responses describing their challenges were incorporated into the data analysis of this study. The students’ chatbot log data were matched with their Qualtrics survey data using IP addresses and timestamps as identifiers. Because responding to the open-ended survey question was optional, some chatbot logs had no corresponding survey responses.

Analysis

The raw data from the Dialogflow ES chatbot and Qualtrics platforms went through three phases before analysis: (a) data collection, (b) selection and cleaning, and (c) matching and coding. The JSON log files from the Dialogflow ES platform were converted into text files in the data collection phase, and all user interactions were sorted chronologically. In the selection and cleaning phase, all computer program codes were removed so that only text and visual signs remained. From the Qualtrics platform, the students’ self-reported native language information and their open-ended responses describing their interactions with the chatbot were collected, including each student’s response timestamp and IP address.

During the matching and coding phases, the Nginx access logs of the webpage embedded with the chatbot were utilized. The IP addresses and timestamps from the logs served as identifiers to match the chatbot log data with the Qualtrics survey responses. After the data were matched, all personal identifiers were removed from the dataset. The selected survey response data were added to the chatbot log text files using RQDA 0.2–8 (Huang, 2016), a computer-assisted qualitative data analysis program. In addition to archiving the students’ open-ended survey responses, RQDA 0.2–8 was also used to code (a) students’ attitudes toward the chatbot, (b) their interaction styles, and (c) native language markers on each student’s text log file. We used a binary system to code the native language markers: EN (native English user) and nonEN (non-native English user). In some cases, students’ native language and symbol uses were also coded as available. Regarding the coding process, we focused on the phenomenon of interest―student to chatbot interactions―emerging from the chatbot log data utilizing conversation analysis (Psathas, 1995) guided by the CMDA paradigm (Herring, 2004).

We adopted the conversation analysis technique to improve systematic comparisons across the two cases in this study. We focused on turn-taking, sequences, and topic development in the interactions (Psathas, 1995). This action enabled us to identify students’ attitudes/manners toward the chatbot and their corresponding definitions. Following an iterative and parallel examination across the research team, we created a codebook of attitudes toward the chatbot (see Table 3) and a codebook of interaction manners (see Table 4).

Table 3 Codebook of attitudes toward the chatbot
Table 4 Codebook of interaction manners

We subsequently coded 32 text files containing each student’s chatbot log data with their native language markers and open-ended responses based on Tables 1 and 2. We coded ten text files without corresponding open-ended responses. The students were grouped as non-native English users (n = 27) and native English users (n = 15) so that each group could represent a separate case in this multiple-case study.

At least two researchers coded each student’s text file. We discussed the differences between researchers’ codes until we reached a unanimous agreement. While doing so, we went through several peer debriefings to enhance the trustworthiness of this qualitative study. We coded students’ attitudes toward the chatbot as either ‘human’ or ‘nonhuman’ when they consistently maintained their attitudes toward the chatbot for the entire interaction. On the other hand, we coded students whose interactions with the chatbot fell in both categories as “between.” To broaden our perspective and to analyze the data in an open-ended way, we also focused on each student’s response to the chatbot’s output in each turn-taking by investigating their manner of interaction in detail. By coding each student’s data on two aspects—overall attitude toward the chatbot and the interaction manner at each turn-taking—we delineated each student’s interaction with the chatbot to explore significant features of the two cases. Figure 3 shows an RQDA coding example using the codes in Tables 3 and 4.

Fig. 3
figure 3

RQDA coding example. Note. This snapshot forms the first half of a coded file. The ‘ < 3_langauge_nonEN > ’ and ‘ < 4_open_It seems not to give me response that I expect > ’ codes indicate that the student,⁠ randomized id number 15,⁠ is a non-native English user (the first code) and left a comment (“It seems not to give …”) elaborating on his challenge in using the chatbot (the second code). Some numbers were added to the codes for the convenience of collaborative coding and communication between multiple coders. The ‘ < between > ’ code indicates this student’s ambivalent attitude toward the chatbot through switches between humanlike and nonhuman perception, observed during the entire interaction process with the chatbot. Table 4 explains the meanings of other codes used in this example

Results

While our research focus was on how non-native English-speaking students’ interactions were differentiated from their native peers, our open-ended research methodology allowed us to reach meaningful findings across the two cases regarding similarities as well. We found that students’ attitudes and interaction manners were similar between the two cases by and large, with a few exceptions. We also observed students’ unrealistic expectations of the chatbot fairly often in both cases. To seek possible explanations for the differences observed in this study, we employed the positioning theory and interactional sociolinguistics perspective within the study context. Lastly, based on the comparison and contrast between the two cases, we provided implications for future chatbot response design to reduce barriers for non-native English users in MOOCs.

Similarities in Students’ Mixed Attitudes and Interaction Manners

The analysis of the chatbot log data showed that the students in the study treated the chatbot in three different ways: humanlike being, nonhuman, and somewhat in-between human and nonhuman. When they treated the chatbot as a humanlike being, their interaction behaviors were characterized in two ways: (a) incorporating social norms evidenced in human conversations such as using conversation-starting/ending phrases, words expressing gratitude, and other customary responses to the chatbot’s prior outputs and (b) employing paraphrasing when conducting user-testing. Contrastingly, students who treated the chatbot as a nonhuman either took (a) a neutral stance throughout, not showing any sentiments during their interactions with the chatbot, and (b) a dominating stance focused on testing the chatbot’s performance, especially its limitations by reporting the chatbot’s incorrect responses as errors during their interactions. Notably, however, 30.95% of the students (n = 13) treated the chatbot as an interlocutor on a spectrum between human and nonhuman. Some of these students initiated interactions with the chatbot without using typical social norms evidenced in human conversations but later rephrased the given user-testing questions. Others began the conversation using social norms (by saying hello); however, they continued their interactions with keyword-based entries and ended their conversations by showing gratitude for the chatbot’s service in this category.

After excluding those students with insufficient log data to determine their attitudes, the analysis comparing the attitudes of native and non-native English users’ toward the chatbot revealed that a similar ratio of students in both cases treated the chatbot as a humanlike being, in-between, or nonhuman (see Table 5). Therefore, we concluded no significant differences between the two cases regarding attitudes toward the chatbot.

Table 5 Number of various students’ attitudes observed between the two cases

Although there was no significant difference between the students’ attitudes toward the chatbot regardless of their native/non-native status, we witnessed some prevailing characteristics of human–chatbot interactions, distinct from regular human–human or human–nonhuman interactions. For example, although some students in the current study distinctly treated the chatbot as a humanlike being, a substantial number (35.71% of the total, n = 15) viewed the chatbot as a nonhuman object. Among these students, more than half (19.04% of the total, n = 8) adopted a dominating stance toward the chatbot, focused on detecting its errors. This finding aligns with previous studies about the so-called ‘uncanny valley effect’ (Mori, 1970). Mori predicted that a person would react negatively to a humanlike robot, even revoking it when the robot acted like a human. He called this negative sentiment toward a humanoid robot an uncanny valley effect and focused on how people responded to the defects of these nonhuman beings. He discovered people tend to be highly sensitive in detecting flaws when the uncanny valley effect occurs.

The data we gathered from the open-ended survey question implies that more students who felt this uncanny valley effect among the group treated the chatbot as a being somewhat between human and nonhuman (30.92% of the total, n = 13). For example, one student who showed this ambiguous attitude toward the chatbot stated, “I believe it’s extremely strange to have a bot trying to interact as if it was human. A bot is not a person and should not act or be treated like that. For example, if I call a bot name[s], it can’t be offended. As a bot has no gender, I could call it “b–– (vulgarity)” or f––- (vulgarity)” and it should not feel uncomfortable… some people don’t like to “talk” to bots.” This student’s disgust toward chatbots also aligns with Luo et al. (2019), who reported that people usually perceive a chatbot as less knowledgeable and less empathetic than a human agent.

The finding seem to provide insights into how people tend to respond to an NLP-based chatbot, a kind of humanoid robot, when they perceive it as nonhuman. However, more importantly, this study’s findings indicate that several people perceive bots as humanlike beings or as being somewhat between humans and nonhumans. This finding implies that human–chatbot interactions may be fundamentally distinct from human–human or nonhuman interactions.

We also observed that similar interaction manners emerged between the two cases. Among the 17 codes we utilized in the interaction manner coding process, more than half were used at a comparable proportion, with a few exceptions. In Table 6, the codes (A-D) represent the students’ use context when conducting user-testing assignments. The interaction manners between the two cases here were similar based on the rates of occurrences. However, we observed different occurrence rates per case in some interaction manner categories when the students were in ‘inquiry’ and ‘response’ user contexts. To name a few, we found that the rates for asking new questions related to courses in full sentence form and showing no sentiment about the chatbot’s previous outputs were similar between the two cases (Codes E and M). However, we noticed nine non-native English users asked new questions irrelevant to courses in a full sentence form when only one native did the same (Code F). Also, it seemed non-native English users rephrased their questions more when their prior non-course related questions failed to produce desired responses (Code I). In the later parts of the result section, we elaborated on how the other different aspects of interaction manners (Code G, J, Q) help us understand the non-native English users’ experience with the chatbot and produce implications on chatbot response design.

Table 6 Number of various students’ interaction manners observed between the two cases

Similarities in Students’ Unrealistic Expectations for the Chatbot

Based on their current courses, most students wanted to retrieve personalized information from the chatbot, such as their course progress, classmates’ nationalities, and course recommendations (e.g., what is my recent course? how many other students are taking this course with me?). Thus, we found no difference between the two cases. However, these expectations were beyond the chatbot’s capability. It was mainly designed to respond to FAQs regarding course logistics based on the research site’s knowledge base accumulated over ten years of operation; the chatbot database was not connected to any user database.

Our observation of the students’ high expectations of the chatbot seems to keep with what a considerable body of prior research suggests; people often set unrealistically high expectations of conversational agents (Luger & Sellen, 2016; Wang et al., 2021b; Zamora, 2017). Regardless of their native language, most of the students attempted to retrieve personalized information about the courses they were taking from the chatbot in this study. These attempts were made despite it being clearly described as an FAQ chatbot capable of answering questions about course logistics rather than a personal assistant chatbot connected to the LMS user database. Moreover, we witnessed that students who asked unanswerable questions ended their interactions with the chatbot rather quickly upon multiple conversation failures, which aligns with what Liao et al. (2018) and Zamora (2017) reported in their studies.

Similarities in Students’ Privacy and Security Concerns and their High Expectations for the Chatbot

Across the two cases, some students expressed privacy and security concerns via the open-ended response field. For example, when the chatbot asked for names when it initiated conversations (i.e., May I have your name?), one non-native English user responded, “I didn’t want to use my real name when I was talking to the bot.” Notably, no indications of this concern were recorded on her chatbot log data. Similarly, one native student responded, “…asking personal information should not be mandatory unless really necessary” in the survey; however, the chatbot log did not contain any errors during the interaction.

These students’ security concerns seem highly related to existing resistance to AI technologies. For instance, one native English user responded, “It was fine for limited use but not as helpful as having a real person answer more difficult questions or questions that have not been programmed for the chatbot…I have never met a chatbot I preferred over a person, and I don’t feel safe to use them most of the time.” This same student posed several personalized questions to the chatbot while feeling uncertain about the chatbot’s performance and security concerns (see Fig. 4). This observation seems to align with chatbot users’ conflicting demands where they would like to have a highly personalized chatbot without providing it with personal information (Ji & Yuan, 2022). Although multiple attempts to retrieve responses failed (not surprising, considering the chatbot’s limitations), the student persisted in testing the chatbot to receive personalized answers, according to the chatbot log data. Similarly, one non-native English user responded, “…AI is not ready to answer all the possible questions not to mention you don’t have proper security measures…” The chatbot log revealed this student asked one personalized question in addition to the user-testing questions: “Why did my dashboard progress bar stay at 70% when I finished the course?”.

Fig. 4
figure 4

Example of asking personalized questions

Non-Native English Users’ Experiences Viewed from Positioning Theory: Increased Focus on Personal and Chatbot Identity

Davies and Harré (1990) state that people are variously positioned in different conversations. Therefore, investigating discursive practices can reveal multiple selfhoods. According to them, people understand and express their experiences through the categories available to them in the current discourse. Connecting this concept to the non-native English users’ experiences in this study, we posit that the students’ development of self-perception began with establishing categories that included non-native English users like themselves while excluding native English users. People use such categories in various conversational situations, including in MOOCs, and the positioning experiences that form daily casual interactions make them recognize themselves as “one of the dichotomous categories and not one of [the] others” (p. 47, Davies & Harré, 1990). This acknowledgment brings about a feeling of commitment and membership towards this group (for instance, a non-native English user). As a result, this self-embodiment of who they are as a group appears in their language uses, such as in this response from the survey result: “Since [I] am not a native English speaker, I found it a bit hard to communicate with a chatbot.”

While Davies and Harré (1990) emphasize a person’s diverse positionings in various situations, they also recognize that individuals possess continuous personal identities, important to their selfhood. They mentioned the very same person experiences and displays what that person perceives in the continuity of a multiplicity of selves (p. 47, Davies & Harré, 1990), acknowledging the interactive nature of identity establishment between agency and environment. Thus sometimes, a person may “have been positioned” by another speaker (p.48, Davies & Harré, 1990) to be self-reflective about their continuous personal identity with their fluid personal diversity in a conversation.

The current study found that non-native English users asked more questions about their own identities and the chatbot’s identity than their native English-speaking counterparts. For example, Fig. 5 provides a chronological snapshot of a non-native English user’s interaction with the chatbot. This user expressed extraordinary interest in the chatbot’s identity and capabilities and posed a question related to his native language. Five additional non-native English users also inquired about the chatbot’s identity during their interactions. Contrastingly, none of the native English users showed any interest (see Code J in Table 6). Connecting this finding with Davies and Harré’s (1990), we may infer that non-native English users may “have been positioned” to emphasize their continuous personal identity (e.g., nationality, native language, name) in addition to their contextual diversity in their conversation with the chatbot. Therefore, the chatbot’s responses could have implicitly contributed to (or enforced) the users’ positioning as non-native English users, not impacting native English users’ positioning.

Fig. 5
figure 5

Example of a non-native english user–chatbot interaction: interest in the chatbot’s identity related to the student’s identity

Non-Native English Users’ Experiences Viewed from the Interactional Sociolinguistics Perspective: Misunderstood Contextualization Cues and Polite Conversational Style

We found a decent number of contextualization cues indicating non-native English users meant the opposite of their literal remarks; however, the chatbot understood them literally and produced an undesired response. Figure 6 depicts an example of such instances. In this example, the student’s intent behind the word ‘sure’ at the end of the interaction may be interpreted as sarcasm. The chatbot apparently deciphered the word literally and responded to the student accordingly. Although such misunderstandings occurred with native students as well, a large number of non-native English users perceived the chatbot’s inaccurate interpretations as the result of their lack of mastery of English, as evidenced by their open-ended responses (e.g., My English is not good enough; Chatbot didn’t understand my short English; It was hard to use it because my mother tongue is not English).

Fig. 6
figure 6

Example of a misunderstood contextualization cue

Additionally, we observed a mismatch of conversational styles between non-native English users and the chatbot in some instances. Notable findings presented here comprise five non-native English user-chatbot interactions that we categorized under the polite conversational style due to the rich presence of words or phrases associated with politeness (e.g., please; excuse me; would you). Figure 7 portrays an example of a non-native English user’s polite manner in contrast to the chatbot’s neutral conversing style. Although the chatbot’s responses were error-free, other than the opening where it failed to recognize a misspelling, the student described the interaction as challenging because she felt the chatbot responded too much like a robot. Conversely, native students’ interactions with the chatbot showed no evidence of a polite conversational style; they displayed either a casual or neutral interaction style. Hence, we posit that such differences in conversational styles between students and chatbots could negatively influence students’ perceptions of chatbot interactions, at least for non-native English users.

Fig. 7
figure 7

Example of a non-native english user’s polite conversational style

FAQ Chatbot Response Design Strategies

During our examination of previous RQs, we discovered the importance of identifying and addressing non-native English users’ positioning during interactions by designing appropriate chatbot responses. For example, one of the students communicated, “I think this chatbot is not prepare[d] for those who do not speak English as a mother tongue, which is a shame.” In this individual’s chatbot log data, we observed one instance wherein the chatbot provided an inadequate response attributed to its limitations rather than the students’ language use. Yet, the student expressed a negative sentiment and immediately attributed it to his identity as a member of the non-native English user category. Similarly, but more nuanced, another student responded, “I think it’s important that chatbot can answer in several languages or, at least, in English, Portuguese and Spanish.” This comment revealed his identity as a non-native English user and hinted at his dissatisfaction with the lack of options regarding the language of communication. The chatbot log data showed questions in his native language⁠, Portuguese (e.g., quais são as atividades do curso), despite the course itself and all course activities (i.e., quiz, forum discussion) being in English. These examples indicate strong identification with their status as non-native English users. Further, they reveal how this identity influences students’ perceptions, with students attributing situations arising from aspects like the chatbot’s technical shortcomings to their language proficiency.

Given such findings, we see three implications for chatbot response design. Firstly, it is important to inform users, as early as possible, how a chatbot processes the information they input. According to Dialogflow ES documentation (n.d.), chatbots created using the natural language understanding platform are programmed to address specific intended conversational situations based on certain contexts. Powered by machine learning techniques, Dialogflow ES-based chatbots improve their performance over time by continuously accumulating training sets created by users; the same process applies to any NLP-based chatbot. This training process ultimately ensures that users’ mistakes in spelling and grammar do not hinder the chatbots’ ability to detect user intent (Boonstra, 2021). Hence, when these chatbots are first deployed, they are insufficiently trained due to the limited amount of training set data. However, during this early implementation phase, chatbot errors occur equally for all users if the initial training set is based on real-world usage. The chatbot employed in this study is no different. This chatbot’s initial training set came from FAQ webpage content that the online course provider composed based on real students’ questions and answers gathered over multiple years of operation. In other words, chatbot errors were not the result of non-native English users’ possible lack of linguistic competence in English; nevertheless, users still perceived their limited English competence as a barrier. It is necessary to inform non-native English users that a chatbot is a nonhuman object incapable of identifying whether a user is a native speaker or not.

Additionally, it is important to reveal to users that a chatbot tries to identify the most appropriate intent of any input from among the limited number of options it has been trained to respond to. It is also necessary to inform users that a chatbot, as a nonhuman self-learning software, must attain sufficient human-generated training sets before the errors the chatbot makes with humans ultimately decrease and performance improves over time. This information regarding a chatbot’s nonhuman-object characteristics might help non-native English users position themselves as humans rather than non-native English users when interacting with a chatbot. This shift in perspective—viewing chatbots as nonhuman beings—might help them view the chatbot’s incorrect responses to their queries as machine errors rather than a result of their linguistic shortcomings, which could contribute to lowering possible barriers when using chatbots.

Secondly, we observed chatbot limitations in detecting contextualization cues, which aggravated non-native English users’ interaction with the chatbot. Although this chatbot limitation affected native students as well, the misdetection of contextualization cues seemed to impact non-native English users more adversely because of positioning effects. Chatbot response design should prioritize the reduction of these misdetections. It would be ideal if chatbots could be trained using numerous training sets from the start; however, this is not always feasible. Instead, informing students of the availability of keyword-based question and answer formats could reduce the impact of misdetections of contextualization cues while the chatbot continues to improve its performance by gaining training sets over time. Moreover, Table 6 revealed that ten non-native English users input keyword-based queries (e.g., “enrollment” instead of “how do I enroll in a course”) while only one native English user did the same (Code G); they expressed more frustration with responses such as “bot fail” or “chatbot error” when their keyword-based queries failed to produce the desired information (Code Q). Enhancing a chatbot’s keyword-based question and answer capability would also lower non-native English users’ cognitive efforts in translating queries into English, potentially decreasing barriers to using chatbots.

Lastly, training chatbots to match students’ conversational styles could positively impact students’ sentiments toward chatbots, thus, decreasing barriers to using them. The style mismatch examples in the chatbot log data and the non-native English users’ open-ended responses implying negative feelings towards chatbots in this study align with previous research, revealing that chatbots trained to match users’ conversational styles could enhance user experiences (e.g., Shumanov & Johnson, 2021). However, such changes would hinge on access to more information on user characteristics from other sources because chatbot log data from normal usage would be insufficient in enabling chatbots to match user styles appropriately, especially in the initial adoption phase. As this kind of data collection at scale sometimes raises ethical concerns (Richards & King, 2014; Richterich, 2018), producing chatbot responses following a user’s communication style should be pursued only when there is a clear social consensus on collecting and utilizing contextualized data comprising user-created information.

Discussion and Conclusion

Our findings provide detailed descriptions of the similarities and differences between the experiences of non-native and native English users’ interactions with the chatbot. Centering on the non-native English users’ experiences, we suggest some chatbot response design strategies to reduce possible negative positioning effects and promote a more inclusive learning environment in MOOCs. Utilizing self-reports as the user contexts with corresponding chatbot log data, we tried to offset the opacity of chatbot log data and had some success. For example, as we addressed earlier, we found that some students who treated the chatbot as a being somewhat between human and nonhuman based on their chatbot log data might have experienced the uncanny valley effect by their self-reports. Furthermore, some students left no indication of their privacy and security-related concerns on chatbot log data; however, we found that some expressed the concerns in their self-reports. This benefit is aligned with the findings of Horn et al. (2016), which elaborated on the useful insights gained by applying multiple data sources to understand student behaviors in technology-enhanced learning environments. We gained insights that may have been neglected without self-reports (e.g., no apparent errors in log data but students expressed concerns) or incorrectly assumed without log data (e.g., student expressed an especially challenging experience but log data showed the chatbot was operating normally). By examining how students interacted with an FAQ chatbot based on their chatbot log data and self-reported perceptions, this study validated the participants’ real chatbot uses and connected them with their perceptions of the interaction. Observing privacy concerns raised by both non-native and native English users regardless of chatbot errors from our dataset, we also saw the need for data privacy and ethics, as argued by Murtarelli et al. (2021) and Vilaza and McCashin (2021).

However, this study entails some limitations. Firstly, the small sample size limits the representativeness of our findings. Although we did not intend to establish generalizability with this exploratory case study, more participants probably provided richer qualitative data to analyze, making our findings more robust for developing inclusive educational chatbots. Secondly, the chatbot’s functionality was limited to answering FAQs based on the research site’s FAQ webpage content and style at the time of the study. This limited capability could have negatively influenced students’ sentiments, despite our explanation of the FAQ chatbot’s limitations during the study participant recruitment. In future studies, we plan to reiterate this study with larger sample size and enhanced chatbot capabilities trained with more data sets from the current study and other sources.

Non-native English users make up the majority of MOOC users; however, less attention has been paid to their challenges. We hope the findings of this study can be useful to researchers and developers in designing better chatbot responses to support this group of students with better usability and thus, promote a more inclusive learning environment with new technologies in MOOCs.