Keywords

1 Introduction

Learner corpora have several applications in language teaching such as developing second language curricula [1], language level tests [2] or interactive exercises [3]. Based on their application, these corpora can be divided between delayed pedagogical use (DPU) or immediate pedagogical use (IPU) [4]. The difference is that DPU corpora are mostly used by researchers or teachers to create new learning resources, while IPU corpora, often containing learners’ own recent texts, are more often used by the same learners to explore their typical language patterns and errors. The corpus is therefore also a digital learning resource. The latter stems from the definition of corpus as “a large, principled collection of naturally occurring examples of language stored electronically” [5], even though IPU corpora can be small in size, and often stored only in teacher’s own computer. Regardless, both IPU and DPU corpora contain digitally-born or digitized texts that can be used for language learning.

Corpus as a learning resource is a central concept in data-driven learning (DDL), where the learner becomes a language researcher, creating their own language model through exploration [6]. This applies especially to IPU corpora. However, we must not diminish the value of DPU corpora. While learner corpora can be used as a learning material on their own, they are also invaluable in creating learner grammars, language exercises with automatic feedback and improving learner language analysis tools [7]. Therefore, it makes sense to combine a DPU and IPU corpus and benefit from both [4].

Since IPU corpora are used directly in the classroom, they can already be considered as digital learning resources. Using the learner corpus as a DPU corpus and creating new exercises and learning resources from this data means that the corpus analysis environment could also contain a virtual learning environment. This would allow the system to host the learning resources and make them accessible for the learners, whether or not the resources are created from the analysis of their own texts or not.

In addition, the development of a classical DPU corpus can be slow. Taking the work of Hana et al. [8] as an example, we see that creating a corpus can take a long time because the texts are often written by hand, they need to be transcribed and annotated with relevant metadata, not to mention, this process should be done by several researchers to ensure valid results. The same can be seen with the Estonian Interlanguage Corpus (EIC), which also contains written exam texts that have to be digitized and annotated with metadata [9]. Considering that IPU corpora are created relatively fast, since the texts are already digital, it could be possible to apply similar methods to DPU corpora and have the learners supply the corpus with their own digitally-born texts.

Our goal is to create a platform for Estonian context that combines both a language corpus and a virtual learning environment. Meanwhile, the corpus can be both IPU or DPU and the virtual learning environment can benefit from the corpus and also replenish the corpus with its results. In order to create such a multi-faceted system, it is imperative to know what the learners, teachers and researchers expect from language learning and/or corpus platforms. Thus, we formulated two research questions:

  • RQ1: What do researchers, teachers, and language learners need from an online platform combining virtual learning environment and text corpora?

  • RQ2: What are the opportunities and drawbacks of such a platform based on the results of testing the digital prototype?

Section 2 discusses the potential benefits and hindrances of a language corpus and a language learning environment. Section 3 describes the methods used to design a corpus-based language learning environment. These methods are applied in Sect. 4, which first answers RQ1, defining the users needs as a starting point. RQ2 is answered through an iterative participatory design process, in which scenarios and prototypes are tested with various stakeholders. Section 5 discusses the results of the study and compares our findings with previous research. Section 6 summarizes the conclusions and provides directions for future research.

2 Corpora and Virtual Learning Environments in Language Learning

Learner corpora and virtual learning environments often exist separately. However there are examples that a corpus can still have ties to a learning environment, especially as a source of data. Write and Improve [10] uses Cambridge Learner Corpus data to provide automatic writing feedback for different tasks. In Swedish Lärka [11] exercises are automatically created for the learner, using the data from the Korp corpus. They can also be combined the other way ‒ the learning environment itself can contribute to the corpus. An example of this is IWiLL [12], where the learners’ essays and teachers’ feedback are moved directly to the Taiwan Learners’ Corpus.

Both corpora and virtual learning environments positively affect the learning process. As described before, corpora can be used as a source for data-driven learning. Virtual learning environments also have an important role in learning. According to [13] and [14], learners are more motivated to learn in a virtual learning environment. They especially enjoy the diversity and the autonomy it offers. Al-Zahrani [15] has pointed out that learners often want to immerse themselves in a language by using subtitles when watching videos, listening to radio, having conversations with other students and translating texts which they do not understand.

On the other hand, [15] has also mentioned that the lack of use of such environments could stem from insufficient technical competence. The same has been found with corpora. Several authors [16, 17] point out that while corpus has a potential as a learning resource, students need a significant amount of training to start using them and the knowledge of existing corpora can be low. In addition, curriculum design might not allow for something new to be integrated into classroom activities and teachers might need more support from others in order to implement new technology [18]. Furthermore, [19] and [20] have found that students also need different support structures when learning virtually. They often miss the social interaction and require immediate teacher feedback.

From these findings we can derive that corpora and virtual learning environments are indeed beneficial to learning, but there are still several drawbacks that need to be handled in order to fully implement them in classroom activities. To help solve the problems at hand, we turned to the users.

3 Methods

Creating a corpus-based virtual learning environment requires an interdisciplinary research approach combining language technology, educational technology, and human-computer interaction. Therefore we approached the problem with the design-based research methodology where new research knowledge is created through the design process [21]. Furthermore, we applied the improved Double Diamond model described by Santos Ordóñez et al. [22] to create an iterative process. The model is diverging and converging, where the divergence allows us to collect as much data as we can and then define key knowledge at the convergence. The iterative aspect of this model allowed us to revisit previous results and the design artifacts created based on them in order to further improve and validate what has been learned so far.

Since users’ needs and their knowledge were vital to us, we also applied different participatory design methods in our process. In participatory design, the designer works with the users to fit the design to their existing knowledge [23]. Initially we conducted a series of interviews to better understand users’ needs and a context in which the platform is going to be used. Since the environment was created in Estonian context, we chose to interview Estonian language researchers, Estonian language teachers, foreign students as well as adults and high-school students who either spoke Estonian natively or had Russian as their mother tongue. The latter were chosen to include the biggest ethnic minority group in Estonia. Based on these interviews we created personas of archetypical users. Then we moved on to write scenarios which were discussed with the stakeholders in the participatory design sessions. The prototyping involved both low-fidelity paper prototypes and a high-fidelity click-through prototype created on Figma platform, which were tested with the users. The way we adapted our process to the aforementioned Double Diamond model is described on Fig. 1.

Fig. 1.
figure 1

Double Diamond model applied to the design of a corpus-based virtual learning environment (adapted from [22]).

The design process took place from spring 2020 to spring 2021. We involved 23 interviewees, 11 participants for scenario-based design sessions, 10 users for testing paper prototypes and 5 for testing digital prototypes. All-in-all, 49 persons participated in the design study. Every interview and design session was recorded and notes were taken on missing features, important aspects and discovered problems. We used inductive qualitative analysis to come to conclusions based on the data received from the design sessions and interviews.

4 Results

In order to answer RQ1, a set of personas was developed based on the insights from the interviews. We discovered that language researchers were alike to the language teachers, since they also taught Estonian on different proficiency levels, but they had more knowledge of corpora than the latter. Hence two different personas were derived: Estonian language teacher persona with a goal to find printable worksheets and listening exercises as well as seeing their students progress and a researcher persona with a goal to analyze language use and create new learning resources.

We could also differentiate between three language learner personas. Two of them were independent learners, one of whom was a self-paced learner, more intrinsically motivated, the other a competitive learner, who found gamified solutions and comparing their learning progress with others very motivating. We also identified a third language learner persona, a high-school student with Russian mother tongue, who was not very highly motivated to learn Estonian and found teacher feedback and learning in a classroom highly important. The rest of the interviewees formed an additional persona, whose goal was to check their texts for language errors, whether using dictionaries or automatic correction tools. That last persona, mostly Estonian native, is not trying to use the platform for language learning, but rather uses it for practical reasons such as correcting their writings. This persona covers also lifelong learners.

Based on personas, their goals and problems, we created and tested scenarios, paper prototypes and digital prototypes, each containing more details and enhanced after testing sessions. We wrote 13 scenarios. The first 5 were in Estonian for teachers and researchers to assess: 1) Language teacher creates a study group in a new virtual learning environment; 2) Language teacher shares exercises in the virtual learning environment; 3) Language teacher analyzes learners results; 4) Language researcher analyzes the corpus material; 5) Language researcher creates a new corpus-based exercise. The second 5 were written for learners and they were both in English and Estonian, so that language learners at every proficiency level could participate: 6) Student joins an online study group in a new language learning environment; 7) A foreign student starts to learn Estonian on his own; 8) A foreign student is learning Estonian regularly; 9) Student submits an exercise several times; 10) Student wants to produce a text with proper use of Estonian. The last 3 scenarios, written in Estonian, were meant for Estonian natives: 11) Estonian happens upon an environment with Estonian text correction; 12) Estonian uses the text corrector to analyze her writing; 13) Estonian uses the environment regularly to improve her language use. Discussing these scenarios in participatory design sessions helped us prioritize offered features and allowed us to start creating prototypes.

The final prototype was named ELLE – Estonian Language Learning, Teaching & Research Environment. Through integrating a corpus with a virtual learning environment, it offers valuable features for language researchers, teachers or learners of any kind. As a result of the design sessions, we combined user needs based on their personas in Fig. 2.

Fig. 2.
figure 2

User needs in a corpus-based virtual learning environment.

Based on the results of the design sessions, we could derive two primary user profiles: one for teachers and researchers and the other for language learners. Since Estonians do not usually want to log in to the environment, there was no need to create a separate user interface for them ‒ everything they need can be accessed without logging in.

The ELLE platform allows analyzing corpus texts with various analysis tools. To make this analysis easier for users without a corpus analysis background, we offered the option to use a text corrector (Fig. 3) with some of the analysis features, especially the ones mentioned by different learners and Estonians as well. In addition to text analysis, teachers and researchers can create courses and new interactive exercises and share them with their own students or with the general public. Independent learners can also learn outside of a language course with public exercises or browse the collection of links to find texts or podcasts in the target language.

To motivate the learners, the environment offers several support structures such as digital badges. Learners can earn badges for uploading and publishing their writings, completing interactive exercises or being an active learner in general. For support, they can ask for help privately from their teacher, post to a forum in a language course or send messages to fellow students.

Fig. 3.
figure 3

Text corrector error analysis screen from the prototype of ELLE.

Since the researchers and teachers have common needs, they share the same user interface. That got further validated after testing the scenarios. It was found that while language teachers have not had contact with corpora before, they would also like to use some of the offered corpus analysis features. For example they would use the corpora to find authentic text examples or compare their students’ writings to similar corpus material. Therefore, it made sense to provide the same interface and options for both of these users.

Teachers and researchers can use ELLE to create new learning resources, analyze the corpus, create courses for their students and add comments to the exercises to help their students along. Not every teacher expressed the need to create new resources, but they often chose to find existing exercises from the site, based on its topic and rated quality. If however they were to create an exercise, they might occasionally want to share it with their own coworkers, keep it private or share it publicly with everyone ‒ the opinions on that matter varied. Furthermore, some teachers also mentioned the need to follow other teachers’ profiles and talk with them to get additional support either applying new techniques in class or on how to use the materials or exercises they had created.

Insights into using corpus analysis tools on the site were gained mostly from the researchers. They valued the most the error and readability analysis, but also occasionally wanted to create word lists, use keyword analysis as a tool for comparison or see concordances. As a drawback however, they did not see themselves leading their students to the corpus and its analysis tools, for fear that they get confused or learn from incorrect contexts, since we’re dealing with mostly learner corpus data. In addition, since the researchers themselves were also language teachers of different proficiency levels, they focused more on what the site offers for teachers and gave less detailed feedback on the corpus analysis tools functionality.

All of ELLEs users were also offered a chance to find additional virtual learning or corpus websites from the collection of useful links. The teachers had mentioned that it is difficult to find some learning resources, that there are many sites and they cannot remember them all. Also, it was pointed out that since the ELLE website contains a learner language corpus, teachers might want to use an Estonian native corpus for finding good example sentences. Estonian natives wanted to find links to translators and dictionaries or to a short collection of Estonian grammar or formatting rules. The foreign students wanted to find texts to read in Estonian, radio shows and podcasts to listen to as well as to explore other Estonian language learning environments and apps. Thus, the collection of links is quite extensive and leads users to various other web sites, not contributing directly to the improvement of the learner language corpus. However, if our goal is to improve Estonian language learning and teaching, it is an important aspect to consider.

For the language learners, whether they were independent learners or not, it was important to join language courses and to find public courses. They also mentioned the need to have conversations with other learners via forum or private messages so that they could feel as part of a group and get help with their learning. The learners appreciated the ability to add comments to their exercises to ask the teacher for help, but also wanted to turn to their coursemates in case their teacher does not respond fast enough.

In addition to doing exercises as part of ELLE’s courses, some students wanted to do additional exercises or find new learning resources on their own. The most popular exercise types for learners were multiple choice, flashcards, fill in the blanks or listening exercises. Longer writing tasks did not appeal to the students, however an independent learner on a higher language proficiency level noted that he would occasionally also like to write essays or other kinds of writings to polish his knowledge. Most other students preferred shorter tasks. It is worth noting however, that the learners wanted to have conversations with others ‒ their coursemates, teachers or even Estonian natives ‒ meaning that they prefer to write in a more organic manner, not just for the sake of an exercise. Therefore, when a language learning environment is combined with a corpus, it should also take into account that learners are not always essay writers and that the data in this corpus or any kind of database should reflect that.

To make ELLE even more practical for the learners, we also offered the chance to assess their own writings, should they want to write in Estonian in any aspect of their daily lives. That was a favorable feature for Estonians as well. Learners as well as Estonians appreciated most the possibility to analyze their text for various errors. They noted that it is not enough to just highlight the mistaken word or phrase, but they also want to see why the system recognizes it is a mistake and have some example sentences with the correct use that would help them along. Some of the test users, especially learners, found that they would allow all manner of texts to be imported directly to the corpus from this text corrector. Estonians were the most reserved, since they would use it to correct important emails, not wanting private correspondence to end up in a public corpus. Although they added that if the text was not sensitive and making it public was straightforward, they would sometimes allow even some of their writings to move to the corpus.

While error analysis was the highlight of the text corrector, the users found that they would sometimes even use some of its other features. The learners said that they would see their text proficiency level and find out how to improve their text to get to a higher proficiency level. They also wanted occasionally to check their text style, to make it more suitable for the writing situation they had in mind. Readability analysis, also offered as part of the text corrector, was not that important for most learners. However, it was something that some of the Estonians, and some learners, would use to improve their text even more, to simplify their sentences or change repeating words.

ELLE also offers several incentives for learners to publish their writings to the corpus or to continue doing exercises. Competitive learner types were more motivated from the digital badges they would receive for their activity on the site. They also found that they want to compare their results to that of their friends, coursemates or other similar learners. Competitive learners would also opt to receive daily notifications to do exercises and keep up their streak, however most learners are not interested in this and would come to ELLE and do exercises when they feel like it. All learners were interested in seeing their progress on their dashboard, telling them what is their proficiency level, what kind of errors they do most and how they can improve.

Analyzing the user needs and feedback to the scenarios and prototypes helped us to design the prototype for an online platform which combines a learner language corpus with a virtual learning environment. This is, however, a snapshot of the process, since design is never final and further improvements based on the recommendations still need to be accounted for.

5 Discussion

The article describes the prototype of ELLE ‒ Estonian Language Learning, Teaching & Research Environment. To create and test this prototype, we formulated two research questions. RQ1 helped us find the users needs to start off the design process. RQ2 allowed us evaluate the final prototype at the end of the current design process. In this discussion, we’d like to point out how our findings from RQ1 and RQ2 correspond to the existing research.

From user testing we found, similar to [13] and [14], that learners appreciate the autonomy that a virtual learning environment offers. They also want to find exercises to test themselves, especially the case with independent learners. In order to support learner autonomy, ELLE offers the option to evaluate one’s text with the text corrector, find exercises with automatic feedback and discover new language learning resources from the collection of useful links.

Language learners are hence also more motivated to learn (e.g. [13,14,15]) using a virtual learning environment. Test users found that interactive exercises enhance in-class learning. There were also users who appreciated gamified elements, especially digital badges and a possibility to compare their own results with others. However, it was noted that doing exercises is not enough. It was also important for learners to practice speech and they found that they can not often do it in online language learning environments. The design of ELLE has not currently tackled the problem of adding means to practice speech and communicate with peers. It was currently left out due to the corpus behind ELLE being a written text corpus and not a speech corpus. However, in the future it is worth considering the possibility to integrate speaking exercises to the platform and to link their data directly to the speech corpus.

During the design sessions it was also found that learners (e.g. [19, 20]) as well as teachers [18] need additional support structures. For learners we offered the option to chat with their coursemates in a forum, send private messages or add comments to their teacher in a shared exercise. Offering the chance to ask help from other learners is also catering to their need for more autonomy, but might also be less stressful. Furthermore, allowing learners to help each other reduces the teachers workload. The teachers themselves can find support from their colleagues using the site, either looking up some public exercises, not having to create one from scratch, or asking for additional support with private messages. The teachers pointed out, however, that most of the support structures still exist offline and face-to-face meetings are more helpful than asking for help online.

In addition, some learners and teachers are not using a virtual learning environment or a corpus due to the lack of technical knowledge (e.g. [16, 17]). The researchers and teachers we interviewed pointed out that they have not used corpora much and they often do not use virtual learning environments either. They still use available digital resources occasionally, but most of the exercises are on paper or based on oral communication. Corpora have been the tool for some researchers, but to find example sentences for the exercises they are creating. Hopefully, using ELLE, corpora are more accessible and easier to use. In order to increase the usability, we simplified the terms used, offered immediate explanations with popups and planned short videos explaining the use of the platform. Some further limitations remain: changing teaching and learning habits, to include corpora more into classroom activities and promoting awareness about different virtual learning environments and Estonian corpora. Those issues could be tackled with offline activities, such as workshops in schools and universities.

6 Conclusions and Future Work

While the resulting prototype depicts an online platform bringing together teachers, learners and researchers in one corpus-based virtual learning environment, it is still a work in progress. We still need to have additional design sessions with language researchers who work with corpora on a daily basis, such as lexicographers. This would allow us to focus specifically on corpus analysis tools and to investigate how the data from the use of a learning environment provides input for the language technology research.

The prototype of ELLE was designed for a desktop platform, since the users said that when working with texts, they use their computers more often. However, learners pointed out that they use their mobile devices when learning languages online and that these devices are always with them. Therefore we also need to consider which features can be transferred to a mobile platform. We can assume that text writing and text correction will be uncomfortable on a small device. However, most of the exercises and learning resources would be easily accessible. We also need to think about the corpus and how much of it will be accessible on a mobile device or would the exercise results moving back to the corpus be the only connection with the mobile version.

While ELLE has been designed with the Estonian learner corpus in mind, the same design principles can be transferred to the development of other similar corpus-based virtual learning environments. More work on ELLE’s design and development still lies ahead. Combining language corpus with a virtual learning environment is a promising approach which opens up new opportunities both for language learning and for language technology research.