Keywords

1 Introduction

1.1 Motivation

The best way to learn language is practicing with real people and makes you get into the whole foreign environment. But sometimes we do not have chances or budgets to do so. No matter gets a foreign friend or foreign place, it costs. So, if there comes a learning companion can be with learner to practice English every day, let learner know if he is doing right or not, that would be great helpful to every demand learner to practice by himself.

1.2 Literature Review

Learning companion is a type of Intelligent Tutor System [1]. It can simulate the teaching way like classmates, partners. It can help students to get more ideas, better than the experience of studying alone. Using machine learning to build the learning companion, there are still some details need to figure out [2]. When design the learning companion, we should take these in mind: Cognitive, Motivation, Sociability, Attitudes and Ethics [3].

Taking the voice assistant for education is just about the beginning stage. Students can use it as Q&A station, timer…We also believe it is also good for special education, because the feature of voice interface [4]. For knowledge query, mental support and recreation, the voice assistant platform can play a good role for learning companion [5].

2 Design & Implementation

2.1 Voice Assistant

The voice assistant has several features just match the human beings; the following Table 1 gives the comparison.

Table 1. Voice assistant vs. Human beings

Since the voice assistant can listen and talk basically, we are trying to make it as the conversation practice partner who never feels tired. At first, let us check how the voice assistant works (Fig. 1).

Fig. 1.
figure 1

How the voice assistant works

When the user reaches to voice assistant, he can wake up the voice assistant device with the invocation name like Alexa, OK Google, …. Then shoot the skill invocation name to enter specific skill. After the device get what user says, interaction model will recognize the sentence to matched intent, and get the keywords (also called slots or entities). The application logic usually designed by developer, to determine what kind of data to get and executes corresponding condition-based operation. After the operation, sends back the data to interaction model to transfer to the suitable data format (voice, text, image, video…) to user.

Based on this infrastructure, the researcher wants to design an oral practice language learning companion, when user needs to practice, he can wake up the machine and do the conversation practice, it never tired. The system design diagram below (Fig. 2).

Fig. 2.
figure 2

How the oral practice language learning companion works

Let us go to the details for the interaction model and application logic. About the interaction model, we need to clarify what user says, and match to the right intent to get the next step. In this skill, we design 5 custom intents in the interaction model (Table 2).

Table 2. Intents in the read after Jessica

The core operation logic is in the handler for AnswerIntent, we will talk about it later. Let us review the user – voice assistant interaction flow first (Fig. 3).

Fig. 3.
figure 3

The interaction flow between user & voice assistant

In our design, the user needs to repeat exactly what the voice assistant says. So, it makes user to say the sentence very carefully. But sometimes the problem may incur from the voice recognition system or the microphone hardware. There are still lot of spaces to improve in this design scheme. Now let us go to see the details in the AnswerIntent (Fig. 4).

Fig. 4.
figure 4

The logic operation in AnswerIntent

When the voice assistant gets the practice sentence from user, the voice recognition system transfers it to text. Then the system determines which intent it belongs to. When it goes to AnswerIntent, it will firstly remove all symbols and change it all to lower case. The purpose for this step is for comparison later. Then the system will compare the sentence user spoken and the sentence in database, to see if it is the same or different. When it is the same, user will get the positive response, to know he is speaking right completely. Or user will get negative response, in those devices which have monitor, we also show what the system got from user, let user sees how far it is between with the right answer. This adding visual feature part is useful for language learner.

3 Results and Discussion

In this very early version of “Read after Jessica”, the researcher made it public to Amazon Alexa skill store.

Here is the result for the first month (Table 3):

Table 3. Users testing result

We let them to interact with the voice assistant, and take 5 sentences as unit, no matter what the English level of user, for 5 sentences, user get the exactly right sentence are only around 1 ~ 3. According to the user response shows in the monitor, The common difference factors we list below (Table 4):

Table 4. The common difference from users

4 Conclusion

It is the very beginning version of “Read after Jessica”, but we found the voice platform is a good way to simulate the oral practice learning companion. You can use it in any time you want, to get practice by your own. Although it just like a very strict teacher, you can only get encourage when you get 100% right. But it can show you what you just said on the screen to check and do the oral practice repeatedly and continuously.

There are still some types of errors are nothing to do with users, so we can still improve the voice assistant platform, and add guidance to users who almost speak the same sentence.