1 Introduction

Mobile learning is an extension of e-learning that is enabled through portable and wireless mobile devices and provides an anywhere, anytime learning experience according to learners’ convenience (Kumar & Sharma, 2020; Kumar, Goundar & Chand, 2019; Traxler & Crompton 2015). Mobile phones offer lower cost and improved flexibility. Hence education materials are more widely available to the students (Aloqaily et al., 2019; Benali & Ally, 2020). There are several advantages of mobile learning; (i) learning can take place anywhere anytime without any barrier to geographical constraints, (ii) students can achieve self-centered learning, i.e., learning at your own pace, (iii) learning materials can be delivered on the need and circumstance of the learners, and (iv) helps in achieving collaborative learning (Aubusson et al., 2009; Dashtestani, 2016; Goundar and Kumar, 2021; Mehdipour & Zerehkafi, 2013), etc.

Mobile language learning (MLL) is a subset of mobile learning that provides a mobile-assisted language learning environment that enhances reading and writing skills, vocabulary learning, and sentence making ability (Hwang & Fu, 2019; Shadiev et al., 2017). There are many studies that report on the development of mobile language learning applications but lacks a review paper that provides a comprehensive understanding of the development process. A systematic literature review was conducted in response to this with the goal of developing a body of knowledge to assist researchers working in the field. The objective was to consolidate information on; (i) requirements elicitation, (ii) design and implementation, and (iii) evaluation processes. Systematic literature reviews have established guidelines to conduct and present the findings of the review in a fair, reliable, and unbiased manner (Harris et al., 2014). The guidelines proposed by Kitchenham & Charters (2007) were used in this study. In total, sixty-three articles were retrieved from seven different databases. Forty-seven articles were selected after assessment against inclusion and exclusion criteria. The selected papers were analyzed using the following research questions; (i) What is the current state of literature? (ii) What are the characteristics of MLL applications? (iii) How are requirements gathered for MLL applications? (iv) How are MLL applications designed and implemented? and iv) How are MLL applications evaluated?

The main contribution of this paper includes; (i) assessing the studies to create a knowledge base on the development of MLL applications and (ii) consolidating the findings to give future research direction in the field. This paper is structured as follows; the background section provides the relevant literature on MLL applications. The methodology section explains how the research was planned and executed. The results section provides an analysis of the data collected. The discussion section presents the findings and recommendations for further research. The threat to the validity of the results was provided. Finally, the conclusion and future work are recommended.

2 Background

2.1 Mobile language learning

Mobile language learning (MLL) applications take advantage of the features of mobile learning, such as spontaneous, portability, interactivity, accessibility, etc., to provide a mobile-assisted language learning environment (Aloqaily et al., 2019; Benali & Ally, 2020). The rapid development of MLL applications has provided a paradigm shift from teacher centered learning to a more portable and real-time language learning environment (Shadiev et al., 2017; Hwang & Fu, 2019). MLL supports many areas of language learning, such as vocabulary, comprehension, speaking, listening, and writing skills (Elaish et al., 2019). Many second language learners carry a pocket dictionary or personal vocabulary books to assist them in learning a foreign language (Crow & Parsons, 2015; Schiefelbein et al., 2019). This allowed the research community to explore portable wireless mobile devices to assist in language learning, which led to the emergence of mobile language learning. Learners can utilize mobile devices as an educational instrument to establish self-directed learning to upgrade their language skills (Ohkawa et al., 2018; Zhou et al., 2017). A study by Tommerdahl et al., (2022) examined the efficacy of commercially available foreign language-learning apps. The study concluded that there is a dearth of studies examining app efficacy, that English was the most commonly taught language, and that vocabulary was the most commonly tested area. Although commercial apps were found to support foreign language learning successfully, the included studies’ methods varied in ways that made direct comparison difficult. Elaish et al., (2019), in a study, concluded teaching the English language via mobile devices to foreign students, including voice recognition and interpretation systems, has been shown to increase their English language skills easily.

2.2 Prior work

In literature, there are few reviews on mobile language learning. Hwang & Fu (2019) investigated mobile language learning apps from 2007 to 2016, identifying research methods, research difficulties, language and learner kinds, and learning outcomes. The results showed that the most prevalent target language was shown to be English as a foreign/second language, while few studies on native language acquisition have also been done. Researchers began to explore the challenge of offering various language skills in authentic learning contexts in the last five years since early studies mostly concentrated on strengthening learners’ individual language skills. Elaish et al., (2019) undertook a thorough evaluation of the literature on mobile English language learning in order to start an evidence-based debate concerning its usage in English language education. They discovered the rate of publishing, research domains, and language learning issues. Shadiev et al., (2017) conducted a review of the literature on mobile language learning in genuine situations from 2007 to 2016. The goal was to learn about the latest trends in publications, as well as the study topic, technology employed, methodology, and current challenges.

The existing reviews mainly focus on understanding mobile language learning, which looks at publication trends, the technology employed, learning outcomes, etc. This paper attempts to specifically consolidate information on the development of MLL applications, such as requirements elicitation, implementation, and evaluation processes, to assist researchers working in this field.

3 Research method

This study was carried out following the guidelines of Kitchenham & Charters (2007). The systematic literature review process was divided into three stages: planning, conducting, and reporting. The planning stage included developing the review protocol. Conducting stage includes selecting and reviewing the studies. Reporting stage involves writing up the review and sharing the findings with the research community. Figure 1 depicts the systematic literature review process adopted in this paper.

Fig. 1
figure 1

Systematic literature review process

Planning - a review protocol was established, including identifying data sources, search strategy, and inclusion/exclusion criteria.

3.1 Information sources

The following digital libraries were used in the study that publishes good quality studies in the relevant field.

3.2 Search strategy

The search string should cover as much ground as possible while remaining reasonable in size (Schardt et al., 2007). The search strings were derived from the previously identified research questions. It was developed using two different key terms to capture results from the databases: (1) Mobile language learning as the field of study, and (2) Development as the specified criteria. Table 1 shows the search strings used.

Table 1 Search String

3.3 Inclusion criteria/exclusion criteria

In order to evaluate the selected articles, inclusion/exclusion criteria were established. Only those articles that met the following criteria were accepted.

The inclusion criteria were as follows;

IC1. The article reports on mobile language learning application development.

IC2. The article is written entirely using English as the primary language.

The exclusion criteria were as follows;

EC1. The articles can be used to satisfactorily answer the research questions.

Conducting - after the planning stage, the actual review process started, involving study selection, data extraction, and synthesis.

3.4 Study selection

The search strings were executed on the selected databases and sixty-two articles were retrieved. The papers were assessed and compared against the inclusion and exclusion criteria. After evaluating the articles against the selection criteria, forty-seven articles were selected, while fifteen were eliminated. The papers have been removed for the following reasons; the article was listed in multiple databases, the article was not entirely written in the English language, and the article can not be used to sufficiently answer the research questions. Figure 2 illustrates the study selection process.

3.5 Data extraction and synthesis

The data from the selected studies were compiled. From each of the primary studies, the following data was retrieved:

  • Year of publication.

  • Publication type.

  • MLL applications.

  • Categories.

  • Learning strategies.

  • Requirements elicitation process.

  • Implementation details.

  • Evaluation details.

The extracted data was placed on a shared drive so that it could be easily assessable to all the authors. While the authors were responsible for reading papers and extracting data, an independent researcher was asked to verify the data. The intercoder reliability was greater than 93%. Any disagreement was resolved with mutual consensus. The extracted data were summarised and grouped into tables. Finally, the data was analyzed, and the results are presented in the next section.

Fig. 2
figure 2

Study selection process

4 Reporting

The result of the data analysis is presented in this section. The discussion section provides a summary of the results obtained and provides direction for further research direction.

5 Current state of literature

In total, 47 research papers were retrieved from the literature. This was less than what was expected, but it is sufficient enough to derive substantial knowledge on the development of MLL applications. There was no restriction on the year of publication. The first paper was published in the year 2007. These studies were also published very similar to the years when mobile phones started gaining prominence. The overall publication trend shows that more papers will be published in the near future. With the increasing trend with which mobile learning is being introduced in the education sector, this will be possible in the near future. In this research, both journal and conference papers were considered.

Peer-reviewed publications such as journal and conference papers were used in this research. Although mobile learning applications are developed outside of academia, peer reviewed evidence is widely accepted in the research community. There were 23% journal papers and 77% conference papers. A larger number of conference papers may be due to the fact that in the field of computer science, conference papers are the major source of publication as they are also more timely and on the hand, journal papers may take years to publish. Papers are distributed over seven different venues. Table 2 provides papers according to different databases.

Table 2 Number of papers included per database

6 MLL categories

MLL applications were classified into six different categories; (i) vocabulary, (ii) reading and writing, (iii) speaking and listening, (iv) pronunciation, (v) grammar, and (vi) conversation. Seven papers overlapped into two or more categories. These papers were included in their relevant categories and also listed as mixed categories. Figure 3 provides a distribution of papers in six different categories. Table 3 provides a list of papers in various categories.

Fig. 3
figure 3

Different categories of MLL applications

Vocabulary - applications in this category comprise vocabulary applications. Vocabulary is essential in language learning as the meanings of new words are often accentuated (Alqahtani, 2015; Turnbull et al., 2017). A vocabulary application features a quick, easy, and exciting way to learn vocabulary. It combines dictionaries and adaptive learning features that allow languages learners to master new words with their meanings.

Reading and writing – includes reading and writing applications. The applications developed under this category involved learning to read and write a foreign language. The reading applications were developed to read characters, words, and sentences. The writing applications revolved around writing letters, words, and sentences.

Speaking and listening – includes speaking and listening applications. Listening to a foreign language and speaking are interconnected in the education process. Listening can serve as the basis for speaking, thus contributing to the development of the language learning process (Walter et al., 2017; Yahuarcani et al., 2019). The applications developed under this category involved mobile-based television programs, visual storytelling, common spoken terms, character speaking, presentation, and interactive scenario-based speech.

Pronunciation - includes pronunciation and dictionary lookup applications. When learning a new or second language, pronunciation plays a key role. When words are pronounced decorously, learners gain confidence and motivation to excel (Ohkawa et al., 2018; Osipova et al., 2016). Pronunciation applications intend to ensure learners can pronounce vocabularies and characters precisely and accurately using speech recognition systems. The learners would be taught the correct pronunciation via the application and directed to repeat. When mispronounced, the apps would alert the learner.

Conversation – include applications that urge learners to converse and learn a new language. Learners may acquire new languages by conversing (Pham et al., 2018). The applications developed under this category involved text chat systems, Q & A systems, and social networking sites.

Grammar - includes grammar learning applications. Grammar is also considered a vital element in learning a new language. Language skills such as listening, reading, writing, and speaking cannot be enriched unless grammar is mastered (Fronza & Gallo, 2016). Grammar applications are designed for reading, reviewing, practicing alphabets and words, word formations, listening, proofreading, and writing.

Mixed Category – the papers overlapped into two or more categories. The overlapping papers have implemented speaking/writing or vocabulary/grammar in the MLL application.

Table 3 Categories of MLL Applications

7 MLL application type

Three major categories were derived to account for the different types of MLL applications. While categorizing, some MLL applications overlapped into more than one category. To resolute this, the primary purpose and motive of the applications were studied and categorized accordingly. Table 4 provides a list of papers that have used different learning strategies.

Game-based learning - carries pedagogical value, particularly in foreign language teaching and learning (Lukianenko, 2014). The category included word games, alphabet games, scenario games, definition games, and diacritic games applications. Word games involve the discovery, formation, and altering of words to learn a respective language. Alphabet games deal with alphabets of different languages. A learner goes through a series of lessons on the alphabet to learn the language. In scenario games, a learner goes through different plots or settings and is taught a specific language. Usually, the learner is taught how to communicate in various instances and scenarios, which builds up communication skills. Definition games teach the meaning of words and concepts in different languages. Diacritic games enable individuals to acquire knowledge where alphabets have signs or symbols written above or below them. These alphabets are pronounced differently in various languages. All these applications make use of gamification techniques. Gamification is the application of game-related elements to non-game contexts (Roccetti et al., 2016).

Entertainment - plays a pivotal role in the modern-day lives of individuals. It provides users with an amplified selection of opportunities to be amused, enlightened, have fun, and pleasure. The category consisted of storytelling music, instant messaging (IM), social networks, and TV programs. All these types of applications were pooled, and the category “entertainment” was coined. Storytelling apps use interactive arts, words, and images to narrate a story or scenario while teaching language skills to the learners. Music apps are predominately used by children to introduce and train a new language. The app includes learning the alphabet and common words through singing and music. IM apps allow students to communicate while learning a new language. IM is real-time and resembles face-to-face communication (Turnbull et al., 2017). Social Networking is a Web 2.0 technology that has transformed the language learning arena of education (Harrison & Thomas, 2009). It allows new language learners to stay connected with teachers and other students. TV program apps function as a standard television. However, in smartphones, these applications broadcast programs on language learning.

Quiz - application category comprises flashcards, multiple-choice, fill-in-the-blanks, and Q & A systems. Flashcards are small note cards used to test and improve memory. In language learning, flashcard-based apps are used to test learners on the lessons learned. These apps hold two types of virtual flashcards; question and answer flash cards. Firstly, a question is posed to the learner, and when navigated, answer cards are flashed. Multiple choice applications carry questions and choices for answers. A learner would read the question and select a possible answer. Upon submission, the applications indicate whether an answer is correct or incorrect. If incorrect, the sample answer is reviewed. Fill-in-the-blanks apps display language-based sentences. These sentences would have a blank or more than one blank in some cases. The fill-in-the-blanks question often comes with a list of words. These words may be used to fill in the missing word(s). Q & A system app has a library of questions. The language learner would commence with a lesson. At the end of the lesson, the learner is tested on the learning outcomes. In a Q & A answer system, application questions consist of multiple choices, matching, fill-in-the-blanks, short answer questions, etc. Upon attempting questions, the learner is allowed to review their answers.

Table 4 Learning strategies

8 Target audience

Adults were primarily targeted for the MLL applications. From this 83% of the adults were classified as students. Language learning took place more in universities discreetly due to increased availability of resources, students of different languages and cultures, increased ownership of mobile devices, and grants invested in motivating language learning, etc. The second most targeted audience was children. 88% of the children were classified as students, as language learning mainly occurred in primary, elementary, and high schools. According to Chang and Hwang (2019), children learn faster than adults. This is mainly because a child’s brain has unique receptiveness and flexibility than an adult’s. Thus, learning a new or second language is easier and more enjoyable. The other targeted children included those at home and in rural areas. The applications developed for children at home provided after-school activities to broaden and enhance learning. Children in rural areas were targeted due to a lack of learning resources and disadvantaged services. The apps developed for children utilized gamification techniques and features.

The third most target was a mixed audience. A mixed audience caters to all types of audiences, from children to adults or second language learners to refugees. This audience uses the language learning application in a ubiquitous environment, available everywhere and anywhere. The following ranked target audiences are foreigners and teachers. Foreigners utilize language learning applications when visiting foreign countries with unique cultures and languages. This enables the foreigners to communicate with the locals during their stay and have effective communication. Foreign language apps also provide dictionary lookup features. Teachers use language learning applications for teaching purposes in schools and distance learning. In schools, apps are used to teach new languages to students. The app provides a quick and practical guideline to deliver the syllables and meet all possible learning outcomes. The teacher uses the language learning application to update syllables and get consistent feedback on student performance in distance learning.

The next ranked target audiences were refugees, second language learners, and English language learners. Refugees are those individuals who are forced to leave their own countries due to unforeseen circumstances. These individuals settle in other countries, usually in a country where culture and language are entirely new (MacFarlane et al., 2008). Refugees use language learning applications to trim down the language barrier to blend in and build up their livelihoods. Second language learners are those individuals who wish to learn other languages apart from their native languages. Thus, using language learning apps becomes a convenient and effective learning tool. English is a widely used international language (Patel & Jain, 2008). Therefore, people are swayed to learn the language. Individuals who are not fluent and proficient in English make use of language learning applications to learn it. Furthermore, individuals in the working-class category are often bound to learn English and other languages to excel in the workplace. For effective and flexible learning, working-class individuals engage in mobile-based language learning. It permits learners to enhance language skills anywhere and anytime.

9 Requirements Elicitation

Requirements elicitation is the process of gathering requirements. Requirements can be gathered from various sources. For MLL applications, requirements were gathered from literature reviews, preliminary studies, motivational factors, and technological advancements. In some instances, two or three different methods were used to gather requirements. Figure 4 provides statistics on different methods used to gather requirements. The different sources are explained below:

Fig. 4
figure 4

Distribution of papers on requirements elicitation

The literature review was the most popular method used to gather requirements. With a literature review on MLL applications, the authors were precisely able to identify, assess and determine the prior work carried in the field. The majority of researchers focused on identifying research gaps, e.g., Lehman et al., (2020); Berns et al., (2016); Zhang & Zou (2020), while others concentrated on collaborating research to develop optimized MLL applications, e.g., Nazare et al., (2017); Segaran et al., (2014). Preliminary studies were also utilized to gather requirements from the users. Chang et al., (2018), Metafas & Politi (2017), and Tsuei & Huang (2018) used questionnaires to gather preliminary data on the development of the app. Walter et al., (2017) and Chang et al., (2013) conducted interviews. Turnbull et al., (2017), Tsuei & Huang (2018), Kingsley et al., (2016), Thi Hien et al., (2018) and Fronza & Gallo (2016) conducted surveys to collate data. Chang et al., (2013) and Hassan et al., (2019) opted for observation to gather pre-development data.

Motivational factors included instances where data was gathered by looking at the existing MLL applications. Rankin & Edwards (2017), Osipova et al., (2016), and Khalil et al., (2020) are some of the many authors who collected data from existing applications or systems. Marciano et al. (2015) collated data from a previous project. The previous project was evaluated to identify faults and gaps. Palomo-Duarte et al., (2016) studied a prototype as a motivational factor to gather data to develop a mobile language learning app. Technological advancement is where new technologies enhance the teaching and learning process and build the interest of learners (Halili, 2019). Park et al., (2011) and Ninan et al., (2019) gathered data on technological advancements and effectively utilized it in MLL applications. Based on this data, the authors developed mobile language learning apps.

Generally, four different methods (literature review, preliminary studies, motivational factors, and technological advancements) have been used to gather the requirements. A notable observation was a lack of a requirement catalog in the domain that can assist MLL developers. This warrants further work in the domain to establish a requirements catalog. These will have many advantages; (i) it will save time and cost in gathering requirements, (ii) it provides an up to date requirements, and (iii) opportunity to extend requirements.

10 Design and implmentation

Different software technologies used in the development of MLL applications were analysed. A total of nine categories of technologies were identified. The categories included app development technology, speech technology, algorithms and programming technology, database technology, gamification technology, cloud technology, prototyping technology, client/server architectural technology, and image recognition and video technology. Table 5 shows the different technologies used for MLL application development.

Table 5 MLL application development technology

Many different technologies have been used in the development of MLL applications. There is a lack of evaluation criteria that could assist in selecting appropriate software development tools. There was no reference from literature to justify why a particular software tool was selected. Selecting the wrong software tool for MLL applications can have an adverse impact on time and cost in MLL projects. Therefore a framework is required that could assist developers in selecting appropriate software tools.

11 Testing

Testing is often conducted to see if the MLL applications meet the specified requirements. Two different types of testing methods, usability and functionality testing were used. Functional testing was the most commonly used evaluation method. It was used to see how learning outcomes and goals were achieved. Data gathering methods included questionnaires, interviews, observation, feedback, and surveys. There were studies that adopted more than one data gathering technique. Figure 5. provides statistics on functional testing methods.

The methods are described below;

Questionnaires - a list of questions was designed to understand the functionality of MLL applications. The questions were based on the functionality of the developed MLL applications.

Interviews - interviews consisted of semi-structured and structured questions. In most of the cases, focus group interview was administered with potential users.

Observation – the developers observe the applications while in use by potential users and try to judge their desired functionality.

Feedback – online feedback or feedback through email was requested for the MLL applications. Feedback was used to look for potential flaws in the application.

Survey – an online survey was conducted with potential users of the application. It was the least commonly functional testing method.

Evaluation of MLL applications is an area that needs to be further explored. In the studies analyzed, none of the authors applied the evaluation method explicitly designed for MLL applications. In other words, there is a lack of standardization in the evaluation of MLL applications.

Fig. 5
figure 5

Functional testing methods

In a few studies, usability testing was also applied. Usability testing was conducted to ensure that the users could use the applications with ease. In usability testing, most of the authors applied the usability heuristics proposed by Nielsen & Molich (1990) in a small controlled experiment to evaluate the MLL applications. The sample size needed to test the MLL applications has not been given much attention. Only a few studies have provided references to validate their sample size. Usability testing was used to determine the ease of use of MLL applications. Mostly the applications were validated against heuristics proposed by Nielsen & Molich (1990).

12 Discussion

12.1 Findings

The main findings of the study are as follows;

There are six categories of MLL applications developed; vocabulary, reading and writing, speaking and listening, pronunciation, grammar, and conversation. Amongst these, vocabulary learning is the most common. Different learning strategies have been used to enhance the learning process such as; game based learning, entertainment, and quiz. A variety of languages have been used in MLL applications, where English is the most dominant language. The target audience of MLL applications can be classified as adults who are mostly university students and children who are in elementary, primary, and high schools. Also, Foreigners who utilize MLL applications when visiting foreign countries with unique cultures and languages.

Requirements for new MLL applications were gathered from literature reviews, preliminary studies, motivational factors, and technological advancements. While literary sources were used as the main source to elicit requirements, effort was also made to gather requirements from preliminary studies and other similar applications. There is no standard set of requirements available, and the studies analyzed re-defined the wheel for requirements gathering. Further work is warranted to define the requirements catalog that would assist in the development of MLL applications.

Different software tools have been used for developing MLL applications, which mainly include tools used in app development. With MLL applications being developed using different technology, making a comparison between different MLL applications is also a challenging task. Further research is needed to validate these software tools and measure their effectiveness in developing MLL applications. Very few papers discussed the deployment model.

Testing methods used in MLL applications are similar to the methods that have been used in other mobile learning applications. Functional testing was used to verify that the application meets the required goal with different types of testing instruments used, such as questionnaires, interviews, observation, feedback, and surveys. Some studies also adopted a mixed method approach. The sample size needed to test the functionality of MLL applications has not been given much attention. Only a few studies have provided references to validate their sample size. Usability testing was used to determine the ease of use of MLL applications. Mostly the applications were validated against Nielsen’s heuristics. This also warrants further study to develop a customized set of heuristics for MLL applications.

12.2 Recommendations

The findings are essential for subsequent research. Future work can be carried out to strengthen the field in the following ways;

  • There is no standard set of requirements for developing MLL applications. Different methods have been used to gather requirements for MLL applications. Further research is needed to establish a requirements catalog that would assist developers working in the field.

  • The lack of standardization in developing MLL applications can result in problems such as uniformity, reusability, increased development cost, and reliability of MLL applications. A standardized approach for developing MLL applications is needed in terms of a model or framework.

  • Testing of MLL applications has been done differently in different studies, this makes comparing results of different MLL applications a difficult task. For a more effective evaluation process, a standardized evaluation method should be developed for MLL applications.

  • A control experiment is needed to validate the number of participants required for testing MLL applications. The validation of the sample size will help to achieve the proper use of resources in the testing of MLL applications.

  • Automated usability testing methods have been employed in several domains. Further study is required to develop an automated testing method for MLL applications. This would also save time and effort for the developers.

12.3 Limitations

In order to ensure the reliability of the results, the systematic literature review was planned and executed based on the guidelines proposed by Kitchenham and (Charters, 2007). Since the systematic literature review strictly conformed to the proposed guideline, the replicability of the results would not deviate significantly. The limitation of the study is that journal and conference papers were retrieved from selected venues. This may have omitted some articles published in standalone journals and conferences. Thus the results must be qualified as applying to studies published in major journals and conferences.

13 Conclusion

This study presents the results of a systematic literature review on the development of MLL applications. The systematic literature review provides a consolidated body of knowledge on the development of MLL applications. The systematic literature review process was designed, executed, and analysed according to the identified research questions. In total, 47 papers were selected and analyzed. The results provided helpful insight into; (i) current state of literature, (ii) requirements elicitation, (iii) implementation, and (iv) evaluation processes. The study’s findings provided useful information on MLL development and recommendations for subsequent research. Future work is required to extend the systematic literature review by providing yearly updates that not only repeat systematic literature reviews but adapt the iterations over the years according to lessons learned from previous iterations.