Introduction

Multilingual language development is characterized by extreme heterogeneity such that two children growing up with the same languages can vary vastly in their language learning experiences, and ultimately, proficiency in each language. This heterogeneity in language learning, exposure, and use can complicate language assessment services. Speech-language pathologists, professionals who are responsible for conducting evidence-based language assessment, often feel underprepared to work with children from multilingual backgrounds [1]. In fact, children from multilingual backgrounds are often under- and/or over-diagnosed with language impairment [2]. Underdiagnosis may occur when practitioners adopt the “wait and see” approach, waiting to evaluate until the child becomes more fluent in English. Overdiagnosis may occur when practitioners assess solely in English, not taking into account the child’s skills in the other language(s). Another major barrier to providing culturally and linguistically responsive assessment services is the lack of appropriate assessment tools for multilingual children resulting in the incorrect use of monolingual norms to make diagnostic decisions for multilingual children [3]. An additional complicating factor includes the lack of trained bilingual speech-language pathologists, with only about 8% of speech-language pathologists in the USA reporting they are bilingual [4]. This results in a client–clinician mismatch [5]. Together, these issues are in fact a public health crisis given that there are 12 million multilingual children in the USA [6].

Despite these barriers, researchers have devoted significant amount of time and effort into developing assessment tools and procedures to improve assessment practices for multilingual children. Recommendations made over the years include testing all languages instead of assessing only the primary society language [7, 8], using conceptual scoring to take into account knowledge across all languages [9, 10], developing and adapting assessments specifically designed for multilingual children [11], and creating alternative assessments that move beyond assessing static language knowledge [7, 12]. Recently, a converging evidence approach was presented as a best practice for assessment of multilingual children [13••]. Specifically, in order to make a clinical decision regarding whether or not a multilingual child has a language impairment, the clinician must gather and synthesize information from four areas, including parent and/or teacher concern ratings, language samples in all the child’s languages, standardized assessments appropriate for the child’s cultural and linguistic background, and measures of learning potential (i.e., dynamic assessment). Thus, clinical decisions must be made with sufficient evidence across multiple areas of assessment. In addition to the converging evidence framework, there has been a call for clinicians to approach language assessment of multilingual children within the framework of “disorder within diversity” where the language skills of multilingual children are compared to the language skills of other multilingual children with similar linguistic experiences/backgrounds instead of comparing multilingual children to monolingual children [14].

The goal of the present paper is to review the most recent updates in the field of speech-language pathology regarding culturally and linguistically responsive assessment approaches for multilingual children. Specifically, research studies conducted within the past 5 years were reviewed to assess the most recent updates, recommendations, and advances in the field of speech-language pathology. The paper focuses on the diagnosis of developmental language disorder, which is defined as a neurodevelopmental disorder characterized by persistent difficulties in the child’s ability to comprehend and use language that is not associated with a biomedical condition [15]. In multilingual children, these comprehension and production difficulties are present in all the child’s languages.

Method

Articles for the present review were identified and selected following the framework outlined by Arksey and O’Malley [16] and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISM) [17]. Arksey and O’Malley [16] outlined a five-stage process for conducting a scoping review: identify the research question, identify relevant studies, select the relevant literature, chart the data, summarize and report and results. They also included an optional sixth step, which includes consultation of stakeholders. In the present review, a keyword search was conducted in the following databases: (1) EBSCO ERIC, (2) MedLine, (3) ProQuest, and (4) ScienceDirect. Keywords included multilingual, bilingual, developmental language disorder/language disorder, and assessment. Empirical studies that met all the following criteria were included in the present review: (1) focused on multilingual children with developmental language disorder, birth to 18 years of age, (2) focused on language assessment considerations, (3) published in a peer-reviewed journal within the past 5 years (2018–2022), and (4) published in the English language. Review articles, book chapters, conference abstracts, and articles written in languages other than English were excluded from the present review.

The initial step in the review process included a screening of the titles and the abstracts of all articles identified through the keyword search in each database listed above. The full text of the articles deemed relevant to the topic of this review paper was retrieved for further examination. The first author conducted the initial screening.

Results

The initial screening of articles based on the keyword search was conducted by the first author. A total of 73 articles across the four databases were identified. Of those, 10 were duplicates and thus removed, resulting in 63 relevant articles. After full text examination, 40 articles were excluded because they did not meet the aforementioned inclusion criteria. This resulted in a total of 23 articles that met eligibility criteria. A research assistant reviewed all 73 articles which were initially identified to calculate an inter-rater reliability check to ensure that all articles selected for review met the inclusion criteria outlined above. The inter-rater reliability check resulted in 98% agreement. Ambiguities were discussed and the inter-rater reliability reached 100% post discussion. The 23 articles selected for the review were articles that presented primary data. Those articles were reviewed by the first author who identified overarching themes based on the assessment approach(es) discussed in each article. Eight themes related to assessment approaches of multilingual children were identified and are outlined below. Both authors reached 100% agreement on all themes identified (Fig. 1).

Fig. 1
figure 1

Article selection process

Consider All Languages

Two studies addressed screening approaches of multilingual children [18•, 19••]. These studies found that, in general, multilingual children are screened at least 3 months later than monolingual children [18•]. Directly screening both languages is the recommended best practice [19••]. Screening multilingual toddlers in only one of their languages resulted in many false positives, unnecessarily increasing assessment referrals [19••]. It was also found that when toddlers were administered a screening tool in the major society language while parent report was used to screen for concerns in the native language, this resulted in high specificity but low sensitivity. Thus, many children who would benefit from a full assessment were not identified. Together, these studies indicate that direct screening/assessment of all languages by professionals appears to be best practice, while parent report should be used as supplemental information.

Best Language Scoring Method

One recent assessment has been developed for bilingual English–Spanish speaking children between the ages of 4 and 6 years old. The Bilingual English Spanish Assessment (BESA) [11] uses the best language scoring method to make a clinical decision regarding developmental language disorder. That is, both languages are assessed but the best score from each subtest (phonology, semantics, and morphosyntax) is used for the final language index score. The BESA has been shown to have excellent sensitivity and specificity for Spanish–English bilingual children between the ages of four to six using the best language scoring method. One study [20••] presented an extended version of the BESA, the Middle Extension (BESA-ME) [21••], designed for school age children between 7 and 11 years old. The results revealed that using the best language scoring method was valid for older children as well. Another study [21••] extended the use of the BESA to Spanish–English bilingual children who speak African American English dialect and found that the BESA is a valid assessment tool for bilingual dialect speakers. Together, these studies illustrate the importance of using assessment tools designed specifically for multilingual children while taking into account knowledge across all the child’s languages.

Specific Language Skills

Two specific areas of multilingual language development were addressed in the articles selected for this review: code switching and morphosyntax. Code switching is a bilingual phenomenon where speakers alternate between their languages, either within the same sentence or between sentences and it is a typical phenomenon of multilingual language development. One study [22•] found that Spanish–English school-age children with developmental language disorder engaged in the same type and number of code switching behaviors as their neurotypical peers. Thus, the authors conclude that analysis of code switching behavior should not be used as part of the assessment process to rule in or rule out developmental language impairment. Alternatively, clinicians should allow children to code switch during language assessment as this is a typical phenomenon of bilingual language use.

A number of studies evaluated the utility of morphosyntax measures for assessment of multilingual children [23•, 24, 25, 26•, 27•, 28•]. This is not surprising given that children with developmental language disorder have difficulties learning and using morphology and syntax [29•, 30, 31]. The articles focused on grammatical features of specific languages such as Spanish [23•, 26•], Turkish [28•], and Welsh [27•]. One study [28•] demonstrated that children with developmental language disorder demonstrate morphosyntactic differences even when learning a less morphologically rich language. Overall, all research articles confirmed the need to extensively assess morphosyntax but urged clinicians to assess a variety of morphological structures in each language [25] because the extent of morphosyntactic difficulties varies depending on the typology of the child’s language.

Language Sampling

Use of language samples as a bias-free assessment tool has been recommended for multilingual children [32]. One study [33•] found a positive correlation between standardized assessment scores and language sample measures in bilingual Spanish–English school-age children. However, they demonstrated that each measure provides unique information and the utility of language samples varied by age. Specifically, the use of wordless picture books for story retelling appears to be more suitable for younger children between 5 and 8 years old than for older school-age children. Thus, language sample measures should be used when assessing multilingual children, but they should be used in conjunction with other tools.

In another study [34••], several changes were proposed to the traditional language sampling measures. The rationale for the proposed changes was that many speech-language pathologists avoid the use of language sampling due to the time demand to collect, transcribe, and analyze the sample. To determine the feasibility of alternate, shortened language sampling procedures, parents were asked to present a book to their child as they typically would and then ask their child to retell the story. Parents were then asked to report back the longest utterances they heard their child produce. Then, two measures were calculated: the length of the longest utterance produced by the child and the average of three utterances in words, that is, the number of words produced in the three longest utterances was calculated and divided by three to obtain an average. Results revealed that these alternative measures, which would be less time consuming, appear to provide reliable information about children’s language skills as they significantly correlated with traditional language sample measures such as number of different words and mean length of utterance. Together, these two studies demonstrate the diagnostic utility of language sample measures when assessing multilingual children; however, these measures should be used in conjunction with other tools and clinicians should be trained to elicit and analyze language samples according to procedures suitable for their clinical setting.

Dynamic Assessment

Unlike standardized assessments, which assess static knowledge, dynamic assessment assesses learning potential. Thus, dynamic assessment is thought to be less biased as it does not depend on past experiences and opportunities, but instead, provides information about the child’s ability to learn new information. It was pointed out in one study [35••] that despite the high success rate of dynamic assessment across multiple cultures and languages, dynamic assessment is not commonly used in clinical practice. Correspondingly, these authors recommended using a standardized dynamic assessment procedure to allow for ease of administration and scoring. In their study [35••], a standardized approach was used to conduct a dynamic assessment focusing on story retelling. Over 3 days, they conducted a pretest of children’s narrative retell ability (day 1), explicitly taught narrative retell skills (day 2), and conducted a posttest of narrative retell abilities (day 3). Their assessment approach resulted in high sensitivity and specificity for school-age multilingual children. The high classification accuracy, paired with the results of previous studies [36], support the use of dynamic assessment when assessing the language skills of multilingual children. Similar results were obtained by another study [37] and that work further demonstrated cross linguistic benefits such that children made gains in their story telling abilities across all their languages irrespective of the language of the teaching session. In another study [38], the utility of dynamic assessment was assessed using an inferential word learning task where children were required to infer the meaning of novel words based on surrounding text/context. They demonstrated high diagnostic accuracy using this task to identify multilingual children with developmental language disorder. Together, all three studies confirmed the utility of dynamic assessment.

Alternative Assessments

Several alternative measures have been proposed to reduce bias when assessing multilingual children. In the articles selected for the present review, these alternative measures include non-word repetition and statistical word learning. Non-word repetition tasks have been studied for over two decades and have been suggested as an appropriate assessment tool to reduce bias in assessment. In a non-word repetition task, children are asked to repeat syllable sequences increasing in length and complexity. The syllable strings are novel words that resemble the phonology of one language or are crosslinguistic in nature such that the phonology represents multiple languages. The question often asked in non-word repetition studies for utility with multilingual children relates to the most appropriate method for structuring the novel words. In two recent study [39•, 40•], it was found that overall, non-word repetition tasks are effective in differentiating children with and without developmental language disorder. Furthermore, monolingual children and bilingual children performed similarly on non-word repetition tasks, further confirming that non-word repetition tasks are less biased assessment tools [39•]. However, clinicians are cautioned against using a single language specific non-word task even if all languages spoken by multilingual children contain similar phonology [40•]. Thus, multilingual children should be assessed using words that are representative of all their languages.

Statistical word learning has also been suggested as a less biased assessment tool for multilingual children. During statistical word learning, children are exposed to a stream of input requiring them to track transitional probability to identify word boundaries. One study [41] found that both multilingual and monolingual children with developmental language disorder experienced difficulty learning words during a statistical word learning task. Furthermore, statistical word learning ability was a strong predictor of the severity of developmental language disorder, such that children with poorest performance on the statistical word learning task had lowest language skills. This study demonstrated that statistical word learning tasks may be used in conjunction with other measures (i.e., converging evidence) to aid in diagnosing language impairment in multilingual children.

Technology

In cases where there are no professionals who speak the child’s languages, the use of speech recognition technology has been proposed to aid in screening children’s language skills. Specifically, dual-language automatic speech recognition (ASR) has been examined for use with multilingual children [42••]. ASR is simply a speech recognition technology such as Google Assistant and Amazon Alexa with capability to recognize a vast number of languages. In the study [42••], it was proposed that ASR can be used as a way to allow speech-language pathologists to screen the language skills of children whose languages they themselves do not speak. These authors specifically state that this would not be the best approach for a comprehensive assessment, at this time, but at least serve as a potential starting point for a screening. In their study, they successfully employed the use of the Google Cloud non-streaming REST speech-to-text API program to transcribe bilingual English–Spanish school-age children’s responses. Language scores were compared when items were transcribed by a human versus an ASR resulting in favorable outcomes. The ASR measure yielded the same sensitivity as the human coding but lower specificity. Therefore, assessment tools specifically programmed with use of ASR may be a helpful method to screen the language skills of multilingual children.

Screening tablet applications also show promise as a reliable tool. In one study [43•], the Receptive Vocabulary Screener (RVS) for German-Polish and German-Turkish speaking children was reliably used to screen receptive language skills. The benefit of both the tablet application and the ASR is that the examiner is not required to speak the child’s languages.

Local Norms

Measures of English language skills may provide important information about multilingual child’s language skills, but it is important to not rely solely on such measures. One study [44] found that the use of a comprehensive monolingual test battery may provide valuable information about multilingual children’s language skills, especially those who speak languages where formal assessments are not available. However, it should be pointed out that this study specifically focused on English Language Learners who required support in acquiring English. Another study [23•] showed that Spanish–English bilingual children exposed to English at least 40% of the time achieved high scores on an English assessment of morphosyntax. In both studies, there was variability in performance indicating that although English measures may provide important information, it is crucial to assess both languages. In fact, assessments developed for monolingual children should be adapted for multilingual children by developing multilingual/local norms for any assessment that was developed for monolingual children [45••].

Discussion

The goal of the present paper was to review and synthesize the most recent updates and recommendations for the assessment of multilingual children. A synthesis of research findings from articles published in the past 5 years aligns with the converging evidence framework recently outlined in the literature [13••]. That is, no measure is sufficient in isolation; clinicians must thoroughly assess and monitor progress by synthesizing information from multiple assessment tools, across all of a child’s languages. Research in the past 5 years has shown diagnostic utility with a number of tasks/tools. A large proportion of the articles focused on morphosyntax [26%], rightfully so as morphosyntactic difficulties is a hallmark characteristic of developmental language disorder. The use of language sampling and dynamic assessments has continuously shown high diagnostic accuracy. The use of technology such as speech recognition has potential in aiding in the assessment process, especially when the clinician does not speak all the child’s languages. Lastly, the review of recent publications reiterates the need to assess all languages and urges the use of assessment tools developed specifically for multilingual children.

Despite all the advances, many studies still continue to compare multilingual to monolingual children. It is necessary to move away from such a viewpoint and adopt the view proposed by Oetting [14] to look at “disorder within diversity.” It is necessary to develop assessment tools with multilingual children for multilingual children as these children have unique linguistic experiences that should not be compared to monolingual experiences. This approach calls for the development of local norms such that assessments used with multilingual children are based on the language characteristics of the multilingual community in which the child resides and receives linguistic input from.

Future Directions

Many unanswered questions remain. Arguably, the most important next step is to ensure that the clinicians responsible for assessing children’s language skills, speech language pathologists, are properly trained on how to approach multilingual assessment. Thus, an urgent call to changes in training is required. Graduate speech-language pathology programs must provide future clinicians with opportunities to learn, across the entire curriculum, current best practices that are culturally and linguistically responsive and provide students with opportunities to interact and work with children who come from multilingual backgrounds.

Based on the recent articles, it appears that clinicians would be more likely to use culturally and linguistically responsive assessment tools such as dynamic assessment and non-word repetition tasks if standardized protocols were available. Thus, research on standardized dynamic assessment protocols is urgently needed as these tasks show excellent diagnostic accuracy. Language sampling is another method with excellent diagnostic accuracy given the rich linguistic information that can be obtained from this single task. Clinicians have significant time constraints; thus, further research is necessary on how to maximize the use of language sampling. Initiatives such as those proposed by Guiberson [34••] are initial steps that must be capitalized on, and standard protocols should be considered to give clinicians clear guidelines on specific measures that can be obtained efficiently from language samples.

Direct clinician assessment appears to result in best diagnostic outcomes. However, parent report appears to provide valuable information. As noted in the review of recent literature, further research is necessary to delineate the accuracy of parent report. Specifically, it is necessary to assess parents’ ability to rate the child’s language skills in the native versus non-native language(s) and to determine factors that may moderate this relationship such as parents’ own language proficiency and socioeconomic background.

There have also been great advances in the use of technology such as automatic speech recognition programs and tablet applications. Studies assessing these technologies are sparse, but such technologies have the potential to aid in the initial screening process. Imagine the use of such technologies in every pediatrician office and pre-school to identify children who are at risk as early as possible. This would allow children to receive early intervention services resulting in improved quality of life.

Limitations

The present review is limited to the literature within the past 5 years. Readers are directed to several review articles [2, 7, 46] spanning a little over a decade focusing on assessment recommendations for multilingual children. We also solely focused on language assessment. But assessment of speech sound production is also impacted by the number of languages a child speaks. Readers are directed to a recent tutorial focusing on this topic [47]. In the present review, we focused on multilingual children, that is, children who speak two or more languages. Language assessment of children who speak more than one dialect is also characterized with similar complications and similar assessment recommendations. Readers are directed to several recent articles [48,49,50] focusing on assessment of children who speak more than one dialect.

Conclusion

Based on literature within the past 5 years, language assessment of multilingual children should be approached using information from a variety of sources, that is, using converging evidence [13••]. Clinicians should understand that multilingualism is heterogeneous and no two multilingual children have exactly the same linguistic experiences. Therefore, assessment must be individualized. Assessment should take into account all the children’s languages, extensive background information on the language acquisition history, use, and exposure, both past and present. Clinicians should choose assessment tools that have been developed using local multilingual norms. Clinicians should also supplement norm-referenced standardized assessments with additional tools. Promising assessments include language sampling and dynamic assessment along with specific tasks such as non-word repetition. But most of all, clinicians should move away from comparing multilingual children to monolingual children and assessing disorder versus differences. We must move toward the “disorder within diversity” framework [14] as this calls for comparing multilingual children to their multilingual peers. In summary, as the most updated literature stands at the moment, culturally and linguistically responsive assessment should use converging evidence and assessment tools developed using local norms.