Introduction: The Death of General EAP?

In some settings, prospects for the future of general EAP (English for Academic Purposes) programming appears gloomy. At many English-medium universities, international students unable to present a required score on one of several standardized tests of English Language Proficiency (ELP; see Sect. 2.1) must complete pre-enrollment language support courses, often EAP courses of a general nature, meant to prepare for study in a broad range of academic disciplines. English as an Additional Language (EAL) students seeking university entrance can forego both the (often substantial) expense and time required to complete an EAP program to demonstrate proof of ELP through achieving a requisite score on a number of internationally recognized assessments. Student demand for these gatekeeping assessments has spawned a billion-dollar industry in testing and test preparation (Cavanagh 2015), and has given rise to criticism that an overemphasis on test preparation is undermining the “real business of learning the language” (Gan 2009, p. 25). While the verbs “IELTS-out” and “TOEFL-out” may not yet be in the Oxford Dictionary, many EAP instructors will immediately recognize the process which they describe: withdrawing from an EAP course (often already in progress) by obtaining a standardized test score that facilitates direct admission into a university program. In informal discussions, more than one international student has indicated that the private tutoring and multiple attempts required to finally achieve an IELTS band of 6.5 for direct entry was still a fraction of the price and far less time-consuming than even one semester of EAP (Personal communications, 2017). Given the costs associated with studying abroad, it is certainly understandable that international students would want to avoid both extending the length of the study program and incurring additional course costs. However, students’ “IELTS-out” strategy has left some EAP instructors demoralized and with questions surrounding professional identity (Personal communications, 2017). The authors have come across various institutional strategies to counter the “IELTS-out” phenomenon, including offering standardized test preparation workshops alongside—or in some cases even in place of—EAP curricula; closing admissions loopholes through the creation of a clause disallowing use of any other proof of ELP after enrollment in EAP; and intensifying marketing efforts to convince students that EAP is well worth the extra time and expense. In circumstances such as these, where EAP is positioned as a post-secondary admission gatekeeper alongside, or even in competition with, other measures of demonstrating ELP, it is easy to be pessimistic about EAP’s future.Footnote 1

A growing emphasis on discipline-specificity for postsecondary English language preparation seems also to present a challenge to the future of general EAP. Murray (2016, pp. 89–90) contrasts what he calls “generic EAP” with the virtues of an “academic literacies” approach. Among Murray’s characterizations of generic EAP are a “grounding in generic study skills” and a program “out of kilter with the notion of academic literacy as something tied to a particular domain of application”. So out of kilter, in fact, that Murray argues “it is likely that many will have to unlearn some of what they have absorbed in those programmes if they are to meet the requirements of their [post-EAP program] disciplines”.

Advocates of academic literacies for tertiary preparation assert the inadequacy of a “study skills approach” associated with general EAP, which is said to treat literacy, and in particular writing, as “an individual cognitive skill where the formal features of writing are learnt and easily applied across different contexts”, with typical emphases being “a focus on sentence structure, creating a paragraph and punctuation” (Sheridan 2011, p. 130). Murray labels this a “one-size-fits-all view of academic literacy” that “tends to dislocate those skills from particular disciplinary contexts” (2016, p. 85). Further unsettling to EAP professionals is the claim that such approaches are “generally constructed within discourses of deficit and remediation” (Henderson and Hirst 2007, p. 26).

If a student can “IELTS-out” of general EAP to gain university admission, and if, as its critics claim, general EAP by definition represents content disembodied from the academic disciplines it purports to serve, the future of such programs seems very much in doubt. It is within this context that the current study was undertaken to investigate the relative overlap of the two indicators of English language proficiency (ELP) most higher education institutions accept—standardized language tests (such as IELTS Academic, TOEFL, and others) and EAP courses—and student success in academic programs. Should EAP courses be interchangeable with standardized test results as determinants of required proficiency in the language of instruction, or, if as Murray suggests, generic EAP is problematic for preparing EAL students for academic programs, we would expect to see this reflected in the capacity of these ELP indicators to predict future student success.

Background

Defining ELP for University Admission

Whether driven by post-secondary institutions’ desire to increase internationalization, to broaden participation, or to receive the revenue generated through international students’ additional fees, energetic goals to further increase enrolment of international students abound in many countries. The government of Canada, for example, set a goal of doubling the number of international students at its post-secondary institutions in a 10-year period (Macgregor and Folinazzo 2017). Though BANA nations (Britain, Australasia, North America) are traditionally seen as study destinations for English-medium postsecondary education, an increasing number of other countries are competing for international students by offering degrees with English as a medium of instruction (Lee 2015; Macaro et al. 2018). This has increased institutions’ recognition of the necessity to both support language learning needs of international students, and quantify their ELP, despite the fact that ELP is typically only vaguely defined, with definitions varying widely even within the same institution (Murray and Hicks 2016). A bewildering number of standardized tests offer to provide such quantification, each with its own distinctive slant on what constitutes ELP. The developers of the TOEFL2 for example, acknowledge the challenges of defining ELP for the purposes of assessment encountered throughout that test’s evolution (Chapelle et al. 2008; Jamieson et al. 2008). Another widely used standardized assessment, the PTE2 “measures English language proficiency for communication in tertiary level academic settings” (Zheng and Dejong 2011, p. 3), but Zheng and Dejong’s discussion of PTE’s construct validity makes no attempt to explicitly define ELP. Murray’s (2016, p. 70) characterization of typical attempts at defining the ELP construct as “rather vague” and “catch-all” rings true.

As well as a lack of clarity around what constitutes ELP, further complication is introduced by the process through which institutions determine which standardized tests to use for admission, and the setting of particular cut scores from those tests. Uneven at best (Tweedie and Chu 2019), test selection and cut score identification is often done simply by referring to scores used by competitor institutions, or by referring to comparison tables, their questionable usefulness notwithstanding (Taylor 2004).

The institutional murkiness of what exactly defines language levels needed for post-secondary success is further compounded where completion of a pre-enrollment, gatekeeping EAP course can be used as proof of ELP. The emphases of these courses vary widely between institutions, and unlike degree courses whose transferability is specified in credit transfer agreements between universities, such EAP courses are in many cases non-transferable. Thus, attempting to pin down what graduates of a particular gatekeeping course can actually do in terms of language, or linking their ELP to particular standardized test scores, remains problematic.

Determining equivalency among assessments of ELP is also thorny. Assessments are scored on different scales, the minimum cut scores on assessments required for admission vary across institutions, and universities typically do not make explicit the criteria being considered when using a particular test. One large Canadian university, for example, requires an IELTS score of Band 6.5 or a PTE score of 61 for undergraduate admission (University of Alberta 2018), yet a band 6.5 on IELTS is equated by the test developers to a range of 58–64 points on the PTE Academic (Pearson Education 2017).

Treating assessments as equivalent, when they have been created utilizing different frameworks and/or even target constructs, invites misuse of the results (AERA APA and NCME 2014; Kane 2013). When test developers make differing claims for their assessments, it follows that the uses of results should also differ.

Operationalizing Academic Success: Why Grades Matter

Defining academic success engenders similar challenges as when attempting to define ELP, and multiple definitions have been put forward, suggesting inclusion of a wide variety of indicators. Alongside measures like Grade Point Average (GPA), proposed indicators of student success have included: scores on entrance exams; credit hours taken consecutively; time to degree completion; perceptions of institutional quality; willingness to re-enroll; post-graduation achievement; social integration into campus life; appreciation for diversity; adherence to democratic values, among others (e.g., see Kuh et al. 2006). York et al. (2015), in building upon the extensive literature review of Kuh and colleagues, propose that the definition of student success be a multidimensional one “inclusive of academic achievement, attainment of learning objectives, acquisition of desired skills and competencies, satisfaction, persistence, and postcollege performance” (p. 5). Of these multiple dimensions, we have limited our consideration for the purpose of this study to focus on GPA. Critics are quick to object to using grades as the only measure of academic success, given the breadth of experiences which constitute students’ post-secondary paths. We affirm the value of multidimensional means of benchmarking student achievement, but maintain that grades are an important measure for a number of reasons.

First, rightly or wrongly, grades are used by a wide variety of stakeholders as a measure of student abilities. Universities themselves consider grades when making decisions on admission, for both undergraduate and graduate program entrance. Course GPA is often a central criterion for getting one’s preferred study major. Organizations providing scholarship funding utilize grades as an important selection mechanism, and post-graduation, employers may factor university GPA into hiring decisions. It follows that the combined effect of the above gatekeepers would result in the central stakeholder, the students themselves, placing a high value on grades. York et al. acknowledge that their “constructivist method” of reviewing the success construct limits the inclusion of student and parent voices (2015, p. 9). We expect that including student and parent voices in defining academic success would strengthen the case for considering grades.

Further, some have argued that achievement of course learning objectives represents a more accurate depiction of academic success, since grades are only substitute measurements of actual learning. For this reason, York et al. (2015) argue for a separation between grades, attainment of learning objectives and achievement of skills and competencies when conceptualizing student achievement. In our view though, it follows that achieving the learning objectives of a course or program, and gaining the requisite skills and competencies, should lead to higher grades. We readily admit that actual knowledge (as opposed to assessed knowledge) is exceptionally difficult to quantify, and that, as York and colleagues assert, a student’s GPA is only a “proxy” measurement for what may have actually been learnt (p. 7). Such philosophical considerations notwithstanding, we are not optimistic that students, their parents, scholarship committees, university admission policy-makers or employers will, in the near future, opt for wholesale adoption of (more difficult to measure) actual learning over the more measurable, but admittedly proxy, learning that is reflected in grades.

Finally, as GPA is by far the most widely available measure of student performance to which researchers have access, it is conveniently comparable across institutions and contexts, making it highly useful.

Context of the Study

This enquiry took place at a large, research-intensive Canadian university, where English is the medium of instruction. All applicants to the institution must demonstrate ELP for direct admission, which, for international students, can be done by presenting a prescribed score on one of eight international standardized assessments: TOEFL iBT, TOEFL PBT, IELTS Academic, CAEL, MELAB, PTE, CAE, or PTE.Footnote 2 A ninth option is available: applicants who do not meet the requisite cut scores on one of these assessments may opt to enroll in the institution’s EAP program, which provides pre-enrollment instruction in general academic English. Students are placed in one of three levels through means of an in-house placement instrument, and complete three semester-length courses at each level: academic writing and grammar, reading comprehension and proficiency, and listening comprehension and oral proficiency. Learners attaining a grade of at least 70% in the program’s third level are considered to have satisfied the ELP requirement and are then admitted to the university.

Given that the institution accepts nine different means of demonstrating ELP for admission (the 8 standardized tests and completion of the EAP program), it stands to reason that comparability among measures should be subjected to scrutiny, as should any differential performance of instruments predicting student success. To the best of our knowledge, no previous study has ever investigated the comparability of all eight of these assessments with respect to their predictive capacity for student achievement, or compared their performance with EAP course results. Since many of the eight assessments are used for admission to English-medium universities around the world, we anticipate that the findings will have implications for a very large number of ELP test-users internationally.

Methods

This quantitative research study considered anonymized data for 1918 EAL students at a Canadian university, ranging from Fall semester 2010 to Fall semester 2016, as provided by the institution’s student services office. Participants (49.9% female, 49.8% male, 0.3% no response) constituted a multinational, multilingual, and multidisciplinary sample, with a total of 107 different nationalities and 19 different academic programs represented. Each student had completed: (i) at least one of the eight ELP tests officially accepted for entry into the university, and/or (ii) the EAP program at the institution. Finally, data included GPA for students’ first semester of study at the institution, final semester, or both. Cumulative GPA, however, was not available in the records provided.

Pearson correlation coefficients (r) were used to estimate the capacity of the different ELP tests and EAP courses for predicting student success in academic programs (both in aggregate and for each program). Correlations of determination (r2) were used when it was felt more beneficial to discuss the amount of variance in GPA which seemed to be determined by variance in a predictive variable (a specific ELP test or EAP course).

Despite the large number of participants, once broken into subgroups (e.g., students presenting CAEL results who completed EAP Reading), the sample sizes often became quite small or even zero. While there are no clear guidelines as to what an acceptable sample size is for a Pearson correlation calculation, the authors decided to follow David’s (1938) long-held and oft-cited recommendation of a minimum of 25. As such, only results for which sample size was greater than 25 are typically presented and discussed.

Results

Table 1 reports the descriptive statistics for academic program GPA, ELP tests, and EAP results. In the seven years of data provided, not one EAL student reported a CAE score as evidence of ELP, only six presented PTE scores, and none of these particular students had first or final semester GPA on record. In addition, very few students had computer-based (CBT) or paper-based (PBT) TOEFL outcomes (n = 16 and 20, respectively), which might be expected given the near-complete transition to the Internet-based TOEFL (iBT) over the past decade. As none of these instruments had a sample size of 25 or greater, they were omitted from further analyses.

Table 1 Descriptive statistics for academic program GPA, ELP scores, and EAP course results

Pearson correlations estimating the predictive capacity of each ELP test and EAP course, for first and final semester GPA, are reported in Table 2.

Table 2 Correlation between ELP indicator results and students’ first and final semester GPA

The CAEL failed to significantly predict student performance in either first or final semester of study. While the insignificant outcomes for the CAEL could, potentially, be blamed on the relatively small sample size (n = 31), the insignificant and/or weak predictive capacity found with regards to IELTS and TOEFL iBT results cannot. The IELTS Academic did not significantly predict first semester student performance (r = .054, p = .495, n = 163) and did so only weakly for final semester (r = .199, p = .011, n = 162). The TOEFL iBT was the only test to significantly predict GPA in both first (r = .246, p = .000, n = 341) and final (r = .218, p = .000, n = 340) semesters, though it did so weakly.

While none of the EAP course results demonstrated a strong association with academic program GPAs, results, overall, would seem to demonstrate better overlap than standardized test results. EAP Reading outcomes showed a significant, weak association with first semester performance (r = .246, p = .005, n = 127) and moderate with final (r = .386, p = .000, n = 126). EAP Writing results did not significantly predict first semester grades (r = .144, p = .107, n = 126) but significantly, weakly predicted final (r = .280, p = .002, n = 125). The EAP Listening and Speaking course results, meanwhile, showed a significant, weak correlation with first-semester GPA (r = .205, p = .021, n = 126), and significant, moderate relationship with final semester success (r = .326, p = .000, n = 125).

Discussion

One limitation of the study is that, despite the large dataset (n = 1918), once broken down by predictor (specific ELP test or EAP course result), many resulting cells had problematically small sample sizes (n < 25). However, efforts were taken to address only those outcomes with samples larger than 25 participants. Another possible limitation is the use of GPAs as the index for student success. It has long been noted, as a score out of 4.00, the relatively limited range of the measure likely contributes to (at least somewhat) muted correlation coefficients and, therefore, potential underestimation of overlap between predictors and actual student success. However, while noting this potential limitation, GPA is by far the most widely available measure of student performance to which researchers have access. It is also nearly universal in terms of its use as an estimate of student success and, resultantly, conveniently comparable across institutions and contexts.

Another potential limitation of the study, or more specifically the measures involved, is that any predictor, whether a test or an EAP course, will always be at least somewhat limited in its capacity to predict student success. No single instrument or process can, for example, address all of the skills, knowledge, attitudes, and behaviours which influence student performance in academic programs. However, it is also equally important to remember that this is, at the very heart of the matter, what institutions are doing with these tests and courses. They administer the tests to determine who is ready to succeed in academic programs now and who needs more tuition in language skills before they are likely to thrive. For students in the latter category, they are assigned to EAP courses specifically intended to improve the skills and knowledge required to succeed in academic programs. To this end, then, we should expect to see considerable (though certainly not absolute) overlap between variance in these test results and course outcomes, and variance in academic success. The skills, knowledge, and/or attitudes which determine success in future academic studies should at least be substantially addressed in the test accepted and programs on offer, or there is little point to their use.

As a final note, it is worth pointing out the surprising number of ELP instruments still officially accepted by the institution which were rarely (if ever) actually presented by incoming students. The CAE, for example, had no data whatsoever, meaning no students used this test to demonstrate ELP in the seven-year range of the data. Similarly, a total of six students presented a PTE result, 16 a computer-based TOEFL, and 20 a paper-based TOEFL outcome in order to gain entry to the university. These numbers are low enough to make discerning the effectiveness of the instruments in assisting placement decisions problematic or impossible. As a result, the institution may wish to consider removing them from their accepted list. One needs data to evaluate the usefulness of information, including how well evidence like a test score may or may not be contributing to beneficial (and high stakes) decisions about incoming students. Currently, the institution simply does not have enough data to know whether or not the evidence provided by these instruments is contributing (or would contribute) to decisions that are beneficial to incoming students and the institution. Reliance on equivalency research from the test developers themselves, without updated, institution-specific evidence of equivalencies, or without contextualized information on what a particular test score means, in our view is not sufficient justification for continued acceptance of a given measure as demonstration of ELP for admission.

As seen in the results section, the findings indicate little overlap between the skills and abilities measured by the standardized tests used for ELP by the institution and those which determine GPA in academic programs. Scores for the CAEL, for example, did not predict student GPA in students’ first or final semester of study. IELTS Academic results demonstrated significant but weak overlap with final semester of study only (r = .199, p = .011), and TOEFL scores showed significant, weak association with both first- (r = .246, p = .000) and final-semester (r = .218, p = .000) GPA. Put in terms of coefficient of determination (r2), this result indicates approximately 4% of variance in student success (in final semester of study) would seem to be influenced by the competencies measured by IELTS Academic. This is troubling, as the instrument is, after all, designed specifically for the purpose of determining who is capable of success in higher education, heavily and continuously researched and developed, and widely used across the globe. Other studies, however, have typically reported similarly problematic results for IELTS. Investigations finding a strong (e.g., Bellingham 1993; Harsch et al. 2017; Hill et al. 1999) or even moderate (e.g., Al-Malki 2014; Woodrow 2006; Yen and Kuzma 2009) relationship between IELTS outcomes and academic success are in the minority. Far more common are outcomes suggesting the relationship between IELTS and academic success are weak (Denham and Oner 1992; Feast 2002; Kerstjens and Nery 2000) or not significant (Bayliss and Ingram 2006; Cotton and Conrow 1998; Dooey and Oliver 2002; Ingram and Bayliss 2007).

While the TOEFL iBT scores did significantly overlap both first- and final-semester GPA, coefficients of determination indicate the skills (and other factors) influencing performance on the test only determine some 4 to 6% (r2 = 0.061 and r2 = 0.048, respectively). Here, too, we find that, despite the instrument’s design for assessing academic English competency, continuous and considerable research and development, and widespread international use, it is typically found to be a weak predictor of academic success. Cho and Bridgman (2012), for example, also found a 3% overlap between the TOEFL iBT scores and GPAs of 2594 university students. Wongtrirat (2010) found a very similar 3.5% estimate of determinance between TOEFL iBT and GPA in a meta-analysis of 22 studies conducted between 1987 and 2009.

The best predictors of EAL student success in academic programs, overall, would appear to be EAP courses. While EAP Writing predicted student success similarly to IELTS and TOEFL iBT results—not significantly for first semester GPA (r = .144, p = .107) and significantly but weakly for final semester (r = .280, p = .002)—EAP Reading and EAP Listening and Speaking courses were the only indicators which: (i) significantly predicted both first and final semester GPAs, and (ii) demonstrated moderate strength in doing so for at least one semester. EAP Reading course results showed an association with first semester success that was significant, though weak (r = .246, p = .005) and with final semester success that was moderate (r = .386, p = .000). Similarly, Listening and Speaking was found to significantly predict both semesters’ GPAs, doing so weakly for first semester (r = .205, p = .021) and moderately for final (r = .326, p = .000). Coefficients of determination suggest EAP courses overlap some 2 to 6% of first semester student performance, and 8 to 16% for final semester. While these percentages may not seem extremely strong, they are considerably higher than the best overlap (3 to 6%) with IELTS and TOEFL results, found not only in this study, but typical elsewhere as well.

The results also challenge the claim that generic EAP programs, given their limited direct connection to the literacies of particular disciplines, are of little utility for student success. In this study, EAP courses, despite their lack of discipline-specific content, substantially outstripped the predictive capacity of any standardized ELP test for student achievement.

Conclusion

The findings of this study underscore unsettling questions about institutional practices for benchmarking English Language Proficiency of prospective students. The results highlight the need for tertiary institutions to regularly evaluate which measures of ELP are accepted for admission, and provide justification for their use. In the case of the university considered for this study, seven years had passed without a single applicant using either of two measures (CAE; PTE) as proof of ELP. Upon what basis, then, can the institution consider these two assessments “equivalent” to other measures of ELP accepted for admission? This begs the question of how and why certain tests are accepted in perpetuity as proof of ELP without institution-specific evidence of being fit for purpose. We assert that an assessment purporting to benchmark ELP needs to be justified for its use in a given context, not just accepted on the claims of the test developers, or because it is accepted by competitor institutions. Future research may seek to make explicit what is now a largely opaque process: the means by which institutional policymakers arrive at specific benchmarks of ELP. Consultations with ELP instructors regarding measures of English Language Proficiency, though not widely utilized, represent a valuable resource for identifying what various benchmarks actually mean, and admission policy would benefit from such practice-informed discussions.

The data here may also warrant consideration by students. Certainly, opting to “IELTS-out” of EAP courses by means of an international assessment of ELP may translate into significantly shorter program length and therefore reduced financial costs. The findings of this study, however, indicate that an “IELTS-out” strategy may not necessarily translate into higher grades.

At the beginning of this article we sounded a note of pessimism about the future of generic EAP, given the many competing means available with which students can demonstrate ELP for university admission. The findings from this study of student achievement, however, suggest a more cautionary approach when predicting the end of general EAP.