1 Introduction: Purpose and the Testing Context of the CET-4

China has the largest number of English language learners in the world. It is estimated that more than 400 million Chinese learn English as a foreign language across the nation (Wei and Su 2012). Among them, around one-tenth of the learners are university students, and College English is a compulsory subject for all freshmen and sophomores in China (Cheng and Curtis 2010). Chinese university students take a variety of high-stakes international and national English tests to satisfy their different purposes and needs. In order to study overseas in an English-speaking country, they take TOEFL, IELTS, and the Pearson Test of English. In order to be competitive in job markets, they take TOEIC and Cambridge English Qualifications: Business. Apart from these international English tests, the most important large-scale standardized English test for Chinese university students at the national level is the College English Test (CET). The CET has two levels: level 4 (known as CET-4) and level 6 (known as CET-6), both of which are administered by the Ministry of Education in China (CMoE). While both tests examine the English proficiency of Chinese non-English major university students, CET-4 is compulsory, whereas CET-6 is not (Yang and Weir 1998). The compulsory nature of the CET-4 renders it to have high-stakes status, with around 10 million test takers annually (Zheng and Cheng 2008). CET-4 is developed by the National College English Testing Committee (NCETC) and is held twice a year in June and December. Currently, the test is open only to the currently enrolled undergraduates, and the registration of the test must be completed through the test takers’ universities.

First launched in 1987 following the implementation of the first College English Curriculum (CMoE 1985, 1986), the initial development of the CET-4 aimed to achieve two goals. The first goal was to provide an objective assessment and evaluation of a university student’s overall English proficiency. The second goal was to direct and to unify the College English teaching nationwide (Ma 2014).

Throughout its 31 years of administration and development, the CET-4 has undergone two waves of major reforms in terms of content, format, and scoring system (Jin 2017). The first wave of reform was implemented throughout the 1990s, and this wave involved three noticeable changes. The initial test items in the CET-4 were predominantly multiple-choice questions, which accounted for as much as 85% of the test items, and only 15% were essay writing for testing English writing skills. Although having a high percentage of multiple-choice questions has potential advantages in terms of objectivity in testing and efficiency for scoring for a large-scale test, and are appropriate for testing receptive skills, such as listening and reading skills, and the receptive knowledge of vocabulary and grammar, this format has apparent drawbacks for evaluating productive skills (Jin and Wu 2017). To solve this issue, the first change of the CET-4 was the inclusion of a variety of formats, including dictation of short English phrases and sentences, short answer questions, and translation from English to Chinese. The second major change concerned the scoring, with the revised scoring system emphasizing writing. A threshold was set for the writing subset, and not achieving the threshold in writing resulted in a penalty of the total score of the CET-4. If a student scored zero for writing, he/she had to retake the whole test, no matter what level of performance he/she had achieved for the remainder of the test. The third major change was the introduction of a spoken test in 1999, known as CET Spoken English Test (CET-SET) to assess students’ English-speaking skills (Zhang 2005). However, the CET-SET was not open to all candidates, as the test takers had to achieve an overall 80 out of 100 points in the test to be eligible for the CET-SET. The CET-SET had three-level grades: A, B, and C. Grades lower than C did not produce a report, as this indicated not having a sufficient level of English-speaking skills.

The second wave of reform of the CET-4 was launched in 2006 to meet the needs of the transformation of the 1999 version to the 2007 version of the National College English Teaching Curriculum and Syllabus and Requirements (CMoE 1999, 2007) as part of the large-scale project—Higher Education Undergraduate Level Teaching Quality and Teaching Reform by the CMoE (2006). The wave of the reform had two major purposes: (1) to meet “the pressing social need for college and university graduates with a stronger communicative competence in English” (Jin and Yang 2006, p. 21), and (2) to “maximize its positive backwash effect on teaching and beneficial impact on society” (p. 34). The reform was implemented by the NCETC, which constructed new tasks to replace some old ones in order to test contextualized English use, rather than the context-free of English language knowledge (NCETC 2006).

The reform included four major changes: (1) increasing the proportion of testing listening comprehension from 20 to 35%; (2) replacing the multiple-choice style of assessing English vocabulary and structure in single sentences with a contextualized cloze test; (3) adding fast reading to assess learners’ skimming and scanning abilities in reading longer English texts; and (4) replacing translation from English to Chinese with translation from Chinese to English. The structure of the reformed CET-4, including subsets/skills, contents, formats, proportions, and time distribution, is displayed in Table 4.1.

Table 4.1 The structure of the reformed CET-4

Apart from the revision of the test format, the scoring system has also been dramatically changed. In the previous scoring system, the test takers only received a certificate indicating if they achieved a pass (60) or distinction (85) out of 100 points. The new scoring system is norm-referenced; hence the scores show how an individual test taker has performed relative to the whole group. The maximum achievable score is 710 points, and the test takers receive a report of the total score and the scores of the subsets of listening (maximum achievable score is 248.5), reading (maximum achievable score is 248.5), writing (maximum achievable score is 106.5), and translation (maximum achievable score is 106.5). The new scoring system has also removed “pass” to prevent universities from using the CET-4 pass as a compulsory requirement for students to graduate and to prevent employers from using it as a mandatory selection criterion for job interviews.

These strategies aimed at reducing the pressure on the test takers and English teachers in the hope that the test can induce positive washback effects in College English learning and teaching (Jin 2010). Therefore, since the introduction of the second reformed CET-4, a few studies have been conducted to examine if the test has achieved its goal of stimulating positive effects. This review only covers the key studies on the washback of the reformed CET-4 (after 2006). The findings of the key studies are summarized, and the problems in terms of negative washback are outlined in the next section.

2 Testing Problems Encountered

In the language testing literature, washback is defined as “the effect of testing on teaching and learning” (Hughes 2003, p. 1). A test is able to produce both positive and negative washback on teaching and learning (Bachman and Palmer 2010). A search of the literature identified nine key studies, whose results showed a mixture of positive and negative washback of the CET-4 on Chinese students’ English learning and the instruction of College English by teachers. This review excluded the studies which only focused on investigating the washback of one of the subsets tested in the CET-4 (e.g., the listening subset: Hou and Wang [2008], Shi [2010], Wang [2010]; the speaking subset: Zhuo [2017]). The results of the nine key studies can be broadly categorized as the washback of the CET-4 on learning and on teaching. For each category, both positive and negative effects were observed and will be discussed in turn. A detailed summary of the nine key studies, including types of research, participants and data collection methods, and key findings, is presented in Appendix 4.1.

Concerning the washback effect on English learning, the reformed CET-4 produced three major positive effects. First, the CET-4 test not only enabled students to put much time and effort in preparing for it, but also motivated students to make extra efforts in learning English (i.e., Li et al. 2012; Shao 2006; Sun 2016). The survey in these three studies out of the nine key studies reported that the majority of students felt the positive side of having to sit for the CET-4 was that they were motivated to learn English. Second, apart from putting much effort into English learning, some students also reported that they felt CET-4 made them more aware of the goals of English learning and kept them focused on the goals (i.e., Li et al. 2012). Third, CET-4 encouraged students to use cognitive strategies and test management strategies in the test (i.e., Xiao 2014).

There were also three prominent negative effects of the CET-4 on English learning. To start with, in five out of the nine key studies, the results showed that students adopted English learning strategies oriented toward passing the test rather than strategies enhancing language use competence and developing English communicative abilities, in the processes of test preparation (i.e., Ren 2011; Sun 2016; Xiao 2014; Xie and Andrews 2012; Zhan and Andrews 2014). These test preparation strategies encouraged rote memorization of linguistic forms, such as memorizing English–Chinese bilingual vocabulary lists; using grammar-translation methods to learn English and to prepare for the test; and developing test-wise skills through practicing the mock or past CET-4 papers, by means such as analyzing test papers, rehearsing test-taking strategies, practicing sample test papers intensely, and memorizing the model English essays. Students also used past tests and sample CET-4 papers as their main sources to learn English. Furthermore, students put much effort into and gave much weight to practicing reading and listening, which were two major skills tested in CET-4. On the other hand, they neglected writing and speaking, because writing only accounted for a small proportion and speaking was not compulsory (i.e., Li et al. 2012). Last, apart from the negative effects on cognition, CET-4 also produced psychological effects detrimental to students, as the test aroused pressure and anxiety in their English learning (i.e., Li et al. 2012).

In terms of the washback on English teaching and teachers, three major positive and negative effects were identified. In terms of positive washback on teaching, firstly, a greater importance was attached to English teaching due to the compulsory nature of CET-4 (i.e., Gu 2007). Secondly, CET-4 promoted the implementation of the new version of the National College English Teaching Curriculum and Syllabus and Requirements in the College English instruction (i.e., Gu 2007). Thirdly, English teaching shifted from grammar drilling to developing communicative competence in English, and teachers tended to integrate the four skills in their instruction (i.e., Chen 2007).

The first major negative impact of CET-4 on English teaching was its narrowing of the content of instruction (i.e., Gu 2007; Ren 2011; Shao 2006). The teaching strategies, teaching materials (e.g., the sample test papers are exclusively used in teaching), and teaching activities (e.g., teachers only practice the skills tested in CET-4 but ignore the skills not tested in CET-4, such as spoken English) tended to be exclusively focused on when CET-4 was approaching. As a result, teachers were unwilling to develop students’ communicative competence, and the teaching lacked creativity. In the time period before CET-4, the normal schedule of English teaching was always disrupted, and teachers found it difficult to follow the syllabus and to complete the contents in the English textbooks.

Furthermore, CET-4 also negatively affected the content of classroom assessment (i.e., Ren 2011). The assessment tasks were predominantly designed to resemble those in CET-4, resulting in a lack of formative assessment of English learning. Last but not least, CET-4 cast a negative influence on teachers and teaching (Chen 2007; Gu 2007; Ren 2011). The results of students’ performance on CET-4 were used heavily to benchmark the quality of English teaching and were given much weight for English teachers’ promotion, which created considerable pressure for teachers.

3 Solution/Resolution of the Problems

Some possible solutions are proposed in order to mitigate the negative washback produced by the CET-4. (1) The quality assurance body of the universities should use a wider range of indicators to gauge the quality of College English teaching and learning rather than solely relying on the outcomes of the CET-4. (2) The CET-SET should be integrated as a compulsory subset for all the test takers so that English-speaking skills are no longer devalued in College English teaching and learning. (3) The weights of some subsets of the tests should be modified. In particular, the writing subset should be given more weight, and the proportion of the Chinese to English translation section should be reduced to discourage using rote memorization of vocabulary lists. (4) The NCETC should consider developing a test format which integrates the assessment of the four skills, so that communicative competence can be emphasized in English learning and teaching.

4 Insights Gained

To mitigate the negative effects generated by CET-4, the National College English Testing Committee (NCETC) may need to consider the following aspects. Most importantly, due to the importance of communicative competence in English learning, testing the speaking and listening proficiency of learners cannot simply be ignored. While the current CET-SET is optional, how to properly integrate it into the CET-4 and make it compulsory remains a critical issue for the NCETC to solve in order for the speaking and listening skills to be appropriately assessed. Accordingly, such implementation may direct the curriculum of college English teaching to place much more emphasis on training students’ communicative competence rather than predominantly focusing on reading and writing per se. Secondly, since the current use of the CET-4 results are closely linked with benchmarking teaching quality, the contents and formats of teaching tend to be greatly impacted by the contents and format of CET-4. Hence, the National Educational Examinations Authority should consider making a policy to regulate how the test results should be used in the universities to minimize the negative influence of CET-4 on normal English teaching.

5 Conclusion: Implications for Test Users

This review indicates that there is a lack of nationwide research on the washback of the reformed CET-4 since Gu (2007). Considering the high-stakes status of the test, it is suggested that the NCETC should conduct large-scale studies on the washback of CET-4 so that the intended use, formats and contents of the test may be modified to address negative effects. The NCETC should also make the interpretation of the test scores transparent by possibly including qualitative and descriptive references as to what a test taker “can do” in English and/or the subskills of English for a particular range of scores and/or sub-scores by a test taker (Jin 2011).

This review also has important implications for stakeholders of the CET-4, including government, employers, and universities. It is risky to rely solely on the scores of the CET-4 as a means to gauge English learning and teaching. Students and teachers should develop strategies to develop English competence in the long run, which may in turn facilitate achieving a desirable level of performance in CET-4.