Keywords

Introduction

In the twenty-first century the use of information and communications technology (ICT) has emerged as a new trend in language education (Matsumura and Hann 2004). The latest Mother Tongue Languages Review Committee Report by Singapore Ministry of Education in 2010 has highlighted the use of ICT to facilitate self-directed learning as students with diverse Chinese language proficiency levels could initiate learning activities at their own level and develop personal knowledge and skills in a technology-enhanced instructional environment (Benson 2007; Sinclair 2000; Smeets and Mooij 2001; Warschauer 2000). Learners who enjoy a high degree of autonomy are more likely than otherwise to put efforts in learning and exploiting language knowledge, which then contribute to language development (Little 2002).

The need for promoting self-directed learning according to different learners’ needs and characteristics is clear in the Singaporean context. Over the past decades, the proportion of primary students from predominantly English-speaking homes has risen from 36% in 1994 to 59% in 2010 (Mother Tongue Languages Review Committee Report 2010). This trend has led to the situation that an increasing number of Singaporean students learn Chinese as a second language (L2) , and they enter the classroom with diverse proficiency levels. These students need individualized instruction and support in language use during the writing process, which is, however, not feasible in a class of more than 20 students.

While students need individualized feedback , responding to student papers can be a challenge for teachers. Particularly, if they have a large number of students or if they assign frequent writing assignments, providing individual feedback to student essays might be time consuming. As teachers in Singapore typically teach several classes of about 30 students each, the amount of work to be graded often limits the number of writing assignments teachers can offer to students. Moreover, providing accurate and informative feedback on the language use of student writing requires a certain degree of linguistic proficiency and knowledge that are conceived or possessed differently by teachers as some of them developed their expertise without being explicitly taught grammar (Johnson 2009; Johnston and Goettsch 2000). Furthermore, although instant corrective feedback on linguistic errors has been found beneficial to L2 learners (Ellis 2001; Ellis et al. 2006; Lyster 2004; Russell and Spada 2006), it is often not feasible to supply individualized feedback in class because students have different levels of language proficiency.

To support the teaching and learning of Chinese writing for students of different levels of language proficiency, Singapore Centre for Chinese Language Footnote 1 has developed a prototype of an automated essay marking system . This system aims to detect linguistic errors in Chinese writing and provides corrective feedback on language use, including Chinese characters , lexical collocations, and grammar . The target users are higher primary Chinese language learners in Singapore. The prototype system integrates a user interface and a linguistic analysis module to perform the automated essay marking process. An essay can be submitted through the user interface for marking by the linguistic analysis module and then returned to users instantly with error marking and corrective feedback in terms of language use. The system has great potential to enhance the teaching and learning of Chinese writing by providing students with individualized feedback and reducing teachers’ workload in marking and correcting linguistic errors. This system will also be scaffolding students’ writing process by providing prompt feedback at the linguistic level to encourage autonomous revision.

The system relies on information from a training corpus plus three databases, i.e., lexicon, grammar, and lexical collocation, to achieve automated detection and correction of linguistic errors . For Chinese characters, the system can circle out incorrect characters and display the correct ones. For lexical collocation, the system flags incorrect collocations and lists out the common collocates that a particular word has based on the corpus so that users could have a better understanding of the usage of a particular word. At the sentence level, the system underlines an ungrammatical sentence and provides metalinguistic feedback ,Footnote 2 namely a simple, short explanation of a rule along with an example sentence.

The marking accuracy and speed of the prototype system has been evaluated using randomly selected, authentic narrative essays from Primary 3 to Primary 6 students from various schools in Singapore. According to the results, the system has achieved an accuracy rate of around 80% in detecting linguistic errors in intermediate-level essays on common topics while the accuracy rate for high- and low-level essays is around 70%. Error deduction includes misses (errors not recognized by the system) and false positives (correct usages identified as errors) with the former higher than the latter. As for the processing speed, it generally takes less than 1 min. to process a Chinese essay of 500 characters. Time increases with the length of the sentences in the text.

The present study aimed to investigate students’ perceptions of the automated essay marking system developed by Singapore Centre for Chinese Language . Students’ feedback was collected as part of the efforts to evaluate the prototype system and to further improve it in order to ensure that the intent of the system was fulfilled and that all the features laid out in the development phase could be successfully implemented in Singapore’s primary schools. A questionnaire and focus group survey were conducted to identify the specific factors that might influence the perceived effectiveness of and satisfaction with the system. This would help the researchers identify the strengths and weaknesses of the system from the perspectives of target users. Students’ feedback was integrated into the continuous improvement of the system.

The following literature review first summarizes the most common automated writing evaluation (AWE) programs, which are developed for English learners, and then examines studies on learners’ perceptions regarding the effectiveness of AWE software.

Literature Review

Writing ability is directly related to language proficiency as good writing must conform to the conventions of grammar and usage of the target language (Frodesen and Holten 2003). For L2 learners, especially those with low language proficiency, writing often appears as a daunting task as they lack vocabulary and grammar to produce the required work. The process approach to writing instruction views writing as a process rather than a product and emphasizes revision and feedback as essential aspects of the process (Weigle 2002; Flower and Hayes 1981).

A major challenge to learners with low Chinese proficiency is that their limited knowledge of vocabulary and grammar hinders the development of writing skills (Cumming and Riazi 2000). In addition, these learners need substantial support to improve their linguistic accuracy in writing as they often make errors in language use and yet have difficulties in recognizing and correcting the errors (Bitchener et al. 2005; Ferris 2002, 2006). The best way to prevent error fossilization is to receive feedback from an instructor, revise based on the feedback, and then repeat the whole process as often as possible (Hyland 2003). Students need to consciously pay attention to the errors they have made in order to recognize the gaps between the correct form and their own usage (Schmidt 1990, 1993; Sheen 2010).

Automated Writing Evaluation

The needs for students to receive writing practice and feedback at their own level and for teachers to increase effectiveness have raised the importance of computer-assisted AWE . Research has revealed that computers have the capacity to function as an effective cognitive tool (Attali 2004). AWE is a computer technology that aims not only to evaluate written work but also to offer essay feedback (Shermis and Barrera 2002; Shermis and Burstein 2003). AWE systems are developed to assist teachers in low-stakes classroom assessment as well as graders in large-scale, high-stakes assessment . Moreover, the systems could help students to review, redraft, and improve their text easily on a word processor before the work is submitted to the teacher or published online for peers. While students can work autonomously, teachers can focus on reviewing students’ final drafts and providing instructor feedback (Warschauer and Grimes 2008). In other words, AWE tools are designed to support student learning and to supplement teachers and graders rather than to replace them.

The most widely used AWE programs are mainly developed for the English language, including Project Essay Grader (PEG) , Intelligent Essay Assessor (IEA) , E-rater , and Criterion, as well as IntelliMetric and My Access! Table 11.1 summarizes the four AWE programs.

Table 11.1 Summary of widely used AWE programs

These AWE programs are typically able to provide holistic scoring and diagnostic feedback , which require topic-specific training as the systems evaluate a new essay by comparing its linguistic features to the benchmark set in the training corpus. It is worth mentioning that the use of AWE programs has moved from a summative to a more formative assessment (Shermis and Burstein 2003). Criterion and MY Access! are two instructional-based AWE programs that support process writing and formative assessment by allowing students to save and revise drafts based on the feedback and scoring received from the computer and/or the teacher. This leads to a paradigmatic shift from a teacher-centered assessment toward a learner-centered evaluation. While the product approach views writing assessment as a summative practice, the process approach views it as a formative practice (Weigle 2002). The process-oriented AWE applications guide students through essay drafting and revision before submitting the final version, which could not otherwise take place inside a classroom.

Research has demonstrated that AWE could facilitate essay revision and that revision based on corrective feedback is beneficial to L2 learning. Attali (2004) found that the use of Criterion led to significant improvement in student writing during the five revisions in terms of the total holistic score (from 3.7 to 4.2 on a six-point scale) as well as the scores in organization, style, grammar, and mechanics. Students were able to significantly reduce the error rates by improving ungrammatical sentences, incorrect words, and mechanical errors that had been identified by the system. Organization and coherence of the revised essays were also enhanced by adding discourse elements. Sheen (2007) found that corrective feedback on language use resulted in improved accuracy in immediate writing tests compared to no correction. Furthermore, compared to direct correction, metalinguistic feedback had a greater positive effect on intermediate learners’ performance in delayed writing tests. This suggests that supply of comments or information related to the well-formedness of sentences without explicit correction could be beneficial to L2 acquisition as the indirect approach engages learners in a deeper cognitive processing (Lyster and Ranta 1997). The results in Sheen (2007) also revealed a significantly positive correlation between students’ improvement in writing accuracy and their language analytic ability.

Overall, L2 studies have shown that students need immediate feedback to support their writing processes (Hyland and Hyland 2006). Also, corrective feedback facilitates the acquisition of linguistic features and helps to improve the overall quality of essays (Bitchener et al. 2005; Ellis et al. 2008; Ferris 2003, 2004, 2006; Lyster and Ranta 1997). AWE programs have great potential to enhance learning and teaching processes by pointing out the weak aspects of student writing as early as possible. In light of the previous research, the marking system focused on detecting linguistic errors and providing instant corrective feedback to help improve students’ linguistic accuracy in writing based on the individualized feedback . The system served as an aid, rather than a replacement, for human evaluation as it took care of the language part and allowed the teacher to focus on other important aspects of an essay, such as content, organization, and style.

Learners’ Perceptions of Automated Writing Evaluation

While many studies have reported the effectiveness of AWE software in improving L2 learners’ essay quality and L2 accuracy (e.g., Bitchener et al. 2005; Ellis et al. 2008; Ferris 2003, 2004, 2006; Lyster and Ranta 1997), few have focused on learners’ perceptions of AWE use. Learners’ perceptions could affect their use of the AWE software and eventually their learning outcomes (Dörnyei 2001; Gardner 1972; Wigfield and Wentzel 2007). If learners are not motivated to use the tools, very little learning will take place in the long run. It has been suggested that learners’ perceptions of the possible benefits of technological tools, such as accessibility and enhancement of learning, could increase their motivation (Beauvois and Eledge 1996; Gilbert 2001; Warschauer 1996a, b). Furthermore, successful implementation of AWE in classroom settings depends on factors beyond high reliability or agreement between system and human evaluation. Technology can only be effective if learners’ needs are met in various learning contexts. This highlights the importance to investigate the AWE effectiveness from learners’ perspectives.

Grimes and Warschauer (2010) investigated learners’ use of the AWE program My Access! in middle schools through interviews, surveys, and classroom observations. Immediate feedback was generally perceived as the most valuable benefit of the AWE software and was of greatest use in correcting errors in mechanics (spelling, punctuation, and grammar). Learners corrected errors in response to automated feedback but made little revision on content and organization, probably because they were unable to view their own writing critically. Furthermore, the usefulness of AWE was perceived differently by learners at various proficiency levels. Intermediate-level learners benefited most from the AWE software as they were still trying to master the mechanisms of writing and yet had sufficient knowledge to understand system feedback. In contrast, learners who lacked the necessary language and computer skills could not make effective use of the AWE software. Importantly, whether AWE encourages revision is affected by complex factors. Students revised more when writing for authentic audiences, when having the awareness of both meaning and surface revisions, and when giving ample time for revision.

Chen and Cheng (2008) explored students’ perceptions using My Access! as a pedagogical tool in three college writing classes. The AWE program was perceived as only slightly, or even not, helpful to writing improvement, largely due to the limitations of software design and the way it was implemented in class. Most students did not trust computer-generated scores due to discrepancies between automated scores and instructor/peer assessment results. The AWE program favored lengthiness and formulaic writing styles, thus failing to assess the content and coherence of the writing and also restricting the expression of ideas. Similar to the findings in Grimes and Warschauer (2010), automated feedback was perceived helpful in reducing language use problems as it allowed immediate identification of errors in L2 writing. However, My Access! was not helpful in improving essay content and organization because it was unable to provide concrete and specific comments on the global aspects of writing.

In addition to the limitations in software design, the ways of implementing AWE significantly influenced the perceived effectiveness. Students’ perceptions were more positive if the program was utilized as a self-evaluation tool followed by teacher assessment. In this case, automated feedback was used to assist students in improving their writing at the drafting stage rather than to assess them. This might promote students’ self-confidence and encourage reflection on writing before final submission to the teacher. Moreover, teacher support was essential for effective implementation of AWE. Teacher feedback could complement automated feedback , and sufficient guidance was necessary as students learned to use the program. The implementation of AWE must also take students’ proficiency level into consideration. Chen and Cheng (2008) noted that automated feedback seemed most helpful to learners who needed assistance on the formal aspects of writing.

Although research on students’ perceptions of AWE is limited, it has been shown that the effectiveness of AWE depends upon the interactions among multiple factors, including learner characteristics as well as program design and implementation. Also, there has been a shift of focus from the assessment function of AWE to the assistance function, particularly when the program was used in classroom settings. AWE is not intended to replace teachers but to support them. As Grimes and Warschauer (2010) state,

[The] benefits [of AWE] require sensible teachers who integrate AWE into a broader writing program emphasizing authentic communication, and who can help students recognize and compensate for the limitations of software that appears more intelligent at first than on deeper inspection. (p. 34)

As a relatively new approach to writing instruction, the effectiveness of AWE is inconclusive and needs to be evaluated from the perspective of different learner groups in different contexts. As revealed in the literature review, the studies on AWE programs have mostly focused on English learners at the college and middle school levels. Clearly, further studies that investigate the use of AWE programs by younger and other language learners are warranted. The present study attempted to contribute to the literature on learners’ perceptions of AWE by examining L2 Chinese in the Singapore primary school context as an under-investigated area.

Method

Participants

Participants were two classes of fifth-grade students (mean age = 11 years) at two primary schools in Singapore . All the students in these schools followed the regular stream of Chinese language education, where Chinese was taught as a single subject from the first grade, while all the other subjects were instructed in English. The two classes differed in the level of Chinese language proficiency. Class A was at the low-intermediate level and took the normal Chinese lessons. Class B was at the high-intermediate level and took the higher Chinese lessons.Footnote 3 There were 28 students in Class A and 25 in Class B. According to the teachers’ reports, these students mainly spoke English at school, and they used either English or a combination of English and their mother tongue at home.Footnote 4 All the students in Class B were ethnic Chinese, whereas Class A had 5 non-ethnic Chinese students who had little exposure to the Chinese language outside the class.

The Automated Essay Marking System

Given the benefits of AWE software in facilitating essay revision as reviewed above, the present study implemented an automated essay marking system that targeted L2 Chinese learners. The automated essay marking system used in the study is composed of a linguistic analysis module and a user interface. The development and implementation of the system involved forefront school teachers to ensure the practical applicability of the system.

Linguistic Analysis Module

The system architecture comprises two major components: a user interface and a linguistic analysis module. The linguistic analysis module adopts a corpus-driven approach to error detection and requires training on an annotated corpus of Chinese texts. The analyzer uses machine learning to exploit information from three databases— i.e., lexicon, collocation, and grammar—in order to achieve error detection and correction. The lexicon database contains the Chinese lexical items drawn from the training corpus. The grammar database includes syntactic rules as well as grammatical and ungrammatical sentence patterns, which enable the system to detect grammar errors. Language use problems are identified based on the corpus-induced grammars plus probabilistic parsing. Unlike a system that employs a broad-coverage grammar aiming to describe any well-formed structures in Chinese (Baldwin et al. 2004), the linguistic analyzer is trained with a corpus containing written Chinese data commonly engaged by higher primary students in Singapore and thus aims to capture common correct and incorrect usage in Singapore students’ writing. The collocation database supports error detection in three common types of collocations: verb-noun (V-N), adjective-noun (A-N), and classifier-noun (CL-N). In addition to the information from the three databases, lexical co-occurrence frequencies extracted from the training corpus are incorporated into system processing.

The training corpus is specialized to cover language use of the target users rather than to exhaust linguistic rules in the Chinese language. The annotated corpus has 3,000,000 Chinese character tokens, including authentic student essays as well as texts from Chinese textbooks, student newspapers, and storybooks. The student essays represent various levels of language proficiency and cover common writing topics in order to maximize the coverage of language use in different text types. Textbook material, student newspapers, and storybooks are included as part of the training data because these are main sources of Singapore students’ linguistic knowledge, and they are common sources of training data for error detection and correction systems (De Felice and Pulman 2008). The corpus is dynamic as it can be further expanded and continuously updated with new data.

User Interface

The marking system provides a student interface and a teacher interface with functions to facilitate an array of user tasks.

Through the student interface, users can submit an essay by typing, uploading, or copying and pasting it. For automated marking, the essay will be returned to the user with error markings and corrective feedback on language use. Alternatively, the user can submit the essay to the teacher for commenting after revising it according to system feedback. While a marked essay is displayed as the original version, the user can select the markup option to view the error markings and corrections in Chinese characters , lexical collocation , and grammar together or separately. Also, the user can place the cursor on the text to view teacher comments, which appear as pop-ups. For revision, the user can edit the original essay in one window while viewing the marked version in another window, which provides a convenient way of referring to system and/or teacher feedback.

The student interface facilitates data storage, tracking, and retrieval. Users can search and access all the comments and feedback on their own writing. They can also view linguistic errors they have made in order of frequency and retrieve a cumulative listing of all the instances where these errors appear in their essays. The annotated and searchable database of linguistic errors provides an opportunity for users to notice patterns of their own errors that might persist across several essays spanning a long period of time. In other words, the configuration supports a systematic display of student errors that makes it easier to see what specific error types occur frequently, which is usually not feasible under traditional pen and paper corrections with red marks representing simply a raw accumulation of errors.

The teacher interface supports single or batch upload of text files. Teachers are allowed to manually change system feedback before the texts are sent to students. Teachers can also mark a portion of a text with a comment by simply typing or choosing a comment that has been stored in a comment database from a drop-down menu. The feedback can be either displayed or hidden when the texts are returned to students. In the teacher interface, a variety of reports can be generated, including a consolidation of student errors as well as tracked records of individual students and/or a whole class. Teachers are able to access student data any time for formative assessment and to easily monitor student progress online. The error compilation feature allows the teacher to generate a list of linguistic errors made by students in order of frequency or error types for further instruction. This feature informs the teacher about students’ learning difficulties and provides a holistic view of student’ performance.

Procedure

Prior to the commencement of the study, the teachers of the participating classes attended a one-hour training session with the researchers, which explained the functions as well as the capabilities and limitations of the essay marking system. They were prepared to implement the system in class and solve problems for students. Afterward, the students received training on how to use the marking system, including typing, submitting, and revising/editing their essays on the system. During the training process, the students were encouraged to try the system as much as possible and to ask questions, which were immediately answered by the teacher or researchers. The training continued till the students were able to use the system independently. Meanwhile, the students were informed that the main purpose of the marking system was to assist them to improve language accuracy in their draft and that a score and comments on the content would be given by the teacher after the final version was submitted.

After the training, the students were required to write a 500-word essay within 50 min. and submit it to the system for marking. The writing requirements were the same as those for their regular assignments. Once the essay was submitted, the students received feedback within a few minutes, including markings of language errors and correction suggestions. The essay was submitted to the teacher once the revisions were completed. Throughout the process, the teacher responded to student inquiries and provided individual assistance regarding the use of the system if necessary. After the writing activity, a questionnaire survey was administered, followed by individual interviews that focused on students’ perceptions of the effectiveness and design of the marking system as well as their attitudes toward using the system to complete a writing assignment.

Data Collection and Analysis

The data were obtained through a questionnaire with a 4-point Likert scale ranging from 1 (“strongly disagree”) to 4 (“strongly agree”), followed by individual interviews by the researchers. The questionnaire and interviews were administered in both Chinese and English depending on the student’s preference. The questionnaire contained ten Likert scale questions (as shown in Tables 11.2, 11.3, 11.4 and 11.5) and one open-ended question requesting free comments on the marking system (i.e., “What do you think about the essay marking system? Please provide any comments you have about the system.”). The Likert scale questions asked about the perceived effectiveness, overall satisfaction, attitude toward using the system, as well as reactions toward system feedback. In total, 53 students (28 at the low-intermediate level of Chinese proficiency and 25 at the high-intermediate level) responded to the questionnaire.

Table 11.2 Perceived effectiveness of using the essay marking system
Table 11.3 Attitude toward using the essay marking system
Table 11.4 Reactions toward system feedback
Table 11.5 Overall satisfaction with the essay marking system

To gather more in-depth insights on the questionnaire survey results, face-to-face interviews were conducted with eight of the students from each class. The students participating in interviews were randomly selected and were all volunteers. Each interview lasted approximately 20 min. The interviewees were asked to talk about their experience with the essay marking system, their opinions regarding the benefits and limitations of the system, as well as their willingness of using the system. The interviews were structured around the following questions: Overall, how do you feel about the essay marking system? How do you think the system can help you with your writing? What do you think are the strengths of the system? What do you think are the drawbacks of the system? Are you willing to use the system at school and at home? Why or why not? The interview results were used to further illustrate the questionnaire findings.

Results and Discussion

Perceived Effectiveness of Using the Essay Marking System

According to the questionnaire survey, the essay marking system was generally perceived as an effective tool. As shown in Table 11.2, more than 80% of the students from Class A (low-intermediate proficiency level/LI level henceforth) and Class B (high-intermediate proficiency level/HI level henceforth) agreed that the system was helpful for their writing improvement and for Chinese learning. Around 10% of the students from the two classes had negative reactions to the marking system. Some of these students only wrote a few sentences due to their slow typing speed and/or low motivation for writing and thus had little experience with system feedback. Some of them had difficulty understanding the metalinguistic feedback at the sentence level, namely a short explanation of a rule along with an example (as illustrated belowFootnote 5) and were unable to correct the errors marked by the system.

Rule:

地点放在动作前面。‘Location is placed before the action.’

Example:

他在客厅看电视。‘He watched TV in the living room.’

Moreover, since the students only had the opportunity to make a single round of revisions before submitting their work to the teacher, some noted that the lack of immediate feedback on their hypothesized corrections was a barrier to learning. The implementation of the automated marking system needed to take into consideration the trial and error nature of L2 learning. In short, the perceived effectiveness of technology use depended on learner readiness, including typing skills and language proficiency, as well as the way the program was implemented in class.

Attitude Toward Using the Essay Marking System

As shown in Table 11.3, the majority of students liked to use the system to improve their language use in writing. They thought the system was convenient, fast, and easy to use, allowing them to receive timely feedback. A few students mentioned that they became less worried about the language aspect of writing as they had a chance to correct errors before submitting their work to the teacher. Furthermore, Class B (HI level) had a more positive attitude toward using the marking system compared to Class A (LI level). About 25% of the students from Class A disliked automated feedback, which was probably due to two reasons. For one thing, the metalinguistic feedback (i.e., an explanation of a rule plus an example) might not be comprehensible to the students who did not reach a certain level of Chinese language proficiency. Several students from Class A (LI level) indicated in the interview that direct correction would be more useful as they had difficulty correcting errors based on the metalinguistic suggestions. For the other, the students’ attitude might be negatively affected by the limitation of the marking system itself. When a sentence contained errors, the system underlined the whole sentence and provided the relevant structure(s)/rule(s). It was, however, unable to specify the exact location of errors in an ungrammatical sentence. Thus, the students had to first understand the rule(s) and then apply the knowledge to identify errors and rewrite the sentence. The error correction task became more demanding when a sentence involved multiple grammatical errors, which was common in the writing of lower-proficiency students. The delay in access to the target form might have a negative impact on students’ learning attitude and might cancel out the potential cognitive benefit of metalinguistic feedback (Chandler 2003).

As for the incentive to use the marking system in class and after school, more than 80% of the students from Class B (HI level) responded positively (see Table 11.3). The reasons were probably that the system was easy to use and that prompt feedback was perceived helpful in enhancing the language accuracy, as discussed above. Unlike Class B (HI level), only 60% of the students from Class A (LI level) were willing to use the system after school. The interview data revealed three factors that might affect students’ motivation to use the system. First, students might prefer handwriting over typing because they were not used to typing Chinese on a computer and sometimes typing interfered with their writing. Second, students might consider the system as a support tool to complete writing assignments in class but not as a tool for self-learning outside the class, which might, in part, be due to the fact that the system was only used for in-class writing in this study. In addition, students might prefer to have assistance from the teacher and be relatively less motivated to engage in Chinese learning activities outside the class.

Reactions Toward System Feedback

As indicated in Table 11.4, more than 80% of the students from both classes agreed that error markings could help them notice language errors in their writing. As the students might lack the ability to identify their own errors, an overt and clear indication of errors could draw their attention to new features of the L2 (Schmidt 1990, 1993). The students became aware of the gap between their usage and the target form. However, it was found that sometimes system detection was inaccurate and thus might lead to confusion. Solving the problem would require an increase in the detection accuracy of the prototype system.

The students from Class A (LI level) and B (HI level) had different opinions regarding the ease of understanding system feedback. There was an overall positive response from Class B (HI level), where 92% of the students thought the feedback was comprehensible and could help them correct language errors immediately. In contrast, 32% of the students from Class A (LI level) did not think that the feedback was easy to understand. As discussed above, students might need to reach a certain level of Chinese proficiency in order to benefit from the metalinguistic feedback , as the error correction process required awareness of grammatical rules as well as the ability to apply the rules to construct new sentences. Some students indicated that the system suggestions might be easier to understand if presented with English translation and/or more explanations. This suggested that the automated feedback might be insufficient or incomprehensible to them.

Overall Satisfaction with the Essay Marking System

The questionnaire survey showed high satisfaction with the ease of use and smooth operation of the marking system (see Table 11.5). As confirmed by the interview, the students generally agreed that they could submit their work and receive feedback in just a few steps and that the system interface was intuitive with clear buttons. According to Davis (1989), the ease of use of a technology will influence learners’ intention in using it and hence their learning effectiveness. When learners perceive a technology as easy to use, they will be more likely to accept it and find it useful with respect to improving their performance, especially for those with weak learning motivation (Huang et al. 2011; Liu et al. 2010). Smooth operation is also important for a computational system as bugs or defects will decrease its effectiveness (Holland et al. 1995).

While the system was perceived as easy to use and smooth, the students had much lower satisfaction with its functions, as indicated in Table 11.5. About 39 and 28% of the students from Class A (LI level) and B (HI level), respectively, disagreed that the system had complete functions. They pointed out the need for additional learning aids, including access to an online dictionary, model essays, written comments on essay content, and even games. In addition, the interface design was not sufficiently attractive and motivative to them. The findings raised important issues for the development and implementation of an essay marking system. For one thing, students must be aware of the specific purpose of using technology in learning. In this case, the marking system was designed to address language errors, and thus giving comments on essay content would be beyond the scope of the system. Students should have a clear understanding of what the technology was capable of doing before using it. For another, the development of the marking system should be based on the basic nature of learning. It has been suggested that computational aids for language learning should be not only easy to use but also intrinsically motivating and enhancing the user experience (Nokelainen 2006; Norman 2002). From this point of view, the marking system needed to be further improved in order to create an enjoyable, supportive, and aesthetically pleasing environment for writers.

General Discussion

This study investigated fifth-grade students’ perceptions of the automated essay marking system developed by Singapore Centre for Chinese Language . The purpose was to gather user feedback for further improvement of the system. The study added to the limited literature on the use of an automated system as a tool for formative feedback on the language aspect of writing. The questionnaire survey and interview results indicated that the marking system was generally perceived as effective and helpful in improving language accuracy , with the advantages of being easy to use and prompt in providing feedback. The timely feedback could promote noticing of L2 form and reduce the anxiety about making errors during the writing process.

Some may argue that error correction does not necessarily lead to proper use of the target form in future and that it may result in a negative attitude toward writing (e.g., Truscott 1996). For one thing, L2 development is a gradual process of trial and error, and corrective feedback provides opportunities for learners to notice the target form and to test linguistic hypotheses. As demonstrated in Ferris and Roberts (2001), error marking helped learners self-edit their text compared to no feedback. For the other, corrective feedback would not discourage or demotivate learners if used as a formative tool. Similar to Chen and Cheng (2008) and Grimes and Warschauer (2010), the present study showed evidence that immediate feedback was considered valuable, especially if utilized as an assistant tool that helped learners improve their work at the drafting stage of writing. The provision of automated feedback was based on a non-judgmental and process-oriented approach that could reduce writing anxiety caused by the fear of teachers’ negative feedback and the lack of linguistic knowledge (Leki 1999).

While there was a generally positive perception of the automated marking system , the students’ attitude might be affected by several factors, including their Chinese proficiency level, cognitive ability, as well as the presentation of metalinguistic feedback. First of all, it is important to note that the study only included non-advanced, young learners of Chinese. Thus, the findings might not be generalizable to other proficiency and/or age groups. In fact, those who have mastered the formal system of the target language might benefit more from content-focused feedback than from form-focused feedback in an AWE learning environment (Chen and Cheng 2008). Moreover, metalinguistic feedback was perceived less useful among the low-intermediate students compared to the high-intermediate students. The finding is in line with the argument that direct correction could best work with elementary learners as it offers immediate access to the target form (Ferris 2006). The lack of sufficient and comprehensible information to resolve errors might lead to confusion (Ferris and Roberts 2001; Leki 1991). In contrast, metalinguistic feedback has been found effective mostly for improving L2 accuracy of advanced or post-intermediate learners with high language analytic ability (e.g., Bitchener and Knoch 2010b; Bitchener et al. 2005; Sheen 2007). Given that most of the previous studies were conducted with adult university learners, it is possible that metalinguistic feedback imposes different cognitive demands on primary-level learners whose cognitive abilities are still in process of development. Further research is thus required to understand how cognitive factors impact the effects of corrective feedback . In addition, while the metalinguistic feedback was provided in Chinese on the automated marking system , some students might prefer English translation as evidenced in the interview data.

The students’ perceptions of the marking system might also be influenced by the implementation and the limitations of the system. The necessity of providing different types of support to suit learners’ needs raises critical issues in the implementation of such a system. First, the use of the technology needs to take into account learner characteristics and learning goals. The students in this study were fifth-graders who had learned to compose essays since the third grade. Most of them still needed assistance in the formal aspects of writing. Also, they were aware that reducing language errors in their writing could help them achieve a better score on school tests. Before the essay marking system was employed in class, the students were informed that the major purpose of using the system was to facilitate their revising process and that the teacher would give comments and assign a score after the final version had been submitted. In this case, the students understood that they were not to be assessed by a machine, and the use of the marking system met their learning goals to some extent. The system might have been perceived differently if implemented with other learner groups or in other learning contexts.

Furthermore, teacher support could compensate for the limitations of the automated marking system and thus increase its effectiveness. While the students worked individually with the system, the teacher was available to answer questions from the students. This might be particularly important for lower-proficiency students who had difficulty understanding the automated feedback . As the marking system was only able to underline an ungrammatical sentence and provide metalinguistic explanations of rules, the teacher could help to pinpoint errors and clarify a grammar point that could not be explained clearly in generic feedback. Also, the automated responses might not be at the students’ level and were not targeted at specific ideas to be conveyed. Therefore, teacher input was necessary to address individual writing problems and to alleviate confusion. In other words, the marking system served as a supplement to teachers rather than as a replacement of them (Ware 2005; Warschauer and Ware 2006).

Another limitation of the marking system was its inability to give specific and individualized comments on essays, as some students pointed out in this study. While the system took care of language accuracy, the teacher could attend to the content and offer focused advice regarding the strengths and weaknesses of each student’s essays. According to the teachers participating in this study, before using the marking system, they used to spend a great deal of time and effort providing corrective feedback on students’ errors, and it was impossible to do so in class due to time constraints and classroom management concerns. Thus, the students did not receive feedback until a few days after submitting their work. The delayed feedback in fact might not be of much use as the students had already lost the motivation to reflect on and improve their writing (c.f. Evans et al. 2011; Hartshorn et al. 2010). The integration of machine and human feedback allowed for the delivery of timely feedback on linguistic form during the writing process and facilitated teacher response to meaning construction and individual learning needs of students.

Conclusion

This study explored fifth-grade students’ perceptions of utilizing an automated essay marking system that provided corrective feedback on lexical and grammatical errors in Chinese. The system was not aimed to eliminate human elements during the essay marking process but to allow teachers to place more attention on the content and other important aspects of student writing by reducing their time and effort investments in error marking and correction. While this study is the first that investigates the implementation of an automated Chinese writing system in Singapore’s primary schools, it has limitations that can be addressed in future research.

First, the system design was preliminary, and further improvement in the accuracy of error detection and the use of metalanguage was necessary. Any changes in system features or processes might lead to different user perceptions. Second, the effectiveness of the marking system was investigated only through students’ perceptions after a single writing session. Further research could examine the short-term and long-term impact of automated feedback on student’s writing improvement by comparing their essays before and after revision. It is also valuable to conduct longitudinal studies that track changes in students’ awareness of and attitude toward self-editing through automated feedback. In addition, it was unclear how the system effectiveness would be influenced by different pedagogical designs, learning contexts, as well as individual teacher and student factors. A complete understanding of these variables and their interactions will help to guide the implementation and maximize the benefits of similar technologies.