Keywords

1 Introduction

1.1 Literature Review

This paper reports on an essay-writing study using a computer system to generate automated, visual feedback on academic essays. Students upload their essay draft to the system. The system has then been designed to offer automated feedback in a number of forms: highlighting elements of essay structure (in line with assessed elements identified in Appendix 1), key concepts, dispersion of key words and sentences throughout the essays, and summarising the essay back to the student for their own reflection. This is achieved through linguistic analysis of the essay text, using key phrase extraction and extractive summarisation, which is then fed through a web application to display the feedback. Thus the system can offer feedback based on single essays, and does not require a ‘bank’ of essays. We should emphasise at the outset that the purpose of our project was to demonstrate proof-of-concept rather than to produce a final system ready for commercial exploitation. Nevertheless, our findings demonstrate the potential value of automated feedback in students’ essay writing.

Within this paper we focus specifically on one of the visual representations: rainbow diagrams. Based on the concept of a reverse rainbow, “nodes” within the essay are identified from the sentences, with the nodes from the introduction being coloured violet, and the nodes from the conclusion being red. This produces a linked representation of how the argument presented in the essay develops and builds the key points (related to key elements of “good” quality and structure of an academic essay – see Appendices 1 and 2): outlining the route the essay will take in the introduction, defining key terms and identifying the key points to be raised; backing this up with evidence in the main body of the essay; and finishing with a discussion to bring the argument together. The resulting diagrammatic representation for a “good” essay should therefore have red and violet nodes closely linked at the core of the diagram, with other coloured nodes tightly clustered around and with many links to other nodes.

It has been well documented in the literature that visual representations can be powerful as a form of feedback to support meaningful, self-reflective discourse (Ifenthaler 2011), and also that rainbow diagrams produced from “good”, “medium” and prize-winning essays can be correctly identified as such (Whitelock et al. 2015). This paper goes one step further: to link the rainbow diagram structure to the actual marks awarded. Thus, the rainbow diagrams incorporate a “learning to learn” function, designed to guide users to reflect on what a “good” essay might look like, and how their own work may meet such requirements or need further attention.

From our analysis of rainbow diagrams and the marks awarded to essays, we will conclude that, to a certain degree, the quality of an academic essay can be ascertained from this visual representation. This is immensely significant, as rainbow diagrams could be used as one tool to offer students at-a-glance and detailed feedback on where the structure of their essay may need further work, without the concern of plagiarism of showing students “model essays”. This could equally support teachers in enabling them to improve their students’ academic writing. We begin by outlining the key principles of feedback practice, as highlighted in the research literature, before moving on to consider automated feedback as particularly relevant to the current study.

Feedback.

The system developed for this study is designed to offer formative feedback during the drafting phase of essay writing, which is different to the common practice of only receiving feedback on submitted work. Despite this unique feature of the system, it is important to review the purpose of feedback in general which underpins the technical system. Chickering and Gamson (1987) listed “gives prompt feedback” as the fourth of seven principles of good practice for undergraduate education. In addition, the third principle identified is “encourages active learning”. Therefore from this perspective, facilitating students to take ownership of and reflect on their work, through provision of feedback at the point when they are engaging with the topic and task, could have significant positive impact on students’ final submissions and understanding of topics.

Butler and Winne (1995) defined feedback as “information with which a learner can confirm, add to, overwrite, tune, or restructure information in memory, whether that information is domain knowledge, metacognitive knowledge, beliefs about self and tasks, or cognitive tactics and strategies (Alexander et al. 1991)” (p. 275). Thus the nature of feedback can be very diverse, but must have the purpose and perception of enabling learners to learn from the task they have just done (or are doing), and implemented in the task that follows. From this Butler and Winne concluded that students who are better able to make use of feedback can more easily bridge the gap between expectations, or goals, and performance.

Evans (2013) built on this notion of the student actively interpreting and implementing suggestions of feedback, in stating:

Considerable emphasis is placed on the value of a social-constructivist assessment process model, focusing on the student as an active agent in the feedback process, working to acquire knowledge of standards, being able to compare those standards to one’s own work, and taking action to close the gap between the two (Sadler 1989). (p. 102)

Also raising the importance of students as active agents in their interpretation of feedback, Hattie and Timperley (2007) concluded that “feedback is conceptualized as information provided by an agent (e.g., teacher, peer, book, parent, self, experience) regarding aspects of one’s performance or understanding” (p. 81). This therefore relates to what feedback is, but Hattie and Timperley went on to explain what it must do in order to be useful:

Effective feedback must answer three major questions asked by a teacher and/or by a student: Where am I going? (What are the goals?), How am I going? (What progress is being made toward the goal?), and Where to next? (What activities need to be undertaken to make better progress?) These questions correspond to notions of feed up, feed back, and feed forward. (p. 86)

Thus we can see from this that feedback must look at what has been done, but use this to provide guidance on what should be done next – feed forward – on how to improve current work and so reduce the gap between desired and actual performance. Any feedback that can support a student in understanding what needs to be done and how to do it, and motivating them that this is worthwhile, would be very powerful indeed.

Working along similar lines, Price et al. (2011) commented that, unlike a traditional understanding of feedback, feed forward has potential significance beyond the immediate learning context. For this significance to be realised however, a student must engage with and integrate the feedback within their ongoing learning processes. This often involves iterative cycles of writing, feedback, and more writing.

Gibbs and Simpson (2004) also commented that feedback must be offered in a timely fashion, so that “it is received by the students while it still matters to them and in time for them to pay attention to further learning or receive further assistance” (p. 18). The features of the technical system being developed in the current study, including the rainbow diagrams, would fit this requirement, since it is an automated, content-free system available to students at the time that they choose to engage with the essay-writing task. Thus, the onus is again on students to prepare work for review, and then to seek feedback on that work, and to implement their interpretations of that feedback.

Price et al. (2011) raised the dilemma, often felt by tutors, of the appropriate level of feedback to offer students:

“Doing students’ work” will ultimately never help the student develop self-evaluative skills, but staff comments on a draft outline may develop the student’s appreciation of what the assessment criteria really mean, and what “quality” looks like. What staff feel “allowed” to do behaviourally depends on what they believe they are helping their students to achieve conceptually. (p. 891, emphasis in original)

The rainbow diagrams offered in the current study provide a means to highlight key points of structure and progression of argument within students’ essays – identifying “what ‘quality’ looks like” in Price et al.’s terms – without having to pinpoint exactly how students should word their essays. This visual representation serves to show quickly where essay structure may need tightening, as well as where it is good – the underlying concept of what makes a good essay, as well as identifying how concepts are evidenced and developed in the essay – without spoon-feeding content or fears of plagiarism.

Having addressed the research on feedback, it is now appropriate to turn more directly to the literature on automated feedback.

Automated Feedback.

There has been widespread enthusiasm for the use of technologies in education, and the role of these in supporting students to take ownership of their learning. Steffens (2006), for instance, stated that “the extent to which learners are capable of regulating their own learning greatly enhances their learning outcomes” (p. 353). He also concluded that “In parallel to the rising interest in self-regulation and self-regulated learning, the rapid development of the Information and Communication Technologies (ICT) has made it possible to develop highly sophisticated Technology-Enhanced Learning Environments (TELEs)” (p. 353).

Greene and Azevedo (2010) were similarly enthusiastic about the potential of computer-based learning environments (CBLEs) to support students’ learning, but wary that they also place a high skill demand on users:

CBLEs are a boon to those who are able to self-regulate their learning, but for learners who lack these skills, CBLEs can present an overwhelming array of information representations and navigational choices that can deplete working memory, negatively influence motivation, and lead to negative emotions, all of which can hinder learning (D’Mello et al. 2007; Moos and Azevedo 2006). (p. 208)

This cautionary note reminds us of the potential of such technologies, but as with the need to offer instruction/guidance before feedback, students need to be given the necessary opportunities to realise how any tool – technological or otherwise – can be used to support and stretch their learning potential. Otherwise it is likely to be at best ignored, and at worst reduce performance and waste time through overload and misunderstanding.

Banyard et al. (2006) highlight another potential pitfall of using technologies to support learning, in that “enhanced technologies provided enhanced opportunities for plagiarism” (p. 484). This is particularly the case where use of technology provides access to a wealth of existing literature on the topic of study, but for students to make their own meaningful and cohesive argument around an issue they must understand the issue, rather than merely copying someone else’s argument. This reinforces the reasoning behind not offering model essays whilst students work on their assignments, which was one of the concerns as we were devising our technical system, but giving students feedback on their essay structure and development of argument without the temptation of material to be simply copied and pasted.

The opposite and hopeful outcome of giving students the opportunity to explore and realise for themselves what they can do with technologies can be summed up in Crabtree and Roberts’ (2003) concept of “wow moments”. As Banyard et al. (2006) explained, “Wow moments come from what can be achieved through the technology rather than a sense of wonder at the technology itself” (p. 487). Therefore any technology must be supportive and intuitive regarding how to do tasks, but transparent enough to allow user-driven engagement with and realisation of task activity, demonstrating and facilitating access to resources as required.

Also on the subject of what support automated systems can offer to students, Alden Rivers et al. (2014) produced a review covering some of the existing technical systems that provide automated feedback on essays for summative assessment, including E-rater, Intellimetric, and Pearson’s KAT (see also Ifenthaler and Pirnay-Dummer 2014). As Alden Rivers et al. identified, however, systems such as these focus on assessment rather than on formative feedback, which is where the system described in the current study presents something unique.

The system that is the subject of this paper aims to assist higher education students to understand where there might be weaknesses in their draft essays, before they submit their work, by exploiting automatic natural-language-processing analysis techniques. A particular challenge has been to design the system to give meaningful, informative, and helpful advice for action. The rainbow diagrams are based on the use of graph theory, to identify key sentences within the draft essay. A substantial amount of work has therefore been invested to make the diagrams transparent in terms of how the represented details depict qualities of a good essay – through the use of different colours, and how interlinked or dispersed the nodes are. Understanding these patterns has the potential to assist students to improve their essays across subject domains.

Taking all of these points forward, we consider the benefits of offering students a content-free visual representation of the structure and integration of their essays. We take seriously concerns over practices that involve peer review and offering model essays: that some students may hold points back from initial drafts in fear that others might copy them, and that other students may do better in revised versions by borrowing points from the work they review. On this basis, in working toward implementing the technical system under development, we have deliberately avoided the use of model essays. This also has the advantage that the system could be used regardless of the essay topic.

2 Research Questions and Hypothesis

Our study addressed the following research questions. First, can the structure of an essay (i.e., introduction, conclusion) and its quality (i.e., coherence, flow of argument) be represented visually in a way that can identify areas of improvement? Second, can such representations be indicative of marks awarded? This leads to the following hypothesis:

  1. 1.

    A rainbow diagram representation of a written essay can be used to predict whether the essay would achieve a high, medium or low mark. The predicted marks will be positively correlated with those awarded against a formal marking scheme.

3 Method

3.1 Participants

Fifty participants were recruited from a subject panel maintained by colleagues in the Department of Psychology consisting of people who were interested in participating in online psychology experiments. Some were current or former students of the Open University, but others were just members of the public with an interest in psychological research. The participants consisted of eight men and 42 women who were aged between 18 and 80 with a mean age of 43.1 years (SD = 12.1 years).

3.2 Procedure

Each participant was asked to write two essays, and in each case they were allowed two weeks for the task. The first task was: “Write an essay on human perception of risk”. The second task was: “Write an essay on memory problems in old age”. Participants who produced both essays were rewarded with an honorarium of £40 in Amazon vouchers. In the event, all 50 participants produced Essay 1, but only 45 participants produced Essay 2.

Two of the authors who were academic staff with considerable experience in teaching and assessment marked the submitted essays using an agreed marking scheme. The marking scheme is shown in Appendix 1. If the difference between the total marks awarded was 20% points or less, the essays were assigned the average of the two markers’ marks. Discrepancies of more than 20% points were resolved by discussion between the markers.

Rainbow Diagrams.

Rainbow diagrams follow the conventions of graph theory, which has been used in a variety of disciplinary contexts (see Newman 2008). A graph consists of a set of nodes or vertices and a set of links or “edges” connecting them. Formally, a graph can be represented by an adjacency matrix in which the cells represent the connections between all pairs of nodes.

Our linguistic analysis engine removes from an essay any titles, tables of contents, headings, captions, abstracts, appendices and references – this is not done manually. Each of the remaining sentences is then compared with every other sentence to derive the cosine similarity for all pairs of sentences. A multidimensional vector is constructed to show the number of times each word appears in each sentence, and the similarity between the two sentences is defined as the cosine of the angle between their two vectors.

The sentences are then represented as nodes in a graph, and values of cosine similarity greater than zero are used to label the corresponding edges in the graph. A web application uses the output of this linguistic analysis to generate various visual representations, including rainbow diagrams. Nodes from the introduction are coloured violet, and nodes from the conclusion are coloured red. As mentioned earlier, the resulting representation for a “good” essay should have red and violet nodes closely linked at the core of the diagram, with other coloured nodes tightly clustered around and with many links to other nodes.

We used our system to generate a rainbow diagram for each of the 95 essays produced by the participants. Without reference to the marks awarded, the rainbow diagrams were then rated as high-, medium- or low-scoring by two of the authors, according to how central the red nodes were (conclusion), how close they were to violet nodes (introduction), and how tightly clustered and interlinked the nodes were. Any differences between raters were resolved through discussion. (For detailed criteria, see Appendix 2, and for examples of high-ranking and low-ranking rainbow diagrams, see Fig. 1).

Fig. 1.
figure 1

Examples of a high-ranking rainbow diagram (left-hand panel) and a low-ranking rainbow diagram (right-hand panel) (Color figure online)

4 Results

The marks awarded for the 50 examples of Essay 1 varied between 27.0 and 87.5 with an overall mean of 56.84 (SD = 15.03). Of the rainbow diagrams for the 50 essays, 6 were rated as high, 17 as medium and 27 as low. The mean marks that were awarded to these three groups of essays were 67.25 (SD = 24.20), 56.29 (SD = 12.54) and 54.87 (SD = 13.67), respectively. The marks awarded for the 45 examples of Essay 2 varied between 28.5 and 83.0 with an overall mean of 54.50 (SD = 15.93). Of the rainbow diagrams for the 45 essays, 7 were rated as high, 10 as medium and 28 as low. The mean marks that were awarded to these three groups of essays were 65.36 (SD = 13.77), 54.70 (SD = 14.07) and 51.71 (SD = 16.34), respectively.

A multivariate analysis of covariance was carried out on the marks awarded to the 45 students who had submitted both essays. This used the marks awarded to Essay 1 and Essay 2 as dependent variables and the ratings given to the rainbow diagrams for Essay 1 and Essay 2 as a varying covariate. The covariate showed a highly significant linear relationship with the marks, F(1, 43) = 8.55, p = .005, partial η2 = .166. In other words, the rainbow diagram ratings explained 16.6% of the between-subjects variation in marks, which would be regarded as a large effect (i.e., an effect of theoretical and practical importance) on the basis of Cohen’s (1988, pp. 280–287) benchmarks of effect size. This confirms our Hypothesis.

An anonymous reviewer pointed out that the difference between the marks awarded to essays rated as high and medium appeared to be larger than the difference between the marks awarded to essays rated as medium and low. To check this, a second multivariate analysis of covariance was carried out that included both the linear and the quadratic components of the relationship between the ratings and the marks. As before, the linear relationship between the ratings and the marks was large and highly significant, F(1, 42) = 8.44, p = .006, partial η2 = .167. In contrast, the quadratic relationship between the ratings and the marks was small and nonsignificant, F(1, 42) = 0.41, p = .53, partial η2 = .010.

In other words, the association between the ratings of the rainbow diagrams and the marks awarded against a formal marking scheme was essentially linear, despite appearances to the contrary. The unstandardised regression coefficient between the ratings and the marks (which is based on the full range of marks and not simply on the mean marks from the three categories of rainbow diagrams) was 9.15. From this we can conclude that essays with rainbow diagrams that were rated as high would be expected to receive 9.15% points more than essays with rainbow diagrams rated as medium and 18.30 (i.e., 9.15 × 2)% points more than essays with rainbow diagrams rated as low.

5 Discussion

This paper has described a study exploring the value of providing visual, computer-generated representations of students’ essays. The visual representations were in the form of “rainbow diagrams”, offering an overview of the development and also the integration of the essay argument. We used essays that had been marked according to set criteria, and generated rainbow diagrams of each essay to depict visually how closely related points raised in the introduction and conclusion were, and how interlinked other points were that were raised during the course of the essay.

Essay diagrams were rated as high-, medium- and low-scoring, and these ratings were analysed against the actual marks essays were awarded. From this we found a significant relationship between essay diagrams rated as high, medium and low, and the actual marks that essays were awarded. We can therefore conclude that rainbow diagrams can illustrate the quality and integrity of an academic essay, offering students an immediate level of feedback on where the structure of their essay and flow of their argument is effective and where it might need further work.

The most obvious limitation of this study is that it was carried out using a modest sample of just 50 participants recruited from a subject panel. They were asked to carry out an artificial task rather than genuine assignments for academic credit. Even so, they exhibited motivation and engagement with their tasks, and their marks demonstrated a wide range of ability. Moreover, because the relationship between the marks that were awarded for their essays and the ratings that were assigned to the rainbow diagrams constituted a large effect, the research design had sufficient power to detect that effect even with a modest sample.

It could be argued that a further limitation of the current study is that it suggests potential of the rainbow diagrams and automated feedback system to support students in writing their essays, and also offers an additional tool to teachers in supporting their students’ academic writing – it has not however tested whether this potential could be achieved in practice. For this a further study would be needed, to address the effect of rainbow diagram feedback/forward on academic essay writing and performance. For this to be implemented, providing guidance to students and teachers on how to interpret the rainbow diagrams would also be essential.

6 Conclusions and Implications

These results hold great significance as a means of automatically representing students’ essays back to them, to indicate how well their essay is structured and how integrated and progressive their argument is. We conclude that having an accessible, always-ready online system offering students feedback on their work in progress, at a time when students are ready to engage with the task, is an invaluable resource for students and teachers. As the system is content-free, it could be made easily available for students studying a wide range of subjects and topics, with the potential to benefit students and teachers across institutions and subjects.

Feedback is considered a central part of academic courses, and has an important role to play in raising students’ expectations of their own capabilities. To achieve this, however, it has been widely reported that feedback must be prompt and encourage active learning (Butler and Winne 1995; Chickering and Gamson 1987; Evans 2013). Through the feedback process, therefore, students must be enabled to see what they have done well, where there is room for improvement, and importantly how they can work to improve their performance in the future (Hattie and Timperley 2007). This latter issue has brought the concept of “feed forward” (Hattie and Timperley 2007; Price et al. 2011), in addition to feedback, into the debate. Thus students need to be given guidance on task requirements before they commence assignments, but they also need ongoing guidance on how they can improve their work – which rainbow diagrams could offer.

There exists great potential for educational technologies to be used to support a large variety of tasks, including the writing of essays. One such technology is of course the system developed for the current study. As the literature relates however, it is critical that any resource, technological or otherwise, be transparent and intuitive of its purpose, so that students can concentrate on the learning task and not on how to use the technology (Greene and Azevedo 2010). This is when “wow moments” (Crabtree and Roberts 2003) can be facilitated: when students find the learning task much easier, more efficient, or better in some other way, due to how they can do the task using the technology – what they can do with the technology, rather than just what the technology can do (Banyard et al. 2006).

The rainbow diagram feature of the current system therefore offers a potential way of both feeding back and feeding forward, in a way that is easily understood from the visual representation. Students would need some guidance on how to interpret the diagrams, and to understand the significance of the colouring and structure, but with a little input this form of essay representation could be widely applied to academic writing on any topic. We have shown that the structure of rainbow diagrams can be used to predict the level of mark awarded for an essay, which could be a very significant tool for students as they draft and revise their essays. By being content-free the provision of rainbow diagrams is also free of concerns about plagiarism, a critical issue in modern academic practice with widespread access to existing material.

7 Compliance with Ethical Standards

This project was approved by The Open University’s Human Research Ethics Committee. Participants who completed both essay-writing tasks were rewarded with a £40 Amazon voucher, of which they were informed before agreeing to take part in the study.