Keywords

1 Introduction: Evaluation in a Postdigital Context

In their paper on ‘ecological teaching evaluation’, Fawns et al. (2020) argued that ‘datafied’ market-driven evaluation practices privilege summative judgements of quality over the formative development of teachers and teaching. In this chapter, we consider how online postgraduate educators might move towards those authors’ ecological view, proposing ‘thick descriptions’ as a promising approach to understanding not only the quality of already-run courses, but also how to improve future educational designs and practices in relation to particular purposes and values. Our main focus in this chapter is on the evaluation of courses, although we recognise that in higher education (HE), online course evaluation is itself an aspect of programme, curriculum and institutional evaluation. Course evaluation also inextricably links with the evaluation of teaching, though contemporary evaluation often uses proxies for teaching (Fawns et al. 2020) and the teacher’s actual work is likely to be invisible (Hayes 2019). Our aim is to counter this marginalisation of teaching, beginning with a question about why that marginalisation happens.

1.1 Whose Purposes Are Prioritised in Evaluation?

Like assessment of students (see Hounsell 2021, this book), the evaluation of teaching and courses can be formative (for learning and development) or summative (for accreditation, ranking, continuation) or a mix of both. Beyond those broad purposes, there are potentially many others (see, for example, the ‘evaluation utilisation terms’ in Onyura 2020). Moreover, different interests in the quality of online postgraduate courses and teaching might be categorised as pedagogical, aspirational or commercial. Although varying evaluation techniques can support a wide range of purposes, it is important not to lose sight of whose interests are predominantly served by a particular method and what it is able to reveal (Biesta 2009). For us, this is crucial to the focus of this book in situating online postgraduate education within its wider context.

Those who have the power to commission evaluations are likely to prioritise their own purposes and needs, which may not be the same as other stakeholders. Stakeholders in HE include, among others: university management and administration, governments, funding bodies, employers, commercial organisations, parents and partners. Our concern is that the interests of two key stakeholders in the evaluation of postgraduate online courses—teachers and students—are not currently prioritised. Rather than involving them in evaluation that empowers and improves teaching and learning (Fetterman et al. 2010), other more powerful stakeholders prioritise economic, informational and accountability needs.

For example, ‘accountability’ has been a watchword in higher education since the 1980s, considered by many writers as evidence of loss of trust in the sector and a move towards management control. Whereas few would deny the importance of accountability in its vernacular sense of being responsible, what we have been seeing is its more technical use: ‘the duty to present verifiable accounts’ (Lorenz 2012: 617). Harvey and Williams (2010) have pointed out that accountability does not tend to lead to improvement; indeed, a quarter of a century ago, Trow (1996) showed how accountability can lead to what we would now call ‘gaming’ the system. We have lost the sense of to whom we might be accountable and why. Accountability may simply be shaped by and restricted to the needs of the commissioning stakeholder. There is now a dearth of important pedagogical insights in the HE evaluation data available to us. Instead, we see the prioritisation of other values such as retention rates, showing a disproportionate emphasis on the needs of stakeholders other than students and their teachers.

Formal course evaluation data are generally collected centrally by HE institutions. Recently, the main sources of evaluative information about teaching have been standardised satisfaction surveys and output measures, such as grades, retention and future salary (Biesta 2009; Fawns et al. 2020). This approach suits aspirational and commercial interests, highlighting supposedly ‘excellent’ components of education. These discrete elements become aggregated for league tables—ranked lists of groups, individuals and institutions—which are now influential in all aspects of society (Esposito and Stark 2019). They are also often used in comparison studies of educational methods, technologies, or student demographics. Herein, they are employed to make claims about the ‘effectiveness’ of courses, as well as to further market institutions and programmes. Although it can be gratifying and even useful to know that a league table positions one’s university in the top 100 globally, that does not say much about why the course one is teaching is regarded so highly. Perhaps the only clue is a number: an averaged percentage of ‘satisfaction’ awarded to the course by students. Such claims are ‘thin’ descriptions of practice that lack the detail or nuance that is critical for course development and support for teaching.

Our chapter calls for the interests of teachers and students not to be subordinated to those of other stakeholders. This entails producing evaluative information about our postgraduate online courses beyond the measurement of discrete components and reductive compiling of the results. It centralises the pedagogical and formative value of evaluation in a culture of trust. Later, we will propose thick description as a move towards an ‘ecological’ perspective on teaching evaluation (Fawns et al. 2020) that counteracts the tendency to view educational variables in isolation as items to be used in creating ranked lists of ‘excellence’ or pitting one educational approach against another.

1.2 How Has Datafication Affected Course Evaluation?

The university sector’s adoption of digital technology to support not only teaching but also its evaluation has offered access to hitherto unimaginable data, enthusiastically deployed by university administration and management. Yet that potential coincides with a period when evaluation in higher education has apparently been increasingly driven by both market positioning (Gourlay and Stevenson 2017) and compliance with university and national governance (Erickson et al. 2020). ‘Datafied’ approaches serve these concerns well, opening up evaluation to the interest of powerful and wealthy commercial organisations aiming to sell applications that will shape our understanding of teaching and learning (Williamson 2017). Williamson’s exploration of the emergence of the new discipline of education data science brings out its underpinning assumptions:

This psycho-informatic approach treats mental life and learning as if they could be known mathematically and computationally, and, having been made measurable, as if they could then be enhanced or optimized. (Williamson 2017: 106)

Simple numeric measures may lend the sheen of science to evaluation processes (Hanson 1993), particularly when bolstered by seemingly objective and systematic uses of technology to perform complex, opaque mathematics on proxies of quality. Yet they reduce our appreciation of the interplay between educational elements, thus diminishing our ability to apply results to new contexts (Biesta and van Braak 2020). They may obscure not only the educational practices involved, but also the context and theoretical understandings underpinning those practices (McLaughlin and Mitra 2001). These complex points can be illustrated through an example, where the appeal of simple numbers has tempted people to seek clear, quantifiable conclusions about online learning.

  • In an article written in March 2020, Jonathan Zimmerman made a plea to use the sudden shift to online teaching brought about by the Covid-19 pandemic, supported by masses of available online data, as a ‘Great Online-Learning Experiment’ that could settle a contested debate once and for all:

  • at institutions that have moved to online-only for the rest of the semester, we should be able to measure how much students learn in that medium compared to the face-to-face instruction they received earlier. (Zimmerman 2020, emphasis added)

We believe that this example reflects a view of education that leads to unwarranted expectations of what data can tell us. At least five questionable assumptions underpin Zimmerman’s request:

  1. 1.

    The modality for delivering education is responsible for the educational outcomes of learners.

  2. 2.

    The modality can be isolated as a variable for scientific study.

  3. 3.

    The pandemic provides an ideal setting for a controlled experiment on the virtues of classroom vs. online learning.

  4. 4.

    The indicator of merit in the evaluation (the ‘evaluand’) is a summative outcome of how much is learned.

  5. 5.

    It is possible to measure this evaluand.

In March 2020, when Zimmerman wrote this piece, buildings in schools and universities were closing in many countries and ‘solutions’ had to be found to this crisis caused by the pandemic. Potential solutions (e.g. a particular software system or platform) may have been brought in without sufficient understanding of how that technology shapes and is shaped by the setting in which it is introduced (Enriquez 2009; Fawns 2019). The technology should not itself be seen as fully responsible for any outcomes, positive or otherwise, though it will certainly have some influence.

There have already been many comparison studies between online and on-campus learning, in the main concluding that there is no significant difference, with various implications erroneously drawn (Lockee et al. 2001). Importantly, a lack of significant difference between outcome measures should not be interpreted as ‘there is no difference’ between the two categories being compared. Rather, it is an inconclusive result derived from an invalid assumption: that a modality can be isolated as a ‘variable’ for scientific study, with learning seen as a dependent variable. This fails to take into account other variables that together affect teaching and learning, which would include student characteristics and circumstances, pedagogic activities and many other factors. Zimmerman is repeating this error from the now widely-discredited media comparison studies.

In the particular context that Zimmerman wanted to exploit, there were even more variables than usual affecting what is happening in classrooms and online. Williamson wrote, in a blog post critiquing Zimmerman’s idea:

Treating a pandemic as an experiment in online learning reduces human suffering, fear and uncertainty to mere ‘noise’ to be controlled in the laboratory, as if there is a statistical method for controlling for such exceptional contextual variables. (Williamson 2020)

Related to this, the pandemic has also brought many forms of inequality to light which cannot now be ignored (Czerniewicz et al. 2020). At a simple illustrative level, ‘online learning’ will be experienced differently by students with laptops and students having to share a single mobile phone with parents and siblings—and some do not even have access to that.

It is also important to consider what would be evaluated here. Broadly, it is ‘online learning’ vs. classroom learning, but Zimmerman’s specific focus is on ‘what is learned’. Soon after Zimmerman’s article appeared, Hodges et al. (2020) suggested that the approaches that emerged in the Covid-19 pandemic should be named ‘emergency remote teaching’ rather than online learning, and evaluated accordingly with a focus on the context, input and process as well as the product (following Stufflebeam and Zhang 2017). We also suggest that ‘products’ might include potential harms alongside potential benefits, and these harms would not show up in quantified measurements of learning (see Stone et al. 2021, this book; Bussey 2021, this book, for powerful examples of such potential harms).

The final assumption we have identified from ‘Zimmerman’s experiment’ is that this evaluand—the summative learning from two different modalities—can actually be measured. This view of learning is particularly associated with a cognitivist paradigm of education, focused on memory and retrieval (see Baker et al. 2019 for an overview of contemporary paradigms of education). This paradigm puts a strong emphasis on testing, and results of tests are likely to be regarded as a proxy for learning. Thus, in our judgement, Zimmerman’s proposal for an experiment is to test what students can remember during a period of education in a crisis, and to attribute to technology any differences from the quantity of things they could previously remember. The resulting data would show whether classroom or online learning is ‘better’. We disagree with this suggestion and the assumptions on which it is based, but our inquiry has encouraged us to further explore the notion of proxies for learning and teaching.

1.3 What Counts as Learning and Teaching in Contemporary Course Evaluation?

Counting or measuring learning is far from straightforward. We might be able to count retention of basic facts, and ignore the myriad purposes and values of education. We could use grades from in-course assessments, assuming that these assessments (which are part of the courses being evaluated) will generate ‘accurate’ measures of learning. The late philosopher Gilbert Ryle critiqued preoccupation with retention as a ‘very thin and partial notion of teaching and learning’, involving ‘the forcible insertion into the pupil’s memory of strings of officially approved propositions.’ (Ryle 2009a: 467). Measurement is further complicated by different understandings of the concept of learning (Hodkinson et al. 2008). As other chapters of this book show, there are many different forms of learning (Boyd 2021; Hounsell 2021; Lee 2021; Jones 2021; Marley et al. 2021), and many factors that influence learning beyond the methods or modality of a course. Indeed, the purposes and challenges of education can be obscured by the very emphasis on learning. Biesta has argued that ‘learning’ is a term that ‘denotes processes and activities but is open—if not empty—with regard to content and direction’ (Biesta 2009: 39).

Measuring the quality of teaching is equally problematic. Institutions tend to adopt similar approaches to evaluating teaching or courses, regardless of the context. For example, in online postgraduate taught (PGT) education, context, needs, methods and outcomes are importantly different from other forms of education (Aitken 2020), yet very similar approaches are taken to evaluation. Governments, in particular, are keen to standardise measurement of teaching quality. In the UK, for instance, the demand for accountability gave rise to Teaching Quality Assessment (TQA) in the 1990s, followed by several other organisations and initiatives, most recently the Teaching Excellence Framework (TEF) in 2017. Many scholars have critiqued these initiatives (Gourlay and Stevenson 2017) highlighting problems with evaluating only what is easily measurable, and contesting the implicit notions of ‘excellence’ and underpinning ideologies. Our own main concern is that this might reduce important qualities of teaching to meaningless or questionable claims. Statements such as ‘This is excellent teaching’ or ‘students learn more with online learning’ are thin and empty without accompanying details about contexts, contents, roles, and mechanisms.

A complex situation such as student learning affected by a change of approach during a pandemic is certainly worth evaluating. Rather than trying to discount, simplify or control the complexity, it would be better to incorporate it into our thinking about ‘developmental evaluation’ (Patton 2010). We offer a perspective that recognises this.

2 An Ecological Perspective on Higher Education

Fawns et al. (2020) propose that an ecological perspective would help to capture the complexity of the activity and contextual factors that make up educational programmes. An ecological perspective is a way of understanding how students are connected to all of the various elements contributing to their learning: tasks, ideas, tools, objects, environments, people––as well as their own previous and current experiences. We should explore how this assemblage functions in concert, rather than attempting to simplify it to suit available but limited measurement processes. A more holistic approach to evaluation will avoid the problems described above of standardised, fragmented education, thin and partial descriptions, and inappropriate proxies for learning and/or teaching. The ecological perspective provides a way of seeing past the instrumental views that still beset accounts of learning environments, particularly in relation to the use of technology in education (Damşa et al. 2019).

This alternative perspective helps us see that it is not methods (e.g. lectures vs. problem-based learning) nor modality (e.g. online, hybrid, or on-campus) that are the main determinants of quality, nor even of outcomes (see also Onyura et al. 2016). Quality is determined by the situated activity, and interaction and interrelation of teachers and students, within the context and infrastructure of the institution. It emerges from the particular designs, scaffolds and supports that help students make use of available methods, resources and affordances; it can be found in spaces where students couple their cultural and academic backgrounds with the conventions and culture of the course. The quality of any particular element is then understood as part of a context-dependent set of relationships with the others. This view is radically at odds with the claims that technology is either a neutral tool for achieving particular pedagogic goals (instrumentalism), or the main determinant of what will happen (determinism) (Hamilton and Friesen 2013). It also steers us away from those competitive but inappropriate comparisons between online and on-campus classrooms.

2.1 Developing Quality and Evaluation in an Ecological Way

To adopt an ecological perspective, we need approaches to evaluation that allow shared understanding and purpose, both of the course itself and the evaluation of it. We need to know the context of the evaluation, and underlying assumptions—pedagogic, institutional, disciplinary and social—affecting working practices and material conditions in that context. The manifestation of quality (the evaluand) emerging from the activities, interactions and relationships within the context is likely to depend on a number of factors, which we attempt to tease out below, starting with evaluative approaches we can see in other chapters of this book.

A key issue is that students’ experiences of their courses can change over time. Boyd (2021, this book) gives the example of an activity done in one course, that students return to and build on in the following year. The value of each activity looks different when considered across both courses. Lee’s (2021) chapter of this book shows how students do not always become aware of their underlying motivation to learn until they have had certain transformative experiences. Both of these chapters show the importance of taking development into account: there will be different results from the same evaluand at different times. Additional complexity arises when courses are viewed in the light of a full programme and beyond. In Marley et al. (2021, this book), Jeremy Moeller, an online PGT graduate, describes the value of his programme manifesting over a number of years.

‘I completed the programme in 2016, but I reflect on it often. Some of the benefits of the programme were only apparent to me years later.’ ([PAGE])

Moeller explains that familiarising himself with online methods, environments, forms of interaction, and the programme culture, took time. Early evaluations (‘I was not comfortable at all with the weekly discussion sessions… they felt slightly loose and unstructured to me’) gave way to later ones, where he had come to appreciate the different approach taken and how it related to his online context. Later still, he came to understand how certain educational principles could have value across modalities, and he began to emulate aspects of the approach taken by his online programme in his own on-campus teaching.

These examples indicate that the timing of an evaluation will affect the information it gives us. What can seem like a negative ‘outcome’—discomfort with discussions—may simply indicate a stage of development. The information can still be useful in considering any steps that need to be taken to support a student, scaffold an activity, or signpost what is happening. But the (informed) judgement may be that no remediation is necessary at all at this stage. There are several implications from this situation: teachers need to be able to tolerate negative responses as part of their own development as well as their students’; teachers need some autonomy in deciding when to take remediating action; the timing of an evaluation should be appropriate to the purpose of the evaluation, which is also likely to be relevant to the purpose of the course. And we particularly want to highlight how these examples show that evaluation can be useful in different ways, especially for development. For detailed analysis of usefulness, see Onyura (2020) on evaluation utilisation and Patton (2010) on developmental evaluation.

The student view in the above examples is therefore crucial, but must be seen in the light of timing, context and purpose. Before starting a course or programme, students may not be in a position to predict, or even conceive of, the potential benefits that they will derive by the end (Aitken et al. 2019). Konnerup et al. (2019) argue for designing in opportunities for ‘springboards for development’, where students and teachers jointly develop new ways of doing things. This approach gives intention to something that happens anyway: students inevitably contribute to the design of a course, even when it is prescriptive. Students ‘complete’ designs by reinterpreting them, and by co-configuring their learning environments (Goodyear and Carvalho 2019; Fawns et al. forthcoming).

Teachers and universities are not in control of the student’s ecology: each student has their own, although there are clear areas of overlap and interdependence. Peters and Romero (2019) looked at the strategies online HE students use to configure their own learning ecologies across a formal/informal learning continuum. The result is a balance of control, wrought through design, policy, practice and subversion. Students subverting the teacher’s intentions, and the course’s expected learning outcomes, may not be a problem. Students are not always compliant, for a number of reasons, including many good ones. This is recognised as an inevitable aspect of design for learning, and the associated need for teachers to redesign as they go (Goodyear and Dimitriadis 2013). Indeed, if students did not learn things other than prescribed learning outcomes, then the attainment of graduate attributes or professional values would be impossible (Boud and Soler 2016). Evaluation that allows for expression of such elements is likely to entail reflection and dialogue and less likely to contain only ‘measurable’ features.

It is clear, then, that ecological evaluation cannot be entirely reduced to numbers, and may require qualitative and dialogic approaches to achieve its purposes. This is a significant challenge, since evaluation often aims to convey information about educational quality simply and concisely to a range of stakeholders, to facilitate easy comparison across courses, teachers, or institutions. Even if we do not wish to rank and compare, we often still need to convey the results of evaluation clearly and concisely. An ecological perspective implies that effective evaluation needs appropriately ‘thick’ descriptions of what has been going on, while still taking into account the practicalities of existing systems and practices.

3 The Case for Thick Descriptions in Evaluations of Postgraduate Online Education

Standardised questionnaires and measurements of outputs generate descriptions that are useful for ranking and marketing but are too ‘thin’ for developing teaching practice. While teachers need to be aware of any findings of such measures, they also need a sense of the context and other variables to avoid misinterpreting, overemphasising, or misattributing results to discrete elements of teaching such as modality. Crucially, educators need to be able to see how to make improvements, not only to a particular aspect of teaching or course design but to the whole system that embeds it, and to be able to contribute to dialogues about such systems.

3.1 Thin and Thick Descriptions

We refer again to the work of philosopher Gilbert Ryle to propose thick descriptions as a way of supporting shared meaning. They provide a more contextualised explanation of a given indicator in terms of both intention and cultural practice:

…thick description is a many-layered sandwich, of which only the bottom slice is catered for by that thinnest description. (Ryle 2009b: 497)

Thickness is not just about adding layers of data, however; such layers may make the description richer, while not fully accounting for what is actually going on. The layers of thick description must also convey something of intention, prior knowledge, and conventions within a culture. As Freeman notes:

thick description designates both the discrete data available for interpretation and a strategy to interpret and represent that data. (Freeman 2014: 828)

Thus, thick description must help us understand quality in relation to the lens through which it is viewed. A key feature of thick descriptions is that they have:

success-versus-failure conditions additional to and quite different from… [their thin counterparts] (Ryle 2009b: 498)

In explaining differences between thin and thick description, Ryle contrasts two boys: one has an involuntary twitch; the other is winking conspiratorially. The thinnest description is that each boy is contracting an eyelid. Yet, as Ryle points out, there is a huge difference—the twitch has no intentional meaning, but there are many layers of possible meaning behind the wink (e.g. the boy could be parodying another boy’s clumsy attempt to wink; he might be rehearsing such a parody). Only with sufficient information and its interpretation can we appropriately understand the wink in relation to its purpose, and the situation where it is enacted, to get behind the surface meaning of an ambiguous indicator.

Thick descriptions have been employed in a research context, most notably by Clifford Geertz, an anthropologist who borrowed the term from Ryle and applied it to culture and ethnography (Geertz 1973). In research, thick descriptions help us to make sense of complex phenomena and dynamic contexts by providing a framework for interpreting the researcher’s understanding. In education, thick descriptions might be recognised as a form of evaluative argument (Ory 2000) involving pre-interpreted, theorised explanations of the purpose, rationale, situated activity and success (or otherwise) of the activity relating to a course.

The notion of thin and thick description might help unpack the ostensibly objective kind of evaluation prized by endeavours like the UK Teaching Excellence Framework (TEF). In a stringent critique of the use of the TEF , Tomlinson and colleagues noted that:

Qualities that do not align to the logic of the competitive market ordering of HE or that cannot be expressed in quantities disappear[,] are marginalised and become devalued. (Tomlinson et al. 2018: 10)

The results are then ordered in a way that ranks universities for quality—entailing a thin and, in this case, very biased view of what quality means for universities. By further labelling the results of ranking as indicators of ‘teaching excellence’, the creators of the TEF elide thicker descriptions of what the quality of education actually manifests. The thinner the description, the more likelihood there is of such misrepresentation (or, at best, ambiguity). Tomlinson et al. (2018) do not use the language of thick description, but refer to the Bourdieusian term ‘symbolic violence’ to indicate the sleight of hand that our use of the thin/thick distinction also uncovers. Ranking universities in order of teaching excellence creates a thin, market-driven description that obscures many different and competing understandings of what teaching is actually about. To counteract this, teachers and course designers may wish to undertake their own evaluation to supplement standardised, institution-wide processes.

3.2 Examples of Thick Descriptions in Educational Literature

In the three examples below, we have found evidence of attention to meaning, interpretation, culture and context in evaluation. They show how theory and values can be used to interpret and even elicit shared understandings. These examples are from published papers, and though the term ‘thick description’ is not used, each incorporates it—that is, they indicate what teachers and students were actually doing or attempting to do within a specific context and how its success or failure might be interpreted. The examples cover online and campus-based work as well as undergraduate and postgraduate. They have been selected for specific points we want to make for the postgraduate online context, which we draw together in the following section.

3.2.1 Example 1: Applying a Theoretical Lens to Interpret One’s Own Practice (On-Campus Masters)

We have argued that most common forms of formal course evaluation do not provide a sufficiently full picture to inform future teaching in specific contexts. For this, teachers need to create their own descriptions, reconciling new information with the emerging overall picture through the application of a theoretical or praxis-based lens. Consider this extract from a largely negative account of his own teaching on embodied cognitive science by John van der Kamp, published with colleagues:

One of us (John) coordinates and teaches a course that addresses motor skill learning… As the teacher, John defines (or confirms) the intended learning outcomes, chooses course content, teaching and learning activities, and assessment methods… John uses a compendium of classical and current scientific papers and book chapters, the contents of which are assessed in a written exam. The course is organized in lectures and tutorials… by and large, John does the talking, the students listen and make notes (hopefully)… During the tutorials, students are meant to do the talking and thinking, but John often finds himself interrupting discussions to correct—in his view—misapprehensions of theory and methods or to further explicate. In short, despite good intentions, John’s teaching is largely prescriptive… As a teacher, John makes all the choices without consulting prospective students, though he does consider the suggestions made by students in the previous year’s course evaluations. By and large, students have no say in course content, it is enforced upon them and they have to adapt to it (cf. Freire 2008). This being said, students… show up in high numbers, except when exams are approaching. Also, students do value the course and teaching highly, giving ratings of quality of course content, lectures, and tutorial of approximately 4.5 on a 5-point scale. (van der Kamp et al. 2019: 3)

Despite positive ratings, John does not simply accept the results of his course surveys. Instead, he worries that they may reflect the implicit adoption, by John and his students, of a transmission model of education. More precisely, he uses Freire’s (2008) concept of ‘banking education’ in which teachers ‘deposit’ knowledge into the students. The authors recognise a tension between the way John teaches and his beliefs—informed by his area of expertise—about how people learn:

…the assumptions underlying John’s teaching—as presumably that of many colleagues—deeply conflict with the assumptions underpinning his science. Even though he emphatically tries to show students that radical embodied cognitive science deserves careful consideration, John does so by regulating the way in which they encounter it. John merely deposits it upon the students. (van der Kamp et al. 2019: 3–4)

As negative as this account of John’s teaching is, the purpose is to help John and others think through the relationship between his philosophy of teaching and learning and his teaching practice. Further, John criticises not only his own practice but also the structures where it sits. He brings together several sources of information, including standardised evaluation forms, but considers them through the critical lens of his own philosophy, which he has developed through thinking, talking, reading and writing about embodied cognition and related ideas. This is an important point—it is not feasible to construct an ecological evaluation without being clear on what one believes education to be and what is important within that. Armed with this clarity, teachers can then analyse their own practice through that theoretical lens, as John has done. They can compare their beliefs with their actions, and with the structures that support and constrain educational practices. John’s example shows that values and philosophy are important in underpinning thick descriptions that can run counter to available data and surface conceptions. It is interesting to note the tension between this thick description and the thin description of the student ratings. This also highlights a limitation of this thick description, from an ecological point of view: student voices are not considered, beyond student surveys which are largely dismissed, though they do emerge later when the authors describe John’s practice following this evaluation. John’s discomfort with the mismatch between his values and his practice has led him to make changes in a way that highlights the developmental benefits of a holistic approach to evaluation.

3.2.2 Example 2: Attention to Timing, Context and Student Perspectives (3rd Year Undergraduate Online)

Muir et al. (2019) set out to counter limitations of traditional end of semester questionnaires by combining weekly surveys and repeated interviews (eight for each student participant) across a semester. Their longitudinal approach to evaluation elicits ‘rich’ descriptions of the complexity of student engagement over time in relation to the practices of their teachers and the conditions in which they learn. They show that engagement with a course is not fixed or stable, as suggested by satisfaction ratings, but fluctuates over time, in relation not only to the instruction within the course, but also to factors outside it (e.g. personal circumstances).

The authors provide a detailed account of one participant, Angela, highlighting the depth of information and insight that this process generated. The account contains thick as well as rich description; it embodies the contextual approach we have endorsed, including the intention of meaningful interpretation. Further, their description shows that Angela’s impression of different elements of the programme was related not just to what kind of element it was, but also to its particular qualities. For example:

Engagement was boosted by ‘catchy, interactive’ and practical learning activities, while heavy, theory-based reading was ‘hard’. (Muir et al. 2019: 270)

One particular week, readings were a key theme:

‘long, laborious readings’ dominated her study schedule but were disengaging, particularly if written in technical ‘jargon’. Obversely, one assigned reading that helped her see ‘the big picture’ was ‘fantastic’, prompting interactivity with the text- book itself: she described ‘highlights [. . .] and Post-It notes everywhere because it just really consolidated what I knew.’. (Muir et al. 2019: 271)

This shows the problem with assigning fixed characteristics to a particular technology, resource, method or modality—as many writers do, including Zimmerman (2020) in calling for the ‘Great Online-Learning Experiment’. Angela’s experience shows the interrelation of factors: the timing of her encounter with materials, the way they are presented, their relevance to set learning tasks, and the interactions with teachers and peers that support her engagement with that resource.

The context around this example of thick description is important. We learn about Angela’s combination of part-time study and part-time work and her adult children who live at home, as well as how she identifies her approach to learning. Angela is not representative of all students, and her account alone is insufficient. However, she is an important ‘local voice’ whose insights can help us understand some of the design parameters and teaching considerations in online and, indeed, all kinds of education. Where each tick on a standard questionnaire is supposed to be representative of a student in such a complex set of circumstances, a thick description can tell us something about what that student is trying to do in their context. Example 2 does this by heavily featuring a particular student voice and context, and examining engagement over the duration of a course. However, it lacks the clear theoretical lens of example 1, and is primarily focused on workload, with limited interrogation of the concept of engagement or of the educational purposes in play. Thus, teachers can use this description to think about the balance of tasks and student workload, but may need additional information to inform the ways in which their designs and practices can support students to engage with and complete those tasks.

3.2.3 Example 3: Combining Academic and Professional Meaning-Making (Postgraduate Online)

Turning our attention now to the even more complex world of the part-time professional online postgraduate student, we feature a paper by Aitken (2020) that has been influenced by the ideas of ecological and holistic evaluation (Fawns et al. 2020). As noted by Fawns and colleagues, teachers on Aitken’s programme—an online MSc in Clinical Education—were already aware that their programme was satisfactory to students: they had scored 100% for overall student satisfaction in a Postgraduate Taught Experience Survey. This was reassuring but incomplete:

The PTES score does little to help us understand the extent to which satisfaction is derived from overcoming such challenges, or from meeting less demanding expectations (Fawns et al. 2020: 5)

Aitken (2020) uses dialogues with students and staff to evaluate the perceived impact of postgraduate online education. She considers students’ actions and interactions through technology and through their material contexts, extending to both academic and clinical settings. Aitken makes her theoretical influences and methods explicit, and provides considerable detail through her use of activity theory as a framework for analysis. This allows her to generate thick descriptions through focusing on what people are doing in a specific context—with its own conventions, forms of mediation, and division of labour. In the context of part-time online professional postgraduates, the division of labour is very different from that of full-time, campus-based undergraduate school-leavers. An excerpt from the paper shows how the author takes pains to bring the professional and academic elements together:

There was a clear focus on helping students’ professional development, not merely delivering academic knowledge, with a sense of encouraging students to think creatively and question more. Consideration was shown by staff in choosing mediating artefacts that would more clearly encourage criticality in students. In this way, an outcome in the programme system has the potential to become a tool or object in the student’s professional system… (Aitken 2020: 7)

Aitken identifies rich themes in the students’ experiences of learning and the associated implications for teaching and course design. Her thick description also considers the goals and study conditions of a particular online postgraduate context. Aitken’s themes indicate the dynamic relationship between study and clinical practice, influences that go beyond online exchanges, effects on professional identity, and individual practices, to the expansion of their networks. This then gives Aitken a valuable focal point for development.

These three examples have provided insights into how these particular courses were enacted: through teaching according to espoused or actual principles, through learning over a period of time in complex conditions, and through careful course design in relation to a professional curriculum. Remillard (2005) used the notion of the ‘enacted curriculum’ in a review of mathematical curricula to not only differentiate the intended curriculum from what actually happens, but also to highlight the agency of teachers (and others) in realising curricular intentions. Like the notion of thick description, the short expression enacting the curriculum points to the context, purpose and interpretation of the activities and interactions involved, helping us understand how we might determine the success or failure conditions.

4 Features of Thick Descriptions in Evaluation

We are proposing thick description as a way of articulating the complexity underlying the manifestation of quality in educational courses. Adopting thick descriptions for course evaluation facilitates integrated understanding of how individual beliefs, course structure, purposes, intentions, activities, resources, and agents interact to influence course quality. The three examples above show that each thick description is context-dependent, so we cannot be prescriptive about how to ‘enact the curriculum’, nor should we be. However, we recommend that thick descriptions for course evaluation include the following features:

  1. 1.

    Explicit articulation of how the curriculum is being enacted. This will include action and interaction by teachers, students or other agents. It should also articulate any findings concerning what the enactment of the curriculum means to these agents.

  2. 2.

    Examination of the value of both the planned curriculum and its enactment. This could show how the human agents interact with curriculum materials previously prepared, and any potential differences between intentions and enactment.

  3. 3.

    Exploration of potential value for future development of teachers and courses. For example, the evaluation might support taking forward something that worked, dropping or adapting something that didn’t, or exploiting an unanticipated outcome.

  4. 4.

    Meaningful involvement of students in the evaluation process. Student voices might be heard partly through evaluation surveys. However, conversations with students about their experiences and understandings of them are bound to yield richer insights.

  5. 5.

    Accounts of the physical and/or virtual environments and social structure of a course. This might include a rationale for these aspects of design (Goodyear and Carvalho 2014). Additionally, there can/should be evaluative inquiry into how agents interact with these structural elements. There may be instances where there was little or no intentional design, but that still warrants reflection: what is part of the design, what is emergent, and what can that tell us?

Our own examples of thick descriptions have been drawn from research literature, for practical and ethical reasons. However, we believe such descriptions can be developed for scholarly teaching, if they are not indeed already present through dialogues and informal feedback. Our five suggestions above indicate that teachers attempting to ‘thicken’ their available evaluation might explicitly consider the significance of any evaluative information and their interpretations of it. Significance and interpretation are more important than adding layers of description (Freeman 2014). In other words, we need a contextualised interpretation of our data and how different elements relate to each other, translating this holistic idea into language that is meaningful to others (Geertz 1973). Using a theoretical lens is one way of doing this. Another is to articulate a set of values (i.e. what is important to you in your teaching and why), and then use this to consider the data.

This latter point might help us approach a considerable obstacle to ecological evaluation: the limited extent to which educators feel able to give open and honest accounts, especially in a risk-averse, market-driven economy. The availability of values-driven thick descriptions can be useful where it is necessary to explain the context behind unsatisfactory metrics, or where a clear plan for informed change is called for. As Onyura (2020) points out, there can be tension between the use of evaluation to justify prior actions and argue for resources, and to generate new knowledge (about a course, one’s students, one’s teaching).

A related consideration is the extent to which it matters whether the values and rationales espoused in our thick descriptions are true accounts of prior intentions, or post-hoc rationalisations. Discrepancies between what was designed and what transpired highlights the limited extent to which one can design the actions or outcomes of students or, indeed, teachers. Indeed, example three brings into question the desirability of such control, particularly in the context of online postgraduate education, where significant value is found in emergent connections between disciplines, settings and cultures (see also the chapter by Marley et al. 2021, this book). Such understandings would be of value to many teachers, managers and students. Thus, thick descriptions within evaluation can have a formative aspect, not only to the evaluator, but also to those who have access to these descriptions, creating additional opportunities for reflection on practice.

Onyura notes that learning from evaluation is aided by ‘explicit clarification and examination of the theoretical underpinnings’ (2020: 4). This knowledge need not be restricted to the teachers on a course: it can be distributed across teaching networks through dialogue and dissemination, thus contributing to the quality of teaching beyond the evaluated programme. We might even argue that teachers should receive credit not only for how well their teaching went, but also for how well they have evaluated that teaching, including generating rich understandings of how to improve practice in the future. In such a system, honesty, even about one’s failings as a teacher, emerges as a positive attribute, as we saw in Example 1. Clearly, this will work best within a framework of trust between educators and managers, and between students and teachers. Still, by giving evaluation a formative focus, local stakeholders and their trusted colleagues can benefit from evaluation information even where it is not ‘politically acceptable or actionable’ (Onyura 2020: 4). Indeed, we argue that without a culture that allows such openness, the development of teachers will be stunted. We propose that even modest moves towards ecological evaluation, enacted through the generation and sharing of thick descriptions, can be a starting point for repairing the damage to trust that a focus on accountability to market forces has done to the university sector.

5 Conclusion

Thin descriptions of higher education teaching, supported by datafied approaches, are important to governments, administrators, managers and marketing specialists, and those who aim to promote the excellence of competing universities. University teachers and their students, however, need thicker descriptions of practice to enable them to understand and develop their joint endeavour to achieve pedagogical intentions in an atmosphere of trust. They can thicken existing thin descriptions by interpreting them in relation to reflections on practice and pedagogical theory, dialogues and interpretations that take account of all the components involved, along with the purposes and intentions of the evaluation itself. An ecological perspective on evaluation of postgraduate online teaching can take into consideration the interrelations between the different topics covered in this book, including practices and labour of teaching (Aitken and Hayes 2021) and assessment and feedback (Hounsell 2021), and the diverse contexts of students (Boyd 2021; Lee 2021; Marley et al. 2021; Stone et al. 2021), institutions (Fawns et al. 2021), and teachers (Bussey 2021; Buchanan 2021). Academic educators and researchers will need time and a conducive atmosphere to put these ideas into practice.