Introduction

The way teachers conceive of assessment purposes and practices is an important issue in education. Changes in assessment practices have proved to be a crucial instrument to promote deeper changes in teaching and learning practices (Barnes et al. 2000; Harlen 2005; Towndrow et al. 2010; Vandeyar and Killen 2003; Volante and Fazio 2007; Remesal 2006). However, the success of innovations may be either supported or resisted by the pre-existing beliefs of teachers who are expected to implement those changes (Fives and Buehl 2012). In this sense, we need to develop instruments that might be used by teacher educators, both in pre-service and in-service programs, to make teachers’ conceptions of assessment explicit, thus accessible to reflection and change (DeLuca et al. 2013; Stiggins 1999), in order to develop assessment literacy (DeLuca and Klinger 2010). Teachers’ conceptions of assessment have been an object of inquiry for over a decade within an intensive research program across different cultural contexts, among in-service teachers of primary or secondary schools (e.g. Delandshere and Jones 1999; Philippou and Christou 1997; Xu and Liu 2009; Brown 2004a; Remesal 2011; Coll and Remesal 2009), and also pre-service student-teachers (e.g. Ogan-Bekiroglu 2009; Wang et al. 2009; Brown and Remesal 2012).

In this study, we wanted to extend research to teachers in different contexts other than formal schooling. Foreign language teaching is, traditionally, a subject which transcends the formal, compulsory educational system. Hence, our target population turned to be foreign language teachers. Using a newly developed questionnaire (Remesal and Brown 2012), it was expected, in accordance with ecological rationality (Rieskamp and Reimer 2007), that conceptions of assessment would be influenced by teachers’ working and policy context. First of all, we present the theoretical background of the inventory and the study. After that, we describe the instrument validation procedures with an international sample of 493 teachers of Spanish as a foreign language (SFL/ELE) working both in formal schooling and extra-schooling contexts (compulsory school, extra-school teaching, adult education and in-company training). We conclude with a discussion of the implications of this study.

Theoretical background

The literature on teachers’ conceptions of assessment has identified the tension between two main different purposes and uses of assessment in the compulsory educational system; that is, improvement or formative versus evaluative or summative (Coll and Remesal 2009). Several researchers refer to the same phenomena using different terms; for instance: testing versus assessment culture (Wolf et al. 1991), formative versus summative assessment (Black and Wiliam 1998), improvement versus accountability (Brown 2004a; Brown et al. 2011b) or educational regulation versus societal control (Perrenoud 2001; Remesal 2011). Although these terms are not necessarily 100 % synonymous, researchers agree that within educational systems, there are conflicts and tensions between agents, participants and stakeholders concerning the nature, purpose and effects of assessment.

Although the literature often presents these two poles as alternative or mutually exclusive views, we defend the dialectical nature of both options and the inherent necessity of both phenomena: the regulation of teaching and learning processes, on the one side, and the external control of and by various educational agents, on the other (Remesal 2011; Brown 2004a). However, there might be other factors that need to be considered. There is evidence of different interpretations of what ‘improving teaching and learning by means of assessment’ might mean in diverse cultural settings, as a result of different school traditions and legislation. For example, in societies characterised strongly by public examinations (e.g. Hong Kong, China or Egypt), accountability is strongly correlated with improvement (e.g. Brown et al. 2011b; Brown et al. 2009; Gebril and Brown 2014). In other words, this systemic tension does have, in addition, a sociocultural component which affects assessment practices (Remesal 2007). Results of prior studies coincide in showing that conceptions of assessment tending towards formative practices are more frequently and strongly endorsed in primary schooling (e.g. Brown and Michaelides 2011; Brown et al. 2011a). In contrast, teachers in secondary education are more likely to regard assessment as an instrument of societal control or accountability rather than for improving teaching and learning processes in comparison with primary teachers (Remesal 2011).

Gaps identified in current research: our research question

Research to date has mainly focused on compulsory, formal systemic school settings at different compulsory levels (Liu 2008), but so we lack information on whether teachers’ conceptions might follow similar patterns in other educational contexts. Only recent research tells us about differences between university lecturers and their students in the ways they conceive of assessment (Fletcher et al. 2012).

Hence, in this study, we looked at teachers working in different non-compulsory adult education teaching contexts, such as after-school language academies or in-company language courses, versus teachers in compulsory schooling or formal higher education, trying to find an answer to the question: Do teachers’ conceptions about assessment in non-compulsory/non-formal adult education teaching contexts differ from those working in the compulsory school system? This research question is grounded on a twofold argument. First, there is the effect on teachers’ conceptions of systemic policies and practices of assessment in compulsory education (Rasmussen and Friche 2011; Skedsmo 2010), whereas non-compulsory education is characterised by highly variegated policies and diversity of systems (e.g. private language schools, universities, corporate work-places, etc.). Second, basic schooling takes place during periods of childhood and adolescence. In contrast, extra-school and non-compulsory adult education contexts usually involve autonomous adults, with noticeable differences in terms of cognitive capacities, motivation, and self-regulation abilities (Gill 2011; Smoke 2013).

Theoretical model grounding the study

The theoretical model grounding the questionnaire rests on the definition of conceptions as the organised subjective sum of individual beliefs, which, in turn, are understood as assumptions about objects and phenomena that people take as true (often without intellectual contrast) (Green 1971; Pajares 1992). Beliefs composing an individual’s conception of any particular topic differ from each other in being primary or central versus secondary or peripheral. Primary central beliefs are psychologically strong and resistant to change, whereas secondary and peripheral beliefs are more easily challenged. Furthermore, beliefs are arranged in clusters that may be held quite independently from each other. Green’s model is particularly suitable for understanding the sometimes apparent inconsistency between beliefs and behaviour: while we express or espouse a certain belief, our current behaviour might be driven or enacted by another belief or set of beliefs which remain unspoken in the background.

The model was developed in earlier qualitative research and refers to two essential multi-faceted dimensions that shape how assessment is understood and evaluated (Remesal 2006; Remesal 2011). The first dimension refers to the focus of assessment: that is, how assessment separately may affect (a) teaching, (b) learning, (c) the certification of learning and (d) the accountability of teaching. The second simultaneous dimension shaping teacher conceptions of assessment has to do with the overall control purpose of assessment: that is, assessment is either an instrument for the regulation of educational processes or an instrument for societal control. In the first option, assessment is a tool for reflection about teaching practices in the classroom, and/or as a tool for the improvement of students’ learning. In the second option, assessment is limited to grading purposes and practices and is used to exercise control over teaching and learning; here, teachers are unlikely to show positive attitudes towards assessment as a tool for change.

In contrast with previous models (Brown 2004a), this model suggests that the beliefs teachers hold on the effects of assessment on teaching are indeed quite often separated from beliefs on how assessment affects learning. In other words, since teachers may conceive of teaching and learning as separated processes, some teachers might have a coherent conception of assessment, with aligned beliefs on assessment affecting both teaching and learning, whereas other teachers might as well have incoherent conceptions of assessment, with confronting beliefs on how assessment affects teaching and learning. More importantly, rather than relying solely on an inter-factor correlation between accountability and improvement to determine the relationship of improvement to accountability, this model takes a bifactor approach in which each item (whether it be about improved teaching or improved student learning) is also jointly predicted by either an evaluative or improvement-oriented perspective. This approach makes more explicit the effect of the external aspects of assessment on the internal educational functions.

As a matter of fact, the results of the original study from which the theoretical model was built (Remesal 2006) show that a coherent conception of assessment tending towards formative practices is more frequent in primary education, whereas teachers in secondary education are more likely to present incoherent conceptions, closer to regarding assessment as an instrument of societal control. However, this is not a rule of thumb, and the complexity of teachers’ conceptions of assessment must be considered in relation with other aspects, such as previous education for teaching in primary and secondary school (which usually differs in every country), participation in professional development programs, teaching experiences and so forth (Liu 2008; Remesal 2011).

Research method and design

This study, approved by the corresponding institutional ethical board, used a cross-sectional survey convenience sample of the target population of teachers of foreign language (Spanish in this case), with causal-correlational analysis of self-reported responses to a structured inventory.

The questionnaire

We designed a 40-item self-report questionnaire with a positively packed agreement rating scale (i.e. two negative and four positive options) which has been found to be appropriate in conditions of social desirability (Brown 2004b). The questionnaire was presented to the members of an international online professional community (i.e. ComunidadTodoele, http://www.todoele.net/) of teachers of Spanish as foreign language (SFL). We had access to the members of this professional community in exchange for previous participation in an online seminar organised by the administrators of the community. Invitations to the community members were sent three times in a 3-month period. Teachers participated voluntarily and for free. They had open online access to the questionnaire from their own computers and took about 30 min to complete it. Results of the study would be shared with the community once published, and the researchers’ further participation in future seminars was also arranged in compensation of the participation in the study. The questionnaire is available at request to authors.

The items were aligned with the theoretical model that grounds the study, which was briefly presented in the previous section (Remesal 2006). Although the teacher conceptions of assessment inventory (Brown 2001–2003) already exists, a study with Spanish university students (Brown and Remesal 2012) demonstrated that the four-factor intercorrelated structure did not fit well with Spanish participants. Likewise, studies in Cyprus (Brown and Michaelides 2011) and Egypt (Gebril and Brown 2014) showed that the original statistical model developed in New Zealand did not apply outside that low-stakes assessment context. These supported the decision to test a different theoretical framework and a new instrument.

Each item was classified as belonging to one of four aspects focused by assessment (i.e. learning, teaching, accountability and certifying) as well as one of two purposes of assessment (i.e. formative regulation or societal control). The learning factor had to do with the motivational, feedback and regulatory effects assessment has on students, while the teaching factor had to do with the integration of assessment within curriculum and pedagogy. The accountability factor related to the use of assessment to evaluate and examine students, while certifying had to do with the use of assessments to establish standards and award certificates. Formative regulation speaks to the role assessment plays in informing and guiding teachers and learners to more effective learning outcomes, while societal control has to do with the role assessment plays in controlling teachers and students. Within each aspect of assessment, half the items were designed to reflect formative and half the controlling role of assessment.

Participants

Altogether, 7500 teachers were invited to respond to the questionnaire. It had a response ratio of about 7 %; hence, the sample of 493 respondents has a margin of error of 4.72 %. The respondents were Spanish-as-foreign-language teachers from all over the world, working in different teaching contexts; that is, either compulsory secondary school or extra-school adult education language teachers. Basic demographic characteristics of the sample are presented in Table 1. The questionnaire was answered mainly by women (79 %), which is slightly over the average percentage of female teacher population in European countries in 2012Footnote 1 and the female teacher population in the UK specifically (DoE 2011), middle-aged (47 %) and those with >10 years teaching experience (48 %). Slightly more than half of the sample (55 %) taught adults in different contexts (either language private academies or in-company training), while the rest taught children or adolescents (either in compulsory school—primary to secondary—or in private language academies as an extra-after-school subject).

Table 1 Demographic characteristics of sample

In this study, teaching experience, practice context and initial training were considered initial variables of comparison. The majority of the participants (60 %) had their initial training in the broad field of languages (e.g. translation, philology, linguistics); only one fourth in education (26 %) (e.g. pedagogy, didactics, educational psychology), whereas just 2 % of the sample had initial training in both fields. Finally, 12 % of the respondents could be considered ‘intruders’ in the profession since their initial training was apart from either language or education (e.g. graduates in History, Economics, or Engineering, etc.).

Analysis

Instead of resorting to exploratory factor analysis, the intended bifactor model was tested in confirmatory factor analysis. Bifactor models propose that each response is conditioned by two causes, while also modelling the effect of random error. While most bifactor models have a general common factor and independent group factors (Weekers 2009), the conceptual model proposed here consists of two dimensions (i.e. the focus of and the control purpose of assessment), each of which has two or more groups. This means that each item is modelled as being caused by both a focus and a control purpose factor (Fig. 1).

Fig. 1
figure 1

Bifactorial measurement model of 493 Spanish as a foreign language (SFL) teachers’ responses to inventory

In line with current practice (Fan and Sivo 2007; Hu and Bentler 1999; Marsh et al. 2004), a multi-criteria approach for acceptable model fit was adopted; models were not rejected if gamma hat and comparative fit index (CFI) ≥ 0.90, root mean square errors of approximation (RMSEA) and standardised root mean residuals (SRMR) ≤08 and χ 2/df ratio were statistically non-significant (p < 0.05). Models that met these criteria were not rejected. All analyses were carried out in AMOS (IBM 2011) using Pearson product moment correlations.

To determine the effect of teacher demographic characteristics upon conceptions of assessment, a factor mean score was created by bundling all the items predicted by the factor. Factor means were created instead of sums because of the different number of items in each factor. Multiple analyses of variance for the effect of key demographic variables on the eight interactive factor scores were conducted. Additionally, where MANOVA found statistically significant differences in mean scores for a factor, multi-group confirmatory factor analysis was carried out to determine whether the measurement model was statistically equivalent for that demographic variable. A nested, sequential approach first determines whether the model is configurally equivalent, then whether the regression weights are statistically equivalent, before testing the equivalence of item intercepts (Cheung and Rensvold 2002). Differences in the CFI > 0.01 indicate that parameters are not equivalent. Lack of equivalence in the measurement model further reinforces the conclusion that the samples are drawn from different populations (Wu et al. 2007).

Results

Confirmatory factor analysis in which each item is predicted by both a focus and a purpose of assessment had acceptable fit (χ 2 = 1040.39, df = 486, [χ 2/df = 2.141, p = 0.14]; CFI = 0.81; gamma hat = 0.94; RMSEA = 0.048, 90 % CI = 0.044–0.052; SRMR = 0.052). While not every path was statistically significant, the combined weight of paths produced small to large percentages of variance explained (min 2 % to max 47 %; M = 25 %, SD = 12.5 %) (see Table 2).

Table 2 Item statistics by predictor factors

The mean scores for each of the factors (Table 3) showed the learning and accountability focus and the formative regulation purpose all received the highest means. However, inspection of the interaction means revealed a somewhat more instructive result. The formative regulation focus on learning and teaching had the highest means, while the societal control focus on the same two applications had the lowest means. All four formative regulation applications had higher means than the societal control applications, clearly indicating that the SFL teachers endorsed using assessment to guide pedagogy much more than for societal control reasons. It is noteworthy that the lowest mean (teaching SC) indicates that the conventional use of student assessment to evaluate the quality of teaching was more-or-less rejected.

Table 3 Mean scores for each factor

Multiple analysis of variance of the effect of key demographic variables on conceptions used eight interactive factor mean scores (i.e. learning-FR, teaching-FR, accountability-FR, certifying-FR, learning-SC, teaching-SC, accountability-SC and certifying-SC). The predictor variables used were main effects for initial training and learner population (i.e. either children-adolescents or adults) and interaction between the two predictors. Statistically significant differences were found only for the main effect of working context (Wilks’ λ = 0.96; F (8,478) = 2.71, p = 0.006). Inspection of univariate differences for the eight factor scores showed that statistically significant differences existed for only three factors (i.e. accountability-SC, p = 0.006, d = 0.26; certifying-SC, p = 0.031, d = 0.20; and learning-SC, p < 0.001, d = 0.38). In all three cases, teachers in adult contexts gave lower scores, though the differences were small to moderate.

Multi-group confirmatory factor analysis found that after constraining the measurement weights to be the same for the two groups resulted in a difference of CFI = 0.011. This supports the conclusion that the two groups responded to the inventory in different ways and that teachers of Spanish in these two different contexts are samples from two different populations.

Discussion and conclusion

In this study, we wanted to learn about conceptions of assessment of teachers working in non-formal adult educational contexts versus those teaching in the formal and systemic school context, which led us to focus on teachers of Spanish as a foreign language. The results of this study show that, as proposed by the original study (Remesal 2006), teacher beliefs about assessment are organised in a complex model in which each individual evaluates the focus of assessment effects according to whether it has formative-regulatory or societal control uses. On the whole, this sample of SFL teachers conceived of assessment primarily in terms of its regulating effects on teaching and learning and hence was strongly positive towards formative assessment purposes. In other words, they agreed mostly that assessment exists to serve the improvement of learning and also teaching. In addition, we identified that those teachers working in non-compulsory adult contexts were less favourable towards the societal control purposes. In contrast, teachers in the formal compulsory schooling context showed a greater acknowledgement of assessment as a societal control tool by means of certifying achievement and showing accountability of teaching. This result might be explained by different elements that we address next.

First, teachers in formal schooling might have a greater awareness of their responsibility towards society and their own pupils or students through having to care for their basic development towards adulthood and responsible citizenship (Green et al. 2007; Johnson et al. 2008; Pope et al. 2009). In contrast, teachers in non-compulsory settings may regard this societal control by means of assessment as distant from their professional role (Vella 1994). Particularly, in the field of foreign languages, as were the participants in this study, teaching focuses less upon basic knowledge or competences, but more upon individual opportunities for better employment or better accomplishment of personal motives and goals that guide the decision to enrol in a foreign language course (Salido 2006; Zúñiga 2009).

Secondly, we have to keep in mind the great distance between children or adolescents as learners in a compulsory context, and adult learners in a free context of self-development. The relationship of power between children or adolescents and the teachers shows a greater gap than that between the adult instructor and the adult learner freely choosing to involve in a learning experience (Lawler 2003; Pratt 1992). The adult learner has clearer motives for learning and has usually already developed strong learning strategies. In addition, the adult learner makes a certain economic investment in language courses, which in turn contributes to raised expectations for positive results and, hence, their expectation of greater formative interventions from the teacher (Knowles et al. 2012). Furthermore, a wider sympathy and empathy can develop between the adult teacher and the adult learner as equals in life phase (Nesbit 1998; Pratt 1992).

These results are coherent with the theoretical model which declares the twofold functional nature of assessment in the basic school system (Remesal 2011), which by default imposes both formative and non-formative (i.e. certification and accountability) purposes upon assessment. Within schooling contexts, teachers have to accept that assessment results are also used to evaluate students, monitor teacher effectiveness and judge school quality. In contrast, it appears that teachers working outside the basic school system might focus more on the classroom context and thus feel a lesser attachment or commitment to evaluative purposes.

It is also noteworthy that the strongly positive view towards formative regulation uses of assessment for teaching and learning is consistent with studies of compulsory education teachers in New Zealand (Brown 2004a; Brown and Michaelides 2011), Queensland (Brown et al. 2011a) and Hong Kong (Brown et al. 2009). Endorsing assessment as a mechanism for improved teaching and student learning does appear to be a universal attitude among societies in which teacher professionalism is an important facet of education. Like the previous study of pre-service teachers (Brown and Remesal 2012), this study found the least endorsement for the use of assessment to evaluate schools.

The current bifactor model is quite different to the multidimensional and hierarchical model of the widely used Teachers’ Conceptions of Assessment inventory (Brown 2001–2003). Future research should investigate whether the models can be reconciled into a common framework. Further analysis of the non-equivalent aspects of the measurement model across working contexts is needed to see what further insights can be gained. Nonetheless, this study provides internal validation evidence for the questionnaire and advances our understanding of the contingencies in teachers’ belief systems about assessment. Last but not least, it is remarkable that the literature on adult education mostly disregards the issue of assessment (Richey 1992; Knowles et al. 2012), even recent proposals of adults second language learning do not include the topic (Smoke 2013); merely contributions in the field of formal higher education and technical training tackle it (Arend 2009). It is hence time to include assessment into the research program of non-compulsory adult learning contexts. Our study is a first step in this direction, since we identified differences in teachers’ conceptions that might be attributable to the educational context.

However, this study has limitations as well. Above all, there is an important limitation in the sample of participants; the huge variability of the sample, with no control over the particular institutional and cultural context, is certainly something to be addressed in future studies. In our study, we valued the voluntary participation and the diversity of ideas over sample control, but we recognise the methodological issue. Also, to continue this line of study, it will be necessary to focus on particular variables, like the sort of professional preparation that the teachers of foreign languages for adults receive on entry to teaching practice. The composition of our convenience sample did not allow us to make any deeper analysis into this; however, we found that there are three main backgrounds: linguistic, pedagogical and non-linguistic/non-pedagogical. It is our hypothesis that these different training backgrounds should generate different results on conceptions and practices of assessment. Finally, to make a comprehensive exploration of this new field, it would be necessary to pay attention to the other side of the road: the adult learners and their conceptions of assessment in relation to their own learning experience.