1 Introduction

Teacher education of many countries has recently come under strong public criticism. However, only little research exists in this field until now. There are numerous publications with normative-conceptual orientation but only a few empirical studies and these are mostly small-scale studies or analyses of policy documents (Blömeke, 2004; Cochran-Smith & Zeichner, 2005; Houston, 1990; Schaefers, 2002; Schlee, 1992; Sikula, Buttery & Guyton, 1996). Even in the field that is covered by most of the existing studies—the training of mathematics teachers—research deficits have to be stated: the research is often short term, of non-cumulative nature, and conducted within the own training institution (Adler et al., 2005; Krainer, Goffree & Berger, 1999; Lerman, 2001). What matters the most is probably that teacher-education research lacks a common theoretical basis, which prevents a convincing development of instruments and makes it difficult to connect the studies to each other. Recently, especially the comprehensive AERA volume “Studying teacher education” led to this conclusion (Clift & Brady, 2005; Floden & Meniketti, 2005; Grossman, 2005; Wilson & Youngs, 2005; Zeichner & Conklin, 2005).

The following paper takes up this problem and models how to measure effective teacher education in the context of the state of research in this field in order to overcome existing deficits in teacher-education research. The model represents the theoretical framework of an international-comparative study of mathematics teacher education in six countries called “Mathematics Teaching for the Twenty-first Century (MT21)”.Footnote 1 Kubow and Fossum (2007) regard teacher professionalism currently as the most important issue of comparative research. To actually test professional competence of future teachers and to grasp opportunities to learn in teacher education beyond distal indicators like certification or majors are two completely new approaches in teacher-education research. They require a careful, theory-driven research procedure. We extend the literature review covered by the AERA volume mentioned above and specifically include the European resp. the German-speaking literature. In addition, we include publications which specifically deal with mathematics teacher education.

We start our review with a conceptualization of the central criterion of effective teacher education, the professional competence of future teachers (Sect. 2). Then, individual, institutional, and systemic factors are described that may influence the acquisition of this competence during teacher education (Sect. 3). In this sense, we turn round the perspective taken by Cochran-Smith and Zeichner (2005). Whereas they mainly take an educational-sociological perspective by focusing on characteristics of teacher education and looking for their effects, we take an educational-psychological perspective by focusing on professional competence of teachers and asking for influences on this. In Sect. 4, we discuss challenges which are connected to the measurement of teacher-education outcomes.

2 Professional competence of future teachers

What teachers need to act successfully during their professional life can be called “professional competence”. Weinert (2001) divides competence in general into

  • cognitive abilities and skills—in terms of teachers’ professional knowledge—to solve certain problems; this knowledge is innate or learnable (see Sect. 2.1),

  • the motivational, volitional and social willingness and skills to apply solutions successfully and responsibly in variable situations—in terms of teachers’ professional beliefs (see Sect. 2.3).

This means by definition that, on the one hand, competence is characterized by applicability; on the other hand, competence is characterized by an intertwined structure of many different components (Hartig, Klieme & Leutner, 2008). One purpose of our paper is to lay out this basic idea of competence with respect to teachers.

The problems and situations to be dealt with by teachers are set by constitutive features of the teaching profession (Bromme, 1992, p. 73ff, 1997; Dann, 2000). To determine which features are in fact constitutive, this can be inferred from existing standards in various national teacher education systems (KMK, 2004; NBPTS, 2003; NCTM, 1991). Considering eastern as well as western countries, a spectrum of expectations can be identified: besides the core tasks of instruction and diagnosing student achievement which are present everywhere, depending on the country teachers have to nurture students’ social and moral development, to counsel parents as well as to participate in school development. Thus, teacher education is required to prepare future teachers for very different tasks in which different countries set different foci.

To ensure that it is possible to apply the knowledge acquired during teacher education to variable problems and situations (i.e. that it is not “idle”), this knowledge has to meet specific criteria which are in turn a challenge for measurement purposes (see Sect. 2.2). Moreover, based on Weinert’s definition several non-cognitive components are part of teachers’ professional competence as well: beliefs (see Sect. 2.3) and personal features like extraversion or stability (see Sect. 2.4). It would be risky to leave them out in a measurement if we cannot rule out the possibility that teacher performance is influenced by them. From the point of measuring development during teacher education, it is finally important to think about gradation of professional competence (see Sect. 2.5).

2.1 Cognitive components of professional competence

Different authors have described teachers’ professional knowledge differently. Overall, the models share three dimensions (see e.g. An, Kulm & Wu, 2004; Baumert & Kunter, 2006; Bromme, 1992; Ferrini-Mundy et al. 2006; Shulman, 1985):

  1. (1)

    content knowledge,

  2. (2)

    pedagogical content knowledge, and

  3. (3)

    general pedagogical knowledge.

In their everyday work, teachers have to combine these three dimensions appropriately to the characteristics of the specifics in a classroom situation.

There exists no consensus about how to conceptualize these three dimensions (Hopmann & Riquarts, 1995). Moreover, the distinction of content knowledge, pedagogical content knowledge and general pedagogical knowledge is more a heuristic to identify important facets of teacher knowledge than it is possible to mark exactly the lines between them. At some point, the dimensions unavoidably flow into each other.

The levels of content knowledge, pedagogical content knowledge and general pedagogical knowledge have proven to influence teacher performance significantly. However, if each of the facets is analyzed with regard to its relation to teacher performance, the effects are vague resp. contradictory. Thus, the single dimensions of teacher knowledge seem to have only a limited predictability. Evidence for this phenomenon exists especially with regard to content knowledge in mathematics (Begle, 1979; Hill, Rowan & Ball, 2004; Monk, 1994). While lacking knowledge has a significantly negative influence on teacher performance, from a certain threshold on performance does not improve anymore. This can be interpreted as a “ceiling effect” (Monk & King, 1994). Cognitive structuring of contents from a higher level of teacher knowledge possibly hinders their simplification and their presentation from a student’s point of view (Ball & Bass, 2000; Ball, Lubienski & Mewborn, 2001). Askew (1999) notices this effect for British elementary teachers as well.

Corresponding vague and contradictory effects become apparent with regard to general pedagogical knowledge if it is taken as the only criterion to predict teacher performance (Ashton & Crocker 1987; Grossman, 1990). In contrast, pedagogical content knowledge seems to be a vital element of teacher performance: the linking of content knowledge and knowledge about its instruction shows consistently positive correlations to student achievement (Baumert, 2006; Brown, Smith & Stein, 1996; Cohen & Hill, 1997; Wiley & Yoon, 1995). However, an important question is to what extent somebody can have pedagogical content knowledge without having acquired content knowledge and pedagogical knowledge.

Under an international-comparative perspective, several empirical studies have shown significant differences in the content knowledge of practicing teachers across countries as well as in their pedagogical content knowledge: The study of Ma (1999), which compared Chinese and US-American primary teachers, describes a lower level of mathematical knowledge of US teachers compared to their Chinese counterparts. This difference influences the ability of teachers to analyze students’ errors or to develop conceptual understanding. The study of An, Kulm, and Wu (2004), carried out with secondary teachers, displayed the role of beliefs in this process: Chinese teachers emphasize the development of procedural and conceptual student knowledge through reliance on more traditional teaching practices in contrast to US-American teachers, who emphasize a variety of activities to promote creativity in attempting to develop conceptual knowledge. These differences have significant influences on the teaching approaches and they are in accordance with the call for an emphasis on pedagogical content knowledge in teacher education, which can be found in the literature (Park, 2005).

The six-country study MT21 about future lower-secondary mathematics teachers reveals significant differences across the countries as well (Schmidt et al., 2007). Korea and Taiwan scored between one-fourth and a full standard deviation above the mean of each of the four other countries on the mathematics scales. Germany was typically in the middle of the international distribution while Mexico was below the international mean. Bulgaria and the US scored from the middle of the distribution to almost one standard deviation below the six-country mean. Regarding pedagogical content knowledge, countries differ in their strengths and weaknesses. On the curriculum test besides Taiwan the United States performed the best while for the test focused on student reasoning Taiwanese and Korean future teachers performed about a quarter to one standard deviation above the mean of each of the other four countries. Germany and the US were in the middle of the six-country distribution. These results support the necessity to distinguish between content knowledge and pedagogical content knowledge.

2.2 The problem of “idle knowledge”

The link between cognitive components of professional competence and teacher performance touches upon the problem of “idle knowledge”. By definition the conceptualization of “competence” as a latent constructs that is assumed to underlie teacher performance requires capturing those types of knowledge in a measurement that are really closely linked to performance. With regard to knowledge research in which the question of different types of knowledge is inquired, at least two major research traditions can be distinguished:

  1. (1)

    the psychological approach (Anderson & Krathwohl, 2001),

  2. (2)

    the sociological approach (Polanyi, 1985).

Both traditions share a distinction of knowledge types. The former distinguishes between declarative, procedural and meta-cognitive knowledge. The latter distinguishes between “knowing that”, “knowing how”, and “knowing why”. Procedural knowledge resp. “knowing how” refers to a type of knowledge, which is especially relevant to action. If one has only declarative knowledge resp. “knowing that”, one will have problems with its application in practice (Ausubel, 1968; Krapp & Heiland, 1986; Stark & Mandl, 2000). Procedural knowledge is situated knowledge and it is organized sequentially in the form of “cognitive schemes”. Regarding teachers, this means that they perceive and carry out classroom actions stepwise according to typical instructional sequences experienced earlier (Aebli, 1983; Putnam & Borko, 2000).

It has to be pointed out that also this distinction between types of knowledge is a heuristic, developed in order to be able to identify and measure knowledge facets in detail. By nature, they also flow into each other because declarative knowledge is transformed into procedural knowledge through experience. It is a special feature of the teaching profession that declarative knowledge of several areas—content knowledge, pedagogical content knowledge, and general pedagogical knowledge—needs to be combined and restructured in order to become procedural knowledge. Presumably, there has to be a minimum knowledge level in all three areas to reach a high level of performance.

Empirical evidence for this exists especially with regard to teachers’ ability to diagnose student achievement, which, according to Weinert, Schrader and Helmke (1990), plays a key role in teacher performance. It is possible for a teacher to accurately evaluate student features only if psychological-diagnostic knowledge, knowledge about cognitive demands of a specific learning field, pedagogical knowledge about typical procedures and mistakes of students in this field as well as instructional knowledge about benefits and problems of different teaching methods exist and are effectively linked to each other (Helmke, Hosenfeld & Schrader, 2004; Schrader, 2001). However, research documents that teachers have huge problems with diagnosing student achievement accurately (Baumert et al., 2001; Demaray & Elliot, 1998; Feinberg & Shapiro, 2003; Schrader, 1989; Spinath, 2005)—this may be an indicator how complex the structure of their professional knowledge is.

2.3 Beliefs as components of professional competence

Teachers’ beliefs are crucial to the perception of classroom situations and to the decision how to act (Leder, Pekhonen & Törner, 2002; Leinhardt & Greeno, 1986). If beliefs are operationalized specifically to both the content taught and the challenges a specific classroom situation presents, empirical evidence exists for a link between teacher beliefs and student achievement (Bromme, 1994, p. 77; 2005). Beliefs have a vital function with regard to orientation as well as to action (Grigutsch, Raatz & Törner, 1998). Therefore, they connect knowledge and acting. In this sense, they are also an indicator for the type of instruction teachers will use in their future teaching (Brown & Rose, 1995; Nespor, 1987; Short & Short, 1989).

These findings require including the measurement of beliefs into a study about future teachers’ professional competence. As with regard to knowledge, we can distinguish between different types of teacher beliefs (Calderhead, 1996; Cooney et al., 1998; Ernest, 1991):

  • epistemological beliefs about the nature of the underlying academic discipline (Hofer & Pintrich, 2002),

  • beliefs about teaching and learning in a subject (Thompson, 1992),

  • pedagogical beliefs about the social context of schools, about teacher education and the process of professional development.

In view of this distinction, there is a need to point out again that both the distinction of the three beliefs types as well as their delimitation to knowledge—in particular to pedagogical content knowledge and general pedagogical knowledge—have a more heuristic function which cannot strictly be kept up (Bromme, 1994, p. 78).

Epistemological beliefs about the nature of the underlying academic discipline are highly content-bound. Concerning mathematics teachers, we have to refer to the nature of mathematics. Grigutsch, Raatz and Törner (1998) categorize teachers’ beliefs mainly by four aspects: mathematics can be understood as a science which mainly consists of problem solving processes (“process”), as a science which is relevant for society and life (“application”), as an exact, formal and logical science (“formalism”) or as a collection of rules and formulae (“scheme”). In MT21 we were able to replicate this finding across countries (Schmidt et al., 2007) and within countries (Blömeke et al., 2008a).

With regard to beliefs on teaching and learning mathematics, research exists on a constructivist versus a transmission perspective (Peterson, Fennema, Carpenter & Loef, 1989). Results of MT21 show that in the groups of future teachers these perspectives are of a more antagonistic character in Western countries than in Asian countries (Schmidt et al., 2007; regarding Germany as one of the western countries participating in MT21 see also Müller, Felbrich & Blömeke, 2008). Staub and Stern (2002) applied Peterson’s instrument to teachers and linked their beliefs to student achievement. The study reveals that a constructivist perspective is related to higher student achievement as far as complex problem solving abilities are tested. At the same time, these teachers achieve a comparable level of students’ algorithmic abilities compared to teachers with a transmission perspective.

In addition to research about these broader beliefs on teaching and learning mathematics comparative studies focusing on Chinese and US-American teachers describe different beliefs concerning the curricular structure of mathematics teaching and the source of success: Whereas Chinese teachers favor a more holistic view on central mathematical ideas, US-American teachers regard mathematics more separated in smaller pieces (Ma, 1999). Chinese teachers view students’ efforts as reason for success in contrast to American teachers, who believe in mathematical talent as main determinant of student achievement (Stevenson et al., 1990).

Only very general findings exist about the acquisition of beliefs during teacher education. According to these, its mode of action can be specified as follows: future teachers beginning in teacher education hold detailed beliefs about school and instruction, which only in few cases outlast alterations made during their training (“Konstanzer Wanne”; Dann, Müller-Fohrbrodt, & Cloetta, 1981). Beliefs can function as filters; thus, information that is incorporated is mainly only information that fits into the existing system of beliefs (Kane, Sandretto, & Heath, 2002; Pajares, 1992; Richardson & Placier 2001). However, MT21 yields some evidence that beliefs may change according to instructional efforts during teacher education and that the newly acquired beliefs remain even under pressure during the future teachers’ induction—if the instructional efforts are delivered consistently across the different components in teacher education (Blömeke et al., 2008b; Felbrich, Müller & Blömeke, 2008).

2.4 The role of personal features

Psychological research shows that, in addition to cognitive abilities and beliefs, personal features like extraversion, stability or agency take on an important role with regard to the prediction of teachers’ professional competence—at least in the long run. During the last 10–15 years, the so-called “big-five” model (McCrae et al., 2000) has internationally become the most common model to measure these personal features also with teachers. Other studies use the agency and communion scales developed by Spence, Helmreich and Stapp (1974). The core result of this research is that a minimum level of features like extraversion, stability or agency has to be fulfilled to enable lasting professional success of teachers while features like neuroticism or communion should not be too strongly marked (Blömeke, in press; Lipowsky, 2003; Mayr & Neuweg, 2006). The perception of a high burden leads to the risk of an early burnout (Schaarschmidt & Fischer, 2001), and appears together with a negative development of student achievement (Helmke, Hosenfeld & Schrader, 2002).

Since teacher-education students are adults and join teacher education with rather stable personal features, it is probably almost impossible for teacher education to achieve significant changes in this field. Nevertheless, personal features may be an important mediating factor when teacher-education outcomes are to be measured. It has to be taken into account then.

To sum up, professional competence of future teachers can be modeled as a complex hypothetical construct that underlies teacher performance. It consists of professional knowledge in several dimensions, professional beliefs in several dimensions and personal characteristics in several dimensions (see Fig. 1). How these dimensions precisely are related to each other is generally unknown and an important topic of future research.

Fig. 1
figure 1

Model of teacher-education outcomes

2.5 Development and gradation of professional competence of future teachers

We have to notice a lack of theoretically and psychometrically valid models of professional competence, which include steps of development and distinguish between levels of competence as well. Existing models are based on phenomenological analyses or empirical-qualitative studies with small sample sizes (Dreyfus & Dreyfus, 1986; Fuller & Brown, 1975; Neuweg, 1999; Terhart et al., 1994). The most extensive results about competence development come from expertise research (Berliner, 1988; Bromme, 1997; Leinhardt & Greeno, 1986; cf. as compendium Ropo, 2004). According to this research, the key feature of professional development is an increase in the linkage of several knowledge domains as was mentioned above. The stronger this linkage is, the more the perception of teaching situations changes: situations are interpreted gradually with regard to possible actions (Berliner, 2001; Calderhead, 1984). Some evidence exists that this process of professional development needs at least 10 years from the beginning of teacher education and that this process has to include several years of practical experience (Terhart, 1996, p. 452).

Regarding the gradation of levels of competence, first attempt to create such a model were taken in MT21 (Blömeke et al., 2008c). In this effort, two perspectives were combined: a theoretical one and an empirical one. Based on an already theory-driven process of item development the features of all items were analyzed in even more detail and classified according to important characteristics like cognitive demands (e.g. single-step problem solving vs. multi steps; Embretson, 2002), number of knowledge dimensions required (e.g. one dimension vs. several dimensions) or knowledge level (e.g. school mathematics vs. university mathematics). In a regression analysis, we tried to predict the level of difficulty of every item (with respect to methodological details see Hartig, 2007; Harsch & Schröder, 2007). This procedure can be regarded as a way of construct validation (Borsboom, Mellenbergh & Van Heerden, 2004). However, in this exclusively theory-driven step we were not always successful. In cases our prediction did not meet the empirical difficulty of the MT21 items we analyzed the items again. In almost all cases, we discovered hidden item characteristics that explained the difference between a conceptual classification and its empirical difficulty. The items were re-classified and a new regression analysis was carried out. Our final model is able to explain about 69% of the variance in item difficulties. We found four levels of professional competence (Blömeke et al., 2008c):

  • on level A, the mathematics knowledge is limited to school mathematics. Problems to be solved should not be too cognitively complex, either. In contrast, the number of knowledge dimensions to be applied can be high. Usually this means that a teacher on this level shows sufficient content knowledge and pedagogical content knowledge to solve simple instructional problems.

  • On level B, these problems can require several cognitive steps. Still the knowledge is limited to school mathematics.

  • The necessary level of knowledge changes on level C. Teachers can solve problems which require knowledge of university mathematics and pedagogical content knowledge.

  • On the highest level D, the tackled problems additionally can be very complex from a cognitive point of view.

3 Characteristics of teacher education influencing professional competence

After having modeled professional competence as the criterion for effective teacher education and in this sense as the dependent variable of empirical studies on teacher-education outcomes, the question arises which factors may influence the development of professional competence. Potentially influential factors can be divided up into

  • individual characteristics of future teachers (Sect. 3.1),

  • institutional characteristics of teacher education (Sect. 3.2), and

  • systemic characteristic of a country (Sect. 3.3).

3.1 Individual characteristics of future teachers

It is known from instructional research that student achievement at school is influenced by prior knowledge and motivation as well as by different learning strategies and time invested into learning (Brophy, 1999; Helmke, 2004). Such individual characteristics may also play a role in teacher education if one looks at findings about the development of learning motivation across age groups (Stuhlmann, 2005). These influences on the acquisition of professional competence may also be cumulative because empirical studies on mathematics education show that high performance appears together with a positive self-concept and positive emotional attitude (Goetz et al., 2004; Möller & Köller, 2004). We do not have empirical findings about this with regard to teacher education. However, we cannot rule out the possibility that these kinds of individual characteristics matter.

From a cognitive point of view, prior knowledge has to be considered as a possibly influential characteristic. In this respect, a popular assumption discussed in the media is that future teachers represent a group of negatively selected high-school graduates. George Bernard Shaw’s quotation “He who can, does. He who cannot, teaches.” (1903, in “Maxims for Revolutionists”) can be taken as a dictum, which was used even by Shulman in his famous opening speech at the AERA-congress in 1985. But we do not have much evidence on this position. In contrast, results of recent studies would lead to the opposite conclusion (Abele, Neunzert & Tobies, 2004; Blömeke, in press; Curdes et al., 2002; Zumwalt & Craig, 2005).

More evidence exists slightly on how motives of future teachers entering teacher education influence the development of professional competence. A longitudinal survey of future teachers in Switzerland (Oser & Oelkers, 2001) shows that students who intended to become teachers right from the beginning received better results in their classes especially in general pedagogy compared with students who were unsure about which occupation to follow in the future (Brühwiler, 2001).

3.2 Institutional characteristics influencing professional competence

With respect to characteristics of teacher education programs, which may influence the development of teachers’ professional competence, one can think about a wide array of features: selectivity, program content, teaching methods, characteristics of teacher educators, accountability, location of teacher education, climate and so on. It is difficult to make evidence-based choices out of this array in order to decide about what to measure because only very few studies exist about the relationship of specific opportunities to learn (OTL) in teacher education and teacher-education outcomes. Most studies use only very rough proxies in order to measure OTL. The purpose of the following section is therefore to try to generate a reasonable model of only a few teacher education characteristics.

Before beginning this, it has to be pointed out that we focus on pre-service teacher education. Lifelong in-service trainings are left out even if they are of growing importance. Recently, Schwille and Dembélé (2007) examined teacher education in developing countries as well as in industrialized countries all over the world. They argue convincingly that the whole spectrum of teacher learning must be considered if one is to formulate policy recommendations or to design programs for teacher education and professional development. In fact, it seems as if the main difference between developing and industrialized countries is that the former almost exclusively focus on pre-service education whereas the latter more and more turn towards considering in-service education as important as well. However, regarding empirical research one certainly has to limit the complexity of a study. Further research reviews should focus on lifelong trainings since we limit ourselves to pre-service education.

3.2.1 Structure, selectivity and content of teacher education

The European Union and the OECD have been intensifying their efforts to conduct comparisons of teacher-education programs. The education network “Eurydice” collected data about the labor market of teachers (Eurydice, 2002b, 2003), and teacher education programs (Eurydice, 2002a), and conclusions for education policy were drawn (Eurydice, 2004). The OECD initiative “Attracting, developing and retaining effective teachers” provides comparable data for about 25 countries, of which, however, only 7 were not from the European continent. The final report emphasizes the variation of teacher-education programs (OECD, 2004b, p. 90ff) but it seems to be possible to distinguish between two types of programs: an integrated teacher education in which one or two subjects, the subjects’ pedagogy and general pedagogy are studied at the same time, and a two-step consecutive teacher-education program in which the subjects’ pedagogy and general pedagogy follow a subject-specific bachelor’s degree. Neither EU nor OECD reveals any evidence about the effects of this difference.

In contrast, selectivity of teacher education is a feature we do have some evidence about its effects, and this is on an aggregated cross-country level as well as within countries. A study carried out by ETS (Wang et al., 2003) looked at mathematics teacher education in seven countries (Australia, England, Hong Kong, Japan, the Netherlands, Singapore, and South Korea). The authors noticed a huge variety with respect to the point at which future teachers are selected and to the density of this process during teacher education. Overall, Asian countries seem to apply more and more rigorous selection criteria than English-speaking countries (ibid., 39). In MT21, we estimated the variance explained through selectivity using the approach of hierarchical linear modeling (Blömeke et al., 2008c). It turned out that in institutions with a higher GPA future teachers gain more professional knowledge. Against this background selectivity is an aspect that has to be considered in measurements of teacher education effects.

Findings about the effects of content in teacher education on professional competence are inconsistent (Blömeke, 2004; Cochran-Smith & Zeichner, 2005; Wilson, Floden & Ferrini-Mundy, 2001). A continuous positive link between opportunities to learn in the future teachers’ subjects or in general pedagogy to their professional knowledge and beliefs in these dimensions cannot be identified. This does not necessarily mean that content features can be left out in studies on teacher education. It merely points to the need of work on more sophisticated measures of OTL than available. The present studies mainly rely on superficial indicators like degrees, majors, examination results or the number of classes taken that can only insufficiently describe the kind of education a future teacher had experienced. Regardless of how common it is to use these measures as indicators (see e.g. Akiba, LeTendre & Scribner, 2007; Goldhaber & Brewer, 2000; Monk & King, 1994), this approach is of high risk to wash out any kind of relationship between opportunities to learn in teacher education and its outcomes because there is unfortunately nothing in teacher education “that share(s) a relatively common meaning across various cultural contexts” (Akiba, LeTendre & Scribner, 2007). An example of this is the difference in the meaning of opportunities to learn “general pedagogy”. In comparison to a broad central European understanding the understanding in English speaking countries is rather narrow since it is mainly operationalized as classes in teaching methods resp. classroom management (Hopmann & Riquarts, 1995). This methodological weakness results in a disturbing inconsistency of study results because differences due to cultural shaping overlay differences between programs. In addition, because of the inconsistent findings almost any inference can be drawn: teacher education may or may not matter, personality may or may not matter and so on (see e.g. Abell Foundation 2001a, b vs. Darling-Hammond & Youngs, 2002). Therefore, there is a need to develop less aggregated measures which capture the content of teacher education in a low-inference way.

Precisely this was done in the six-country study, MT21. Future teachers were asked whether they encountered certain mathematics, mathematics pedagogy and general pedagogy topics in one or more of the courses they took as a part of their teacher-education program. What was listed represented the types of topics that can be studied in mathematics teacher education. The second set of questions asked the future teachers to rate the extent to which they had had the opportunity in their teacher preparation program to study various topics or to be engaged in specific activities in mathematics, mathematics pedagogy and general pedagogy. Results of this study are presented by Schmidt et al. in this journal issue. It turns out that these measures in fact are able to capture the OTL more precisely and that they measure the intended differences between countries. In addition, it becomes obvious that the opportunities to learn are related to the future teachers’ knowledge.

3.2.2 Linkage of theory and practice, teaching methods and teacher-educator features

A repeated discourse in teacher education is about the linkage of theory and practice. We have some evidence that the acquisition of professional competence is influenced by the way this linkage is accomplished. A German–Swiss comparative study of future teachers shows that fewer experiences of school practice during the university study lead to ideas about teaching with less theoretical and empirical foundation (Czerwenka & Nölle, 2000). Correspondingly, graduates who have been taught in theory-based classes about instructional issues show a more theory-based performance in their classroom practice (Niggli, 2004). However, it seems as if not only the amount of practical experiences is important but also its sequence. The pressure to act induces students to question the usefulness of scientific theories, and unreflectively to adopt traditional performance routines if the teaching load is too high in an early stage of their professional development (Jäger & Milbach, 1994; Oser & Oelkers, 2001).

In view of this result, the teaching methods used in teacher education may play an influential role as well (Grossman, 2005). Mayr (2003) shows that knowledge acquired during teacher education can better be used if it was developed in practice-related class arrangements. Kotzschmar (2004) illustrates that classes based on exploratory and independent learning result in higher teacher knowledge.

In turn, it becomes obvious that teacher educators may also be an element of opportunities to learn in teacher education. In comparison with instruction in school, this could be especially true since there are usually fewer detailed guidelines in teacher education (Zaslavsky & Leikin, 2004). Many decisions about what to teach and how to teach are up to the teacher educators themselves.

To sum up, it can be stated that no coherent empirical findings about the effects of teacher-education programs on teacher-education outcomes exist. However, it seems to be possible to identify core features of teacher education that influence professional competence: selectivity, content of teacher education (if measured on a less aggregated level than up to now), linkage of theory and practice, teaching methods, and characteristics of teacher educators.

3.3 Systemic features of teacher education

Just like in the school system, it can be expected that there are cultural features which may, conveyed by institutional features of teacher education and individual features of future teachers, explain variance in the professional competence of future teachers. Such systemic features could be, for example working conditions of teachers, their social reputation or general socio-cultural features of a society.

In most countries teachers at elementary schools are generalists, that is they teach various school subjects and have to take over the role of form teachers. This role implies a wide array of duties, including communication with parents, writing recommendations, organizing the class schedule. In secondary school, in contrast, teachers are mainly specialists. This means they teach only one, at most two subjects (Mullis et al., 2004, p. 240). It is plausible that this difference influences the structure of teacher-education programs. The same applies to educational ends of the school system. In some countries, students are supposed to acquire a broad general education, which results in a fixed curriculum until the end of compulsory schooling (OECD, 2004a, p. 364). By contrast, other countries believe in an increasing specialization and individualization of schooling, in which students can choose their courses in elementary school already and increasingly more so at the lower-secondary level, and all of them at the upper-secondary level. Probably these differences result in differences of program features in teacher education as well.

In addition to these still relatively proximal school-related features, the question arises whether cultural features far more distal may influence teacher education and future teachers’ professional competence as well. From the literature about cultural features, three indicators seem to be well-founded measures of the complex construct “culture” (Hofstede, 2001; Inglehart, 1997; Triandis, 1995): the socio-economic level of a country, its level of democratization, and the level of individualism. These three features reflect both historically retrograde origins of a society as well as current developments. First analyses suggest that in fact there is a relationship to teacher education. The more an individualistic attitude dominates in a country, the lower the prestige of teachers and the lower their income is (Blömeke, 2005, 2006). With increasing socio-economic development, the idea of general education at school (so-called “Allgemeinbildung”) seems to be abandoned. Instead, specialization and individualization is stressed (ibid.). Therefore, even if this increases the complexity of a teacher-education study even further, a comprehensive study has to be aware that there may exist hidden cultural characteristics behind institutional or individual features of teacher education.

To sum up, we have described the state of research on the acquisition of professional competence in teacher education. Our review concludes into a model with potentially influential factors on an individual, institutional, and systemic level which have to be captured (see Fig. 2). This model can provide a sound basis for further empirical studies on the effectiveness of teacher education.

Fig. 2
figure 2

Multi-level model of factors influencing teacher-education outcomes (FT = future teachers)

4 Challenges of measuring the effectiveness of teacher education

The final part of our paper deals with specific problems and challenges that empirical studies have to face if they try to measure effects of teacher education in the above documented way. Such special challenges are knowledge characteristics that require a specific item format (see the section about idle knowledge) with simultaneously restricted assessment time, problems of construct validation, sampling issues, the intertwining of teacher-education features, the hierarchical structure of the data, and the benefits and limits of international comparisons.

4.1 Item format and time restrictions

As shown, professional competence is a complex construct, which manifests itself as a result of the situated nature of its knowledge and belief dimensions as well as their intertwining. In order to consider these characteristics and to avoid the measurement of idle knowledge, it is important to use a special item format: teaching situations which can only be handled by using and linking several knowledge and beliefs dimensions (Blömeke, Felbrich & Müller, 2008). Distractors or codings of open-ended answers would have to mirror levels of competence. Apart from the difficulty of developing such items and distractors/coding rubrics (with respect to this see the paper about future teachers’ competence to plan a lesson by Blömeke et al. in this journal issue), it is important to point out that they are hard to realize in a paper-and-pencil format. Here, the usage of computer-based assessment methods could accelerate progress. In view of the current level of technology and its availability, it remains questionable whether such an assessment would work on a large scale across countries though.

Moreover, even with situated items an evaluation of professional competence remains limited in large-scale assessments. The testing of university knowledge and this in three different areas (content knowledge, pedagogical content knowledge, and general pedagogical knowledge) is much more time-consuming than testing student achievement. This problem would not be as pressing if only some of the interviewees need to answer the complete item pool. However, there are limits to a rotating multi-matrix design in teacher education research since in many countries there is only a very small target population per institution (Schmidt et al., 2007). Therefore, the number of test forms cannot be increased unrestrictedly.

In many respects, a multi-method approach is therefore very viable. Some of the problems mentioned can be solved if large-scale assessments are accompanied by qualitative studies. In these, the possibility exists to inquire the nature of future teachers’ competence in more depth (see as an example Schwarz, Kaiser & Buchholtz, 2008 and Schwarz et al. in this journal issue).

4.2 Construct validation

Usually, it has to remain a subject to other studies to validate externally the findings about professional competence as a function of teacher education. For example, by testing other groups than future teachers with the same instrument in order to estimate its discriminant validity, following future teachers into their profession in order to estimate whether the structure of their competence develops in the expected way, or testing practicing teachers with the same instrument, observe their performance and assess the achievement of their students. Student achievement, for example, usually cannot directly be used as a criterion for the effectiveness of teacher education—even if this is its ultimate function—because this would involve the examination of a large number of student and school features which is, from a technical side of research, difficult to achieve in one study. The number of variables related to teacher education and professional competence already presents a challenge.

However, if one has a well-founded model for a study (see e.g. Figs. 1, 2) which enables researchers to develop items in a theory-driven way and which claims clear relationships between the models’ variables, it is possible to derive a set of strong hypotheses. If they were supported by empirical data, this would be an important indicator of construct validity (Borsboom, Mellenbergh & Van Heerden, 2004). Multidimensional Rasch measurement allows the testing of item component models as well as faceted designs (see e.g. Blömeke et al., 2008d; Krauss et al., in press). As a matter of course, it is necessary to break down every component documented in Figs. 1 and 2 once more in order to be able to make use of the full potential of IRT models.

Within the general problem of validation, the evaluation of beliefs appears to be especially problematic. In large-scale assessments it has to be done by self-reports. This means to rely on a common rationale of the test persons (in addition to problems of social desirability etc.). Even controversial nationally, the international reliability has to be considered rather carefully given the cultural differences of what it means to be a teacher. On top of this, beliefs of future teachers are of a strong hypothetical nature since there may be a lack of sufficient experience in the role of teachers.

4.3 Sampling issues

Another challenge of teacher-education research is the sampling procedure. In some countries, quite a pragmatic problem exists how to estimate the size of the target population and how to approach the future teachers. This applies especially to central European countries like Austria, Denmark or Germany where higher education is organized in an individualized form and “classes” do not exist.

In addition, the preconditions with which future teachers begin their teacher-education program need to be known to be able to estimate the true effects of the programs. Thus, a sole assessment at the end of teacher education would not be sufficient. Consequently, the workload for the measurement of teacher-education outcomes doubles. However, how would one define the “beginning” of teacher education? Is the general BA program in a consecutive model part of teacher education? Moreover, if one decides so, how would one pick up those students who will become teachers? In MT21, solutions were developed for every one of these problems—in a long-lasting process that needed a lot of communication between the countries (Schmidt et al., 2007).

4.4 Hierarchical data structure

It has to be pointed out that causes, conditions and effects of teacher education are—just like in educational research in general—closely intertwined. In view of the hierarchical structure of the data, it seems to be essential to carry out a multi-level analysis (Ditton, 1998; Raudenbush & Bryk, 2002), to estimate both the influence of individual features on the acquired professional competence as well as the influence of institutional conditions. Experience from respective analyses confirms the potential of this approach. Ecological fallacy (using data from units at a higher level to draw inferences regarding units at a lower level) as well as atomistic fallacy (drawing inferences regarding units at a higher level based on data collected for units at a lower level) could therefore be avoided. However, it needs to be regarded that single contextual features—e.g. the constitution of the future-teacher body—represent pooled individual features, and it needs to be pointed out that, compared with school research, the capacity of inferential statistic is limited due to a significant smaller test population and fewer training institutions (Schmidt et al., 2007; Blömeke et al., 2008c).

4.5 Benefits and limits of international comparisons

Researchers are embedded in their own culture so that they often are not able to recognize matters of culture. This is particularly the case for teacher education, given the unique way in which it implicates many different levels of education and stands at the intersection of education and other socio-economic and political arenas (Blömeke & Paine, in press). Therefore, this kind of research is a challenge by itself. In international comparisons additional problems of language and meaning become important. They are far more demanding to resolve than “simple” translation of instruments or responses. A lot of terms from native languages cannot be translated because adequate English terms are missing. Vice versa, a translation from English into another language can fail because now in this language appropriate terms are not at the translator’s disposal. In the field of education, this problem arises often: the very German term “Bildung” does not have a counterpart in English, vice versa the different meanings of sex and gender do not have a counterpart in German.

Differences in the structure of teacher education make collecting comparable data even more complicated, and different meanings of the constructs inquired make the interpretation of the results complicated. Features of teacher education usually do not share a common meaning cross countries. On the other hand, it is precisely this phenomenon that represents one of the values added to nationally bounded research. The variety of manifestations makes hidden national characteristics visible. Even one of the ce rtainly simplest constructs in the field of teacher education, a “mathematics major”, has quite different meanings in East Asia, Continental Europe and English-speaking countries—not to mention constructs like “general pedagogy”, “curriculum”, “didactics”, “mathematics education” or “schooling”.

Another value added by international comparisons is the joint work of experts from many different fields. Teacher education research on a large-scale basis is a relatively new but especially difficult area that requires a lot of expertise in order to be carried out appropriately. Content knowledge, pedagogical content knowledge, pedagogical knowledge, epistemological beliefs, pedagogical beliefs, self-efficacy or whatever construct is to be measured; sampling issues, test design, data analyses—it is far more probable to find experts in all fields necessary across countries than within one country.

Finally, it has to be pointed out that international comparisons provide an implicit benchmark. We know that some countries do much better in studies like TIMSS or PISA than others. This suggests that their teacher education may consist of more effective features than those systems of countries that do relatively badly. Therefore, if one carefully samples the countries participating in a cross-country study the comparisons are quite meaningful.

5 Conclusions

The measurement of teacher-education outcomes is challenging. Professional competence is a complex construct, and its development depends on many context factors—beyond others on characteristics of teacher education. Much research is needed to clarify the importance of single program characteristics. Longitudinal and cross-sectional studies within countries as well as across countries can lead to meaningful insights into the impact of individual and institutional characteristics on the nature and the development of professional competence. A precondition for this is an appropriate model, which makes it possible to carry out the research in a theory-driven way. This was one of the purposes of the present paper in addition to summarizing the state of research and pointing out the challenges which are connected to teacher-education research.

The investigation of teacher-education programs in different countries and the discovery that it is possible to organize things differently is of special relevance. It sheds a new light on fundamental cultural concepts behind teacher education which are usually taken for granted. In this sense, MT21 as well as the recently released IEA study on teacher education (TEDS-M; Tatto et al., 2008) are of high importance for research matters. As teacher-education research has become more important during the past years, we have a good chance to eliminate quickly the most serious research deficits in the field.