In this article, the research team describes how they adapted the Fennema-Sherman Mathematics Attitude Scales (FSMAS) to assess teachers’ attitudes toward mathematics. The research team also discusses the rigorous procedures used to validate this measure’s use with lower-primary teachersFootnote 1 (with a version of the FSMAS the team calls FSMAS-T) and the corresponding results of these analyses. Finally, the team provides an example of a study that used the FSMAS-T to examine longitudinal changes in teachers’ mathematical attitudes after participating in an elementary mathematics specialist professional development program. While little research has been conducted to examine the longitudinal change in teachers’ mathematical attitudes, the results of this study suggest that researchers and program evaluators may use the FSMAS-T to examine the impact of professional development on such changes (or stability).

The current study is situated within the educational context of a state and a number of districts within the USA. During the pilot study, the research team focused on answering the research question: Can the FSMAS be adapted for use with lower-primary teachers such that the adaptation forms valid and reliable scales for measuring teacher attitudes toward learning mathematics? Following that, the research team examined the research question: Does using the FSMAS-T with lower-primary teachers result in the same factor structure as the original FSMAS? Confirmatory factor analyses were conducted with data from lower-primary teachers to examine the psychometric qualities (e.g., dimensionality and reliability) of the adapted measure. Finally, the research team explored the following question: Does the FSMAS-T exhibit measurement invariance when used with lower-primary teachers? Measurement invariance analyses were conducted using data from a similar lower-primary teacher sample to cross-validate that the psychometric properties obtained from the previous step could be generalized across groups.

Literature review

A decade ago, Goldin (2002) expressed the concern that research in mathematics education was focused mainly on cognition and less on affect. Now, researchers recognize the importance of affective factors in explaining individual differences in the learning of mathematics (Philipp 2007). Affective factors influence one’s decisions about how many mathematics courses one is willing to take, the amount of effort one is willing to exert in learning mathematics, and the way one approaches learning mathematical content (Reyes 1984).

Although the affective domain has been defined in various ways in the literature, McLeod (1992) proposed that attitudes, beliefs, and emotions were the major components of this domain in mathematics education. In the present study, the research team focuses on one of these components—attitudes. Specifically, the team examines lower-primary teachers’ attitudes toward learning mathematics.Footnote 2 Neale (1969) defined attitudes toward mathematics as “a liking or disliking of mathematics, a tendency to engage in or avoid mathematical activities, a belief that one is good or bad at mathematics, and a belief that mathematics is useful or useless” (p. 632). This definition refers to one’s attitudes as a learner of mathematics and, although relatively old, captures the potential components of an individual’s attitudes toward learning mathematics, such as confidence, motivation, and anxiety.

Within the affective domain, the distinction between attitudes, beliefs, and emotions remains unclear, and attitudes seem to overlap with beliefs, emotions or feelings, and values (Leder and Grootenboer 2005). For instance, mathematics anxiety has been characterized as fear (a “hot” emotion) or dislike (an attitude), and it may even include aspects that are cognitive (e.g., worry) (Ma 1999; McLeod 1992). In fact, McLeod (1992) conceptualized these components as a continuum:

…we can think of beliefs, attitudes and emotions as representing increased levels of affective involvement, decreased levels of cognitive involvement, increasing levels of intensity of response, and decreasing levels of response stability (McLeod 1992, p. 579).

The research team acknowledges that individuals’ confidence, motivation, and anxiety toward mathematics also may constitute other components of the affective domain. Therefore, in this article, mathematical attitudes refer to a broad construct of individuals’ attitudes toward mathematics (Ma and Kishor 1997).

An individual’s mathematical attitudes play an important role in the learning of mathematics. Many researchers have investigated students’ attitudes toward learning mathematics and have consistently found that students’ mathematical attitudes are related to their mathematics achievement and self-efficacy in mathematics (e.g., Cooper and Robinson 1991; Jackson 2015; Ma 1999; Muis and Foy 2010; Sherman and Christian 1999). Furthermore, Lipnevich et al. (2011) found that students’ mathematical attitudes predicted their mathematics achievements independent of their innate mathematics ability. Individuals tend to perform more poorly than their actual abilities if they are anxious about mathematics, because anxiety and fears they experience when doing mathematics may prevent them from utilizing the mathematics knowledge they possess (Ashcraft and Kirk 2001). Negative feelings, such as worries and fears, can compromise the thinking and reasoning process (Beilock 2008) and can undermine self-confidence (Sherman and Christian 1999).

Less research has been focused on teachers’ attitudes toward learning mathematics, particularly practicing, in-service teachers’ mathematical attitudes and how their attitudes are related to their teaching practices and students’ learning (McAnallen 2010). In one study, Wilkins (2010) investigated elementary school teachers’ attitudes toward different subjects, and mathematics was consistently ranked as teachers’ least favorite subject to teach, particularly for lower-primary teachers. Moreover, a large percentage of in-service teachers have reported anxiety toward mathematics (Haycock 2001), and teachers with high levels of mathematical anxiety tend to demonstrate low confidence in teaching elementary mathematics (Bursal and Paznokas 2006; Jackson 2015; Sherman and Christian 1999; Swars et al. 2006).

Teachers’ mathematical anxiety can also be “contagious” to students’ attitudes toward mathematics (Beilock et al. 2010; Stipek et al. 2001). Beilock et al. (2010) examined the relations between elementary female teachers’ mathematical anxiety and their students’ mathematical attitudes and mathematics achievement. Teachers’ mathematical anxiety was measured using the short Mathematics Anxiety Rating Scale (Alexander and Martray 1989), in which teachers responded to questions about how anxious different situations would make them feel (e.g., “studying for a math test”). The findings showed that at the beginning of the school year, there was no connection between female teachers’ mathematical anxiety and their students’ mathematics achievement. However, by the end of the school year, the more anxious the teachers were toward mathematics, the more likely girls (but not boys) in their classrooms were to believe the commonly held stereotype that “boys are good at math, and girls are good at reading.” Moreover, girls who endorsed this stereotype had lower mathematics achievement than girls who did not and lower achievement than boys overall.

Elementary school mathematics experiences are often the first causes of negative mathematical attitudes, which in turn undermine students’ confidence in their mathematical abilities and lead to mathematics avoidance later in school (Harper and Daane 1998). Thus, teachers’ negative attitudes toward their own learning of mathematics can exert a significant impact on their students’ current, as well as future, mathematics learning and achievement. In addition, with about 85 % of early mathematics teachers being female (Goldring et al. 2013), it is important to understand how to reduce teachers’ mathematical anxiety and foster positive mathematical attitudes through professional development.

In-service teachers’ attitudes toward learning mathematics not only influence the attitudes and achievement of their students but also their own instructional practices (e.g., Karp 1991; Quinn 1997; Richardson 1996; Stipek et al. 2001; Thompson 1992; Wilkins 2008). For example, Wilkins (2008) asked teachers to report their enjoyment and liking of mathematics, their perceptions of the importance of mathematics, their feelings of success with mathematics, and their enjoyment and liking of teaching mathematics. Findings revealed that teachers with more positive attitudes toward learning and teaching mathematics tended to have a stronger belief in the effectiveness of inquiry-based instruction (e.g., having students work in cooperative learning groups) and to use such instruction more frequently in their teaching. Inquiry-based instruction is often found to be positively related to student mathematics achievement (e.g., Hill et al. 2008).

Other studies also have explored relationships between teachers’ mathematical attitudes and instructional practices. Karp (1991) found that teachers with negative attitudes toward mathematics more often used teacher-centered approaches and did not engage students actively in learning mathematics. Lee (2005) surveyed 200 Indiana kindergarten teachers and found that teachers’ attitudes toward teaching mathematics were important predictors of the presence of developmentally appropriate teaching practices in the classroom. Stipek et al. (2001) conducted in-depth research of teachers of students aged 9–12 that involved the use of teacher surveys and multiple classroom observations. Stipek et al. found significant correlations between teacher attitudes and classroom practices as well as correlations between teachers’ and students’ self-confidence and enjoyment of mathematics. Therefore, teachers need to develop positive attitudes toward mathematics learning and teaching as well as their own feelings of efficacy as a mathematics learner and teacher (McLeod 1992; Pajares 1992).

With the increasing recognition of the importance of teachers’ mathematical attitudes on their instructional practices and students’ learning, it is crucial to explore ways to improve teachers’ mathematical attitudes, as well as to evaluate the effectiveness of those approaches. Teachers need to position themselves as life-long learners in order to become more effective in teaching, and the life-long learning process is an ongoing part of their professionalism (Graven 2004). Fennema and Franke (1992) proposed that instructional practice mediated the relationship between teacher characteristics (e.g., teachers’ mathematical attitudes) and student learning. Thus, investigations of teachers’ attitudes toward their own learning of mathematics will help develop a better understanding of teachers’ learning and teaching, as well as potential learning opportunities that exist for students (Jong et al. 2012). Additionally, within the context of a teacher professional development program such as the one in which this study is situated, examining changes in teachers’ attitudes toward learning mathematics is one potential indicator of program effectiveness.

To assess teachers’ mathematical attitudes properly, sound measures are necessary. However, to the authors’ knowledge, no reliable and validated measure of teachers’ mathematical attitudes has been widely and consistently used in the literature. Several surveys exist that measure attitudes in mathematics, such as the Fennema-Sherman Mathematics Attitudes Scales (Fennema and Sherman 1976a), the Attitudes Toward Mathematics Inventory (Tapia and Marsh 2004), and the Mathematics Anxiety Rating Scale (Richardson and Suinn 1972). However, all three instruments were developed for use with students.

When planning the professional development program for lower-primary teachers in 2008, the research team was unable to find any measure of in-service teachers’ attitudes toward learning mathematics that had been widely and consistently used in existing research. Since then, Welder et al. (2011) developed the Mathematics Experiences and Conceptions Surveys (MECS) to assess teachers’ attitudes, beliefs, and dispositions toward the teaching and learning of mathematics. The MECS was developed and has been mainly used among preservice teachers (e.g., Hodges and Jong 2012; Welder and Jong 2012). The MECS includes both items assessing preservice teachers’ attitudes toward teaching mathematics (e.g., “I look forward to teaching mathematics”) and items pertaining to preservice teachers’ attitudes toward learning mathematics (e.g., “I enjoy solving mathematics problems”). Even given the increased attention to teacher mathematical attitudes in the past decade, there still is no measure for in-service teachers’ mathematical attitudes that is in wide use for research.

When measuring the impact of professional development on teacher participants, one often wants to measure changes in attitudes. Individuals’ mathematical attitudes are shaped by their past experiences with mathematics, such as interactions with mathematics teachers in elementary school, teaching practices they experienced when they were in school, and experiences in past mathematics classes (e.g., Harper and Daane 1998; Jackson 2015; McAnallen 2010). Thus, frameworks focused on students’ attitudes toward mathematics might be helpful in examining teachers’ mathematical attitudes. Therefore, well-established measures that focused on students’ mathematical attitudes might be informative in exploring teachers’ mathematical attitudes.

The Fennema-Sherman Mathematics Attitudes Scales (Fennema and Sherman 1976a) are among the most popular instruments used in studies of students’ attitudes toward mathematics. The FSMAS has been widely used by researchers in English-speaking countries besides the USA, including Canada (e.g., Chouinard et al. 2008) and Australia (e.g., Forgasz et al. 1999; Norton and Rennie 1998; Rowe 1993). The FSMAS has also been translated into many languages, such as French (Vezeau et al. 1998), Chinese (Sachs and Leung 2007), and Arabic (Alkhateeb 2004). It has been used in international comparisons, such as in the Trends in International Mathematics and Science Study (TIMSS; Mullis et al. 2008).

Fennema-Sherman Mathematics Attitudes Scales

The Fennema-Sherman Mathematics Attitudes Scales (Fennema and Sherman 1976a) have been extensively used to assess students’ attitudes toward learning mathematics. They were originally developed to capture gender-related differences in mathematical attitudes among high school students (Fennema and Sherman 1976a). The FSMAS encompasses nine scales, each of which measures domain-specific attitudes that are believed to be related to the learning of mathematics: Attitude Toward the Success in Mathematics, Mathematics as a Male Domain, Mother (perception of mother’s attitudes toward one as a learner of mathematics), Father (perception of father’s attitudes toward one as a learner of mathematics), Teacher (perception of teacher’s attitudes toward one as a learner of mathematics), Confidence in Learning Mathematics, Mathematics Anxiety, Effectance Motivation in Mathematics, and Mathematics Usefulness (Fennema and Sherman 1976a). Each scale has 12 items, and responses are measured using a 5-point Likert scale (from “strongly disagree” to “strongly agree”).

The nine scales can be used altogether or separately, and a review of the literature suggests that the FSMAS has been used widely and flexibly in research. Some researchers have used an individual scale in their research, while other researchers have used sets of two or more scales. For instance, Lim and Chapman (2013) and Dew et al. (1983) only used the Mathematics Anxiety scale in their studies, whereas Norton and Rennie (1998) used five scales (i.e., Mathematics Anxiety, Confidence in Learning Mathematics, Effectance Motivation, Attitude toward Success in Mathematics, and Mathematics as a Male Domain) of the FSMAS to examine students’ mathematical attitudes in single-sex and coeducational schools. Other researchers have adapted one or more scales. For example, Betz (1978) adapted the Anxiety scale for use among college students, and Bai et al. (2009) made further revisions based on Betz’s (1978) adapted version in their study. In other studies, researchers have selected items from the FSMAS instead of using a full scale. For instance, Pearn et al. (1996) selected items from the FSMAS and several other instruments to develop a multidimensional questionnaire to examine the mathematical attitudes of secondary students from schools in Australia with high percentages of students with non-English speaking backgrounds. Throughout these various uses, the FSMAS has been used primarily among student populations, particularly among secondary and college students, in many countries, and it has been rarely used among teachers.

Since the publication of the FSMAS, many researchers have re-examined its use among diverse populations. For instance, Forgasz et al. (1999) examined the Mathematics as a Male Domain scale among ninth-grade students from Australia, Sweden, and the USA and found that several items in the scale might no longer be valid and revisions were needed to ensure its usability in measuring the original construct. Lim and Chapman (2013) examined the factor structure of the Mathematics Anxiety scale using data from students attending pretertiary in Singapore (see Lim and Chapman 2013 for information on pretertiary institutions). Based on exploratory and confirmatory factor analyses, three items were removed and a two-factor model was retained (positively worded items formed one factor “FS-EASE” and negatively worded items made up the other factor “FS-ANX”). Melanchon et al. (1994)) investigated the measurement integrity of the FSMAS using data provided by public elementary school teachers in the USA, and the results supported the validity of the FSMAS for use with public elementary school teachers.

Shortened versions of the FSMAS also have been developed and used. For example, Mulhern and Rae (1998) developed a short form using principle component analysis, and Sachs and Leung (2007) developed a shortened version using Bhargava and Ishizuka’s (1981) BI-method (a method that selects variables by using the trace information of the correlation or covariance matrix). However, the research team decided to use all items from a few scales instead of using five to six items from all the shortened scales. By using all items from the most teacher-relevant scales, the team hoped to capture a more in-depth picture of teachers’ attitudes toward mathematics learning with respect to the scales chosen.

In the current study, the research team originally chose four scales from the FSMAS: (1) the Confidence in Learning Mathematics Scale, which measures individuals’ confidence in their abilities to learn and to perform well in mathematics; (2) the Mathematics Anxiety Scale, which measures individuals’ “feelings of anxiety, dread, nervousness, and associated bodily symptoms related to doing mathematics” (Fennema and Sherman 1976a, p. 326); (3) the Effectance Motivation Scale in Mathematics, which measures whether individuals enjoy and seek challenges regarding mathematics; and (4) the Mathematics Usefulness Scale, which measures individuals’ beliefs about the usefulness of mathematics currently and in relationship to their future education and vocation. The research team chose these scales to align both with the goals of the teacher professional development program and the research literature about teacher factors that may be related to student outcomes. Teacher anxiety, in particular, has been shown to negatively impact student outcomes, particularly for female students (e.g., Beilock et al. 2010). The professional development program specifically sought to decrease teachers’ anxiety toward mathematics while increasing teachers’ motivation and confidence related to mathematics teaching and learning and helping teachers see mathematics as useful. Thus, these factors were well aligned to measure the impact of the professional development program on these desired dimensions.

Since the research team was focused on adult teachers learning mathematics and needed the survey to be as short as possible (given the other data collection demands on teachers’ time), the team decided not to include the scales about Mother, Father, and Teacher. The items on these three scales are worded in ways that clearly apply to students living at home with parents, whose success in mathematics would be heavily influenced by their perception of parent and teacher expectations. Because Forgasz et al. (1999) found that the Mathematics as a Male Domain scale needed to be updated into more modern language and situations, the research team decided not to use this scale. Finally, while the Attitude Toward the Success in Mathematics scale would potentially have been interesting, some of the items would not pertain to teachers taking graduate courses in mathematics (e.g., “It would be really great to win a prize in mathematics”). The professional development goals included increasing teachers’ mathematical knowledge for teaching, confidence, and motivation while decreasing their anxiety toward mathematics. The professional development leaders also expressed an interest in measuring whether teachers grow in their recognition of the usefulness of mathematics, even though this was not one of the program goals. Seeing mathematics as useful is one indicator of a positive attitude toward mathematics teachers could be conveying to students. Thus, the scales chosen represent the best match to the project goals and leaders’ interests and have since been selected for use in research on preservice elementary teachers’ attitudes/beliefs (Tsao 2014).

To prepare for the pilot study, the research team first reviewed each item from the chosen scales and made edits as needed to ensure the items were applicable to present-day teachers, rather than students. The research team rewrote items using future tense to present tense. For example, the word “will” was excluded from the original item “I will use mathematics in many ways as an adult.” The research team also reworded some items to make them culturally appropriate and easier to understand. For example, “I have a knack of flubbing up math” was changed to “I have a tendency to mess up math.” One item from the Mathematics Usefulness Scale was eliminated (i.e., “I expect to have little use for mathematics when I get out of school”), because it is relevant for high school students, but not for teachers in general. The revised instrument was presented to a team of leading experts in the fields of mathematics education, early childhood education, mathematics, and statistics. The experts provided feedback on the relevance of items to lower-primary teachers, the extent to which lower-primary teachers might interpret the wording of the items in consistent ways, the alignment of the items to the overall goals of the professional development program, and the extent to which items might capture the intended change in teacher attitudes after participating in the professional development program.

Next, the research team randomized the order of the items and shared an initial draft of the instrument with a very experienced primary teacher leader. The teacher leader completed the draft survey and offered suggestions for further revisions. The research team also rewrote the item “Math has been my worst subject” to instead be “Math was my worst subject.” Based on her feedback, the research team switched the order of some items. After making these changes, the instrument was ready to be piloted with a larger group of teachers.

Pilot study

The research team first conducted a pilot study to evaluate the FSMAS-T. Sixty-five teachers completed the FSMAS-T, including 43 elementary teacher leaders, two preservice teachers, and 20 middle-school teachers. The pilot teachers were involved in one of two different professional development programs, one held in a Southern state and one held in a Midwestern state. Each program was separately led by two of the experts who had helped review the instrument. Even though the research team did not collect teacher-level ethnicity data, the professional development leaders reported that the participants were nearly all Caucasian; thus, the research team knew they would not have enough teachers to look at subgroups by ethnicity. The 20 middle-school teachers were 90 % female. The elementary teacher leaders and preservice teachers were all female. Table 1 presents the characteristics for each of the samples involved in this study, including the percentage of male teachers in the sample, years of teaching experience, and teaching assignment at the time of the study.

Table 1 Summary of samples

For each predefined scale, the research team conducted reliability analyses on the pilot data to identify potentially problematic items. Based on the coefficient alphas, the Confidence in Learning Mathematics (Confidence) (α = 0.906), Mathematics Anxiety (Anxiety) (α = 0.924), Effectance Motivation in Mathematics (Effective Motivation) (α = 0.921), and Mathematics Usefulness (Usefulness) (α = 0.858) scales had acceptable estimated reliability for the teachers. For each scale, the research team also examined items with low item-total correlations. Two items from the Confidence scale and one item from each of the other scales had item-total correlations less than 0.4. However, none of these items were edited or removed from the final instrument, because the items measured different, but important, aspects of their respective constructs.

Descriptive statistics also were obtained for each item. Responses for six of the items on the Usefulness scale had low variability. Specifically, most of the pilot participants responded positively to these items (“agree” or “strongly agree”), indicating that most of the teachers perceived mathematics as useful. The research team wanted to be able to detect potential positive changes in teachers’ mathematical attitudes after participating in professional development programs, and thus, needed scales with room for growth. Additionally, movement on this scale was not deemed an important objective of the larger research study around program impact. No program goals were related to teachers finding mathematics more useful, but improved attitudes, including decreased anxiety and increased confidence and motivation, were explicit goals of the professional development. Therefore, the Usefulness scale was eliminated from the final survey (see Appendix for the survey; items are arranged by scale). Advanced analysis, such as confirmatory factor analysis, was not conducted because of the small sample size. The teachers also had the opportunity to provide qualitative feedback about the instrument and any item(s) they found difficult to understand and/or answer. However, none of the written responses indicated issues with the individual items that needed to be addressed.

Instrument validation

Results of the pilot study suggest that the FSMAS can be adapted for use with teachers. However, due to the small sample size, only basic descriptive and reliability analyses were conducted. To further validate the use of the FSMAS-T with teachers, a rigorous validation procedure was performed. The validation procedure for the FSMAS-T involved two steps. First, a series of confirmatory factor analyses were conducted with data from a sample primarily comprised of lower-primary teachers (the confirmatory factor analysis sample) to examine the factor structure of the FSMAS-T. Confirmatory factor analysis is a theory-driven statistical procedure that is used to verify the factor structure of a set of observed variables (e.g., measurement items). Next, to ensure that the factor structure obtained from the first step was not sample-specific, measurement invariance analyses were conducted using data from a similar sample that primarily consisted of lower-primary teachers (the cross-validation sample) to examine whether the psychometric properties of the FSMAS-T obtained from the confirmatory factor analysis sample could be generalized to this group. Measurement invariance (i.e., measurement equivalence) analysis concerns the extent to which the psychometric properties of the observed indicators can be generalized across groups or over time/condition.

Confirmatory factor analysis sample

The confirmatory factor analysis sample came from a larger National Science Foundation-funded Math Science Partnership study examining the effectiveness of a professional development program. A total of 225 teachers completed the FSMAS-T, and they were followed for several years to examine longitudinal changes in their mathematical attitudes. Participating districts included four larger core partner districts and 28 smaller districts across Nebraska, a Midwestern state in the USA. About 85 % of the teachers were teaching kindergarten to third grade at the time of the study. About 3 % of the teachers were teaching upper-primary grades (fourth to six grade) or multiple grades, and 4 % of the teachers had other teaching assignments. Almost 98 % of the participants were female teachers, and they were primarily Caucasian. The teachers had an average of 12 years of teaching experience, and all of the teachers had a bachelor’s degree or higher.

In this study, teachers’ baseline data (the first time they completed the FSMAS-T) were used to conduct confirmatory factor analyses, because these data are more representative of teacher attitudes toward mathematics prior to participating in professional development. The baseline data also provide a basis for which to measure change.

Cross-validation sample

The cross-validation sample is composed of 171 teachers who came from most of the same districts and had similar characteristics as the confirmatory factor analysis sample. The original Math Science Partnership research study included teachers participating in the professional development during 2009–2011. As the program transitioned to an institutionalization phase, additional groups of teachers participated in the professional development program in 2012–2014. These teachers also were followed for several years, and they completed the FSMAS-T multiple times. In this study, only teachers’ baseline data were used in the measurement invariance analyses. At baseline, they had an average of 9 years of teaching experience. About 84 % of the teachers were teaching lower-primary grades (kindergarten to third grade); 9 % of the participants were upper-primary teachers (fourth to sixth grade) or teaching multiple grades; and 3 % of them were English language learner teachers, special education teachers, or math coaches. In addition, only 4 % of the teachers were male. Thus, similar to the confirmatory factor analysis (CFA) sample, the cross-validation sample primarily was composed of female Caucasians.

Results

Confirmatory factor analysis

Robust maximum likelihood estimation in Mplus v. 7.11 (Muthén and Muthén 1998–2013) was used in confirmatory factor analyses to examine the psychometric qualities of the FSMAS-T. The research team did not perform an exploratory factor analysis. One main goal of exploratory factor analysis is to reduce a set of variables to a smaller number of variables while retaining as much of the original variance as possible. In this case, the use of exploratory factor analysis is more pragmatic than theoretical (Brown 2006). Based on Fennema and Sherman’s (1976a) findings, each scale in the FSMAS measured a single construct. The research team did not intend to reduce the number of variables (i.e., items), but to test whether each scale would measure the same construct as the original FSMAS for a different population. Thus, as a theory-driven analytical approach, confirmatory factor analysis is appropriate for this purpose.

First, the research team conducted a confirmatory factor analysis using the 225 teachers’ baseline data to examine the construct of each scale. Consistent with the original FSMAS, the research team proposed that each scale is unidimensional, which means that each item is constrained to measure only one construct. Thus, to test the unidimensionality of the three factors—namely Confidence, Effectance Motivation, and Anxiety—the team conducted a confirmatory factor analysis for each scale separately. Table 2 presents the model fit statistics of all the CFA models examined. Figure 1 presents the final measurement model.

Table 2 Model fit indices of the confirmatory factor analyses
Fig. 1
figure 1

The final measurement model of the FSMAS-T after removing three problematic items. The model fit of each factor was examined separately in the analysis, and the standardized and unstandardized model estimates can be found in Table 3

The research team used several model fit indices to evaluate the model fit, including the obtained model chi-square value (χ 2), the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR). Although there are other fit indices available, the research team only used the ones listed above based on Brown’s (2006) recommendation and findings from a study by Jackson et al. (2009). Brown (2006) selected these indices on the basis of their popularity in the literature, as well as their favorable performance in simulation studies. After examining almost 200 studies, Jackson et al. (2009) also found that chi-square values, CFI, RMSEA, and TLI are the most commonly used fit indices. In general, using these different types of model fit indices in conjunction with one another provides stronger evidence for model fit than solely relying on one model fit index (Brown 2006).

For each fit index, the research team used the criteria reported by Brown (2006) to evaluate model fit. For example, the research team used the obtained model χ 2 to assess model fit at an absolute level. A significant χ 2 test (p < 0.05) indicates that the exact model fit is not good, and nonsignificance is desirable. However, because χ 2 is susceptible to sample size, the research team also used other model fit indices, as is customary. For CFI and TLI, values greater than 0.95 indicate good fit, and values greater than 0.9 indicate acceptable model fit. For RMSEA, values smaller than 0.06 are desirable for good fit, values of 0.06 to 0.08 indicate acceptable model fit, and values of 0.08 to 0.10 indicate mediocre fit. The 90 % confidence interval of RMSEA is also reported. SRMR values of 0.08 or smaller indicate good model fit. Together, these criteria helped the research team evaluate the fit of each CFA model. In addition to the various fit indices, the research team examined standardized factor loadings and squared multiple correlations (R 2) of the items to investigate effect size. A factor loading represents the correlation between the item and the latent factor; R 2 represents the amount of variance in individuals’ responses to the item that can be attributed to the latent factor. The research team also examined normalized residuals for diagnosing local misfit to respecify the model when needed.

For the Confidence scale, the model fit was not acceptable, χ 2(54) = 190.246 (p < 0.001), CFI = 0.896, TLI = 0.873, RMSEA = 0.106 (90 % CI = 0.090~0.122), SRMR = 0.057. The research team inspected the normalized residual matrix and discovered that two items were more correlated than the model indicated (i.e., item 5: “I don’t think I could do advanced mathematics”; item 16: “I am sure I could do advanced work in mathematics”). The extra correlation between the two items might be due to the common word “advanced.” The phrase “advanced” might have caused problems, as lower-primary teachers may interpret “advanced” as courses like calculus, which they may see as irrelevant to teaching lower-primary mathematics. To retain as many items as possible, the research team examined model fit while removing only one item at a time. Removing item 5 did not result in desirable model fit, especially according to RMSEA, χ 2(44) = 144.969 (p < 0.001), CFI = 0.912, TLI = 0.890, RMSEA = 0.101 (90 % CI = 0.083~0.119), SRMR = 0.054. Removing item 16 led to acceptable model fit according to CFI, TLI, SRMR, but not χ 2 or RMSEA, χ 2(44) = 119.510 (p < 0.001), CFI = 0.934, TLI = 0.918, RMSEA = 0.087 (90 % CI = 0.069~0.106), SRMR = 0.049. Thus, the research team decided to remove both items containing the word “advanced,” which resulted in a better model fit, χ 2(35) = 89.609 (p < 0.001), CFI = 0.945, TLI = 0.929, RMSEA = 0.083 (90 % CI = 0.062~0.105), SRMR = 0.046. All the items had significant standardized factor loadings ranging from 0.496 to 0.870 and R 2 values that ranged from 0.246 to 0.757.

Next, composite reliability (i.e., coefficient Omega) was calculated. Composite reliability is conceptually similar to the internal consistency coefficient—coefficient alpha. However, the calculation of coefficient alpha makes the assumption that all items load on a single underlying construct and all items represent that construct equally well (i.e., equal factor loadings). In confirmatory factor analysis, indicators (i.e., items) are allowed to have heterogeneous correlations with the underlying factor (i.e., heterogeneous factor loadings). Thus, composite reliability provides a more precise estimate of reliability than coefficient alpha (Geldhof et al. 2014). In the formula for calculating coefficient Omega, the numerator is the squared sum of factor loadings (i.e., the true score variance), and the denominator represents the true score variance plus all residual variances. The Confidence scale obtained a composite reliability coefficient of 0.935.

For the Effectance Motivation scale, the one-factor model fit was acceptable according to most fit indices, χ 2(54) = 123.513 (p < 0.001), CFI = 0.941, TLI = 0.928, RMSEA = 0.076 (90 % CI = 0.058~0.093), SRMR = 0.041. When inspecting the normalized residual matrix, the research team found that item 1 “I like math puzzles” might be problematic. The normalized residual matrix indicated this item should be less correlated with other items than what the model estimated, suggesting that this item may not measure the same construct as the other items. The research team also found this item to be problematic in the pilot study and hypothesized that different people might have different understandings regarding what a math puzzle is. While there is another item that uses the phrase “math puzzle” (i.e., item 26: “Math puzzles are boring”), that item did fit in the model and seemed to get more at teachers’ interests, whereas the former (i.e., “I like math puzzles”) may imply some actions—that a teacher would seek out math puzzles due to liking them.

Because the research team wanted to retain as many items as possible to better measure the construct, the team first removed only one item, “I like math puzzles,” which resulted in improved model fit, χ 2(44) = 80.719 (p < 0.001), CFI = 0.964, TLI = 0.955, RMSEA = 0.061 (90 % CI = 0.039 ~ 0.082), SRMR = 0.036. All the items had significant standardized factor loadings ranging from 0.388 to 0.878, and R 2 values ranged from 0.151 to 0.771. Removing both items 1 and 26 that contain the phrase “math puzzles” resulted in slightly worse model fit than the model with only item 1 removed, χ 2(35) = 67.870 (p < 0.001), CFI = 0.964, TLI = 0.953, RMSEA = 0.065 (90 % CI = 0.041~0.087), SRMR = 0.037. This evidence, along with the research team’s desire to retain as many items as possible, led the team to decide not to remove item 26, “Math puzzles are boring.” Furthermore, this item fit the initial model, as indicated by the normalized residual matrix discussed previously. The composite reliability coefficient for the 11-item Effectance Motivation scale was 0.918.

For the Anxiety scale, the model exhibited an acceptable fit, χ 2(54) = 130.842 (p < 0.001), CFI = 0.943, TLI = 0.931, RMSEA = 0.080 (90 % CI = 0.062~0.097), SRMR = 0.038. All of the items had significant factor loadings that ranged from 0.540 to 0.903 and significant R 2 values that ranged from 0.292 to 0.815. The Anxiety scale had a composite reliability coefficient of 0.942. Thus, in the final models, the research team eliminated two items from the Confidence scale, removed one item from the Effectance Motivation scale, and retained all items from the Anxiety scale. Table 3 provides the estimates and standard errors for the item factor loadings from the final standardized solutions for all three factors.

Table 3 Model estimates of the confidence, effectance motivation, and anxiety factors in the final models after removing problematic items

Finally, a three-factor model was examined. In this model, the Confidence, Effectance Motivation, and Anxiety factors were allowed to be correlated. The model obtained an acceptable fit, χ 2(492) = 896.771 (p < 0.001), CFI = 0.911, TLI = 0.905, RMSEA = 0.060 (90 % CI = 0.054~0.067), SRMR = 0.054. The three factors were highly correlated, with correlation coefficients ranging from 0.837 to 0.962. Thus, a one-factor model with all 33 remaining items was examined to investigate whether there was actually only one factor instead of three factors. This model did not have an acceptable fit, χ 2(496) = 1227.542 (p < 0.001), CFI = 0.840, TLI = 0.829, RMSEA = 0.081 (90 % CI = 0.075~0.087), SRMR = 0.131. Therefore, the research team recommends retaining the three scales instead of combining the three scales into one scale. However, high correlations among the three scales should be expected, which is consistent with previous findings (Fennema and Sherman 1976b; Reyes 1984).

Measurement invariance analysis

To ensure the final factor structures obtained from the previous analyses are not specific to the studied sample, measurement invariance analyses were conducted using data from the confirmatory factor analysis sample and the cross-validation sample. Achieving measurement invariance means the same construct is measured in the same way between the two studied samples.

Testing measurement invariance includes a few steps. First, the research team used a “configural” invariance model as the baseline model against which all following models were tested. In the configural model, all factor loadings, intercepts, and residuals were estimated freely for the two samples. Thus, the research team thinks of this model as the “best” model with the most parameters. In subsequent steps, the research team used deviance tests (i.e., likelihood-ratio tests) to compare nested models in which additional modeling constraints were placed on factor loadings, intercepts, and residuals. The research team used robust maximum likelihood estimation and adjusted the deviance tests accordingly to account for the scaling factor.

In the second step, the research team tested “metric” invariance to determine whether the two samples had equal factor loadings. For each item, factor loadings were constrained to be the same across the two samples. Specifically, loadings for the same item were constrained to be equal between the CFA sample and the cross-validation sample. Meanwhile, each item was still allowed to have a different loading. This model was nested within the configural model. If the metric invariance model fits the same statistically as the configural model, it indicates the loadings are equal across groups. In this study, metric invariance was achieved for all three factors: Confidence, χ 2 difference (9) = 14.5, p = 0.23; Effectance Motivation, χ 2 difference (10) = 18.7, p = 0.08; and Anxiety, χ 2 difference (11) = 12.8, p = 0.39. Thus, the two samples had equal factor loadings for each common item.

In the third step, the research team tested “scalar” invariance to determine whether the two samples had equal item intercepts for the same item. In this model, intercepts for the same item were constrained to be equal across the samples, as were the factor loadings. The scalar invariance model was nested within the metric invariance model. The deviance test showed that full scalar invariance was achieved for all three factors: Confidence, χ 2 difference (9) = 8.0, p = 0.55; Effectance Motivation, χ 2 difference (10) = 6.7, p = 0.75; and Anxiety, χ 2 difference (11) = 13.7, p = 0.24. Thus, in addition to equal factor loadings, the two samples had equal intercepts for each common item.

The final step assessed “residual” invariance in which residual variances for the same item were constrained to be equal across the two samples. The residual variance invariance model was nested within the scalar invariance model. For each item, the factor loadings and intercepts continued to be constrained to be equal across the samples. The deviance test showed that full residual variance invariance was achieved for all the factors: Confidence, χ 2 difference (10) = 23.7, p = 0.30; Effectance Motivation, χ 2 difference (11) = 17.9, p = 0.30; and Anxiety, χ 2 difference (12) = 15.6, p = 0.60. Thus, in addition to equal factor loadings and intercepts, the two samples had equal residual variances for each common item. To conclude, achieving measurement invariance suggests that the final constructs of the FSMAS-T were not specific to the confirmatory factor analysis sample. By cross-validating the constructs of the FSMAS-T with a similar (but separate) sample of lower-primary teachers, the research team has faith in the usability and reliability of the measurement.

Using the FSMAS-T to evaluate program effect

The research team used the FSMAS-T to measure the impact of a research-based, intensive professional development program, Primarily Math, in changing teachers’ mathematical attitudes. Primarily Math is a six-course (18 credit hours), 13-month elementary mathematics specialist (EMS) program aimed to strengthen kindergarten to third grade teachers’ pedagogical and mathematical content knowledge and to improve instruction to the extent that it creates measurable gains in mathematics achievements of kindergarten to third grade students across the state. Of the six courses, three are focused on increasing teachers’ mathematical knowledge for teaching, while the other three are focused on improving teachers’ knowledge in pedagogy and child development (refer to http://scimath.unl.edu/primarilymath/ and Kutaka et al. [under review] for detailed information regarding the sequence, content, and sample assignments of the six courses).

The Primarily Math leadership team designed the professional development program to help teachers acquire more knowledge in mathematical content, mathematics teaching, pedagogy, and child development. Through this program, it was expected that teachers would become more confident in their abilities to learn mathematics, more motivated to learn more mathematics, and less anxious toward learning mathematics. In Kutaka et al. (under review), the research team reported findings of the impact of Primarily Math on teachers’ mathematical attitudes measured by the FSMAS-T; the teachers under investigation were a subset of teachers from the CFA sample. The findings showed that teachers who participated in Primarily Math had significant increases in their confidence and effectance motivation and a significant decrease in their anxiety from pretest (the testing occasion prior to starting Primarily Math coursework) to posttest (the testing occasion immediately after the completion of Primarily Math). In addition, Primarily Math teachers showed larger increases in their confidence and effectance motivation, as well as a larger decrease in their anxiety toward learning mathematics relative to the comparison teachers whose scores remained statistically unchanged across the research study. Detailed results and discussion of the findings were described in Kutaka et al. (under review).

The Primarily Math research team also used the FSMAS-T to assess changes in teachers’ attitudes toward learning mathematics among a group of 39 in-service teachers who participated in the Primarily Math professional development program during 2014–2015. These 39 teachers were not part of the CFA sample or the cross-validation sample, but they all came from one of the same core partner districts in which teachers in the CFA sample and the cross-validation sample taught. This district decided to use Title IFootnote 3 professional development money to fund 39 teachers to complete Primarily Math. There was an open application process, which followed the same procedures as those used to select the Primarily Math participants in the original research studies: the district chose 39 teachers (1 male) from over 120 who applied. The teachers had an average of 7 years of teaching experience. About 87 % of the teachers were teaching lower-primary grades (kindergarten to third grade); 8 % of them were teaching multiple grades or upper-primary grades (fourth to sixth grade), and 5 % of the teachers were special education teachers, English language learner teachers, or math coaches.

Throughout their participation in Primarily Math, teachers completed the FSMAS-T during four different measurement occasions. The first measurement occasion (pretest) was completed at the beginning of the summer of 2014, prior to starting any coursework. The second measurement occasion (midtest 1) was taken at the end of the summer of 2014, after completing two mathematics content courses. During the 2014–2015 academic year, teachers completed one pedagogy course each semester. The third measurement occasion (midtest 2) was completed at the beginning of the summer of 2015. During the summer of 2015, teachers completed the last two courses of Primarily Math, including one mathematics content course and one pedagogy course. The fourth measurement occasion (posttest) was taken in summer 2015, after the completion of all Primarily Math courses.

It was hypothesized that teachers would become more confident and motivated and less anxious toward mathematics after learning more about mathematics, mathematics teaching, and child development through Primarily Math. Mean confidence, effectance motivation, and anxiety scores were calculated. The research team fit a separate repeated measures ANOVA model for each of the three outcomes, in which the confidence, effectance motivation, or anxiety score was included as the dependent variable, and measurement occasion was specified as the predictor. For teacher-reported confidence, the effect of measurement occasion was significant, F (3, 108) = 6.42, p < 0.001. Post hoc comparisons showed teachers’ confidence gradually increased; when compared to pretest scores, teachers reported more confidence at midtest 1 (t = 2.03, p = 0.045), midtest 2 (t = 3.38, p = 0.001), and posttest (t = 4.07, p < 0.001). Teacher-reported effectance motivation also changed significantly across measurement occasions, F (3, 108) = 3.70, p = 0.014. Post hoc comparisons indicated that compared to pretest scores, teachers reported higher levels of effectance motivation at midtest 1 (t = 2.67, p = 0.001), midtest 2 (t = 3.05, p = 0.003), and posttest (t = 1.98, p = 0.050). Finally, teacher anxiety also changed across time, F (3, 108) = 4.78, p = 0.004. Teacher anxiety gradually decreased over time; when compared to pretest scores, teachers reported lower levels of anxiety at midtest 2 (t = 2.40, p = 0.018) and posttest (t = 3.73, p < 0.001). Figure 2 presents teachers’ confidence, effectance motivation, and anxiety at each of the four measurement occasions. Because there was not a comparison group for this set of participants, the research team cannot attribute these changes to participation in Primarily Math. Nevertheless, the findings reported by Kutaka et al. (under review), in conjunction with the results described above, provide evidence that the FSMAS-T is a valuable tool for evaluating the effectiveness of professional development in changing lower-primary teachers’ attitudes toward learning mathematics.

Fig. 2
figure 2

Changes of Primarily Math teachers’ attitudes toward learning mathematics

Conclusions

The goal of this study was to determine if the research team could use the Confidence, Effectance Motivation, and Anxiety scales from the FSMAS (Fennema and Sherman 1976a) to assess lower-primary teachers’ attitudes toward learning mathematics, and if so, to use the measure to assess the impact of a professional development program. After using the pilot study results to choose which revised FSMAS scales to use, the research team was able to establish the instrument’s reliability for use with teachers. Confirmatory factor analyses were conducted to investigate the factor structure of the revised measure. The research team found that each scale measured one latent construct after deleting a few problematic items. Measurement invariance analyses further validated the psychometric properties of the revised instrument.

The current study adds knowledge to the research of teachers’ attitudes toward learning mathematics. Previous research suggests that teachers’ mathematical attitudes influence students’ attitudes toward mathematics, students’ mathematics achievement, and teachers’ instructional practices in the classroom (Beilock et al. 2010; Karp 1991; Stipek et al. 2001; Wilkins 2008). Thus, reliable and validated assessments of teachers’ mathematical attitudes are needed. The FSMAS-T is useful for research investigating lower-primary teachers’ attitudes toward learning mathematics, including research to study whether professional developmental programs have a direct impact on teachers’ attitudes toward learning mathematics. Results from this instrument may increase researchers’ and educators’ understanding of teachers’ mathematical attitudes, particularly with respect to their relationship with and influence on students’ attitudes and mathematics achievement, as well as their impact on teachers’ instructional practices in the classroom.

In this study, teacher samples came from a Midwestern state in the USA, and all but a few of the teachers are female. In addition, many of the teachers in the samples were enrolled in voluntary professional development. These characteristics aligned with the research team’s target population, because the research team intended to use the FSMAS-T with lower-primary teachers participating in professional development, and worldwide, most countries have more female lower-primary teachers than male (The World Bank 2015). However, these features could limit the generalizability of the findings. Thus, while the research team was able to validate the FSMAS-T with the heavily female population of Midwest primary teachers in the USA, further research is necessary to validate the use of the instrument with male primary teachers, primary teachers in other areas of the USA, and primary teachers in other countries. The research team knows that the status of teaching as a profession has wide variation across different countries with different cultural contexts. Teacher credential requirements also vary widely by country (e.g., Tatto 2013), which can further result in wide variation in teachers’ attitudes toward mathematics. Because the FSMAS has been used with students from many cultures (e.g., Alkhateeb 2004; Forgasz et al. 1999; Norton and Rennie 1998; Sachs and Leung 2007; Vezeau et al. 1998), it may also be appropriate to use the FSMAS-T with teachers from other cultures. In addition, the FSMAS-T may be applicable to preservice teachers and/or in-service mathematics teachers who teach upper-elementary grades, middle school, or secondary school. Additional research is needed to examine the use of the FSMAS-T among these populations. Once the usability of the FSMAS-T is established, the FSMAS-T can be used to compare teachers’ mathematical attitudes across grade levels and across cultures.