Gender is one of the most salient features we use to categorize and process social stimuli. Children begin learning gender stereotypes before they reach preschool age, and their stereotype knowledge becomes increasingly refined across childhood (Huston, 1983; Ruble & Martin, 1998). Gender role flexibility can be defined as the transcendence of traditionally gender-typed traits, roles, and behaviors in the judgment of others and the self. For a child to be truly flexible, he/she must understand cultural norms and weigh personal likes and dislikes above cultural prescriptions for gender-typed behavior (Katz, 1996). Developmental differences can be identified in how faithfully children adhere to gender stereotypes in their judgments of others (Martin, 1989; Sigelman, Carr, & Begley, 1986; Smetana, 1986; Stoddart & Turiel, 1985) and in their own preferences (Edelbrock & Sugawara, 1978; Emmerich & Shepard, 1982; Nelson & Keith, 1990; Turner & Gervai, 1995). The limitation of much of this literature is the lack of longitudinal data and/or inadequate measures of gender role flexibility. The purpose of this article is to address these limitations by presenting a 2-year longitudinal study of early adolescents’ gender role flexibility that used a multidimensional assessment of gender role flexibility.

Multidimensionality of Gender Typing

Research on adults (e.g., Spence, 1993; Spence & Buckner, 2000; Twenge, 1999) suggests that gender typing is best viewed in a multidimensional context. Several developmental researchers have also maintained that the target, domain, and form of stereotyping need to be distinguished (Biernat, 1991b; Bigler, 1997; Eccles, 1987; Huston, 1985; Liben & Bigler, 2002). A useful framework for discussing the multiple factors in the development of gender typing was developed by Huston (1983) and expanded by Ruble and Martin (1998) to include Concepts or Beliefs, Identity or Self-Perception, Preferences, and Behavioral Enacting. Based on their concepts of gender typing, I consider gender role flexibility to consist of three constructs: Attitudes, Self-Perceptions/Preferences, and Behaviors. The construct of Attitudes consists of individuals’ concepts about the general appropriateness of various gender-typed behaviors for men and women in general. Self-Perceptions/Preferences encompasses individuals’ gender-typed self-descriptions and preferences, such as personality traits and activities. Finally, the construct of Behaviors includes the gender role traditionality of behavior in experimental and natural settings.

Consistent with this framework, researchers have found that attitudes toward others and self-perceptions or preferences are not necessarily related either for children (e.g., Katz & Ksansnak, 1994; Signorella, Bigler, & Liben, 1993) or adults (e.g., Spence & Buckner, 2000; Twenge, 1999). When the flexibility of children’s gender-typed behaviors is considered, the relations among the three constructs (attitude, self-perception, and behavioral flexibility) become more complex. For example, are attitudes or self-perceptions better predictors of behavioral choices? Based on the literature on attitude–behavior consistency, one would have to respond that it depends on the context (Taylor, Peplau, & Sears, 1994). High attitude–behavior consistency would be expected when the attitude is strong, stable over time, relevant to the behavior and salient or accessible. The first three conditions for high attitude–behavior consistency seem to be met for gender role attitudes and behavior. The salience of gender role flexibility attitudes when making personal activity choices, however, is probably not high. Individuals are more likely to consider their personal preferences or self-perceptions when choosing their free time activities. Unfortunately, there are few studies of gender role flexibility that include a behavioral measure. If one is included, it is usually assessed as preschoolers’ toy choices during free play (e.g., Moller & Serbin, 1996) or children’s behavior in an experimental situation (e.g., Katz & Walsh, 1991). Because children’s behavior outside an experimental setting has not been fully explored, it is difficult to determine the relations between attitudes and behavior. One goal of the present study was to develop a measure of early adolescents’ free-time gender-typed behavior to address the nature of the attitude–behavior relationship.

Gender role flexibility has been assessed through domains such as activities, socially prescribed roles/behaviors, occupations, physical appearance and personality traits (Signorella et al., 1993). Although most researchers have measured flexibility in terms of only one domain, several investigators have begun to assess flexibility in multiple domains (Biernat, 1991a, 1991b; Boldizar, 1991; Katz & Boswell, 1986; Katz & Ksansnak, 1994; Liben & Bigler, 2002; Serbin, Powlishta, & Gulko, 1993; Zuckerman & Sayre, 1982). For example, in a study with fourth, fifth, and sixth grade children, Spence & Hall (1996) found that scores on the children’s version of the Personal Attributes Questionnaire (Spence & Helmreich, 1978, as cited in Spence & Hall, 1996) and the children’s version of the Bem Sex Role Inventory (Boldizar, 1991) were not correlated with children’s activity preferences. In addition, the correlations between occupational and activity preferences were low and not significant, which supports a multidimensional view.

Liben and Bigler (2002) developed a measure of gender role flexibility called the Children’s Occupations, Activities, and Traits Scale (COAT), which was used in the present study. In addition to including the three domains of occupations, activities, and traits, the COAT provides for the assessment of both self-perceptions/preferences and attitudes toward others. Analysis of their standardization samples revealed an interesting pattern of intercorrelations among the subscales suggestive of independence among the domains of gender role flexibility (Liben & Bigler, 2002). Correlations among the three domains of attitudes towards others ranged from moderate to strong whereas correlations among the children’s self-perceptions were weaker or non-significant. There was moderate consistency in self-perceptions, however, as both boys and girls seemed to indicate preference for masculine and feminine items across domains. Correlations between attitudes and self-perceptions showed either weak or non-significant correlations. Taken together, these patterns of correlations support the contention that masculinity–femininity is not a global construct, but it is context specific.

Theories of Gender Role Flexibility Development

Researchers have developed theories that specifically address the developmental trajectory of gender role flexibility in adolescence. Kohlberg’s (1966) cognitive developmental theory states that adolescents become increasingly flexible as they mature (Katz & Ksansnak, 1994; Signorella, Frieze, & Hershey, 1996) and, as they do so, resist rigid application of cultural gender stereotypes. As children are able to reason and consider multiple perspectives in formal operations, cognition usually becomes more flexible.

Because rebelling against social conventions is an important means of achieving independence during adolescence, Katz (1979) predicted that adolescents would be less likely to adhere to gender role stereotypes and would become more flexible, especially when their peers are modeling and supporting this flexibility. In a study of 8 to 18 year olds, Katz and Ksansnak (1994) found that tolerance for others, or attitude flexibility, was highest in the late adolescent age group, and girls were consistently more flexible than boys, which supports the cognitive developmental theory and Katz’s (1979) model of adolescence as a time of increased flexibility. Of course, these data are cross-sectional, which limits the conclusions that may be drawn. Longitudinal data collected during the early adolescent period by Galambos, Almeida and Petersen (1990) also showed that girls’ attitude flexibility increased with age.

Other researchers have postulated that gender role flexibility decreases in early adolescence as a function of biological and social changes, such as puberty and dating (Alfieri, Ruble, & Higgins, 1996; Eccles, 1987; Feiring, 1999; Galambos et al., 1990; Hill & Lynch, 1983; Rebecca, Hefner, & Oleshansky, 1976). Hill and Lynch (1983) proposed the gender role intensification hypothesis which suggests that the onset of puberty stimulates an over-identification with stereotypes as adolescents begin to consider their adult gender roles. Most of the evidence they cited, however, was behavioral (achievement behaviors and friendship quality, for example), and they did not address changes in attitudes or self-perceptions. Research that included measures of gender role flexibility, however, has provided some empirical support for increased rigidity in early adolescence (e.g., Plumb & Cowan, 1984; Streitmatter, 1985) when gender roles are intensified as a result of biological changes and social environment transitions. For example, Alfieri et al. (1996) found that adolescents who had just entered middle school had significantly more flexible gender attitudes regarding personality traits than did those who had been in middle school for a year. These findings strongly suggest that changes in adolescents’ social environment are related to changes in flexibility, which led Alfieri et al. to speculate that inconsistent results in the adolescent flexibility literature may be the result of a lack of consideration of changes in school context.

The present longitudinal study of middle school children was designed to test the gender role intensification hypothesis that gender role flexibility decreases in early adolescence. Although several researchers have found some support for the hypothesis, replication is warranted with the addition of a multi-dimensional measure of gender role flexibility. It was hypothesized that gender role flexibility would decline across the middle school years. No specific predictions were made regarding which component of flexibility would follow this trajectory, as previous researchers have often included only a single measure of gender role flexibility. Gender differences in flexibility, however, was also expected to emerge such that girls would show more flexibility than boys in all three domains of stereotyping. Consistent with the multidimensional nature of gender typing, measures of attitude flexibility and self-perception flexibility were not expected to be highly correlated. The relations between attitudes and actual behavior were also of interest, and it was expected that the flexibility of activity choices would decrease throughout middle school and that self-perception flexibility would be more strongly related to actual activity choices than would attitude flexibility.

Materials and Methods

Participants

One hundred and thirty six sixth grade students (61 girls and 75 boys, with a mean age = 12.8 years at the start of the school year) enrolled in two sixth through eighth grade middle schools in northern Georgia served as participants in the present study. Over 90% of the sample was retained (58 girls and 66 boys) when the students started seventh grade. The racial composition of the sample was predominately White, and students had a mainly lower–middle to upper–middle class socio-economic status. Permission of the school district and parental consent was obtained prior to the start of the study. Students were paid $20 each year in return for their participation.

Children were asked to complete the activity diaries monthly as part of a larger research project. Gender role flexibility was assessed in the fall and spring of sixth and seventh grade along with other measures not included in the present study.

Materials

Early adolescents’ gender role flexibility was measured in terms of attitude flexibility, self-perception flexibility, and behavioral flexibility. Liben and Bigler’s (2002) Children’s Occupation, Activity, and Trait Scale (COAT) was used to assess attitudes and self-perceptions and activity diaries were used to measure behavioral flexibility. The COAT has high internal and test–retest reliability as well as strong external validity (Liben & Bigler, 2002). Reliability was also high for both the COAT-Attitude Measure (AM) and -Preference Measure (PM) with alphas that range between 0.67 and 0.87. In the present study, reliability was also high; alphas ranged between 0.80 and 0.90.

Attitude flexibility

The Children’s Occupation, Activity and Trait-Attitude Measure (COAT-AM) contains items that address children’s gender role attitudes toward others in the three domains listed above. For each item, children were asked, “Who should...?” and provided with the choices “only boys/men,” “only girls/women,” and “both boys/men and girls/women” in the original measure. Two additional scale points of “mostly boys/men” and “mostly girls/women” were added to the COAT-AM for the present study to avoid biasing children toward the “both” option. Selection of the “both boys/men and girls/women” option indicates flexibility in gender role attitude for that particular item.

For each of the domains of occupations, activities, and traits, ten masculine items (such as “school principal” and “plumber” for occupations; “fixing bicycles” and “flying model airplanes” for activities; “confident” and “misbehave” for traits), ten feminine items (such as “librarian” and “housecleaner” for occupations; “vacuuming the house” and “doing gymnastics” for activities; “loving” and “complaining” for traits), and five neutral items (such as “artist” for occupations; “going to the beach” for activities; “creative” for traits) were included for each of the three domain subscales. Items were selected to vary in status and desirability, but to be equal in the degree of stereotypicality (Liben & Bigler, 2002). Given this variation in desirability, the additional rating option of “neither boys nor girls” was provided in the trait subscale for those traits that children consider undesirable for either gender. Flexibility scores were calculated separately for gender appropriate (hereafter referred to as “same-gender”) occupations, activities, and traits and for gender inappropriate (hereafter referred to as “other-gender”) occupations, activities, and traits.

Self-perception flexibility

The Children’s Occupation, Activity and Trait-Personal Measure (COAT-PM) consists mostly of the same items as the COAT-AM with slightly different questions and scoring procedures in order to allow for the measurement of attitudes toward others and toward the self in the same domains with very similar items (Bigler, 1997; Liben & Bigler, 2002).

The occupation subscale asks children “How much would you like to be a ?” Items included masculine jobs (“lawyer,” “construction worker”), feminine jobs (“nurse,” “supermarket check-out clerk”), and neutral jobs (“writer”). Children rated their responses on a four point scale (1 = not at all to 4 = very much). In the activity subscale, children rated “How often do you...?” on a four point scale (1 = never to 4 = very often). Items included masculine activities (“build forts,” “fix cars”), feminine activities (“practice cheerleading,” “wash clothes”), and neutral activities (“ride a bicycle”). The trait subscale had children complete the sentence “This [trait] is...” on a four point scale (1 = not at all like me to 4 = very much like me). Items included masculine traits (“logical,” “aggressive”), feminine traits (“affectionate,” “dependent”), and neutral traits (“appreciative (thankful)”). For all three subscales, responses to same-gender and other-gender items were averaged to provide a mean preference score.

Behavioral flexibility

A structured diary was given to children to assess the kinds of activities they did in their free time outside of school. Activity choices outside the school context are important because they are unstructured and should be more reflective of children’s actual preferences than school activities.

The children were instructed to think about everything they did after school on the previous day. The diaries contained numbered blank spaces for the children to fill in each activity they did the previous day. For each activity, the children were asked to provide the following information: with whom they did the activity, the duration of the activity, and their enjoyment of the activity. To facilitate the scoring of the diaries, the children were provided with a word list from which to choose their responses to each question. In addition, they were given a suggestion list for activities and instructed that the list was intended to help them think about what they did after school. They did not have to write down any of the activities provided. Eight activity slots were provided in the diaries, but children were instructed to add more slots as needed.

The activity diaries provided three indices of after school activities. First, three independent coders determined whether each activity would be considered same-gender or not. Activities such as mowing the lawn and playing baseball were considered same-gender for boys and other-gender for girls. Talking on the phone and shopping, for example, were classified as same-gender for girls and other-gender for boys, and doing homework and watching TV, for example, were classified as gender-neutral activities. Inter-rater reliability was determined for 25% of the diaries as all raters agreed on the gender stereotyping of over 95% of the activities. Second, the length of the activity, in minutes, was measured as an indirect assessment of interest in the activity. Third, how much the child enjoyed the activity was measured.

Procedure

Data collection began in the fall of sixth grade, the children’s first year in the middle school, and continued to the spring of seventh grade. Before each questionnaire session, the children were asked if they would like to answer some questions about themselves and others. The gender role flexibility measures (COAT-AM and COAT-PM) were administered in mixed gender groups of 20 during the school day in the fall and spring of each school year. Both male and female experimenters were present. The order of the activity, occupation, and trait subscales within the AM and PM measures were counterbalanced. The children completed a different order at each of the four administration sessions. Completion of both COAT scales took approximately 20 min.

Activity diary information was collected monthly in conjunction with diary data for an unrelated study. On the selected day of the month, children were instructed to complete the diary during a questionnaire administration session.

Results

To evaluate change in gender role flexibility, each of the three gender role flexibility measures was submitted to a repeated measures analysis of variance (ANOVA). Each measure of flexibility is discussed in turn. For both of the COAT measures (AM and PM), six separate scores were calculated for the same-gender, or gender appropriate, items and other-gender, or gender inappropriate, items of the Activities, Occupations, and Traits subscales.

Attitude Flexibility (COAT-AM Scale)

The three subscales of the COAT-AM measure (Activities, Occupations, and Traits) were scored by calculating the percentage of “both boys and girls” responses for the same-gender and other-gender items. The descriptive statistics for these subscales across all four time points are listed in Table 1. Because the data for the COAT-AM scale were proportions, all participants’ scores were transformed by the arcsine function prior to the analysis of variance in order to normalize the distribution and to eliminate the correlation between the mean and standard deviation (Snedecor & Cochran, 1967).

Table 1 Descriptive statistics for the attitude flexibility measure (COAT-AM) at all four time points (non-arcsined data).

The transformed data were submitted to a 2 (gender) by 4 (time) by 3 (domain) by 2 (gender typing of item) ANOVA with repeated measures on the last three factors. All significant main effects and interactions are listed in Table 2.

Table 2 Summary of significant main effects and interactions for the COAT-AM 4-way ANOVA.

The main effect of time revealed a linear increase in gender role flexibility from the beginning of sixth grade through the end of seventh grade, which is contrary to expectations. Although there was no main effect of gender, there was a significant gender by gender typing interaction which revealed a markedly different pattern of gender role attitudes for boys and girls. Girls were more flexible about other-gender items than about same-gender items, whereas boys were more flexible about same-gender items than about other-gender items.

The other notable interaction was the significant domain by gender by gender typing interaction (see Fig. 1). Post-hoc comparisons revealed that the girls had significantly more flexible attitudes regarding other-gender activities than about same-gender activities, t(60) = 4.50, p < 0.001, but did not significantly differ in their other-gender and same-gender attitudes for traits and occupations. Boys, on the other hand, had more flexible attitudes about same-gender than about other-gender activities, t(71) = 3.96, p < 0.001, and occupations, t(71) = 6.24, p < 0.001, but were more flexible regarding other-gender than about same-gender traits, t(71) = 2.93, p < 0.01. The only significant between gender difference was attitudes about other-gender occupations. Girls had more flexible attitudes about other-gender occupations than boys did, t(131) = 3.70, p < 0.001 (girls’ M = 0.67, boys’ M = 0.51). It should be noted that this pattern of findings clearly support previous findings of greater flexibility in the masculine domain than in the feminine domain.

Fig. 1
figure 1

Gender by domain by gender typing interaction for the attitude flexibility measure (COAT-AM).

Self-Perception Flexibility (COAT-PM Scale)

The three subscales of the COAT-PM measure (Activities, Occupations, and Traits) were scored by averaging the participants’ responses for the same-gender and other-gender items, thus yielding two scores for each subscale. The descriptive statistics for the self-flexibility measure at all four time points are listed in Table 3.

Table 3 Descriptive statistics for the self-perception flexibility measure (COAT-PM) at all four time points.

The mean scores were submitted to a 2 (gender) by 4 (time) by 3 (domain) by 2 (gender typing of item) ANOVA with repeated measures on the last three factors. All significant main effects and interactions are listed in Table 4.

Table 4 Significant main effects and interactions for the COAT-PM 4-way ANOVA.

The two-way interaction of domain by gender typing yielded significant and interpretable results. The children consistently preferred same-gender items to other-gender items, but this difference was more pronounced for activities (same M = 2.59, other M = 1.76) and occupations (same M = 2.07, other M = 1.58) than for traits (same M = 3.13, other M = 2.89).

The domain by gender by gender typing interaction provided further clarification of these relations (see Fig. 2). Both boys and girls significantly preferred same-gender to other-gender activities, boys: t(76) = 12.65, p < 0.001; girls: t(60) = 13.24, p < 0.001, and same-gender to other-gender occupations, boys: t(76) = 13.62, p < 0.001; girls: t(60) = 6.34, p < 0.001. For traits, however, girls rated same-gender personality traits as more self-descriptive than other-gender traits, t(60) = 10.23, p < 0.001, whereas boys were equally likely to choose same-gender and other-gender traits in describing themselves.

Fig. 2
figure 2

Gender by domain by gender typing interaction for the self-perception flexibility measure (COAT-PM).

There were two significant interactions with the time variable, which indicate some developmental change. The time by gender by gender typing interaction (see Fig. 3) suggests that girls’ preference for same-gender items slightly increased over time, whereas boys’ preferences for same-gender items remained fairly stable. Preference for other-gender items, however, slightly decreased for girls, but slightly increased for boys. None of these comparisons was statistically significant, however, which makes it difficult to reach any firm conclusions about the nature of this interaction. The time by domain by gender by gender typing interaction was also suggestive of this weak pattern of gender role intensification for girls, but the small effect size makes developmental conclusions rather tentative.

Fig. 3
figure 3

Time by gender by gender typing interaction for the self-perception flexibility measure (COAT-PM).

Behavioral Flexibility (Diary Data)

Diaries from 3 to 4 months were aggregated to derive scores for the fall and spring of sixth grade and seventh grade. The number of same-gender, other-gender and neutral activities at each time point were divided by the total number of activities in order to calculate the proportion of same-gender, other-gender, and gender neutral activities. The descriptive statistics for these proportions at all four time points are listed in Table 5. As with the COAT-AM data, the diary proportion data were transformed with the arcsine function before being submitted to a 2 (gender) by 4 (time) by 3 (gender typing) ANOVA with repeated measures on the last two factors.

Table 5 Descriptive statistics for the behavioral flexibility measure (Diary) at all four time points.

A main effect of gender typing was found, F(2, 230) = 202.926, p < 0.001, η 2 = 0.638, power = 1.0, where children reported participating more in neutral activities (M = 0.71) than either same-gender (M = 0.32) and other-gender (M = 0.10) activities. The time by gender interaction was also significant, F(3, 345) = 2.928, p = 0.038, η 2 = 0.025, power = 0.665, but the effect size was so small as to suggest that this interaction was not meaningful.

In addition to analyzing the proportion of same-gender, other-gender and gender neutral activities, the amount of time spent doing the activities and the degree to which the children enjoyed the activities were submitted to ANOVA. The amount of time was measured in minutes and averaged across the diaries in each of the four times of measurement. The liking of the activities was measured on a four point Likert scale and averaged like the time variable. Separate 2 (gender) by 3 (gender typing) ANOVAs with repeated measures on the last factor were calculated for each of the four times of measurement because there was too much missing data for three way ANOVAs to be computed. Nine to 43% of participants at each time point left the liking and time questions blank, so only seven participants had the complete data necessary for three way ANOVA.

For the amount of time spent doing the activities, a significant main effect of gender typing was found only for the fall of sixth grade data, F(2, 114) = 8.8, p < 0.001, η 2 = 0.134, power = 0.968. Children spent significantly more time in same-gender (M = 76.37) and neutral activities (M = 77.46) than in other-gender activities (M = 51.67). In the spring of seventh grade, there was a gender by gender typing interaction, F(2, 92) = 7.50, p = 0.002, η 2 = 0.14, power = 0.913. Girls spent more time in other-gender activities (M = 103.83) than in either same-gender (M = 66.31) or neutral activities (M = 65.58) whereas boys spent more time in same-gender (M = 78.26) and neutral activities (M = 71.53) than other-gender activities (M = 52.85).

As for the enjoyment of the activities, a main effect of gender typing was found in the fall of sixth grade and the fall and spring of seventh grade. In the fall of sixth grade, F(2, 126) = 9.07, p < 0.001, η 2 = 0.13, power = 0.97, children liked same-gender activities (M = 3.38) more than other-gender (M = 3.04) and neutral activities (M = 2.94). In the fall of seventh grade, F(2, 96) = 11.51, p < 0.001, η 2 = 0.19, power = 0.99, same-gender (M = 3.46) and other-gender activities (M = 3.28) were liked more than neutral activities (M = 2.88). Similarly, in the spring of seventh grade, F(2, 86) = 16.64, p < 0.001, η 2 = 0.279, power = 1.0, same-gender (M = 3.42) and other-gender activities (M = 3.47) were liked more than neutral activities (M = 2.84). These results, taken together, provide only weak support to the contention that middle school children are more interested in same-gender activities than other-gender and neutral activities.

Correlations

To address the second objective of this study, the multi-dimensionality of gender role flexibility was also examined by correlating the attitude, self-perception, and behavioral measures of flexibility. It was predicted that correlations within components of flexibility would be higher than correlations between components. The data generally support this prediction. Very few of the COAT-AM and PM scales were correlated at any of the four time points. The few correlations that were significant were modest (range of r: 0.18 to 0.38). Within attitude and self-perceptions flexibility, the domains were inter-correlated. The activity, occupation, and trait subscales were significantly correlated for both the COAT-AM (range of significant rs: 0.26 to 0.83) and PM scales (range of significant rs: 0.22 to 0.55) at all times of measurement.

Almost none of the correlations between the COAT-AM (attitude flexibility) scale and the measures of behavioral flexibility were significant (range of significant rs: −0.31 to 0.24), which is consistent with the hypotheses. There were also few unexpected significant correlations between the self-perception flexibility and behavioral flexibility measures (range of significant rs: −0.20 to 0.22).

To examine further how attitudes and self-perceptions predict gender-typed behavior, separate hierarchical multiple regression analyses for same-gender and other-gender activities at each time point were conducted. For each analysis, gender was entered on the first step, self-flexibility measures for that time point were entered on the second step, and attitude flexibility measures were entered on the third step. The time 1 same-gender activities model yielded significant results, F(13, 111) = 1.96, p < 0.05, R 2 = 0.19; lower preferences for other-gender occupations and being female predicted same-gender activity choices. The time 1 other-gender activities model was also significant, F(1, 123) = 5.04, p < 0.05, R 2 = 0.04, but gender was the only significant predictor such that being female predicted more other-gender activity choices. Consistent with the correlational data, attitude gender role flexibility did not significantly predict activities. None of the other models was significant, which suggests that neither gender role attitudes nor self-perceptions played a substantive role in predicting children’s gender-typed behaviors.

Discussion

The first objective of this study was to test the gender role intensification hypothesis using a multi-dimensional assessment of gender role flexibility. The prediction of decreased flexibility during early adolescence was weakly supported and the results for each component of flexibility will be discussed separately in light of gender role intensification and cognitive developmental theory.

The first component, attitude flexibility, followed a different developmental trajectory than expected. The significant main effect of time revealed a linear increase in flexibility from the fall of sixth grade to the spring of seventh grade. Although these results were not predicted, they are consistent with cognitive developmental theory and with the results of Liben and Bigler’s (2002) recent longitudinal study. As adolescents’ ability to consider multiple perspectives develops, they become more accepting of alternative models of behavior. Following this logic, it makes sense that children’s attitudes toward others became more flexible.

Change over time in self-perception flexibility was much more complex, but did not provide strong support for the hypothesis of decreased flexibility in early adolescence. The expected time by gender typing interaction was not significant. This is consistent with Liben and Bigler’s (2002) results on the COAT scale; they did not find any significant change over time in preferences. Unlike Liben and Bigler’s (2002) results, the three- and four-way interactions were significant, which shows that girls seemed to become less flexible over time. Although the effect sizes of these interactions were small, girls’ preferences for same-gender items seemed to slightly increase whereas their endorsement of other-gender items remained relatively stable. Boys, on the other hand, did not exhibit the same pattern in their activity preferences and seemed to exhibit more stability than increased or decreased flexibility.

Of particular interest in these findings is the difference between the girls’ and boys’ preferences for other-gender activities. Some researchers have suggested that gender role intensification in early adolescence is more evident in girls because there is more pressure for them to conform to the adult feminine role (Balk, 1995; Richards & Larson, 1989). It is socially acceptable for young girls to be “tomboys,” but once they enter adolescence it is no longer appropriate for them to act more masculine than feminine (Hyde, Rosenberg, & Behrman, 1977). Boys are not subject to greater pressure to conform to stereotypes in adolescence because it is rarely considered appropriate for boys to act more feminine than masculine (Martin, 1990), so the pressure to conform is constant. Although this rationale makes sense, the data do not strongly support it. As the above research would predict, boys in this sample showed consistent preferences for same-gender items, but the girls showed only a slight trend towards gender-role intensification.

The behavioral flexibility measure did not yield significant change over time. The types of after-school activities in which the children engaged did not significantly change from the fall of sixth grade to the spring of seventh grade, although there was a trend for boys to become more stereotyped over time. These results could simply mean that the activities early adolescents did in their free time did not change. It is also possible that the diary measure used in this study may not have been sensitive enough to detect change. For example, children reported that they watched television the previous day, but not necessarily which programs they watched. If children had been required to report more detailed information, gender coding of the activities could have been more fine-tuned.

It may not have been possible to obtain more details with this retrospective method, however, so alternative methods of obtaining activity information are suggested for future research. For example, children could be telephoned periodically and asked what they are currently doing. This technique would allow the researcher to ask for specific information about each activity. To insure the representativeness of the activities, however, many activities would need to be sampled, and the cost-effectiveness of this data collection technique becomes questionable. An equally expensive, though potentially worthwhile, technique would be to provide children with a beeper and ask them to record their current activity whenever the beeper went off (Csikszentmihalyi & Larson, 1984). Different methods of sampling early adolescents’ free-time behaviors could yield different results, so it is worthwhile to explore new data collection techniques.

Analysis of the variables regarding how much time was spent doing each activity and how much the children liked each activity suggested that they were more interested in same-gender activities than in neutral and other-gender activities. In the spring of seventh grade, there was an interesting gender by gender typing interaction in terms of how much time was spent doing the activities. Girls spent more time in other-gender activities than in either same-gender and neutral activities whereas boys spent more time in same-gender and neutral activities than in other-gender activities. One of the most frequently reported other-gender activities for girls was sports participation. It was common for these girls to report a sport practice, such as basketball, as one of their activities, an activity that usually lasted for at least 2 hours. To elaborate further on our coding scheme, when participants reported “sports” or “sport practice” as an activity, our default code was masculine. If a specific sport was reported, we were able to be more accurate in our coding. For example, “softball” was coded as feminine whereas “basketball” was coded as masculine. Because quite a few of the girls in the sample participated in basketball or simply reported “sports practice,” these activities were coded as masculine or other-gender. These circumstances could explain why girls spent more time in other-gender activities. These data could also provide more support for the cognitive developmental theory of increased, rather than decreased, flexibility in adolescence and are clearly inconsistent with the predictions of gender role intensification.

It is interesting that attitude, self-perception, and behavioral flexibility all exhibited different patterns of change over the first 2 years of middle school. For attitude flexibility, both boys and girls linearly increased in the flexibility of their attitudes. For self-perception flexibility, only girls tended to become more stereotyped in their preferences, but even this tendency was only evident in the sixth grade data. Behaviorally, there was no significant change over time, although girls expressed more interest in other-gender activites in seventh grade. These data provide support for the contention of Signorella et al. (1993) that the results of a given study depend on the nature of the question being asked and on the operational definitions of the construct. This raises another question of whether these three flexibility measures are components of a global construct of gender role flexibility or whether they reflect independent constructs.

The second objective of this study was to try to answer this question regarding the possible independence of gender role attitudes, self-perceptions, and behaviors and explore the multi-dimensionality of gender role flexibility. All three measures of flexibility were inter-correlated as another test of the independence of the components. As predicted, correlations were stronger and more consistent within components than across components of flexibility. The few correlations that were significant across components were modest (around 0.2), which suggests that the relations could merely be due to chance alone. Regression analyses also revealed that attitude and self-perception flexibility did not significantly predict behavioral flexibility. Perhaps the connections between attitudes, self-perceptions and behaviors would have been stronger if the behaviors measured matched the items on the COAT-AM. The free-response format of the diaries used in the present study prevented such a comparison, but it is an important point for future researchers to consider.

For the COAT measures, all three domains of flexibility were correlated within the AM and PM subscales. So, the argument could be made that any one domain within attitude and self-perception flexibility is an adequate measure of flexibility (Liben & Bigler, 2002). Closer examination of the correlation matrices revealed that the stronger relations were between same-gender and other-gender items within each domain than the relations between same-gender and other-gender items across domains. These differences in the correlation coefficients were not statistically significant, however, so it is difficult to draw any conclusions regarding the magnitude of the relations. The present study makes a contribution to the gender role flexibility literature by illustrating the multi-dimensionality of the construct in several ways. First, the separate ANOVAs for attitude, self-perception, and behavioral flexibility revealed distinctly different patterns of effects and interactions. Consistent with Kohlberg (1966) and Katz (1979), attitude flexibility was found to increase linearly over time; girls became more flexible about other-gender than same-gender items and boys became more flexible about same-gender than about other-gender items. It was also found that attitudes regarding traits were consistently more flexible than those regarding activities and occupations.

Support for the gender role intensification hypothesis (Hill & Lynch, 1983) was weak in the self-perception and behavioral flexibility data. Self-perception flexibility did not change overall, but girls’ preferences for same-gender items slightly increased as their other-gender preferences remained fairly stable. This slight gender role intensification evident in girls’ self-perceptions might be a function of temporary changes in their context, such as adjusting to the new school environment. Boys, on the other hand, remained fairly stable in their self-perception flexbility. Finally, behavioral flexibility did not change over time, nor were there any significant gender differences.

The divergent pattern of results for the boys and girls suggests that they may be experiencing different transitions at the same time. For example, gender differences in pubertal status, a variable not measured in the present study, could explain why girls exhibited slightly more gender role intensification than boys did. The biological changes of puberty may heighten early adolescents’ awareness of their gender identity and lead to more adherence to stereotypes. Future researchers should include a measure of pubertal timing, as the effects of physical maturation on flexibility have yet to be determined.

In conclusion, gender role flexibility is not a global construct captured by a simple assessment of personality traits. Attitudes seem to be quite distinct from both self-perceptions and behaviors, whereas self-perceptions and behaviors exhibited similar patterns of change or no change at all. The method of measurement (attitudes vs. preferences vs. behavior) clearly affects results, and researchers need to consider this before drawing conclusions about the development of gender role flexibility in early adolescence.