Introduction

Global Gender Gap Index showed that Japan's gender gap is the worst among all advanced nations. Japan ranked 121st out of 153 countries in 2020 (Crotti et al. 2020). It is the dunce among the G7 developed countries and the lowest among the 38 OECD countries, followed only by Türkiye. Although Japan enacted the Equal Employment Opportunity Law in 1985, which required taking all appropriate measures to eliminate discrimination against women in all professional fields, progress has been slow over the decades (Estévez-Abe 2013; Yamada 2013). In the most recent national election in July 2022, 35 women got elected among 125 seats. It was the largest in history but still scarcely over one-quarter of the total.

Japan's Ministry of Education, Culture, Sports, Science and Technology (MEXT) has been concerned about it and tried to amend the situation through school education. MEXT set the school education guidelines with the teaching objectives for gender equality education. Social studies teachers teach the Equal Employment Opportunity Law in contemporary society in general civics classes at high schools and fundamental human rights in elementary schools. See Ogawa (2018) and Seki and Saito (2019) for various practice reports in Japanese schools.

However, gender equality education stagnates in Japan (Henninger 2019). One reason for the slow progress is the difficulty of assessing gender education's effectiveness. How can we assess any changes after gender education? Standard examinations can measure how much students learned but not how much their views of gender differences changed in their minds. Questionnaires were also unsuitable. The answers are easily distorted by social desirability biases. "Good students" know how they should answer the gender equality questionnaires. After a gender education class, teachers often let students write essays on what they learned from the lesson. Again, the students can write anything in their essays, irrespective of their genuine attitude toward women.

Monin and Miller (2001) hypothesized that a person would be more likely to show a prejudiced attitude in an actual choice behavior than in a conventional questionnaire, especially after answering "correctly" in questionnaires. Then, they let participants choose a suitable applicant in an imaginal job-recruitment scenario. They found that participants favored a white man over a black woman more after answering impartially in questionnaires than those in the control condition. It was a clever way to probe the hidden attitude of participants but unsuitable for child participants in school settings.

The changes may show up in their daily activities in school in the long run. Teachers can collect these anecdotes to assess the possible effects of gender education. However, teachers' observations can be easily distorted by subjective interpretations. It also takes time before determining the impact, even if it works for the genuine evaluation of gender education.

Social psychologists investigating discriminations, prejudices, and stereotypes started using the IAT (Implicit Association Test; Greenwald et al. 1998) to assess the implicit attitude of participants. Mori et al. (2008) converted the IAT into a paper-and-pencil version, the FUMIE test that could be used more easily in school settings without computers.

Akita and Mori (2022) assessed the effect of gender education by utilizing the FUMIE test before and after a 45-min gender equity lesson in an elementary school with 92 sixth graders (46 boys and 46 girls, 11–12 years old). They administered the FUMIE tests with the target word, "(woman)," to assess an implicit image of women among the sixth graders. The results showed that the boys' implicit image of "woman" improved after the lesson.

The present paper is a follow-up report of Akita and Mori (2022) after three years. Our original cohort graduated elementary school and entered junior high; some students came from the former and other elementary schools in the same district. Accordingly, we could assess the implicit images of "woman" in the same students of the Akita and Mori (2022) cohort and compare them with those of other students without gender education in their elementary schools.

Methods

Participants

One hundred and forty-seven ninth graders (72 boys and 75 girls; 14–15 years old) of Hirosaki University Junior High School participated in the present study. This junior high school has a system where about half of the incoming students come from the Hirosaki University Elementary School, and the other half come from other elementary schools in Hirosaki city. Furthermore, those students are assigned to classes (35–37 students each) in a balanced way. As a result, an ideal situation for this study was created without any pre-experimental effort. Namely, of the 147 participating students, 85 students (42 boys and 43 girls) were the same cohort of Akita and Mori (2022) (henceforth, Same Cohort); the remaining 62 (30 boys and 32 girls) served as control (Control). In addition, we also assessed a sample of 42 (22 boys and 20 girls) ninth graders from another junior high school in the city for a post hoc comparison (Post Hoc Control) with the original sample students.

FUMIE test administration procedures

We administered in class the same FUMIE tests with the target word "woman" to assess the implicit image of women following the standard procedure described in Uchida and Mori (2018b). The students performed the positive and negative tasks for three lines each, in 20 s for each line (See Appendix of Akita and Mori 2022, for a FUMIE test sheet). It took approximately five minutes, including the instructions (See Akita and Mori 2022, for a more detailed explanation of the FUMIE test).

The assessment procedure was not an intrusive one, neither physically nor mentally, being a simple performance test marking either a circle or a cross on the printed words with a pencil. We administered it to the students all together in the classroom. Therefore, it would not be practical to conduct a standard informed-consent procedure individually for the present study. Therefore, we fully informed the participants first, then instructed them that only those who would consent to participate in the assessment should submit the test sheet. In addition, we asked the participants to show their gender only without writing names and other information so that the assessment would go anonymously. This alternative procedure had been approved in advance by an IRB equivalent organization of the participating schools. As a result, no participants retained back their test sheets.

Results

Calculation of IAQ100 scores

For both the Same Cohort and Control students, we administered the FUMIE test without asking them to write their names on the test sheets. Nevertheless, we asked them to indicate their gender and whether they were from Hirosaki University Elementary School or not. In the following analyses, we compared the Same Cohort and Control data group-wise with the two independent variables: boys vs. girls and Same Cohort vs. Control.

As in our previous study, we counted the total number of words marked in 60 s (20 s each for three lines) for the positive and negative tasks (WP and WN, respectively). Then, we converted them into the Implicit Association Quotients (IAQ100) using the following formula:

$${\text{IAQ}}_{{100}} = 100 \times (W_{{\text{P}}} - W_{{\text{N}}} )/(W_{{\text{P}}} + W_{{\text{N}}} )$$

The IAQ100 index represents the difference in the number of words performed under the two conditions per 100 words. It shows a positive/negative balance when the examinee has a positive/negative implicit image for the target word. The IAQ100 scores varied around − 13.5 to + 24.46 for individual students in the previous study (Akita and Mori 2022).

After calculating IAQ100 scores, the outliers that deviated more than two standard deviations from the mean were removed from the following analyses. There were eight outliers, four each in the boys' and girls' data. (Since the FUMIE test measures the task performances within 20 s, an accidental delay in a participant's performance may cause a significant bias for statistical analyses. Therefore, the test developers recommended removing the outliers that fall beyond the two SD ranges for data analyses, cf., Uchida and Mori 2018a.)

Comparison of the Akita and Mori (2022) and the follow-up

Figure 1 shows the average IAQ100s of boys and girls in the Same Cohort and Control groups, along with those of the Akita and Mori study. Akita and Mori (2022) reported that the implicit image of "woman" in the sixth-grade boys improved considerably from almost neutral before the lesson. The same boys maintained their improved image three years later. Similarly, girls also kept high images of women for three years.

Fig.1
figure 1

Average IAQs of "WOMAN" for boys and girls at three assessment periods (The vertical bars show the standard errors.)

Since boys' and girls' data differed considerably, we performed a one-way between-participant ANOVA separately for boys and girls with the data obtained from Akita and Mori (2022) and the Same Cohort three years later. The statistical analyses revealed that only the pre-test IAQ100 score was significantly lower than the post-test and the follow-up for boys (F2,120 = 5.07, Cohen's effect size f = 0.2907, p < 0.01). Meanwhile, the girls' data showed no significant differences among the three test periods (F2,125 = 2.61, Cohen's effect size f = 0.2043, 0.05 < p < 0.10).

Comparison of the same cohort data with the control and the post hoc control

The follow-up data comprised the Same Cohort students who had participated in the previous study (Akita and Mori 2022) and the Control students who had not. We named the latter "Control" because they would serve as a control group to show the baseline level of the implicit image of women among ninth-grade students. Therefore, we expected the IAQ100 scores of the Control boys to be lower than those of the Same Cohort students.

However, the Control students showed a similar pattern to the Same Cohort students (see the furthermost right bars in Fig. 1). The Control boys had almost the same implicit image of women that the Same Cohort boys did. A two-way between-participant ANOVA revealed only the significant difference between boys and girls (F1,135 = 5.25, Cohen's effect size f = 0.1972, p < 0.05), but no significance between the Same Cohort and the Control students (F1,135 = 0.54, Cohen's effect size f = 0.0633, n.s.).

Such might be why the Control students in the follow-up study showed similar IAQ scores that they had studied with the Same Cohort students who had an impartial view of women. The unbiased views of women among the Same Cohort students might have positively affected their male classmates. If so, the Control students might not be appropriate as a control group. Therefore, we took another sample of 42 ninth-grade students (22 boys and 20 girls) from another junior high school for an additional control condition (Post Hoc Control). The Post Hoc Control boys showed lower IAQ100 scores for "woman" (4.20) than those of the Same Cohort boys (6.28). However, the difference was not statistically significant (F1,59 = 1.30, Cohen's effect size f = 0.1496, p = 0.2591).

Discussion

Why did the control boys show similar results without gender lessons?

How should we interpret these results? First, we found that the boys who had participated in the gender equity lesson retained their elevated image of women after three years. The sixth-grade boys who had a neutral idea about women (i.e., IAQ100 scores of 0.90) before the gender education improved their view of women considerably after the lesson (i.e., IAQ100 scores of 5.12). The Same Cohort of boys showed even higher scores (i.e., IAQ100 scores of 5.62) during the follow-up assessment three years later, which was the primary finding of the present follow-up study.

However, these results are subject to reservations. We should not directly attribute them to the effect of gender education at the elementary school three years ago because the two other samples of boys of the same age showed similar ranges of IAQ100 scores as the boys who participated in the gender equity lesson. Consequently, we could not attribute the elevated IAQ100 scores of the target boys solely to gender education three years ago.

It is worth noting that the FUMIE assessment was reliable, so the present data reasonably showed the examinees' reality. First, Mori et al. (2008) thoroughly examined the reliability of the test. Second, previous studies utilizing the FUMIE test (e.g., Kurita and Kusumi 2009; Sakai and Koike 2011; Uchida and Mori 2018b) repeatedly showed its reliability for various research purposes. In fact, the present data consistently shows that girls had more positive images of women than boys without any irregularities.

Here, we make a somewhat optimistic interpretation that the two Control boy groups had obtained an appropriate image of women by the age of 14–15, through school education and daily activities, at a similar level to the Same Cohort boys in the present study. As we stated in the introduction section, the MEXT of Japan was concerned about gender inequality and tried to amend it through school education. Although the effects of gender education have not been adequately assessed, they seem to come out at the subconscious level in the present study. It may take several decades before the impact of gender education becomes visible in areas such as the gender ratio of the Diet members. Still, at least at the unconscious level among children, gender equality is steadily progressing in Japan.

Limitations and future perspectives of the present follow-up study

Although we obtained positive results confirming that the effect of the gender lesson was present in the boy cohort three years later, we also found similar attitudes among the other boys toward women. In the discussion session above, we interpreted both results positively; the effect of gender education from three years earlier was still present in the boys, and the other boys also acquired similar images of women through other activities. However, our conclusion might be too optimistic.

Also, it was the only assessment study of the effect of gender education in Japan because it was intrinsically challenging to assess the impact properly. We attempted to tackle the difficulty with a new assessment procedure, the FUMIE test, and found some positive signs of gender education. As we stated at the beginning of this paper, Japan's gender gap is still the worst among all advanced nations. The Global Gender Gap Index is calculated based on the visible data in various social activities, such as the gender ratio of the members of representatives. The gender differences will be amended in the long run as people's minds gradually change. However, such changes are invisible and not easily observed. We believe that our studies have detected such invisible changes in Japanese children.

In conclusion, the previous gender equality education still showed its effects in the follow-up assessment three years later with the same implicit association test. The new assessment procedure also revealed possibly hidden progress in rectifying the gender inequality issue in Japan. However, our samples were small and limited to three schools. Therefore, we need to conduct a large-scale survey that utilizes implicit measures, such as the FUMIE test, to assess current attitudes toward gender differences in Japanese children and adolescents.