Keywords

1 Introduction and Background

Computational thinking education is an important component in the process of digitalization of society and the economy, as discussed in a recent study organized by the European Commission [2]. The European Commission encourages this focus by making computational thinking education a priority in order to improve digital skills and competences as part of the digital transformation. This study highlights that the transfer of computational thinking into educational subjects is a new field, which poses many challenges that need to be assessed in educational practice.

Evaluation and measurement of results are important in educational practice when implementing different teaching methods and tools. In the absence of a single unified definition of computational thinking, researchers working in this area apply different definitions [19, 25, 26]. As the result, there is also no single tool to assess computational thinking. For the assessment of computational thinking, various methods and tools have been developed today, such as Dr. Scratch, Bebras tasks, Zoombinis, CTt, CTS, etc.

Román-González categorized the tools for assessing computational thinking into the following groups: diagnostic tools, summative tools, formative-iterative tools, data-mining tools, skill transfer tools, perceptions-attitudes scales, vocabulary assessment [20].

Dr. Scratch as formative-iterative tool automatically analyzes Scratch programming projects and also can be used to develop computational thinking [27, 28]. Dr. Scratch analyzes code based on these computational thinking concepts: abstraction and problem decomposition, logical thinking, synchronization, parallelism, algorithmic notions of flow control, user interactivity, data representation [16].

Bebras tasks are used as another computational thinking assessment tool, classified as skill transfer tool [4, 5, 14, 20]. It is mentioned that the Bebras challenge tasks refer to analytics or analytical thinking concept [14], but there is not yet a representative set of Bebras tasks that has been validated as an assessment instrument for computational thinking.

Zoombinis [1] is an award-winning educational game from the nineties that has been rebuilt for modern platforms. It is not only used for learning computational thinking, but also in recent years for computational thinking assessment as a data-mining assessment tool. In Zoombinis, all the players actions are logged and then analyzed for the purpose of learning or the assessment. Concepts that are assessed in Zoombinis: problem decomposition, pattern recognition, abstraction, algorithm design [21].

More approaches to computational thinking assessment appear in recent research, e.g., CT-cube, a framework for the design, realization, analysis, and assessment of computational thinking activities [17], data driven approaches based on students’ artefacts [6].

CTt (computational thinking test) was developed and validated by Román-González et al. [19]. CTt consists of 28 questions, divided in 7 groups: basic directions and sequences; repeat times; repeat until; simple conditional; complex conditional; while conditional; simple functions [19]. The test focuses on middle school children (mainly for 12–14 years old, but it can be used from 5th to 10th grade). Computational concepts used in the test are aligned with the CSTA (Computer Science Teachers Association) Computer Science Standards for the 7th and 8th grade [3]. Guggemos [9] mentions the main computational thinking concepts that CTt covers: abstraction, decomposition, algorithms, and debugging. The CTt has some advantages, like the ability to be conducted in large groups in pre-test scenarios, allowing for early detection of students with high abilities (or special needs) for programming tasks; and the ability to collect quantitative information before the evaluations of the effectiveness of curricula designed to foster computational thinking [19].

Another validated tool for assessing computational thinking independently of programming is the CTS (Computational Thinking Scales) test [13]. This test identifies the following components of computational thinking based on ISTE (International Society for Technology in Education): creativity, algorithmic thinking, critical thinking, problem solving and collaboration skills [12]. The specificity of this test is that it is a self-assessment instrument. In this respect, it is completely different from the CTt test, which is knowledge assessment test.

Since computational thinking is broader than programming as defined in The European Commission’s Staff Working Document accompanying the Digital Education Action Plan 2021–2027 (DEAP) [7]: “Computational thinking, programming and coding are often used interchangeably in education settings, but they are distinct activities”, therefore it is important to pay attention to computational thinking assessment tools that assess beyond programming or algorithmic skills. Commonly used validated tests for assessing computational thinking that are independent of programming languages are the CTt and CTS [10, 15, 18, 24]. Other tools mentioned above are associated with specific platforms for programming or gaming (e.g. Dr. Scratch, Zoombinis). The Bebras tasks are tool independent, but not yet validated as a set of tasks for the assessment of computational thinking. For this reason, the tool-independent tests mentioned above (CTt and CTS) were chosen for this study. Both CTt and CTS are presented to students in a form of tests (a set of questions/statements with a set of answer options), while allowing to assess computational thinking from different points of view.

Due to the complexity of computational thinking, it is also relevant to assess computational thinking in a complex way, using more than one tool or method [22, 28]. According to the beforementioned classification [20], the CTt test falls into the diagnostic tools group and the CTS test into the perceptions-attitudes scales group [20]. Guggemos et al. [10] also mention that a system of different assessments is essential because various computational thinking profiles can be identified using multifunctional methods. Guggemos et al. [10], in their research for the assessment of computational thinking, analyze these two tests: CTt and CTS as well. However, researchers do not consider the computational thinking teaching tools used, nor do they differentiate between students' computational thinking test scores according to their gender.

The aim of this study is to investigate the relationship between CTt and CTS tests in relation to students’ gender and the tools used to develop computational thinking.

We pose the following research questions:

  • RQ1: Is there a relationship between students’ CTt and CTS (including its subconstructs) scores?

  • RQ2: How students’ CTt and CTS tests’ results are associated with the learning tool used to develop CT and differ in gender groups?

The paper is structured as follows. First, we present learning methods and tools, describe respondents, instruments and data analysis methods. Next, we present the results of the study according to the research questions. Finally, we discuss our findings, describe limitations and provide directions for future research.

2 Methods

2.1 Learning Tools and Methodology

Students were taught computational thinking using Scratch and “Minecraft: Education Edition”. The teaching tools were not a part of the study, the computational thinking knowledge was acquired during the regular computer science lessons using the above-mentioned tools. Each class was familiar with both tools, but the main tool for grade 8 was the Scratch platform, while grade 9 learned in the “Minecraft: Education Edition” environment. On the Scratch platform, the students had to do “open” tasks using computer science concepts, such as cycles, conditions: they wrote programs that draw different shapes, developed a project with self-created characters, environment and implemented a created scenario of interaction between the characters. On the “Minecraft: Education Edition” platform, students learnt from the pre-designed lessons based on CSTA and ISTE guidelines [3, 12]. They first completed the tasks from the block programming fundamentals lessons, based on the 5 lessons provided, and then were introduced to the basics of Python programming (also completing the tasks from the 5 lessons). One lesson in “Minecraft: Education Edition” required 1 or 2 academic lessons to complete all the activities. On average, both grades had 12–14 lessons using these tools. As all tasks were completed individually and there were no team tasks during this learning period, the concept of cooperativity was removed from the CTS test in this study.

2.2 Respondents

In total, 49 students (51% female and 49% male), studying in school grades 8 and 9 (aged 14–16), took part in the survey. There were 24 students of 8th grade (51%), learning computational thinking with Scratch as dominating tool, and 25 students of 9th grade (49%), learning with “Minecraft: Education edition” as a primary tool.

All respondents were informed of the purpose of the study and gave their free will consent to participate in the study.

2.3 Instruments

In this study, besides the questions on basic demographical information, the two following instruments were used.

CTt.

A validated instrument, consisting of 28 questions [19]. CTt test is claimed to be unidimensional [10] although addressing 7 cognitive operations (4 items for each cognitive operation arranged in increasing difficulty direction): basic directions and sequences, loops repeat times, loops repeat until, if simple conditional statement, if/else complex conditional statement, while conditional, and simple functions. For each question, four answer options are suggested with only one correct. Each item is rated as 1 (correct) or 0 (incorrect).

CTS.

A validated computational thinking assessment scale, originally consisting of creativity, algorithmic thinking, critical thinking, problem solving and cooperativity subconstructs, rated on a 5-point Likert scale [13]. In our study, we included all subconstructs of this scale except for cooperativity, as mentioned before.

2.4 Data Analysis

For the analysis of the collected data, quantitative methods were used. Data normality for a whole sample has been checked with Kolmogorov-Smirnov and Shapiro-Wilk tests. CTt scores were not normally distributed. Due to this reason as well as analysis involving relatively small subgroup analysis, we used distribution-free non-parametric measures:

  • To compare differences between two independent samples, the Mann–Whitney U test was used, and \(\upeta^{2}\) was used as an effect size measure.

  • To test the monotonous relationships between the pairs of variables, Spearman’s rank correlations were used.

We computed scores of the tests and their parts as a sum of the item scores.

The reliability of CTS psychometric scale subconstructs was examined using Cronbach’s Alpha. After evaluating subscale reliability, item 4 from problem solving subconstruct was dropped to improve subscale reliability. Cronbach’s Alphas for scale subconstructs were satisfying (≥0.7): 0.701 for creativity (8 items), 0.765 for algorithmic thinking (6 items), 0.725 for critical thinking (5 items), 0,703 for problem solving (5 items).

The significance level was set to α = 0.05.

For the statistical analysis, IBM SPSS Statistics 28 software package and MS Excel were used.

3 Results

3.1 An Association of Students’ CTt and CTS Results

In a whole group of students, the CTt scores ranged from 9 to 27 with mean scores of 21.2, while CTS scores ranged from 57 to 107 with mean value of 77.4. The descriptive statistics for the results of both tests in general and according to tests’ subscales, are presented in Table 1.

Table 1. Descriptive statistics for test scores (N = 49).

Spearman’s rank correlations for 49 students have been calculated between CTt scores and CTS general scores as well as its subconstructs (Table 2).

Table 2. Spearman’s rank correlations between CTS and its subconstructs and CTt scores.

Significant (at 0.05 level) relationship was found between CTt and algorithmic thinking scores (\(\uprho \, = \,0.307\), p = 0.032). However, there was no significant association between CTt scores and CTS general results (\(\uprho = 0.174\), p = 0.232). Further analysis on differences between students’ groups is presented in the following section.

3.2 CT Assessment Scores in Learning Tool and Gender Groups

In order to observe the relationships between CTt scores and CTS including its subconstruct scores between groups studying with different learning tool (Minecraft: Education Edition, Scratch) and between male and female students’ subgroups, Spearman’s rank correlations have been computed (Table 3). We use Minecraft as a shortened name form of “Minecraft: Education Edition”.

Table 3. Spearman’s rank correlations between CTS and its subconstructs and CTt scores for Minecraft and Scratch learning groups and gender.

We found a significant monotonous relationship between CTt and CTS scores in a subgroup of male students (\(\uprho \, = \,0.445\), p = 0.029). However, there were no correlations between scores in other subgroups (female students or subgroups based on learning tool). The strongest significant monotonous relationship was found between CTS’s algorithmic thinking and CTt scores in a group of boys (\(\uprho \, = \,0.576\), p = 0.003). In a group of students learning with Minecraft as a primary tool, this relationship was also significant, but weaker (\(\uprho \, = \,0.414\), p = 0.040).

Graphically, the differences in CTt scores between groups studied are presented in Fig. 1.

Fig. 1.
figure 1

CTt scores for Scratch and Minecraft groups’ male and female students.

The mean ranks of scores for different test constructs and subconstructs in groups studied and results of the Mann-Whitney U tests are presented in Table 4 (light grey shading for significance at 0.05 level, dark grey for significance at 0.01 level).

Table 4. Differences between groups (Scratch, Minecraft, male and female): Mann-Whitney U tests’ results.

The Mann-Whitney U test confirmed a significant difference in CTt scores between the groups learning as a primary tool with Scratch (mean rank 20.44) and Minecraft (mean rank 29.38): Z = –2.20, p = 0, 028. An effect size \(\upeta^{2} = \, 0.1\) denotes that 10% of variance in rank was accounted by the CT learning tool used (Scratch or Minecraft).

Significantly higher scores in critical thinking were observed in Scratch group compared to Minecraft (Z = –2,86, p = 0.004, \(\upeta^{2} = \, 0.17\)). Interquartile range of the differences in critical thinking scores, including subgroups of male and female students are presented graphically in Fig. 2.

Fig. 2.
figure 2

Critical thinking (CTS) scores for Scratch and Minecraft groups’ male and female students.

Significant differences for Scratch and Minecraft groups were also found in scores of CTt cognitive operation “simple If” (Z = –2.68, p = 0.007, \(\upeta^{2} = \, 0.15\)) and “While loop” (Z = –2.22, p = 0.027, \(\upeta^{2} = \, 0.1\)).

Studying the differences between groups of boys and girls, significant differences were found in algorithmic thinking scores: Z = –2.34, p = 0.020, \(\upeta^{2} = \, 0.12\), with mean rank for female students 20.36 and for male students 29.83.

4 Discussion and Conclusion

In this study, we examined an association between the results gained by the two computational thinking assessment instruments and differences in scores based on groups of learning tool used to develop computational thinking and gender.

4.1 Relationship Between Students’ CTt and CTS Scores

Looking at the results regarding the first research question, there was no significant relationship found between CTt and CTS general scores. This finding is in line with the recent study by Guggemos et al. [10] and can be explained by different nature of the instruments. However, it is interesting to note that analysis of the results of both tests in separate groups of students by gender has shown that the tests’ scores correlated with each other in the group of male students.

Analysis, performed on the results for the individual subconstructs of the CTS test, we also see monotonous positive relationship of the CTt scores with the algorithmic thinking scores in CTS test, what supports the results of the study by Guggemos et al. [10]. Thus, in response to the first research question, we can say that the tests do not correlate from a generic point of view, but that the correlation is influenced by the gender of the students, and to fully validate this statement, it would require research in a larger group of students.

4.2 Differences in the CTt and CTS Tests’ Results in Students’ Groups by Learning Tool and Gender

In response to the second research question, which asks how the test results were influenced by the teaching tools, we can see that students who studied using “Minecraft: Education Edition” had significantly better CTt test results than those students who studied computational thinking in Scratch. These results could be explained by the fact that both “Minecraft: Education Edition” and CTt are based on the CSTA [3] teaching standards. The pre-designed lessons include the same elements as the CTt test (cycles, conditional sentences, etc.). This is further confirmed by the results of the separate CTt test groups, cognitive operations such as the “while loop” and the “simple if” conditions. However, students who had studied in the Scratch environment had better results on the critical thinking subconstruct of the CTt test. Critical thinking is defined as “the use of cognitive skills or strategies that increase the possibility of the desired behaviors” [11]. In the context of the definition of critical thinking, the results obtained can be explained by the fact that, unlike the pre-prepared lessons used in “Minecraft: Education Edition”, in the Scratch environment, students had to create their own projects and find custom solutions to achieve the desired outcome. As we can see, different tools develop different computational thinking skills during the teaching process, and this should be considered when teaching computational thinking, so that the most versatile computational thinking skills can be developed and assessed with the widest possible range of assessment tools. There is also a need for more research in this area on which tools best develop which computational thinking skills.

In terms of gender, the significant difference was observed in the algorithmic thinking subconstruct of the CTS test. Boys showed higher scores in algorithmic thinking than girls. On a one hand side, this finding reflects the existing stereotype of computer science and engineering being more male-oriented field [15, 23]. While the findings of Ma et al. [15] study show that the CTS test algorithmic thinking scores before the intervention were slightly higher in the girls’ group, after the intervention, the scores became identical, with a non-significant difference to the boys’ advantage. As Groher et al. [8] mentions: “diversity among the students calls for diversity among the teaching and learning materials.” This could be one of the reasons for the different tests results.

4.3 Limitations and Future Research Directions

The main limitation of this study was the relatively small sample size of students involved and obtained by the convenient sampling method. Also, the slightly different age group of the students (8th and 9th grade) might have had some influence on the results. Nevertheless, the results were in line with the findings of other related studies and provided interesting insights for further research with greater samples and other methods.

In addition to the test results, one trend was observed during the course of the study that would allow for improvements in the assessment. When taking the test, students used their hand or a computer mouse to guide the screen through the picture next to each test question in order to find the correct answer. However, this process was not logged anywhere and we only saw one of the selected answers as the test result. However, in order to assess computational thinking, we should assess the process of thinking itself. Such approach we may see in the Zoombinis game [21]. It might be possible that the student’s thinking process was partly correct, e.g. right at the beginning with a slight mistake at the end, resulting in the wrong answer, but this is not what we see as a test score for diagnostic test. Such approach could also help to eliminate the cases when the student clicked the right answer by chance. In future research, we will focus on how to better assess the process thinking for the task solution, and not just the final result.