Introduction

Over the past three decades, there has been substantial growth in the amount of research on models and modelling in science education (Chiu & Lin, 2022). Some international organizations and institutions have outlined the importance of scientific models and modelling practices in learning and teaching science (Ministry of Education, 2022; National Research Council, 2012; Nielsen & Nielsen, 2021). A new direction of research in this field started to focus on developing a competence-oriented perspective on modelling, specifically, “modelling competence” (Upmeier zu Belzen et al., 2019). Due to the limited research on this focus, there is a continuing need for an operational framework of modelling competence to identify the components that could guide teaching and assessment, specifically process-oriented assessment (Göhner et al., 2022; Nicolaou & Constantinou, 2014; Nielsen & Nielsen, 2021). In addition, while the issue of how to assess modellers’ understanding of models and modelling practices has been discussed for three decades, the development of more assessments is still required and these must be validated for researchers to understand modelling competence better (Chiu & Lin, 2019; Nicolaou & Constantinou, 2014).

Because of the complexity of supporting students’ engagement in modelling, attention is being paid to understanding teachers’ modelling competence. Knowing about aspects of the modelling competence of both pre-service teachers (PSTs) and in-service teachers (ISTs) could lead to the further development of teacher education programmes (Göhner & Krell, 2022; Göhner et al., 2022). Much prior research focuses on teachers’ understanding of models and modelling (Danusso et al., 2010; Justi & Gilbert, 2002a, 2002b), rather than teachers’ capacities to engage in modelling practices and develop high-quality modelling products (Chiu & Lin, 2019; Göhner et al., 2022). This intricacy is deepened by the ongoing controversy over the relationship between the components of modelling competence, namely meta-modelling knowledge, modelling practices, and modelling products (Cheng & Lin, 2015; Göhner et al., 2022; Schwarz & White, 2005; Sins et al., 2009). However, a profound understanding of how these components interact suggests that a teacher's meta-modelling knowledge significantly influences their classroom modelling practices, which then determine the quality of the resultant modelling products (Vo et al., 2015). The discrepancies between the broad theoretical frameworks and their classroom applications underscore the need for potential refinements in teacher education. Highlighting this, the need for empirical evidence becomes even more pronounced, emphasizing the necessity to gauge the real-world efficacy of modelling competencies. Given the evolving nature of modelling techniques and methodologies, it's clear that the realm of teacher education is not a static field but an ongoing journey, advocating for the continuous professional development of both PSTs and ISTs.

The present study aims to outline a framework for conceptualizing the theoretical position of the three modelling competence components: meta-modelling knowledge, modelling practice, and the modelling product. Then the framework can be used as a theoretical foundation for the development and implementation of assessments for teachers’ modelling competence. Finally, the empirical analysis could provide valuable evidence for evaluating the framework and making the theoretical position operationalised and condensed.

In this study, modelling competence can be divided into three components: meta- modelling knowledge, modelling practice, and the modelling product. Two research questions will be addressed in this study:

  1. 1.

    What is the relationship between the three components of modelling competence?

  2. 2.

    What are the similarities and differences between PSTs and ISTs' modelling competence components?

Framework of modelling competence in science education

The term competence is described as “domain-specific cognitive dispositions that are required to successfully cope with certain situations or tasks, and that are acquired by learning processes” (Koeppen et al., 2008, p. 62). With a focus on models and modelling in science education, modelling competence is adapted to Weinert 's (2001) definition of competence, referring to successful mastery and reflecting sufficient knowledge and practice (for a particular aspect). Consequently, modelling competence reflects a person’s potential by combining knowledge about models and modelling with modelling practice to produce a model that meets cognitive needs in specific content areas.

Different frameworks of modelling competence (FMC) were developed for science education purposes, to explore, “how models are used, why they are used, and what their strengths and limitations are, in order to appreciate how science works and the dynamic nature of knowledge that science produces” (Schwarz et al., 2009, pp. 634–635). The components of modelling competence are controversial. Gilbert and Justi (2016) proposed the approach emphasizes the use of models and modelling to enhance the understanding and competence of learners. By allowing students to develop their own models, it encourages active participation and establishes a creative learning environment. Nicolaou and Constantinou (2014) defined modelling competence into two broad categories, namely modelling practices, and meta-knowledge. This FMC attempts to support the claim that students’ modelling competence can emerge as a result of active engagement in specific modelling practices and is shaped by meta-knowledge about models and modelling. Some studies promoted other components, modelling product (Chiu & Lin, 2019; Göhner et al., 2022; Namdar & Shen, 2015) and subject-specific knowledge (Nielsen & Nielsen, 2021). These various frameworks for how learners understand models and modelling and rationale for integrating models and modelling into their teaching have been discussed and refined.

Based on the existing FMCs, this presented study outlines a framework describing the theoretical position of three components: meta-modelling knowledge, modelling practice, and modelling product. As shown in Fig. 1, modelling products can be erroneous, partially correct, or correct. Meta-modelling knowledge includes four dimensions: knowledge about the modelling process, the nature of models, types of models, and the purpose of models. There are twelve different modelling practices distributed across different stages of the modelling process (e.g., comparing the experimental data, generating an assumption, creating and identifying new elements, etc., see Fig. 1). Since the former research of FMC shares many similarities, the aim of this study is not to construct a new framework but to outline a framework describing how knowledge, practice, and products are connected. On the other hand, this framework fully presents the structure of modeling competence and guides empirical research to evaluate and analyse how modellers apply their knowledge and practices in the process of developing a modeling product. The following paragraph will describe each element and its composition in more detail.

Fig. 1
figure 1

The framework of modelling competence

Meta-modelling knowledge was defined as knowledge about models and modelling on “how models are used, why they are used, and what their strengths and limitations are” (Schwarzet al., 2009). Modellers with meta-modelling knowledge could understand the nature of science and reflect on their ability to use and develop scientific models to appreciate how science works and the dynamic nature of knowledge (Abd‐El‐Khalick et al., 2004; Schwarz & White, 2005). This component is usually introduced to assess learners’ knowledge about models and modelling and was classified into the following four dimensions.

  • Nature of models refers to a representation of a process or phenomenon with a specific purpose; the model is changeable (Lee et al., 2017).

  • Types of models generally includes six types of models: concrete model, verbal model, visual model, mathematical model, gestural model and mixed model (Boulter & Buckley, 2000).

  • Purpose of models includes two categories, basic functions (describing, visualizing and explaining), and advanced functions (standard reference, reasoning, problem solving, communicating, predicting, simulating and generating new ideas) (Lin, 2014).

  • Modelling process occurs in modelling-based learning or teaching to achieve the educational goal, the generation of models, the evaluation of models and the modification of models (Khan, 2007).

Modelling practices can be defined as cognitive, discursive and social activities that take place in science classrooms, that are related to the modelling process and are aimed at developing epistemic understanding of science concepts and appreciation of the nature of science (Fretz et al., 2002; Jimenez-Liso et al., 2021; Ke & Schwarz, 2019). Science education literature addresses modelling practice in several theoretical frameworks that use overlapping terms to describe its activities, phrases, or processes (Göhner et al., 2022). A number of core modelling practices take place during modelling processes. Louca and Zacharia (2012) found a consensus in the literature about the modelling process involving a stimulus and four discrete steps. Modelling practices are incorporated in the model-based teaching and learning environment, which relates to the theoretical framework used for the description of modelling processes and operationalisation of assessment of modelling processes (Göhner et al., 2022). Since this study aims to explore how participants apply their modelling practices between the experimental world and the modelling world, modelling practices/activities were identified in each discrete step based on research by Göhner and Krell (2022) and Krell et al. (2019) in relation to the BB experiment. This scientific modelling aims to gain understating of a complex real-world system by using relevant modelling activities, which is different from most modelling frameworks as it explores the experimental and physical world rather than mental model development.

Modelling product can be well defined as “the main outcome of any modelling process is the development of a tangible, visible, and communicable artefact that demonstrates the modeller's understanding and that can be evaluated by specific criteria for its quality” (Göhner et al., 2022). Modelling products can be used to externalise and express learners’ thoughts (mental models) and help them visualise and examine components of their theory (conceptual models and scientific models) (Gentner & Stevens, 2014; Göhner et al., 2022; Greca & Moreira, 2000; Hamza et al., 2008; Nicolaou & Constantinou, 2014). Regarding the aspect of evaluation in science education literature, modelling products are considered a more content-related approach focusing on the integration of specific components (Baumfalk et al., 2019; Göhner et al., 2022). The modelling product in this study could be classified as correct models, partially correct models, and erroneous models. The criterion of whether the modelling product is good or not depends on the degree to which the modelling products correctly and fully represent the characteristics of the phenomenon; provide a mechanism that accounts for how the phenomenon operates, and are used to formulate predictions about the observable aspects of the phenomenon (Pluta et al., 2011).

Methods

Participants

The participants in this study included two groups selected via convenience sampling. These participants were chosen because they shared two characteristics: (1) all of them were enrolled in the same teacher education programme at the same university, chemistry education teacher education programme (CE-TE). (2) they had never experienced formal modelling-based training. This study consisted of 38 pre-service teachers (PSTs) who are studying at a comprehensive public university in the south of China. 38 in-service teachers (ISTs) participated who were all employed at urban schools from primary school to senior high school as science or chemistry teachers. Table 1 contains a summary of these participants’ background information.

Table 1 Background information of pre-service teachers and in-service teachers

Participants in the PST group including seven males and thirty-one females, are studying as Master students (year one), majoring chemistry education, enrolling in the second semester of the first year of CE-TE programme. They are predominantly female, 21–23 years of age (the majority of participants are female, which would be a potential limitation and will be addressed in Limitations section); 92% of PSTs majored in chemistry for their bachelor’s degree, while only three pursued non-chemistry majors, such as business and accounting. However, all of them passed the national master entrance exam, which indicated that all PSTs had the required knowledge and skills in the teacher education programme.

In the context of this study, ISTs (n = 33) refer to specialist science teachers who teach in primary schools or middle schools, covering grades K1-K9. In China, science is treated as an independent subject with dedicated teachers responsible for its instruction. The teachers involved in this study who work in high schools (n = 6), all teach chemistry. This is because they all obtained master's degrees in chemistry education. All of ISTs graduated from the CE-TE programme, and the years of total teaching experience and science teaching experience of participants ranged from one to three years. Araujo et al. (2016) pointed out that teachers with no experience to three years of experience are referred to as “rookies”. Thus, the selected sample of ISTs and ISTs aims to compare if teaching experience could be an influential factor in the modelling competence.

Instruments

Two instruments were developed to measure three components of modelling competence. A Likert scale questionnaire was used to evaluate meta-modelling knowledge and a Black Box (BB) modelling task was applied to assess modelling practices and modelling products. Two different techniques, think aloud method and drawing were used to collect modelling practices and products respectively.

Meta-modelling knowledge questionnaire

The questionnaire comprised two parts. In the first part, the participants were asked to provide their background information including their personal identifier, their age, gender, major in bachelor’s degree, and their total years of teaching experience. The second part consisted of 20 items to collect information about their knowledge of four meta-modelling knowledge dimensions: nature of the model (4 items), the purpose of models (6 items), the types of models (6 items) and the process of modelling (4 items). Each item had five options, strongly disagree, disagree, not sure, agree, and strongly agree, which were assigned 1 to 5 scores respectively. Since the adjective labels on the scale with a meaningful numerical interpretation was used, it is appropriate to treat the rating scale data as interval data rather than ordinal data (Taber, 2018).

Before generating the final questionnaire, the original questionnaire with a total of 35 items had a test process to ensure sufficient validity and reliability. The design of each item in each dimension was based on the previous studies (Grosslight et al., 1991; Lee, 2018; Lin, 2014; Treagust et al., 2002). A pilot study was implemented to test the appropriateness of the instruments for ensuring content validity. Nine participants in the pilot study, including two PSTs who were in year two in the CE-TE programme, two PhD students in science education, two ISTs with one-year teaching science experience in a primary school and one lecturer with PhD degree in science education who is a member of academic staff in the cohort of science education in the CE-TE programme. This helped strengthen the match between the items and their relevant dimensions and simplify the wordings and the expressions of the questionnaire. Confirmatory factor analysis (CFA) was conducted to evaluate whether the proposed structure (four factors model) fit the modellers’ answers (n = 76). A four-factor model was generated with acceptable goodness-of-fit indices (χ2/df = 1.340, GFI = 0.955, CFI = 0.916, NNF = 0.932, and RSMEA = 0.0067). Finally, Cronbach’s alpha value for each dimension ranged from 0.739 to 0.931, indicating that the scale had strong internal consistency (McCrae et al., 2011). These indicators showed that the questionnaire could be regarded as reliable.

Black Box modelling task

To determine whether participants could use modelling practices to generate a modelling product for problem-solving in specific situations, they were asked to work out one modelling task with three rounds of experimental data, namely the BB (see Fig. 2). This task was revised from previous studies by Göhner et al. (2022) and Krell et al. (2019). According to the given data, three different data sets required participants to generate different models. Given data included the input of pouring water into a black box and observing the output of water in each round. The second and third rounds of data provided opportunities to evaluate and modify the models that participants had created. Participants were asked to reason why the output of water changed for each round, generate hypotheses about what was happening, make predictions, and eventually, attempt to infer the hidden mechanics of the box. Participants were instructed to think aloud while modelling the Black Box to gain insight into their modelling processes. The model that they drew from the third-round data sets were considered as the indicator of quality of their modelling products.

Fig. 2
figure 2

The Black Box modelling task

Data analysis

Both quantitative and qualitative analysis were used in this study. Descriptive statistics on the rating scale questionnaire of meta-modelling knowledge were used to calculate the mean and standard deviation of each item. Since the Kolmogorov–Smirnov test indicated the data did not follow a non-normal distribution (D = 0.1, p = 0.02 < 0.05), Spearman’s correlation coefficient was examined among three components for the PSTs group and ISTs group, respectively, to observe if there were any correlations among the three components. Wilcoxon rank-sum tests were then carried out to examine the differences in performance in terms of meta-modelling, practice and product between the two groups. The effect size r was calculated. The interpretation values for r commonly found in published literature (Göhner et al., 2022; Mulder et al., 2016) are: 0.10 < 0.3 (small effect), 0.30 < 0.5 (moderate effect) and >  = 0.5 (large effect).

An additional step was to quantify qualitative data. Quantifying qualitative data involves converting the subjective and descriptive information into objective data, such as coding data and rating scale data (Krippendorff, 2018; Miles & Huberman, 1994). Regarding evaluating participants’ modelling practices, cognitive processes such as solving problems or thinking abstractly can be uncovered by asking participants to think aloud (Ericsson & Simon, 1998). After the training, the participants were asked individually to describe aloud the process they followed to solve the BB modelling task. Their speaking was recorded with electronic devices and then transcribed for coding. A deductive approach was applied since a theoretical conception of the process of scientific modelling in a BB modelling task is available, referring the studies by Krell et al. (2019). However, the deductive approach was supplemented by an inductive refinement of the modelling practices. A rubric of modelling practices established by previous studies (Cheng et al., 2021; Khan & Krell, 2019) and the revised Bloom’s taxonomy (Jensen et al., 2014) was used. As shown in Table 2, four levels were created to assess respondents’ modelling practices. Since it was difficult to give a score for each sub-modelling practice when participants outlined a complex system, and their responses were highly circular and iterative, nonetheless the presented scoring rubrics showed cognitive levels after interactions while respondents performed BB.

Table 2 The description and examples of a scoring system for assessing modelling practice

This study employed the draw-a-picture research technique to assess the participants’ quality of model products. Drawings in this study included diagrams, graphs, images, or other visual representations combined with text made by the participants. In contrast, producing pure text or essays is not considered drawing. According to the criteria for good scientific models by Namdar and Shen (2015) and Cheng et al. (2021), modelling products were assessed according to quality, based on the degree to which and the way in which model reflected two features: (1) correctness and completeness of symbolic representations (such as the accuracy of the instrument and proficiency of explanatory texts used in a model); (2) how well the whole modelling product coherently reflected the underlying mechanism of the BB. The scoring rubrics of the study used four levels to cover all aspects of the participants’ drawings, as shown in Table 3.

Table 3 A scoring rubric of modelling products

Two independent coders were trained to code all qualitative data. Inter-coder agreement (Cohen’s kappa) reached k = 0.75, and 0.83 for modelling practices and modelling products respectively, which is regarded as almost perfect (Landis & Koch, 1977). Next, in order to resolve the differences between the two coders, coders and the first author discussed until an agreement was reached.

Results

Table 4 shows the results for Spearman’s correlation coefficient and significance in different sample settings. Significant correlations were found between practice and product (rs = 0.62, p =  < 0.01) for the entire sample, and the same pattern of significant relationship occurred for the PST group (rs = 0.78, p =  < 0.01). Regarding the ISTs group only, it reflected a similar pattern to the correlations between practice and product, but the correlation coefficients results were slightly weaker (rs = 0.42, p < 0.05). However, results showed a non-significant correlation between knowledge and the other two components, from a negative relationship rs = -0.16 to a positive relationship rs = 0.22 at p > 0.05 level. This suggested a high level of modelling practice can be attributed to the high quality of modelling products, and vice versa. On the other hand, the level of knowledge seemed not to affect the modelling practice, or the quality of the product.

Table 4 Results of Spearman’s correlation coefficient and significance in different sample settings

Table 5 revealed that there was significant difference in meta-modelling knowledge (Z = -2.38, p = 0.017) with a moderate effect size (r = 0.39) between the two groups, but no statistically significant difference between modelling practice and product. This suggested that ISTs had more advanced knowledge of meta-modelling knowledge than PSTs but both groups were of an equal level of modelling practice and modelling product.

Table 5 Wilcoxon rank-sum test for the two groups’ three components of modelling competence

This research then conducted a more in-depth analysis of data from modelling practices and products. Regarding modelling practices, 12 codes (practices) were identified empirically from the sample’s responses. The frequencies of each code in both groups are presented in Table 6. The percentages indicate the ratio of the number of a single code to all codes in the group, and the numbers indicate how many times a single code appeared in the group from all responses.

Table 6 Frequencies of codes for modelling practice in both groups

In general, both groups covered all codes of modelling practice, and the frequency of each code had similar percentages. It indicated that both groups performed equally well in terms of modelling practice when solving a complex problem, which was consistent with the quantitative results. The highest frequencies code was comparing data for both groups (23% for PSTs, and 31% for ISTs). The frequency of explaining and justifying the model also had a high percentage in both groups (19% for PSTs, and 20% for ISTs). It was worth noting that this practice often came together with creating and identifying the new element. This result illustrated that even though these novel modellers experienced no or little modelling training, they were able to interpret the constructed models or elements. For the code of using technologies, most modellers created a model with physical structures, including pipes, switches, and beakers. Only a few (n = 3) used technology, such as using sensors and controlling by programming systems. Taken together, these results suggested that these novice modellers had a variety of modelling processes to solve a complex system, and the frequencies of these modelling processes were nearly identical for both groups.

Figure 3 shows that PSTs’ modelling practices seemed to fall into two groups – the very naïve and the relatively competent (55% level 3 and level 4 in total). However, most ISTs were relatively naïve. There were 43% of PSTs’ modelling practices only at Level 1 with a low level of cognition (13 out of 33). At this level, modellers in both groups usually were able to answer the first round of the BB experiment by drawing a required model and clearly explaining how the water flows through the specific structure. However, they didn’t have any idea about giving an answer to the second round of experiments. The percentage of PSTs (31%) at level 4 (n = 10) was higher than ISTs (14%), who held higher-level aspects of modelling practice, synthesizing different modelling aspects to solve a problem, such as evaluating and reconstructing models (n = 5). Modellers in this level usually start working on the third round of experiment. By considering more complex conditions that happened from input and output data, the modelling aspects of comparing, evaluating, testing, and revising were iterated through the process.

Fig. 3
figure 3

The percentage of each level of modelling practice in both groups

Modelling products

Drawings were used as the basis for assessing participants’ modelling products. In this approach, two groups of teachers created drawings of scientific phenomena (BB) and used the language of icons representing system behaviours and visualizing the modelling process according to each round of experimental data. The final drawing was considered to examine the accuracy and quality of the modelling product. Figure 4 illustrates the percentage of each scoring level of students’ drawing for a BB structure in both groups.

Fig. 4
figure 4

The percentage of each scoring level of students’ drawing for a BB structure in both groups

Across both conditions, there were relatively few people with level 1 and level 4. There were 21% of PSTs at level 1 (7 out of 33), while less than 3% of IST was at this level (1 out of 35). A relatively small percentage of modellers provided a perfectly correct model (level 4) which could be applicable to the given experimental data. The majority of in-service teachers (60%, n = 21) developed a partial product with medium quality (level 3). They were able to create a model which basically satisfied the given situation, but a few elements and structure were unclear. Even though 42% (n = 14) of the PSTs had reached level 3, which was the highest percentage among the four levels, the number was still less than the ISTs. Regarding level 2, 31% of ISTs (n = 11) and 24% (n = 8) of PSTs created a partial product with low quality. The low quality of BB appropriately reflected the real phenomenon, but the mechanism was not achievable.

It was interesting to analyse the correct model products and find the constraints of the partially correct model products. A total of six modellers created the correct model products (PST = 4, IST = 2). Regarding these correct models, there were four modellers (PST = 3 and IST = 1) who created a physical model product to present the structure of BB. These models included some physical elements, such as pipes, valves, bulkheads, pulleys, and containers with different volumes. Although these products were not consistent with the sample model (presented in the methodology section), they could conform to the experimental data correctly. The other two modellers (PST = 1 and IST = 1) drew a model by using technological elements. These technologies included sensors and computer-controlled systems. Figures 5 and 6 present the correct examples of a physical model and a technology-supported model respectively.

Fig. 5
figure 5

The correct drawing of a physical model

Fig. 6
figure 6

The correct drawing of a technology-supported model

Figure 5 displays a correct drawing of a physical model (a and b represent containers; c, e and f valves; the dotted line is the connection line of the fixed pulley), which was given by a PST. He illustrated the mechanism of BB: The volume of both containers (a) and (b) was 200 ml. Container (c) was 600 ml. When there were 400 ml of liquid in the system, the gear drove the fixed pulley to close the switch (e). When there were 1400 ml of liquid in the system, the gear drove the fixed pulley to open (f) but close the load-bearing valve. When the volume was 600 ml, the valve was opened. When there was 2400 ml liquid in the system, valve (f) opened. From these detailed descriptions, he succeeded in identifying the relationship between the variables underlying the experimental data and physically connected them with different elements (e.g., switch, gear, and valve). This result showed this modeller had high-level modelling practice and robust scientific knowledge. The other cases of correct models came with the same results.

Physically creating a model for BB appears to be more challenging, not simply for PSTs or ISTs. An interesting drawing from a PST for creating a correct model included using a computer-supported system to control all elements rather than using complex physical structures (see Fig. 6, note: A, B, and C represent containers; a and b represent pipelines; k1-k4 represent electric valves). To illustrate this product, four switches were controlled by a computer. The computer system could control different switches according to the amount of water intake. Then the water flowed into different paths and finally into the measuring cylinder. Although the technology supported the implementation of the system, it still required design thinking as well as a relevant physical understanding for making it.

Discussion

In order to address the first research question, this study investigated the relationship among three components of modelling competence in different groups (PST group and IST group). In line with the recent assumption by Göhner et al. (2022), the results further supported the view that modellers with higher modelling practices would develop more qualified modelling products. Inconsistent with the assumptions of meta-modelling knowledge guiding practices (Cheng & Lin, 2015; Schwarz et al., 2009), the analysis of think-aloud data found modellers did not express much related meta-modelling knowledge. In a modelling environment, more important aspects were identified in that modellers applied iterative modelling practices to solve the BB modelling tasks. A product after more modelling activities would have more elements and more explanatory power.

Meta-modelling knowledge, modelling practices and modelling products have distinct contributions to the process of problem-solving. Although meta-modelling information does not affect modelling practices or modelling products, it does offer a fundamental theoretical framework for comprehending the interactions between different system components and informing modelling activities (Ke & Schwarz, 2016; Schwarz & White, 2005). On the other hand, modelling practises are the processes via which modelling products are developed and enhanced. Modelling practices are techniques that result in high-quality modelling products, hence modelling practices are positively correlated with modelling products. Therefore, modelling products are the outcomes of modelling practices, and their quality is reflected in the accuracy of these results.

By comparing significant differences in responding to meta-modelling knowledge items between the two groups, it was found that ISTs had a higher level of knowledge than PSTs. This finding may indicate that ISTs who had never experienced modelling-based training could advance their meta-modelling knowledge by reflecting on their daily teaching experience. Few studies examined differences in the perspective of meta-modelling between PSTs and ISTs. The presented results were aligned with the studies of Justi and Gilbert (2002a, 2002b, 2003), which indicated experienced teachers had awareness of the value of models in learning science. However, Van Driel and Verloop (2002) investigated modelling knowledge in different groups of experienced teachers who had different science backgrounds and found that teachers’ subject and teaching experience had no relationship with their responses towards models and modelling in science. This inconsistency may be due to the different national curricula and contexts (Gogolin & Krüger, 2018). Informal teaching activities, such as learning from teaching resources and peer learning, occurring in ISTs’ daily school environment were able to help them engage in various model and modelling scenarios.

Although the study found that meta-modelling knowledge appeared to be related to teachers' actual teaching experience (with significant differences in performance between the two groups), there was no difference in the performance of the two groups in terms of modelling practices. Looking further, the levels of modelling practices in the PST group were divided into two main groups, naïve and relatively competent. In contrast, ISTs mainly performed lower-level practices. Consistent with previous studies, teachers with different teaching experience teaching science subjects did not differ significantly in their modelling performance, unless they were experts (Hogan et al., 2003; Kang et al., 2018; Van Driel & Verloop, 2002). In the actual teaching, ISTs’ modelling practices are mainly reflected in their instructional sequences, that is, how teachers apply model pedagogy to support students to construct scientific knowledge (Schnotz & Bannert, 2003; Tytler & Hubber, 2016; Van Driel & Verloop, 1999; Xue & Sun, 2022). Actually, they rarely have the opportunity to experience authentic modelling scenarios (Gilbert, 2004; Prins et al., 2009, 2011). Therefore, support and professional learning are a priority for all teachers, not just beginners, and providing an authentic modelling environment is needed to promote teachers’ modelling practices (Graham et al., 2020; Hamza et al., 2008; Stammen et al., 2018).

An innovative aspect of the study is the analysis of the modelling product, which has received little attention in previous related studies, as stated by Chiu and Lin (2019) and Göhner et al. (2022). The results revealed that both groups were able only to develop a very limited number of correct models, but more partially correct models. Literature suggests modelling products as indicators for evaluating students’ meta-modelling knowledge and modelling practices (Cheng et al., 2021; Göhner et al., 2022; Krell et al., 2019; Schwarz et al., 2007). The presented results indicated that modellers with sophisticated meta-modelling knowledge would not necessarily create high-quality modelling products. The reason might be that a correct modelling product needs not only high-level modelling competence but also scientific knowledge and other skills, such as hands-on skills, technical literacy and drawing (Göhner et al., 2022). Regarding the BB modelling task in this study, the scientific knowledge of water pressure and siphoning would influence participants to develop their modelling products by observing the data from the amount of input water and output water. In addition, some participants designed technology-based elements in their drawing (sensors and artificial intelligence) to describe the inside structure of BB rather than the physical structure that most participants created.

Limitations

One of the main limitations of this study concerns sampling. Participants in the present study were not randomly assigned to groups (Ledford, 2018). The researcher was only permitted to gather data at the selected university, specifically in the chemistry cohort. The study sample consisted solely of females, which may limit the generalizability of the results to male populations or populations with a mix of males and females. It is important to consider gender diversity in future studies to ensure that the findings are valid across different genders. Regarding ISTs, a few were from high schools and most of them came from K1-K9. There were likely differences in teacher training, pedagogy and scientific skills, which may have some influence on the generalisability of the result. On another hand, the selection of only ISTs with 1–3 years of experience may limit the generalizability of the findings to all in-service teachers, including those with more or less teaching experience. It is important to recognize this sample limitation when interpreting and applying the results of this study to other populations of teachers. Future studies could consider including teacher candidates with a wider range of experience to enhance the generalizability of the findings. The study's results would be stronger if it used random sampling, as it ensures that results obtained from the sample could represent the entire level of science teacher education program as well as presenting more comprehensive picture how ISTs across different educational level perform modelling competence. Another limitation would be the Black Box approach. Regarding the assessment of the modeling processes, the black box as a modeling task may have limited the participants in their engagement, as it is a rather abstract and complex task (Leden et al., 2020). Although the developed BB task was simplified based on the original one, some instructions were still needed before modellers start to answer the task in case all participants could fully understand. Finally, using proposed a scoring system for the overall analysis of modelling practice, the researcher could monitor and assess the overall performance of modelling practice. However, it is acknowledged that this scoring system is a somewhat nebulous way to describe the entire modelling process. It is impossible to say that a modeller with level 4 advanced modelling skills is competent enough to use all of the modelling activities in this research. Therefore, in using the defined scoring criteria for modelling practices, researchers may evaluate the modelling activities that modellers use and how they integrate them (independently or additively).

Conclusions and implication

Modelling competence has been characterised by different studies, which indicated modellers with sufficient model and modelling knowledge conducted different modelling practices to generate modelling products to respond to specific questions both theoretically and in authentic scenarios (Weinert, 2001). The presented research proposed three components of the framework of modelling competence (meta-modelling knowledge, modelling practice and modelling products), and investigated the two groups’ (PSTs and ISTs) relationships between the three components. This aimed to enrich the development of theory and practice in modelling competence, as well as fill research gaps in the assessments of modelling competence in science teacher education.

To respond to the first research question, this study showed that meta-modelling knowledge has no relationship with either modelling practices or modelling products, but modelling practices were positively correlated with modelling products. This implies that during the process of problem-solving, meta-modelling knowledge did not play a measurable role in guiding modelling practice and generating high-quality modelling products. Modelling practice at a higher level was reflected in the analysis of correct model products. Regarding the second research question, both groups were at a high level of meta-modelling knowledge, but the ISTs were higher than the PSTs. The two groups did not perform well in modelling practices and products and there was no difference between the two groups. A high modelling product not only requires the modeller to apply high level modelling practices, but also more practical abilities, which may be closely related to STEAM disciplines.

The findings are instructive in that in-service teachers' experience of teaching in schools was helpful in improving their meta-modelling knowledge, but their modelling practices and products were the same as those of ISTs, which is informative for future IST professional development. The results also contribute to filling the gap in the literature on modelling competence development in pre-service and in-service professional development (Göhner et al., 2022; Nielsen & Nielsen, 2021). Another contribution is that the development and validation of modelling competence assessment tools and scoring rubrics have been proposed for process-oriented assessment rather than product-oriented assessment. The BB modelling task is often used in science education to assess students' scientific literacy (Göhner & Krell, 2022; Göhner et al., 2022; Krell et al., 2019). Previous research has used physical black boxes to assess students' modelling practices, which allows students to experience real-life scenarios but is not suitable for large-scale testing (it is usually for case studies). This study adapted a dynamic task to a static one and fitted the characteristics of the modelling task, i.e., a description of the stem, followed by different experimental data to guide students in generating a model, evaluating it and modifying it. Students' modelling practice was analysed through a think-aloud approach and the modelling products were assessed in the form of drawings. The findings suggested that the task and scoring rubrics developed can be used to assess modelling practices and products. The process of assessment went through a pilot study and formal experimental testing, and this work strengthened the reliability and validity of the research instrument.

This study equips the global science education community with a methodical framework related to modelling competence that can be employed in diverse educational contexts, thereby amplifying its relevance to the primary or middle school level. For example, the research model can serve as a robust assessment tool for evaluating primary and middle school students' abilities in scenario-based modelling tasks, providing educators with insights into areas of strength and needed improvement. Secondly, the research model could be considered as a theoretical foundation upon which curricula can be designed, specifically aiming to bolster the facilitation and understanding of modelling competence, ensuring students not only grasp the concept but also proficiently apply it in varied scientific contexts. Regarding the fact of research outcomes, it emphasizes a universal area in need of improvement in science teacher education, regardless of teaching experience. While theoretical knowledge is essential, its translation to practical application remains challenging. This distinction is vital for shaping the curriculum in teacher training programs, ensuring that educators are not just theoretically equipped but also adept at practical applications. By understanding these disparities and their implications, this study paves the way for targeted professional development programs, ensuring teachers are better prepared to facilitate the intricacies of modelling competence to their students, especially in primary and middle years.

Future studies could continue to develop the theoretical framework with the three components and incorporate it into different educational levels across disciplines and modelling tasks. To respond to the latest call by Schwarz et al. (2022) in efforts to support and assess the practices, it is important for further studies to explore and assess individual modelling activities, identifying which activity influences overall modelling practices and what impact it has on the final products. Finally, modelling practices have a positive relationship with modelling products, while qualitative data indicated that the scientific knowledge held by the modeller, as well as engineering skills, affect modelling practices as well as products. More evidence is needed to support this view.