Keywords

1 The Assessment of Science Competency in Primary School

In recent years, science learning has been described within the construct of scientific literacy, which has been conceptualized with various facets, distinguishing a component of conceptual understanding from a component of procedural understanding (Bybee 1997). While there are theoretical and empirically validated models of science competency for secondary schools, corresponding efforts are rare within the growing body of research on competency development for primary schools (e.g., Walpuski et al. 2011). Hence, in our project we aimed to model the development of science competency in primary school in the two dimensions of scientific reasoning (e.g., Koerber et al. 2017, in this volume) and conceptual understanding. The latter is the focus of this chapter.

To derive a theoretically plausible and empirically testable competency model of conceptual understanding in primary school science, it appears suitable to resort to the findings of conceptual change research in developmental psychology and science education. This research has revealed that students bring a wide range of individual content-specific ideas and conceptions to the science class; these have potential to hinder or foster formal science learning. Aside from the nature of students’ naïve conceptions, conceptual change research has also explored the pathways along which these evolve (e.g., Schneider and Hardy 2013). In this context, we pursued three main goals: (a) modeling primary school students’ conceptual understandings in the content areas of FS and EC with paper-pencil tests to empirically validate a competency model using large groups of students, (b) investigating the development of conceptual understanding over the course of primary school and (c) examining the relation between students’ conceptual understanding and scientific reasoning.

2 Modeling Conceptual Understanding in Primary School Science

2.1 Model Specification and Item Construction

In the first place, we hypothesized a competency model with three hierarchical levels of increasing understanding: At the naïve level students hold scientifically inadequate conceptions which, through processes of restructuring or enrichment may possibly result in intermediate conceptions. These contain partly correct conceptualizations and are applicable in a broader range of situations than are naïve conceptions. At the scientifically advanced level, eventually, students hold conceptions in accordance with scientifically accepted views (Hardy et al. 2010). Within this framework, we designed a construct map as the foundation for item development (Wilson 2005). For each content area, this contained detailed descriptions of possible student conceptions at each level of understanding. These conceptions were extracted from conceptual change research (e.g., Hsin and Wu 2011; Tytler 2000).

The translation of the conceptions identified in conceptual change research into a paper-pencil instrument suitable for testing groups of primary school students, posed a considerable challenge for item development. Specifically, the test instrument had to incorporate and represent different levels of conceptual understanding without inducing artificial response tendencies and preferences. Using the construct map, we designed items with mainly closed response formats; response alternatives represented varying levels of conceptual understanding. Response formats were either forced-choice (select the better of two alternatives), multiple-choice (select the best of three to six alternatives) or multiple-select (judge three to six alternatives consecutively as true or false). In addition, a few items with open and graphical response formats were constructed (Kleickmann et al. 2010).

For all items, the stems consisted of descriptions of physical phenomena relevant to the two content areas. Of these phenomena, those which could be presented in a classroom, were demonstrated during administration of the test (see Fig. 2.1). After presentation of a specific phenomenon, students had to select or, in the rare case of open response formats, produce an explanation for that phenomenon. For multiple-select items, students could select several explanations simultaneously (see Fig. 2.1). To minimize the impact of reading ability on students’ performance, descriptions of phenomena and response alternatives were read out aloud. Students in participating classes proceeded simultaneously through the test within 90 min. The majority of items represented explanations on the naïve level, due to the wealth of naïve conceptions identified by previous research. In general, primary school students were considered to demonstrate proficient conceptual understanding by the dismissal of naïve explanations and the endorsement of intermediate or scientifically advanced explanations.

Fig. 2.1
figure 1

Sample item: condensation

2.2 Conceptual Understanding: Dimensions and Levels

To examine the dimensionality of our test instrument, we fitted one-parametric logistic item response models with varying dimensionality to the data of a cross-sectional study with 1820 s, third and fourth graders, using ACER Conquest 2.0 (Wu et al. 2005). A likelihood ratio test of relative model fit demonstrated that a model featuring the two content areas as separate dimensions, fitted the data better than did a uni-dimensional model (Δχ2(2) = 246.83, p < .001, ΔAIC = 242.88, ΔBIC = 231.71; Pollmeier 2015; Pollmeier et al. in prep.). This finding supported the notion that competency in certain content areas might develop separately from that in other content domains. Thus, further analyses for the cross-sectional data were performed separately for each content area. The two-dimensionality established for the cross-sectional data set was consistent with the results of preliminary studies (Pollmeier et al. 2011).

To clarify the influence of the hypothesized levels of understanding and sub-facets of content areas defined in the construct map on students’ performance, we devised explanatory item response models, using the R-package lme4 (De Boeck and Wilson 2004; De Boeck et al. 2011). These models explored the impact of specific person and item characteristics on students’ responses (Pollmeier et al. 2013). In particular, the analyses revealed differential proficiency in subgroups of students with regard to levels of understanding in the two content areas. We found an overall gender effect for the content area of FS, with boys outperforming girls. Furthermore, girls exhibited specific weaknesses for items on density and displacement, compared to items on buoyancy. They also, relative to boys, had a preference for explanations on the intermediate level, whereas they neglected explanations based on scientifically advanced conceptions. In contrast, for the content area of EC, overall performance did not differ between girls and boys. Yet girls also displayed a relative preference for items featuring intermediate conceptions in the content area of EC, although they did not neglect scientifically advanced conceptions (Pollmeier et al. 2013).

To explain additional variance in item difficulties, we explored the relevance of characteristics derived from the classification of item stems. Again, we employed explanatory item response models, both for the cross-sectional data and for one further experimental study, in which certain contextual features were varied systematically across item stems (Pollmeier 2015). For the content area of FS we identified congruence as an important explanatory characteristic for items associated with the concepts of density and displacement: With these items, students had to compare two objects and decide which was made of the denser material or displaced more water. Items featuring congruent objects—that is, the denser object was also the heavier object, or the heavier object displaced more water—were easier to solve than incongruent items—that is, where the denser object was the lighter object or the lighter object displaced more water.

For the content area of EC we obtained no single explanatory characteristic of central importance. However, we found that the specific content used in items accounted for a large portion of the variance in item difficulties. The most difficult content for the facet of evaporation was a naïve conception: the anthropomorphic interpretation of the physical phenomena to be explained. Items with scientifically advanced content—that is, with the correct explanation for phenomena of evaporation in age-appropriate language—were not as difficult to solve as these items, or items proposing a mere change of matter as the explanation for evaporation phenomena. Items conveying a false description of the change of matter, a description of change of location, and non-conservation of matter as explanations for evaporation phenomena, were comparatively easy to solve. For the facet of condensation, items featuring a faulty description of a cause, a change of location, a change of matter and a scientifically advanced explanation for condensation phenomena, constituted an order of decreasing difficulty of content (Pollmeier 2015).

2.3 Validity

To assess the convergent and discriminant validity of our instrument, we conducted a validation study with four third grade classes (FS: N = 41, EC: N = 32). For each content area we presented two classes with 13 item stems, both as paper-pencil items with closed response format and as interview items with open response format (Pollmeier et al. 2011). Students were randomly assigned to an order of presentation of the two forms of item. Additionally, reading ability (Lenhard and Schneider 2006) and cognitive ability (Weiß 2006) were measured. We found substantial correlations between the two modes of assessment for each content area, but also systematic differences in the responses: Students produced a wider range of answers on the naïve and intermediate levels, and fewer answers on the scientifically advanced level for interview items than for paper-pencil items. As expected, the production of explanations was more demanding than merely endorsing correct response alternatives. Apart from that, knowledge in the content area of FS, as assessed with the interviews, appeared to be more fragmented and context dependent than corresponding knowledge in the content area of EC; a discrepancy not evident in the paper-pencil items.

Moreover, for the content area of EC, performance on paper-pencil items was independent of reading ability and cognitive ability. This finding supports the claim that our instrument measured a form of science competency that was more specific than those general abilities. The substantial relation found between the test of cognitive ability and performance on the paper-pencil items for the content area of FS probably was induced by the similarity between items covering the facet of density and items assessing cognitive ability. The impact of socio-economic status on proficiency in the content of FS was evident both for interview and for paper-pencil items.

In sum, there was a systematic difference between responses to interview and paper-pencil items that can be readily explained by the discrepancy between free retrieval and recognition and that thus was not caused by a difference in the constructs assessed by the items. In other words, the positive associations between responses to interview and paper-pencil items indicate that our test instrument for assessment of conceptual understanding captured a form of science competency that is plausibly parallel to the conceptual understanding found in classic conceptual change research.

3 The Development of Conceptual Understanding in Primary School Science

Analyses of the cross-sectional data set (see Sect. 2.2.2 above) by means of explanatory item response models also yielded insights into the differences in average conceptual understanding between grade levels: Third and fourth graders outperformed students from second grade in terms of conceptual understanding, and we further unveiled the specific strengths of third and fourth graders. Within the content area of FS, third and fourth graders performed particularly well on items covering the facets of density and displacement and on items featuring scientifically advanced conceptions. In the content area of EC, students from third and fourth grade displayed a specific strength in items concerned with the facet of evaporation.

A longitudinal study with a total of 1578 students in 75 classes from primary schools in two federal states of Germany (Baden-Wuerttemberg, North Rhine-Westphalia) concluded our project. Students completed our tests on science competency at the end of third and fourth grade. For the preliminary analyses we used 48 individual items from 23 anchoring item stems, of the total 27 stems that were used in this study.

For both content areas, on average, the solution rates of anchor items increased in fourth grade and were accompanied by relatively large standard deviations (see Table 2.1, item statistics); a first hint that our instrument covered a sensible amount of variety. Also, the number of correctly solved items reveals that students on average solved more items at posttest than at pretest (see Table 2.1, person statistics). In relation to the number of anchor items assigned to each content area, this implies a relatively smaller gain in conceptual understanding for the content area of EC.

Table 2.1 Descriptive statistics for the longitudinal study

In sum, our preliminary explorations suggest that we succeeded in assessing naturally occurring growth in conceptual understanding in this longitudinal study. In future analyses based on all items, we will examine whether the small growth in the content area of EC is attributable to the general difficulty of this content or rather to deficiencies in the amount and quality of formal instruction. Furthermore, we will investigate students’ performance with regard to the various characteristics of items and item stems (e.g., the assigned level of conceptual understanding). Finally, future analyses will focus on investigating the conjoint development of conceptual understanding and scientific reasoning.

4 Conceptual Understanding and Scientific Reasoning

The issue of the relation between conceptual understanding and scientific reasoning was also tackled with the cross-sectional study data (for detailed analyses of primary school students’ competency in scientific reasoning see Koerber et al. 2017, in this volume). After calibrating our tests by the use of simple Rasch models, we retrieved weighted likelihood estimates of person ability for subsequent analyses. Multilevel analyses revealed substantial associations between scientific reasoning and conceptual understanding in both content areas that were not readily explained by relevant covariates like fluid intelligence, reading ability, interest in science, socioeconomic status, and immigrant status. Furthermore, in the content area FS, the predictive effect of scientific reasoning on conceptual knowledge slightly increased with grade, even after controlling for fluid ability. These findings hint at the possibility that proficient scientific reasoning facilitates the acquisition of conceptual understanding. Specifically, having a command of the processes of scientific reasoning could enhance the evaluation of evidence with respect to existing conceptions, which could take the role of hypotheses or even theories. This could also account for the possibly cumulative effect of proficient scientific reasoning on conceptual understanding, suggested by its interaction with the content area FS in the cross-sectional data. Future analyses of the longitudinal data will yield deeper insights into this issue.