Theoretical Background

The fact that scientific observation is a basic skill in scientific research and an initial starting point for studying scientific phenomena (Johnston 2009) is, in a sense, obvious: a lot of epoch-making biological discoveries like Darwinian theory of evolution are founded upon observations (Tomkins and Tunnicliffe 2007). Nevertheless, it is often dwarfed by experiments which, as the paradigm of scientific research depends on them, are also the main interest of science education.

For most people, the process of observing seems to be so trivial that it need not be learned and is not seen as a serious independent scientific research method (Tomkins and Tunnicliffe 2001). Even worse, observation is regarded as ‘just looking’ even though “seeing is only one of the senses that help us in recognition of our world, the most basic one without a shadow of doubt; yet other senses should be taken into consideration” (Yurumezoglu 2006, p. 45). That it is possible to observe smells, sounds or structures (Johnston 2009) is often not even taken into consideration.

Most people believe that they know what ‘observation’ and ‘perception’ mean, because they do it all the time in order to make sense of the world (Tomkins and Tunnicliffe 2001). But to differentiate them clearly and to define what it actually means to observe in a scientific way seems to be much more difficult.

Observation competency is a basic skill and a prerequisite for the successful use of other scientific methods. This basic characteristic may be the reason that it has been recognized as an important skill in early education and primary science in recent years (Harlen 2000; de Bóo 2006; Johnston 2009), and has become an integral part of primary science curricula in many countries (Johnston 2009), for example, Norway, Finland, the UK and Germany. As a basic scientific skill, observation is more than ‘just looking’. It must be seen as an independent research method that can be systematically learned. When observing the variety of Galapagos finches, Darwin did not use a scientific method. However, his ‘just looking’ became scientific observation by asking questions, generating hypotheses and interpreting the data on the basis of the hypotheses.

Scientific observation, therefore, includes asking questions, generating hypotheses, interpreting data and coming to a new scientific conclusion.

As children from about 4 years old are able to generate questions and hypotheses independently (Sodian and Thoermer 2006), fostering their observation competency becomes possible in preschool years. Since ‘guided play’ activities are more effective than ‘normal’ school lessons in preschool learning (Singer et al. 2006), these activities can be employed as a method of improving observation competency for children in this age group.

To afford the best possible individual promotion for each child, it is necessary to ensure that the children are working within their zone of proximal development (Vygotsky 1978). Therefore, an individual assessment of each child’s competency level is required.

It was necessary, first of all, to define clearly what scientific observation means, and then to develop and evaluate a model that describes this competency (Kohlhauf et al. 2010, in press). In order to foster this competency as individually as possible, it is also important to understand whether or not observation competency is influenced by any preconditions of the learner, for example, domain-specific interest, previous knowledge and language skills.

Experiments Versus Observation

It is important to distinguish clearly between biological observations and experiments: observation is a data collection method, whereas experiments tend to produce new data (Martin and Wawrinowski 2006). To experiment means to take a hand on biological processes, whereas to observe means to keep out of them.

You do not need a man-made situation for observations; it is possible to observe under natural conditions as well. However, the more an observer is trying to influence the situation to attend to more details, the closer he or she gets to the realm of an experiment. Observation is considered to be the “touchstone of objectivity in science” (Martin 1972, p. 112). To not intervene and manipulate forms the special objective characteristics of an observation, which provides both its discrimination and its legitimacy in comparison to other research methods.

Perception, Unsystematic Observation and Systematic Observation

The observation process has to be distinguished from perception. To perceive means to absorb all impressions which come to the fore in a passive and incidental way (Bortz and Döring 1995).

You can only perceive some details of the richness of the world around you at any one time; therefore any perception is selection (Harlen and Symington 1985; Hodson 1986; Johnston 2009). Because selection on the level of perception is unconscious, we decided to name the lowest level of observation competency ‘incidental observation’ based upon the concept of ‘incidental learning’ developed by Oerter (1996).

‘Incidental observation’ can therefore not be categorised as scientific, but is of importance because it is a precondition of scientific observation.

In contrast, observing in a scientific way means to access the world in an active goal-oriented process, which is investigating, analysing and merging (Tomkins and Tunnicliffe 2001) and which goes with Darwin’s experiences that the observation itself is worthless if it is not combined with interpretations. Incidental or unsystematic observations would not have led Darwin to his evolutionary theory, but rather his questions, his hypotheses and his interpretations had made his observations scientific systematic observations.

Pre-scientific or unsystematic observation has a subjective, spontaneous characteristic and is not planned in advance (Hodson 1986). It is unfocused, with poor separation between observation and interference (Schuster and Leland 2008). However because hypotheses are generated with this method and the underlying intent is to solve a problem, it cannot be classified as wholly unscientific.

An unsystematic observation passes into a systematic one if it is well-planned, actively seeking for information with the aim of testing a hypothesis (Tomkins and Tunnicliffe 2001). It runs under standardized conditions and specific details need to be in focus. That means that a subject of interest has to be selected out of the richness of impulses and reduced to its basic characteristics. These characteristics have then to be classified and put together as an understandable unit (Wilkinson 1995).

We therefore acted on the assumption in our study that there are three levels of observation competency: incidental, unsystematic and systematic observation.

Observation Competency: More than ‘Just Looking’

Observation has been described as a complex activity (Oguz and Yurumezoglu 2007) and that it is more than ‘just looking’ because it involves cognitive activity (Hodson 1986; Millar 1994). Based on the literature from biology, psychology and biology education, we proposed in this study that scientific observation, as Darwin did it, can be divided into five different dimensions: describing, asking, assuming, testing and interpreting.

Describing

Describing the observed object or process is an essential dimension of observation competency because it not only reflects the observer’s level of examination and comprehension but also documents the observation.

The most common method of description is through verbalisation. Vygotsky (1986) established the explicit connection between speech and the development of mental concepts.

He is of the opinion that the relationship between language development and thought develops in early childhood during the child’s interactions with more capable people.

Additionally the formulation of ideas or discovery through talking and private speech activates prior knowledge and helps to create new concepts and further ideas (Tomkins and Tunnicliffe 2001).

Most human responses to stimuli, be they verbal, musical or visual, reflect some level of comprehension. A nonverbal description, for example, a drawing, can also be seen as an additional form of documentation.

However, language documentation is also an indispensable instrument in science, for example, in results comparison. Language acts as an essential element of communication and has, above all, an attention-directive function: the focus of another person can be directed on specific details by speech (Hodson 1986).

Assuming, Asking and Testing as Parts of the Dimension ‘Scientific Reasoning’

In order to generate a research question, it is necessary to develop the skill of divergent thinking, which means to seek actively for information (Oguz and Yurumezoglu 2007; Tomkins and Tunnicliffe 2001). At the beginning of any study there is often the curiosity to find out something new about subjects or their interrelations. But to ask questions in the way that it is possible to investigate them in an observation, is not as easy as it seems to be and requires some practice (Naguib 2006).

It is often in unsystematic observation that some questions arise. For any systematic observation, it is necessary to clearly define the question to be investigated before the observation process is started; only by doing so can one ensure that no relevant information is lost and that a conscious focusing on and selecting of details—which are relevant to answering the question—are possible (Naguib 2006).

The skill of divergent thinking is also needed for generating possible hypotheses, which should be previously defined to encourage objectivity during the process (Harms et al. 2004).

However in order to think divergently and to seek actively for ideas and approaches, it is first necessary to differentiate between hypothesis and evidence.

This ability, according to the latest findings in psychology (Sodian and Thoermer 2006), develops at approximately 4 years of age and provides the basis of a Theory of Mind.

As children of about 4 years old can reflect upon different points of view (Sodian and Thoermer 2006) and can understand that there are many different hypotheses which can be checked by observation, we can assume that it is possible to foster observation competency in preschool.

Interpreting

Last but not least, it is important that any observations are interpreted (Schuster and Leland 2008). Findings should be systematically evaluated and evidence-based. The aim of an interpretation is to correlate the data with the initial hypotheses and to decide whether the proposed hypothesis can accepted or rejected (Naguib 2006). It is therefore very important that observations and interpretations are not intertwined. As for Darwin’ work, it was his rigorous interpretations of the collected data that had led him to his evolutionary theory.

These three dimensions were empirically confirmed in Kohlhauf et al.’s (in press) study in 2010 in which the model (see Table 1) was empirically verified. The model proposes three dimensions (‘Describing’, ‘Scientific reasoning’ and ‘Interpreting’) with three levels for each dimension (‘Incidental observation’, ‘Unsystematic observation’ and ‘Systematic observation’). The model in Table 1 formed the basis for assessing and fostering the individual observation skills of preschool children in this study.

Table 1 Empirical model on observation competency (Kohlhauf et al. in press)

Factors Affecting Observation Competency

To assess the abilities of each child in the best possible way, it is important to know to what extent previous knowledge, language skills and domain-specific interest have an impact on observation competency.

Previous Knowledge

It has been assumed that domain-specific previous knowledge has a strong influence on domain-specific achievement in the science education literature over the past decades, for example, in Ausubel’s assimilation theory of meaningful reception learning (Ausubel et al. 1978).

Whereas reported in several studies that students’ previous science grades affected their final examination performance (e.g., Gooding et al. 1990; Blurton 1985; Hegarty-Hazel and Prosser 1991a, b), Johnson and Lawson (1998) found that reasoning ability, not prior knowledge accounted for the significant amount of variance in students’ biology final examination grades, irrespective of whether they were enrolled in expository or inquiry classes.

In a previous study, Lawson (1983) found that beliefs and prior knowledge relating to the subject of study were the best predictors of students’ performance on multiple-choice items. Due to these controversial findings, it is of interest to understand to what extent previous knowledge affects observation competency.

Language Skills

Improved language ability also appears to be strongly associated with higher academic performance in school-aged children according to several studies (e.g., Aram et al. 1984; Kastner et al. 2001). Language skills are furthermore seen as one of the most important preschool skills. Preschool years are critical to the development of children’s literacy skills which, in turn, play an important role in their acquisition of reading ability (Pullen and Justice 2003).

Verbalisation is therefore assumed to be integrated in observation competency, specifically the ability to describe details, ask questions, generate hypotheses and document findings. To what extent language skills impact the observation competency was investigated in this study.

Domain-Specific Interest

The role of interest in learning and development has also been researched in many studies (e.g., Schiefele 1991; Renninger et al. 1992; Denissen et al. 2007), where interest levels are always good predictors for further achievement.

Tunnicliffe and Litson (2002) confirmed that situational interest in the object or process a child is observing is indispensable for doing well; and it was also proposed that “the use of motivating scientific phenomena or objects has helped children to make close observations” (Johnston 2009, p. 3). Therefore, the influence of situational interest on observation competency seems to be obvious.

However, the problem of whether or not and to what extent domain-specific interest (in the three animals and in nature and animal observations in this study) might have an impact on students’ observation skills, had not been researched before and was therefore part of the research purpose in this project.

Based on the above theoretical background, the research reported in this paper aimed to answer the following research questions:

  1. 1.

    Do language skills have an impact on observation skills and if so, to what extent?

  2. 2.

    Does previous knowledge have an impact on observation skills and if so, to what extent?

  3. 3.

    Does domain-specific interest have an impact on observation skills and if so, to what extent?

Methodology

Participants

The participants (N = 110) in this research were aged between 4 and 29 years, including children (n = 26) who attended a kindergarten, pupils (n = 27) in a primary school, pupils (n = 28) in a secondary school and university students (n = 29) studying biology education; all of them were in a rural environment close to Munich, Germany. All the children and pupils came from middle class families.

The youngest children were encouraged with the help of a puppet to get familiar with the observation situation and the researcher.

Ethics

Permission was given by parents of children under 18 years of age prior to the research being undertaken.

At the start of the research, the nature of the activity was explained to the children as well as their teachers and to the pupils and students who participated in the study. All participants or their legal guardians (parents) were asked to give their permission for being videotaped while taking the observation test. The videos were just used for this scientific research study and would be deleted afterwards.

Methods and Data Analysis

First, the participants’ domain-specific interest and previous knowledge on three animals (mice, fishes and slugs) were investigated using an ad-hoc developed test and questionnaire (see Appendices I and II). Additionally the language skills of participants in the age group of 4–7 years were tested with a computer-based test called CITO (CITO Deutschland GmbH 2008).

Second, the participants’ observation competency was investigated by an ad-hoc developed oral-practical test, in which the participants were asked to observe the three animals (compare an example for one animal from “Appendix 3”). The collected data was then used to analyse the correlation between observation competency and the participant’s language skills, previous knowledge and domain-specific interest.

Data on previous knowledge about the animals used for observation were collected with an ad hoc developed open-ended assessment test consisting of 18 items, of which 5 were on slugs, 6 on mice and 7 on fishes (see “Appendix 1”). As we worked with young children from kindergarten we aimed to ask the questions in as easy a way as possible and used only easy vocabulary without technical terms. To make it as easy as possible, we used the easy grammatical structure of the questions and the same structure in all questions. To reduce the probability of guessing by the participants, we avoided using multiple-choice items. Each question could be answered with a single word so that the answers could easily be judged as right or wrong. These answers were coded as 0 or 1 (0 = ‘wrong’, 1 = ‘right’) and analysed with SPSS 18 (SPSS Inc 2009). All three scales showed acceptable values for homogeneity.

The questionnaire on interest consisted of 6 items on a 4-point Likert-scale (see “Appendix 2”). The participants were asked about their interest in each of the three test animals in particular, and their interest in nature, animals and animal observations in general. The data were coded from 1 to 4 (1 = ‘not interested’, 2 = ‘slightly interested’, 3 = ‘quite interested’, 4 = ‘very interested’) and were also analysed using SPSS 18. As test scale smileys were used to make it as easy as possible for the preschoolers to answer, analyses of the scale showed acceptable values for homogeneity.

Because some of the preschoolers were unable to read or write, the questions in the test and questionnaire were asked orally and answers were written down for them by the researcher.

Language skills were analysed by a commercial instrument called CITO (CITO Deutschland GmbH 2008), which was first developed to test the language skills of 4–7 year old children before school enrolment and is now officially used in some German federal states. The test consists of four components: passive vocabulary, cognitive terms, phonological awareness and text comprehension skills. The children were guided through the test and its 175 questions by a virtual clown called ‘Primo’ and had to answer the questions by choosing the right picture. Due to its length (in this study it took about 30–45 min depending on the child’s ability) the test might have been exhausting for some children, but was fun for most of them and easy to handle for both the children and the researcher. The results were non-ambiguous and detailed for each component. The children’s answers on each of the 175 items were judged as right or wrong by the software program, coded as 0 or 1 (0 = ‘wrong’, 1 = ‘right’) by the researcher and analysed with SPSS 18. Because this test was targeted on only children from 4 to 7 years, all participants older than 7 years did not take this test and for the purposes of this study were automatically given a score of 175 credits.

The test on observation competency was developed as an oral-practical test in the form of an interview (compare an example for one animal from “Appendix 3”). The three living animals (a mouse, a fish and a slug) were presented one after another to the participant, who was asked to imagine he/she were a biologist that had decided to observe the animal. As we aimed to discover if the participant was able to develop research questions and hypotheses by himself/herself and if he/she was able to interpret the observation without any help, we tried to give only few hints through our questions. Therefore, we could analyse in which area the participant needed help. The participants were asked to explain what they would like to find out and how they were going to do it as well as to describe their actions throughout the observation period. If the participant had no ideas, the researcher assisted him/her through each stage of the process.

The participant was asked to describe the animal, to think about a possible research question, to generate a hypothesis and to test and interpret it afterwards. Thus, the test consisted of 15 items, 5 items for each animal. The three animals were chosen because they are easy to handle and represent three different zoological categories. To motivate the participants and to give the feeling of a real situation, we decided to use real living objects. The whole scenario with three living animals took about 20 min per participant and was videotaped so that answers given during the test could be coded and further interpreted later. The homogeneity of the test was ensured by using three different animals. Therefore we could observe if the participant showed similar observing behaviour for different animals. Furthermore, 10% of the videos were additionally coded by an objective second rater to allow the calculation of interrater reliability. By analysing this test with a Rasch-analysis and conducting a confirmatory factor analysis of the Rasch-measurement values, our previously constructed model (Kohlhauf et al. in press) of 2010 on observation competency (see Table 1) could be empirically tested. It is therefore possible to classify each person doing the test from this competency model.

In order to understand how domain-specific interest, previous knowledge and linguistic skills influence the model, all the data was analysed using a Structure Equation Modeling (SEM) with Analysis of Moment Structures (AMOS) 18.0 (Arbuckle 2009), a program for visual SEM. With AMOS a graphical model can be drawn by using rectangles to represent the observed variables, ellipses to represent the unobserved variables, single-headed arrows to visualise paths and double-headed arrows for covariances. The program then assesses the model’s fit, paths and the values for covariances.

The structural model of this study was drawn with the three latent, unobserved variables, ‘language skills’, ‘previous knowledge’, ‘domain specific interest’, combined with double-headed arrows for covariances, each of them pointing to the unobserved variable ‘observation competency’ by single-headed arrows as paths.

‘Language skills’ were measured with four scales: ‘vocabulary’, ‘terms’, ‘phonology’ and ‘text comprehension’ (4 rectangles); the variable ‘previous knowledge’ was tested by 18 items across three scales ‘knowledge on fishes’, ‘knowledge on mice’ and ‘knowledge on slugs’ (3 rectangles), whereas the ‘domain specific interest’ was shown with 6 individual items (6 rectangles).

The ‘observation competency’ itself was tested by items on the three scales ‘Describing’, ‘Scientific reasoning’ and ‘Interpreting’ (3 rectangles).

Missing data (MissingCompletelyAtRandom) in the data set was eliminated by the EM Algorithm (Dempster et al. 1977) with SPSS before conducting the SEM analysis.

Results

The test reliabilities of all four tests were good (Cronbach’s Alpha values ≥ .7) in terms of Nunnally’s (1978) cut-off-value for acceptable reliability indicating that these four tests were reliable instruments (see Table 2).

Table 2 Test reliabilities

The created model (see Fig. 1) (χ2 = 169.70, df = 98, χ2/df = 1.73, p < .001, RMSEA = .08, SRMR = .07, CFI = .95), which was estimated by the ML-method, showed a barely satisfactory model-fit on consulting respectively the inferential criteria of χ2/df ≤ 2 (Byrne 1989, p. 55), RMSEA ≤ .08 (Browne and Cudeck 1993) and the descriptive criteria of SRMR ≤ .08 (Hu and Bentler 1999, p. 27), as well as the baseline comparison of CFI ≥ .95 (Bentler 1990).

Fig. 1
figure 1

Standardized solution for the structural model of observation competency

The model would have been narrowly rejected by the significant Chi-square test, probably due to the violated multivariate normality assumption (see Table 3) according to the Mardia’s (1970) test, in which the multivariate c.r.-value must not be higher than 1.96.

Table 3 Violated multivariate normality assumption

Therefore, a Bollen-Stine-Bootstrap analysis (Bollen and Stine 1992) was conducted and the result was not significant (p = .025), so that the model can be assumed to fit despite the significant Chi-square test.

Figure 1 shows a completely standardized solution for the created model. The standardized regression weights (see also Table 4) are shown on the single-headed arrows, correlations between latent variables (see also Table 5) are on the double-headed arrows and numbers over the measured variables stand for multiple squared correlations.

Table 4 Regression weights (standardized and unstandardized values)
Table 5 Covariances and correlations

The negative regression weight (between observation competency and domain-specific interest; λ = −.17, p < .01) and the two negative correlations—between language and domain-specific interest (β = −.20, NS) and between previous knowledge and domain-specific interest (β = −.31, p < .01) resulted from a cohort effect in the correlation between interest and domain-specific interest—meant that these correlations were totally different in the case of pre- and elementary school children compared to those in the pupils from Gymnasium or those in the student group. A consequence of this might be to take the latent variable ‘domain-specific interest’ out of the definite model, which clearly improves the model fit (χ2 = 44.40, df = 32, χ2/df = 1.39, p = .07, RMSEA = .06, SRMR = .04, CFI = .99), while the regression weights and correlations between the other variables remain constant. In order to understand the situation in the cohort of pre- and elementary school children however, it is worth having a closer look at this sub-sample (n = 53). Figure 2 shows a completely standardized solution for the drawn model (χ2 = 120.22, df = 98, χ2/df = 1.23, p = .06, Bollen-Stine corrected value p = .59, RMSEA = .07, SRMR = .08, CFI = .95) of the sub-sample including standardized regression weights, correlations and multiple squared correlations.

Fig. 2
figure 2

Standardized solution for the structural model of observation competency in a subsample of pre- and elementary schoolers (n = 53)

The sample size in this sub-sample was small due to a low number of cases, so that the results of this Structural Equation Modeling have to be interpreted carefully and in case of regression weights and correlations, the results were not significant because of the small sample size.

Nevertheless, it is worth looking at the tendencies. There was a positive correlation between domain-specific interest and previous knowledge (β = −.31, p < .01; see Table 6), but the regression weight of domain-specific interest on observation competency is negligibly low (λ = .05, NS, see Table 7). The regression weights of the other latent variables‚ ‘previous knowledge’ and ‘language’ showed similar values as in the main-sample (see Table 7).

Table 6 Covariances and correlations (n = 53)
Table 7 Regression weights (standardized and unstandardized values; n = 53)

Discussion of Findings

In this study, the Structural Equation Modeling analysis showed that previous knowledge was, as expected, the factor that clearly predicted observation competency (λ = .62, p < .001; see Table 4). This finding supports the idea that our mind is not a ‘tabula rasa’, because “we interpret the sense data that enters our consciousness in terms of prior knowledge, beliefs, expectations and previous experiences” (Hodson 1986, p. 19).

Even if an incidental observation eventually begins with simple unprejudiced perceptions, a theory-driven scientific observation is dissimilar to a researcher going out into the world with no prior ideas in mind. We all are influenced by our previous experiences (Hodson 1986). These experiences and previous knowledge filter out or amplify special ideas coming into our minds. The more we know about the object or process observed, the more research questions and hypotheses seem to be created in our thoughts. Whereas the finding that previous knowledge has a high impact on observation skills appeared to be relatively obvious, the impact of the other two influencing factors might appear unexpected.

The Structural Equation Modeling analysis also showed that domain-specific interest had no influence on observation competency at all—a person who is not interested, perhaps even disgusted by mice, slugs or fishes, observes these animals as well as someone who is highly interested in animals and nature. This might be surprising at first glance. One reason for that might be the objective position of an observer. Since it is possible to reduce fear and disgust for some animals (e.g., spiders) by exposure-based treatment (Smits et al. 2002), the same seems to happen with an objective scientific observer. You need not intervene in the process or be involved emotionally; however with the course of time you learn that there are some things that are worth observing, even if you were previously not interested. The statement of Johnston that “children observe only what interests them” (Johnston 2009, p. 2) is therefore not supported by the findings of this study, although this may probably be the case for situational interest (not investigated in this study).

We can assume that the observation situation in this study was relatively motivating for all the participants, because of its unusual, novel characteristics and the participants’ contact with living animals.

However, personal domain-specific interest differed from person to person, but did not influence personal observation skills.

A possible further study could be to investigate whether the domain-specific interest increases after having experienced an interesting observation situation with interesting findings and to investigate if training of children for observation competency contributes to the development of personal interest.

Language skills do not have as a great influence on observation competency as expected either.

With regression weights of .33 and .34, respectively, in the pre- and elementary school samples (see Tables 4, 7), language skills did have some impact as expected. Scientific observation is not possible without documentation; and consequently a form of verbal communication is needed. However, this impact is moderate. The assumption that only measuring a child’s language skills or language development by measuring its observation competency can therefore be rejected.

In fact, perhaps this even legitimises the early training of children for observation skills. For young children, language skills are still under development and can be positively affected by the training for observation competency where verbal communication is necessary. A further study may be meaningful in testing this hypothesis.

The empirical model on observation competency should ideally be used by preschool teachers to assess the actual observation competency of children in kindergarten and preschool. This is necessary to promote each individual child’s competency level.

Therefore, based upon this model, the development of modules of guided play activities for fostering observation competency in preschool is in process. Cooperation with preschool vocational training institutes is necessary in order to facilitate training of children in observation competency and preschool teachers for developing the guided play activities. Teachers who are well trained in guided play activities for observation competency will be able to consistently and regularly use these activities in their teaching.

The developed modules will then be empirically evaluated during a long-term intervention in German kindergarten and preschool classes in 2011 with the help of the trained preschool teachers. A posttest analysis—to further understand the relationships between observation competency, domain-specific interest and language skills—is being planned for the beginning of 2012.

To conlcude, due to the limited test time we had when working with preschoolers, until now we have not yet validated the knowledge test, the questionnaire on interest and the test on observation competency. This would be done in further studies. Nevertheless, if the competency model proves itself to be useful in reality, we would then argue that it is therefore valid.