Keywords

1 Introduction

The term Information literacy is used to describe the ability to realize when there is a need for information, and the ability to identify, locate, and evaluate information which is required to meet this need [1, 2]. Against the background that advances in information technology lead to a growing number of information resources, information literacy can be considered a “basic skills set” [3].

As information literacy is of importance in nearly all circumstances [3], a clear definition of information literacy has to be provided first of all. Hence, we limit our research to information literacy in higher education, especially in psychology. Our definition of information literacy is based on the ACRL Psychology information literacy standards [4], because this framework includes detailed performance indicators. This definition includes four standards of information literacy:

  1. 1.

    Determining the nature and amount of information needed: exemplary performance indicator: “understands basic research methods and scholarly communication patterns in psychology necessary to select relevant resources”;

  2. 2.

    Assessing information effectively and efficiently: exemplary performance indicator: “selects the most appropriate sources for accessing the needed information”;

  3. 3.

    Evaluating information and incorporating information into one’s knowledge system: exemplary performance indicator: “compares new information with prior knowledge to determine its value, contradictions, or other unique characteristics”;

  4. 4.

    Using the information effectively to accomplish a specific purpose; exemplary performance indicator: “applies new and prior information to the planning and creation of a particular project, paper, or presentation”.

We decided to focus on standards one to three, as our course was devoted primarily to the improvement of information seeking skills. Skills related to standard four are part of the curriculum of academic writing courses at German universities; consequently, they should not be part of our course. With regard to our curriculum, we expected the participants inter alia to be (better) able to understand scholarly communication patterns, distinguish between several research methods (e.g. meta-analysis and empirical study), to select the most appropriate resource for their information need, and to evaluate the literature found.

There is indication that incoming students are not sufficiently information literate [5] and that students do not become information literate during the course of their studies [6]. To address this need, almost every university library in Germany provides information literacy courses that complement the academic writing courses offered by university departments. However, most of these courses are two-hour events covering only a facet of academic information literacy (e.g. the use of bibliographic databases). Due to limited time, these courses offer few possibilities to practice information seeking, or to ask questions and discuss matters. As there is no standardized test to assess information literacy for German speaking populations, in most cases, evaluation efforts are based exclusively on feedback provided by participants. Finally, most courses are not tailored to specific disciplines, or fields (e.g. economics, social sciences). This is problematic, as scholarly communication patterns and information resources differ between disciplines. Course content should therefore be adapted to the needs of psychology students [7].

The aim of this study is to create a blended learning approach to teach information literacy to undergraduate students in psychology and to carry out an evaluation study based on both objective performance indicators using a control group as well as feedback by the participants.

The main reason for choosing a blended learning approach was to give participants the chance to work on the online materials adapted to their individual schedule. This is particularly important as students are often pressed for time. However, there should also be traditional face-to-face teaching as online learning alone seems to be fraught with higher dropout rates [8]. There is research indicating that blended learning can reduce dropout rates [9] and is more effective than traditional face-to-face teaching or online learning [10]. Furthermore, face-to-face teaching facilitates the organization of discussions, which are important for a deeper understanding of learning material [11]. At the start, the course will exist alongside the courses offered by university libraries. However, we hope that our materials will later on be used by university libraries or faculty to offer courses tailored to psychology students.

2 Outline of the Course

2.1 Content

The content of the course was determined based on the psychology specific information literacy standards provided by the ACRL [4] and based on our own considerations. The target group were undergraduate students, so the content was mainly basic information about

  • scholarly communication patterns in psychology and common publication types (e.g. empirical article, review article, edited books);

  • different information resources (inter alia bibliographic databases, internet resources) and their advantages and disadvantages;

  • appropriate use of these resources (e.g. understanding of the thesaurus and of Boolean Operators);

  • inclusion of resources provided by related disciplines (e.g. PubMed, ERIC) in case the topic is of interdisciplinary nature;

  • options for the acquisition of literature (e.g. use of electronic journal subscriptions, the local library catalogue or interlibrary loan);

  • criteria for the selecting publications beyond their content, e.g. Journal Impact Factor.

2.2 Structure

As mentioned before, the course combined online and traditional face-to-face teaching. In total, there were three modules to be completed online and two face-to-face seminars. We expected that completing the online materials would take up to four hours; both face-to-face seminars were designed to take 90 min each. The course was scheduled to be completed within two weeks, what seems a reasonable workload for college students.

The concept of the course envisaged that most of the knowledge should be imparted by the online materials, while the main purpose of the face-to-face seminars was to provide an opportunity to solve information problems under the guidance of the instructor, to ask questions, as well as to discuss the advantages and disadvantages of different search strategies and search tools. For this reason, participants had to complete certain online materials before attending the related face-to-face seminar.

Online modules 1 and 2 were related to the first face-to-face seminar. These modules dealt mainly with scholarly communication patterns, information sources (and their functions), as well as the acquisition of literature. A central element of the related first seminar was the task to find scientific literature on the question how distractions impact car driving performance. At the beginning of the seminar, the task was presented to the participants. Then, the task was split into steps (determining the information need, finding search terms, conducting the search, selecting literature) and participants worked on the steps either individually, or in small groups. The instructor was available in case questions arose. After completing each step, one of the participants (one group, respectively) had to present his/her outcome to the other participants and the outcome was discussed. Online module 3 and the related second face-to-face seminar dealt with the evaluation of publications. The online module included criteria for the selecting publications beyond their content. During the second face-to-face seminar, the students were presented several publications about a topic and were asked to apply the criteria for evaluating and selecting publications from the online module.

All online modules were provided via the e-learning platform “Moodle”. Most of the content was presented using short passages of text which were enriched by illustrations or screenshots of the relevant computer programs. This content was provided using lessons or pages inside of Moodle. The materials also included several videos. At the end of every section of the course, short quizzes were provided, so that the participants could apply their knowledge right after learning.

3 Empirical Study

3.1 Instruments

In most cases, information literacy is assessed using knowledge tests consisting of multiple choice items; at least two such tests are available commercially [12, 13]. These tests have been shown to provide a reliable, valid and economic way of measuring information literacy. However, as information literacy is a complex ability, it can be doubted whether a knowledge test can assess information literacy comprehensively. For instance, appropriate information seeking behavior is an elementary part of being information literate [14] and the assessment of information seeking behavior requires observation [15], or self-reports [14].

Besides, several authors argue that competencies should be assessed using real-life tasks instead of knowledge tests [16, 17].

For these reasons, we decided to use a multi-method approach consisting of two standardized tests which were applied in a laboratory setting: a knowledge test and information search tasks.

The knowledge test consisted of 35 multiple-choice items and had previously been developed by our research group. When developing the items, we relied on Standards 1 to 3 of the aforementioned information literacy definition [4]. A sample item is:

Which differences exist between Internet search engines (e.g. Google Scholar) and bibliographic databases?

  1. (a)

    bibliographic databases usually have a thesaurus search

  2. (b)

    Boolean Operators can only be used with bibliographic databases

  3. (c)

    the order of items on the results page is not affected by the number of clicks on each item

The test had been used in a previous study with a sample of N = 184 participants who had completed the test online. In this study, an acceptable internal consistency of the test of Cronbach’s Alpha α = 0.49 was found. Furthermore, it was found that Master level-students scored significantly higher than undergraduate students in their first and second year. These results can be considered an indication of the validity of the test.

The information search tasks are based on a taxonomy of tasks from which instances of tasks can be derived [18]. When reviewing the literature on information search tasks, we found that the existing taxonomies are of a descriptive nature and do not provide indications for the difficulty of a certain task type, for an example, see [19]. Another problem with these taxonomies was that they had been developed to classify non-scholarly search tasks in electronic resources. There are several differences between academic and non-scholarly searches; the most important one might be the use of bibliographic databases (e.g. PsycINFO) instead of internet search engines (e.g. Google, Yahoo!).

For these reasons, we decided to develop a task taxonomy specifically for academic information search tasks in psychology. The taxonomy provides three types of information search task differing in their difficulty. To be more precise, the tasks differ in the abilities and competencies required to solve the task. The taxonomy is designed in a way that abilities required to solve tasks of the first type are also required to solve tasks of type 2. However, solving tasks of type 2 requires additional competencies, as do tasks of type 3. The taxonomy can be used to develop several tasks of the same structure and difficulty which can be used for assessing information literacy. For illustration purposes, a type 2 task (medium difficulty) is provided:

Are there meta-analyses published after 2005 investigating “risk factors” for the development of a “Posttraumatic stress disorder”? If possible, indicate two publications.

To solve this task, the participant has to understand the keyword search function in a bibliographic database, and needs an understanding of Boolean operators and complex filter functions in bibliographic databases to find publications using a certain methodology (e.g. meta-analysis). Type 1 tasks are easier as they do not require an understanding of Boolean operators and complex filter functions. Type 3 tasks are more difficult, as they additionally require the participant to identify appropriate search terms before conducting the search. To score the tasks, rubrics for scoring the search task outcome (which publications were found) and the procedure applied by the students when completing the tasks were created. In line with the rubric for scoring the outcome, scores were awarded depending on how close the publications found come to the requirements mentioned in the task description (e.g. thematic focus of the publication, publication date). As stated in the rubric for scoring the procedure, scores were awarded for working on the tasks in an efficient and information literate way as defined by the information literacy standards [4]. For example, for a type 2 task, the maximum number of procedure scores was rewarded if the participant solved the task using bibliographic databases, used Boolean operators to combine two search terms and limited the results using the corresponding functions of the database.

As the knowledge test and the information search tasks primarily assess the overall increase in skills, we additionally employed the “Inventory for the Evaluation of Blended Learning” (IEBL) [20]. This questionnaire can be used to evaluate blended learning courses based on subjective feedback by participants. It includes three scales referring to general aspects of the training:

  • “Overall benefit” (usefulness of training contents, e.g., for studying, 7 items),

  • “Didactic quality” (comprehensibility and clarity of the transmission of contents, e.g. using practical examples, 6 items), and

  • “Appropriateness” (adequacy of course contents, e.g., difficulty, with regard to individual preconditions, 5 items).

Furthermore, it includes five scales referring to specific aspects of blended learning:

  • “Acceptance of online learning” (appropriateness of online learning for the conveyed contents, 5 items),

  • “Lack of social exchange in online learning” (subjective lack of social exchanges while learning online, 3 items),

  • “Usability” (clarity of arrangement and ease of handling of the online materials, 7 items),

  • “Acceptance of face-to-face learning” (appropriateness of face-to-face learning for the conveyed contents, 5 items), and

  • “Lecturer” (quality of teaching, e.g., speaking clearly, 8 items).

In sum, the inventory consists of 46 statements which have to be rated on 7-point Likert scales (1 = “I do not agree”; 7 = “I agree”). A sample item which refers to the scale “Acceptance of online learning” is “Training content is conveyed comprehensible in the online modules of this training”.

For the scale “Appropriateness”, which is used to evaluate the coverage of certain materials, a different labeling of the Likert scale (1 = “far too easy”; 4 = “appropriate”; 7 = “far too hard”) is used. Internal consistencies (Cronbach’s α) of the scales range from α = 0.69 to α = 0.89 [20].

3.2 Method

Participants.

The sample consisted of N = 67 undergraduate psychology students who took the course. Out of these students, 34 were first year students, while 33 were second year students. The average age was 21.67 (SD = 2.38). Participants had agreed to additionally participate in three data collection sessions for which they were compensated. Participants were randomly assigned to one of two groups; group 1 (experimental group) consisted of n = 37 participants, group 2 (waiting control group) of n = 30.

Procedure.

The duration of the evaluation study was four weeks, while the actual course took only two weeks. Data collection 1 took place right at the start of the evaluation study. Subsequently, group 1 participated in the course, while group 2 served as a waiting control group. Two weeks later, when the course was completed for group 1, data collection 2 took place. After that, group 2 participated in the course. The final data collection 3 took place after all participants had completed the course.

The data collections were scheduled in the computer lab of Trier University. Students were tested in groups of 15 to 22 participants under the supervision of two experimenters. They first completed three information search tasks (one task of each type), followed by two questionnaires concerning epistemological beliefs, and the information literacy knowledge test.

To complete the information search tasks, the participants could use all resources available from these computers (access to the internet, to bibliographic databases, and to the online library catalogue). The information search tasks were presented ordered by difficulty. Participants were required to record the publications found by using input boxes that were provided by the software used. After the completion of every task, the participants had to answer several questions concerning the procedure of their search. These data were the basis for scoring the procedure. As it would be beyond the scope of this paper, the results concerning the epistemological beliefs questionnaires are not reported.

The IEBL was employed after course participation. As the two groups participated at different points in time, participants from group 1 completed the IEBL during data collection 2, while participants from group 2 completed the questionnaire during data collection 3.

Hypotheses and Research Questions.

With regard to the search tasks, we expected that those tasks requiring more abilities should be more difficult; in this case, we expected type 3 tasks to be the most difficult followed by type 2 tasks, which, in turn, should be more difficult than type 1 tasks (Hypothesis 1).

As explained above, there were three variables designed to assess information literacy: outcome and procedure of the search tasks, and the knowledge test. We expected to find significant correlations among these instruments. As the knowledge test had already been tried and tested in a different study, finding correlations between the test and the search tasks would corroborate the status of the search tasks as indicator of information literacy (Hypothesis 2).

Furthermore, we expected that participants would score higher on all instruments after participating in the course. Specifically, group 1 should outperform group 2 at data collection 2. At the final data collection, there should not be any difference between the groups (Hypothesis 3).

Additionally, we expected that the training should be evaluated positively by the participants, as they should benefit from the domain-specific conceptualization of the course and the time flexibility of the blended learning approach [21]. Therefore, mean scores of all scales of the IEBL were expected to be above the theoretical mean of 4, except for the scale “Appropriateness”, whose mean was expected to be close to 4, corresponding to a judgment of training difficulty etc. as “appropriate” (Hypothesis 4).

For exploratory reasons, we examined additionally whether the mean scores of the IEBL scales referring to specific aspects of blended learning differed from each other, as the results can be used to derive detailed implications for the further development of the training.

3.3 Results

Before the course could be evaluated, the information search tasks had to be scored independently by two raters. The inter-rater-reliability (correlation between the scores awarded by the two raters) was in the range from r = 0.62 to r = 0.92; most correlations were above r = 0.70. In those cases where the scores differed, the raters agreed on one solution which was used for the analyses.

The first hypothesis to be examined was whether the expected order of task difficulties could be verified empirically. Data from data collection 1 is presented in Table 1. As can be seen, tasks of type 3 were more difficult than tasks of type 2, which, in turn, were more difficult than type 1 tasks. The table shows the percentage of the maximum score for the different tasks types.

Table 1. Percentage of maximum score for search task outcome and procedure at data collection 1.

For the following analyses, the outcome and procedure scores of each data collection were summed up separately, so that 2 scores for each data collection resulted. These scores were scaled, in order to restrict their range from 0 to 1 and are presented in Table 2. Before using these scores for evaluating the course, we determined whether there were differences between the two groups of participants before the course started. Our analysis revealed that there were no differences between the groups, neither on the outcome scores (t[65] = 1.34, ns), nor on the procedure scores (t[65] = 1.23, ns). Furthermore, the two groups did not differ in their performance on the knowledge test (t[65] = 0.78, ns).

Table 2. Mean scores (and standard deviations) for the outcome and procedure scores.

Scores on the information literacy knowledge test were also scaled to restrict their range from 0 to 1, and can be found in Table 3.

Table 3. Mean scores (and standard deviations) for the information literacy knowledge test.

To examine the second hypothesis, correlations between the scores on the knowledge test and the two search task variables were computed using data from data collection 1. It was decided to analyze data from data collection 1 only, as the performance at the following data collections reflects to a great extent how much the participants have benefited from the course, so the results might be distorted.

The outcome and procedure scores of the search tasks correlated significantly (r = 0.22, p < 0.05), even though the correlation was weak. Both scores also correlated significantly with the performance on the knowledge test (for the outcome scores r = 0.29, p < 0.01, and the procedure scores r = 0.48, p < 0.01, both one-tailed).

To evaluate the course (Hypothesis 3), the three information literacy performance indicators were analyzed separately. For each variable, a repeated measures analysis of variance (ANOVA) was computed. The time of data collection was a within-subjects factor, while group membership was a between-subjects factor. The respective information literacy performance indicator was used as dependent variable.

The performance on the knowledge test was analyzed first. The analysis revealed a significant main effect of the within-subjects factor (F[2.130] = 216.53, p < 0.01) and a significant interaction of the two factors (F[2.130] = 73.13, p < 0.01), what is depicted in Fig. 1. To analyze group differences, a t-test was computed for every data collection. At data collection 1, there was no difference between the groups (t[65] = 0.78, ns). At data collection 2, a significant difference could be found (t[65] = 10.47, p < 0.01), indicating that group 1 outperformed the other group. There was no significant difference at data collection 3 (t[65] = 0.37, ns).

Fig. 1.
figure 1

Mean scores (and standard errors) on the information literacy test. DC = data collection.

Next, the outcome scores of the search tasks were analyzed. The ANOVA revealed a significant main effect of the within-subjects factor (F[2.130] = 45.77, p < 0.01) and a significant interaction of the two factors (F[2.130] = 5.45, p < 0.01). To investigate the pattern in more detail, t-tests were calculated to compare the two groups at each data collection. There were no significant differences at data collections 1 and 3 (t[65] = 1.33 and t[65] = 0.32, respectively). However, the two groups differed at data collection 2 (t[65] = 3.32, p < 0.01). Once again, group 1 outperformed group 2, as can be seen in Fig. 2.

Fig. 2.
figure 2

Outcome scores (and standard errors) of the information search tasks.

Furthermore, the procedure scores of the search task were analyzed. The ANOVA revealed a significant main effect of the within-subjects factor (F[2.130] = 148.46, p < 0.01) and a significant interaction of the two factors (F[2.130] = 37.38, p < 0.01). Once again, t-tests were applied to analyze group differences. There was no significant difference at data collection 1 (t[65] = 1.23, ns), but significant differences at data collections 2 (t[65] = 9.21, p < 0.01) and 3 (t[65] = 3.21, p < 0.01) in such a way that group 1 scored higher than group 2, as is displayed in Fig. 3.

Fig. 3.
figure 3

Procedure scores (and standards errors) of the information search tasks.

For the analysis of the IEBL questionnaire, data from both data collections was merged; the mean scores of the scales are displayed in Fig. 4. One-sample t-tests revealed that the mean scores of all scales were significantly above the theoretical mean of 4 (p < 0.01 for all scales; alpha error level was adjusted using Bonferroni correction), except for the scale “Appropriateness”, which did not differ significantly from its theoretical mean.

Fig. 4.
figure 4

Mean scores (and standard errors) of the IEBL scales.

To examine whether the mean values of the five scales referring to blended learning differed from each other, an overall mean was computed (M = 5.54; SD = 0.56), and it was examined whether each scale differed from this overall mean by one-sample t-tests (alpha error level was adjusted again). It was revealed that the mean of the scale “Acceptance of online learning” was significantly above (t[66] = 5.91, p < 0.01) the overall mean, whereas the mean of the scale “Acceptance of face-to-face learning” was significantly below the overall mean (t[66] = −4.77, p < 0.01). The mean scores of the other three scales did not differ significantly from the overall mean.

3.4 Discussion

All hypotheses were confirmed by the results. The first two hypotheses relate to the soundness of the evaluation instruments. First, the analysis of the search task difficulties indicated that the expected order of task difficulties could be verified empirically. This shows that those tasks requiring more competencies are also harder to solve for the participants, what can be seen as an indication of validity for the search tasks and the underlying task taxonomy. The second hypothesis, postulating that there would be significant correlations among all three information literacy performance indicators, could also be upheld. The fact that the correlations are far from perfect lead us to the conclusion, that all three instruments capture different facets of the concept information literacy. It is of significance that search task outcome and procedure scores both correlated significantly with the knowledge test, as this is a signal that the search tasks are valid, because the test had already been tested in a different study.

The third hypothesis was that participation in the course improves information literacy. As can be seen in Figs. 1, 2, 3, participation in the course improved information literacy on all three performance indicators. To be exact, the hypothesis was that group 1 would outperform group 2 at data collection 2. This was hypothesized because at that time, group 1 had already taken the course, while group 2 had not. At data collection 3, there should not be any group differences left. This pattern could be observed when analyzing the scores on the knowledge test and the outcome scores of the search tasks. Analysis of the procedure scores displayed the same pattern at data collections 1 and 2. At data collection 3, however, group 1 still performed better, even though at this time both groups had taken the course. A more detailed analysis revealed that both groups improved; however, group 1 performance was enhanced stronger. No substantial explanation for this result can be found except that this might be random variations due to relatively small sample size. As mentioned above, group sizes were n = 30 and n = 37, respectively. Notwithstanding that participants had been assigned to one of the groups in a randomized way, it cannot be ruled out that some group 1 participants learned more due to individual differences. As this difference can only be observed on one of the three performance indicators, it can be ascribed to random variations. The hypothesis can still be confirmed as both groups had improved by taking the course. It should also be mentioned that there were significant differences between the groups at data collection 2. At this time a treatment group was compared to a waiting control group. The group differences show that participants did not become more information literate without training, corroborating the attribution of information literacy improvements to the participation in the course.

Analyses of the IEBL scales show that the course was perceived as positive by the participants. Thus, the fourth hypothesis could also be confirmed. The participants rated the training as beneficial for their studies and appreciated its didactic quality. They felt neither overwhelmed nor insufficiently challenged.

Finally, explorative analyses revealed a comparatively high acceptance of the online learning, whereas the acceptance of the face-to-face learning was comparatively low. Although the face-to-face learning was considered to foster a deeper understanding of the learning material, e.g. through discussions, some students might have experienced it as redundant. It seems also plausible that students partly disliked the face-to-face seminars, as they were required to participate at a certain time, and there was no time flexibility as with the online materials.

To sum up, the results show that the information literacy course is effective and perceived positively. So, this research adds to the field a blended learning course that is tailored to psychology students and has been rigorously evaluated. In the future, it might be developed further by adding elements tailored to Master levels students. As this course was tailored to Bachelor students, the participants were mainly taught the essentials of seeking academic information. Furthermore, the comparatively low acceptance of the face-to-face learning should be considered when developing similar courses. One possibility is to leave out the face-to-face seminars. A further evaluation study might show whether this is equally effective. Another possibility is to enhance the personal involvement of the participants during the face-to-face seminars. For instance, in addition to discussions, the participants could be instructed to conduct literature searches for personally relevant topics, e.g. the topic of their thesis.