Introduction

The teaching of the responsible conduct of research (RCR) is now commonplace in many research institutions. This was not always the case, but it has been encouraged both by a perceived need [1] and by requirements to provide such training [2]. Surprisingly, little is currently known about the success of RCR education programs in achieving any specified outcomes.

The purposes of RCR education are not immediately apparent. In general, the goals for any kind of teaching can be divided into four broad categories: (1) knowledge; (2) skills; (3) attitudes; and (4) behaviors. Knowledge, skills, and attitudes correspond directly with the cognitive, psychomotor, and affective domains defined by Bloom [3], while the pedagogical hope is that the end result will be a change in behavior. Corresponding examples of learning objectives for RCR education could include to: (1) be able to name the principles outlined in the Belmont report, (2) be able to demonstrate moral reasoning skills, (3) demonstrate an attitude that discussions on the responsible conduct of research are worthwhile, and (4) give credit where credit is due. Further, the wide range of credible outcomes might be short- or long-term, for individuals or a community, of greater or lesser importance to the responsible practice of science, and measurable or not. Unfortunately, the evidence that RCR education meets even a small part of the range of possible goals is scarce [47].

The best evidence for positive outcomes of RCR education is for the skill of moral reasoning [8, 9]. Although this is not always the case [10], it is worth considering that moral reasoning skills alone are insufficient to ensure responsible conduct. Furthermore, it is arguable that even some of the most egregious cases of research misconduct were committed by individuals who had the capacity for excellent moral reasoning, but lacked the attitude or will necessary to apply those skills. Similarly, insufficient knowledge of relevant facts or resources could mean that excellent moral reasoning will still result in a flawed outcome.

The present study was designed to assess the impact of a very short-term RCR education experience on selected outcomes that might be classified as examples of knowledge, skills, attitudes, and behavior. Importantly, this is not a study to assess whether all RCR courses are effective. Rather, it is an initial study of only one course. This is also not a study to create an ideal RCR course, nor is it designed to prove the effectiveness of RCR education. Instead, the goal was to take a first step in looking at the effectiveness of existing RCR education programs. This approach was taken for two purposes. First, many RCR education experiences are necessarily short because they must be inserted into research experiences that are intended to be only part-time and/or to last for a short period of time. Therefore, it is worth asking if a brief experience (<6 h) in RCR education has a positive impact on any of the intended outcomes. Second, it is hoped that such a study will provide a useful starting point for the design of future studies by identifying potential areas and approaches that might warrant consideration.

Methods

Subjects

Between July and August 2003, students at the University of California, San Diego (UCSD) School of Medicine were asked to complete a survey. Candidates for this study were medical students participating in an NIH-sponsored Summer Research Program. As part of this training program, students were required to attend a series of four training seminars, two of which were focused on the responsible conduct of research (RCR) and two of which dealt with policies and procedures for the use of animal and human subjects. The study population consisted of three groups, all of which participated in the NIH Summer Research Program between their first and second years of medical school: (1) Summer 2001; (2) Summer 2002; and (3) Summer 2003. To assess the possible influence of the RCR course on the outcome measures, surveys were administered both before and after completion of the two RCR seminars for the Summer 2003 group. This study was approved by the UCSD Institutional Review Board (#030783SX).

RCR course

The Summer RCR course consisted of four training sessions. Each session was conducted using a lecture/discussion format and was up to 1–1/2 h in duration. The focus for two of the sessions was institutional requirements for the review of research involving animal and human subjects, respectively. The remaining two sessions had a more general focus on research ethics and RCR. Handout materials and a PowerPoint slide presentation for the RCR lectures can be found at: http://www.ethics.ucsd.edu/effectiveness.

Survey instrument

Surveys were developed based on preliminary findings from other ongoing studies [11] and focus group discussions. Based on interviews of over 50 teachers of RCR courses, it was found that goals for instruction vary widely. However, using an iterative classification process, Kalichman and his colleagues found that these diverse goals could be broadly classified into one of four categories: knowledge, skills, attitudes, and behavior. Representative examples of those goals were selected and preliminary questions were developed to address each of these categories. An initial version of this survey was presented and refined through meetings with two consecutive focus groups of five and four students, respectively. All were student researchers and all but two were medical students from the target population. At the beginning of each focus group discussion, students were asked to complete a version of the survey containing all potential questions. After completion, the group engaged in a guided discussion to assess the clarity and appropriateness of the questions. Participants were encouraged to provide suggestions for improving the survey.

The final survey was designed to analyze the effect of the RCR training on knowledge, skills, attitudes, and behavior. To assess RCR knowledge, the survey included 12 multiple choice questions. The correct answers to these empirical questions were determined in advance by the investigators, and correct understanding of the questions was verified through the focus group discussions. To assess attitudes and behavior, seven questions asked for responses using a five-point Likert scale and two questions asked about conversations with colleagues regarding RCR. It was not assumed that these questions should have a “correct” answer, but it was of interest instead to know whether these attitudes or conversations differed in the pre- and post-testing of the students. Understanding of these questions had also been verified in the focus group discussions. To assess ethical reasoning skills, participants were asked to respond to a brief scenario with each student randomly assigned to one of three such scenarios. Again, it was not assumed that respondents should come up with the “correct” answer, but it was assumed that one measure of ethical decision-making skills is the extent to which answers were based on recognizing what interests are at stake (e.g., the interests of individuals or of the institution). When repeat surveys were distributed to students following the 2003 seminars, scenarios were selected so that no individual received the same scenario he/she had received with the pre-course survey. Survey questions are included as Appendix 1 and a copy of the complete survey is provided at: http://www.ethics.ucsd.edu/effectiveness.

Surveys were distributed with a cover letter and administered via mail, email, or in-person. For the 2001 group of 34 students, contact information was not available for two students and three of the students were excluded because of prior participation in the focus group discussions; surveys were completed by 13 trainees, giving a response rate of 45% (13/29). For the 2002 group of 36 students, three students could not be contacted because of lack of contact information; surveys were completed by 14 trainees, giving a response rate of 42% (14/33). For the 2003 group, 23 of 23 students present (100% response rate) participated in the ‘pre-course’ survey and 15 of 16 students present (94% response rate) participated in the ‘post-course’ survey.

After collection, surveys were coded for entry into the database. The knowledge (factual) questions were scored against a key prepared by the survey authors. These scores, along with student responses to each question and responses to the seven Likert questions were compiled in a database file. For scoring purposes, the investigators identified possible interests at stake for each scenario. To minimize the risk of bias, coded surveys were scored independently by the three authors. The dependent variable for scoring of ethical decision-making was the number of interests identified by the respondents.

Analysis

Statistical analysis was conducted on the entire cohort and by groups (2001, 2002, 2003 pre-course, 2003 post-course). For descriptive purposes, demographic data were presented with distribution-free measures (Table 1). Because parametric tests are typically more powerful than non-parametric tests, and because the data distributions did not deviate sufficiently from normality to overwhelm the robustness of the ANOVA and t-tests, statistical comparisons were made using these parametric tests (Tables 2, 3, and 4). Group differences in the mean scores for the knowledge questions were determined by ANOVA for the 2001, 2002 and post-course 2003 groups. Changes in the knowledge scores for the 2003 group taken before and after completion of the RCR course were assessed with a t-test. Statistical analyses were conducted using SAS version 8.2 (Cary, NC).

Table 1 Prior courses in research ethics and experience in research
Table 2 Performance on knowledge-based questions
Table 3 Student attitudes
Table 4 Number of conversations about research ethics

Results

In 2003, 65 surveys were completed by medical students participating in UCSD School of Medicine’s summer research program. Respondents were grouped according to the year in which they took the course: 2001, 2002, or 2003. Students in the 2003 group were asked to complete the survey both before (Pre) and after (Post) the summer research ethics course. At the time of distribution of the Pre surveys, 23 students were present and all completed the survey. At the conclusion of the course, 16 students were present, 15 of those students completed the survey. Only ten of the 15 had also been present on the first day of the course.

No significant differences were found among the three groups of respondents with respect to prior courses in research ethics or their research experience (Table 1). No more than three students in any of the groups reported having taken a previous course in research ethics. For all groups, the median research experience was no less than 2 years.

In a comparison of 2003 Pre and Post student scores (Table 2), performance on the knowledge questions improved significantly (p < 0.05). However, there were no significant differences among the groups of students who had completed the course in 2001, 2002, and 2003. When analyses were restricted to those knowledge questions that were covered in lecture and/or discussions, the difference was more dramatic (p < 0.005). For these same questions, there were no significant differences in the scores for the groups of students who had completed the course in 2001, 2002, and 2003. For survey questions not specifically covered in the course, there was no statistically significant difference between Pre and Post scores for the 2003 group; however, based on analysis of variance, scores were greater for students taking the course in 2001 and 2002 than in 2003 (p < 0.05).

Attitudes, as assessed by scoring of statements related to RCR, were not significantly different between the Pre and Post 2003 groups (Table 3). The one difference that approached statistical significance was in response to the statement “Formal training in the responsible conduct of research should be required of all researchers.” The tendency was for Post 2003 students to be more likely to agree with this statement than the Pre 2003 students (p = 0.075). In comparing responses among students who took the course in different years, some evidence for attitudinal differences was found. Specifically, one statistically significant change was in response to the statement “If you were concerned that someone senior to you was conducting experiments on cats that might as easily be conducted with frogs, then you would be willing to raise this issue with them.” Students who had most recently completed the course (2003) reported greater agreement with this statement than those who had taken the course in 2002 and 2001 (p = 0.022). Furthermore, in comparisons across students who had completed the course in different years, results approached statistical significance with 2003 students being less likely to agree with the statement that “Sloppy recordkeeping in research should be considered an example of research misconduct” (p = 0.085) and more likely to agree with the statement that “Formal training in the responsible conduct of research should be required of all researchers” (p = 0.068).

In a comparison of Pre and Post students for the 2003 course, the reported number of conversations about research ethics outside of class with either other medical students or other researchers was not significantly different (Table 4). However, in a comparison across years, students who took the course most recently reported more conversations with other medical students than did the respondents who took the course in 2002 and 2001 (p = 0.007). A similar pattern was found for discussions of research ethics with other researchers, but it was not statistically significant.

Skills in ethical decision making were assessed by scoring student responses to selected scenarios. The differences between Pre and Post scores for 2003 students approached statistical significance (p = 0.06), but no differences in performance were seen over time in comparisons among the 2001, 2002, and 2003 students (p = 0.56). It is noteworthy that students rarely identified even a small fraction of the possible ethical interests in these cases.

Discussion

The principal findings of the present study were: (1) a small, but statistically significant, improvement in scores on questions that tested knowledge of specific facts; (2) a borderline significant improvement in ethical decision-making skills; and (3) a borderline significant increase in agreement with the statement that responsible conduct of research courses should be required.

These results are not encouraging, but it is important to be clear that this study is not a referendum on either the effectiveness or the importance of RCR education. The course studied for this project was brief and limited in scope. Nonetheless, the course has been in place for nearly 10 years to meet NIH requirements for training in the responsible conduct of research. Precisely because it is inserted into a short-term (Summer) research experience, it has by necessity always been brief. Given the context in which these courses are taught, many courses are likely to be similar to this one.

The course studied was small both in duration and in enrollment. Neither of these factors is ideal given the goals for assessing effectiveness, but if the goal is to assess the effectiveness of existing RCR education programs, then it is essential that such studies be conducted. Despite more than 15 years of an NIH recommendation that RCR education should be provided for all trainees [2], not just those funded by NIH, this training is typically limited only to NIH trainees. In a recent study, only six of 50 RCR instructors reported that their courses were required for all trainees [11]. And when these courses are taught, they are often small. In another recent study, the median enrollment in a sampling of 11 RCR courses nationwide was just 20 students, and three of the 11 courses had fewer than 11 students enrolled [12]. In theory, a larger sample could be obtained by assessing multiple courses, but the results would be confounded because the courses are taught by different instructors, in very different environments, at different times, and in many different ways. The present study is a necessary first step before creating an instrument sufficient to detect effects despite the many sources of variation in a nationwide survey of numerous courses.

Of the four educational outcome variables studied (knowledge, skills, attitudes, and behaviors), improvement in student performance on knowledge-based RCR questions was the outcome most clearly shown by statistical analysis. When comparing the 2003 Pre and Post RCR course groups, we found a statistically significant increase in student performance on questions testing general RCR knowledge. When comparing the scores on the knowledge-based questions of 2003 Post students to the 2001 and 2002 groups (both of which had completed the course in previous years), there was no significant difference. A possible interpretation of this finding is that this course produced a lasting improvement in the area of general RCR knowledge, but it is also possible that these students performed better on the knowledge-based questions because of exposure to RCR issues in the course of their research program. In addition, it is worth noting the possibility that the 2001 and 2002 students performed similarly on the test not because of their initial training, but because the content of the questionnaire was reinforced by other education and training. However, the finding that RCR courses are effective in improving general student knowledge of RCR corresponds to student perceptions about RCR courses demonstrated in previous studies [12]. It has been shown that students in RCR courses perceive that of the four outcome variables, it is in the area of knowledge of RCR issues that the courses are most effective.

The statistically significant improvement found in performance on knowledge-based questions is especially impressive when considering some qualities of the RCR course studied. The course is very short, with two of four sessions devoted to the practical dimensions of conducting research with human and animal subjects. Although these are typically included in lists of RCR topics, the focus of these sessions was primarily on procedural issues rather than ethics per se. These were supplemented by two sessions taught by one of the authors (MK) that emphasized other general RCR topics through lecture and class discussion, with the majority of the teaching in small group format. Therefore, although the total course is 6 h in duration (1.5 h per session), the RCR lectures not dealing with research subjects consist of considerably less than 3 h.

This study was intentionally designed without any attempt to match the survey to the material actually covered in the course. That is, the questions were not intended to be a final exam for this particular course but instead a survey of general RCR knowledge. Few of the questions reflected primary teaching goals for the course. This may seem counter-intuitive, but the goal was not to demonstrate the effectiveness of this particular course, but to assess the positive outcomes, if any, of a short-term course experience. Unlike many other kinds of courses, it is hoped that students will be learning much of the material outside of the course. For example, it may be that if students become more aware of the ethical dimensions of the practice of research, then they are more likely to raise questions in the research environment about a wide-range of topics—not just those that were covered in the course. The result is that even if a student was not exposed to key information about RCR in the course, she might initiate conversations about RCR and learn that information outside the course. If the course had been designed solely to match to the questions on the survey, then improvements in knowledge would be less surprising. Thus, despite the small improvement in performance on the test of knowledge, it is impressive that the result reached statistical significance. Additionally, because one of the aims of the current study was to design a survey instrument that might be generalized to evaluate other courses, the questions on the survey were designed to be applicable to the field of RCR in general rather than only applicable to the specific RCR course we surveyed. As a result, the survey instrument might be applied to other courses with different instructors, student populations, or pedagogical methods.

The effect of the RCR course on attitudes was examined using a Likert scale for student responses. When comparing 2003 Post to Pre students, no statistically significant differences were found in student attitudes as assessed by any of seven statements posed on the survey. However, responses to the statement “Formal training in the responsible conduct of research should be required of all researchers” approached statistical significance (p = 0.075) with a tendency for the Post group to agree with the statement more strongly than the 2003 Pre group. Although this result is of borderline significance, it suggests a trend toward attitudinal changes as an effect of RCR courses. This nominal effect is consistent with the view that RCR courses tend to be effective in increasing student knowledge, while they are perceived to be less effective in changing student attitudes [12].

Examination of the effects of the RCR course on student attitudes across the three different years revealed one statistically significant difference in student responses and one result that approached significance. Compared to the 2002 and 2001 groups, students in the 2003 post group were in stronger agreement with the statement, “If you were concerned that someone senior to you was conducting experiments on cats that might as easily be conducted with frogs, then you would be willing to raise this issue with them” (p = 0.02). The difference observed might be attributable to the more recent completion of the RCR course. However, it is also plausible that the observed difference is due to unrelated, but inherent, differences among classes (2003 vs. 2002 vs. 2001).

Agreement with the statement “Formal training in the responsible conduct of research should be required of all researchers,” was slightly greater, although not statistically significant (p = 0.068), in the 2003 Post students compared with the 2002 and 2001 students. Of note, the scores for this question were inversely related to the amount of time since completion of the course. This potential relationship may reflect an increasing emphasis on clinical rather than on research experience in the second and third years of medical education.

Questions about the number of conversations students in the course had had with other medical students, and with other researchers regarding RCR in the past 3 months, were designed to assess the effect of RCR courses on student behavior. The analysis revealed no significant difference in the number of conversations between the 2003 Pre RCR course students and the 2003 Post group. There was, however, a significant increase in time spent in discussions with other medical students (p = 0.007), but not other researchers, between the 2003 Post students and the 2002 and 2001 classes. This is most likely a reflection of the fact that the current students (2003) were working primarily in a research environment, while past students (2002 and 2001) were now working largely in clinical environments.

To measure the effectiveness of RCR courses on ethical reasoning, students were asked to explain why a briefly described course of action was or was not ethical. Responses were graded by how many ethical interests were identified. There was a borderline significant improvement in student scores between the 2003 Pre and Post groups (p = 0.06), but no difference across post groups for 2003, 2002, and 2001. These findings are consistent with a possible improvement due to the training course and with persistence of that skill over time. As for improvements in knowledge, it is plausible that ethical decision-making remains high in more senior students not because of a long-lasting effect of the training but because of other aspects of the medical education program. In any case, it should be noted that student performance in analyzing the case studies was disappointingly low in all groups.

A limitation in the analysis of educational outcomes is the variable attendance of the students in the RCR course we studied. As a result of the demanding summer schedule, there was a high risk that individual students from the study population would miss one or both of the general RCR sessions. In fact, five students in the 2003 Post group had not attended the first meeting of the course. Similar patterns of attendance had occurred in earlier years of the course. Unfortunately, records of attendance were not available to be matched against the anonymous survey responses to determine which surveys reflected student responses after attending neither, one, or both RCR sessions. It is notable that despite this source of increased variation improvements in scores were still measurable.

The present study is relevant to the design and evaluation of individual RCR courses. It is reasonable to assume that effective instruction depends on clearly defined teaching objectives, a curriculum designed to meet those objectives, and measurable outcomes to determine whether they are met. This study provides a framework for thinking about what should be done in such courses. Depending on specific goals, different course approaches, durations, and assessment tools may be appropriate. If the goal is effective training, then matching these methods to the teaching objectives of a course is a necessary first step.

The outcome categories outlined in this project were knowledge, skills, attitudes, and behavior. For a focus on knowledge, it is hoped that trainees will learn certain information. This includes, for example, federal regulations governing financial conflicts of interest, principles of the Belmont Report, guidelines for authorship, historical examples of misconduct, or institutional resources for further information. Such material is readily covered in lectures or even Web-based tutorials. And assessment of outcomes is easily accomplished through multiple choice or “fill in the blank” exams.

A second focus for RCR courses is skills, particularly the skills of moral reasoning and ethical decision-making. Skills are most likely to be acquired through practice rather than merely reading or listening to others. This practical aspect means that instruction must include opportunities to struggle with the ethical dimensions of the practice of research. This is typically accomplished by discussing cases or specific problematic situations. Such discussion depends on course formats (classroom or Web-based) that challenge students with open-ended dilemmas and the expectation that they will articulate their own perspectives as well as listen to views of others. Evaluation of this outcome is considerably more difficult than for the teaching of new knowledge. A nominal approach is to recognize that the process itself is an important advance. In that case, mere participation in the discussion is a sufficient endpoint. However, if the goal is to recognize the quality of a response, then a measurable outcome is needed. The Defining Issues Test [9, 13] is one such approach, but may not be practical or even appropriate for all instructors. An alternative, that might be more value neutral, is to seek ways to address the richness of a response (e.g., identification of ethical principles as a basis for action or inaction, or recognition and definition of the many interests that are at stake).

A focus on attitudes is very different than knowledge or skills. It is clearly possible through repetition or practice to learn new information or skills, but that does not guarantee a positive disposition. For RCR instruction, it is likely that what happens outside the classroom will have a greater impact on attitudes than anything the instructor might do. That said, any possible success in shifting attitudes will depend on the choice of material to teach knowledge and skills, the instructor’s passion and commitment, the place of the course in the curriculum, and the ability of the instructor to highlight the relationship of the course to other evidence of an institutional commitment to RCR. Measurement of changes in attitudes might be derived from both forced choice (e.g., “Using a scale of 1–5, how would you rank the importance of RCR education?”) or open-ended questions (“How, if at all, has your attitude toward RCR been changed by this course?”).

Finally, many instructors may have the pedagogical hope that their courses will influence future behavior of their trainees. Such outcomes might include an absence of certain behaviors, such as research misconduct or other misbehaviors as described by Raymond De Vries and colleagues [14]. Conversely, RCR instructors might hope that trainees will not only avoid bad behavior, but will be models of the highest standards of responsible conduct in research. Such long-term goals are important, and it is reasonable to hope that they are more likely to occur with effective training to promote knowledge, skills, and attitudes. Unfortunately, reliable measurement of such long-term outcomes would be impractical, if not impossible, for assessing the effectiveness of an individual RCR course. On the other hand, at least one behavioral outcome may change in the short-term, and is arguably a minimal expectation if other long-term behaviors are going to improve. Specifically, evidence of increased openness, discussion, and transparency are likely to occur if trainees seek out and initiate discussions about various dimensions of RCR. Such discussions normally occur rarely [5]. Pre- and post-course surveys could address perceptions of trainees about, for example, time spent on such discussions, topics covered, or the number and standing (e.g., faculty, students, or staff) of people involved in such discussions.

In conclusion, the findings of this study were that this short-term RCR course contributed to little or no improvements in knowledge, skills, attitudes, or behavior. This is not evidence for a lack of value of RCR training, but is consistent with the modest gains to be expected from very short-term training experiences. Nonetheless, these findings are encouraging for future assessments of changes that might be present with more substantial courses.