Almost half a century ago, the first departments of medical education were established in the United States. Medical education emerged as a field of research and as a domain deserving systematic study. If the numbers of departments, professorial positions, journals, or conferences are to be used as indicators of its maturity, the field is thriving. For instance, in the early eighties the field was covered by two journals. Now there are no less than sixteen journals dealing with health professions education. Another example of the frugality of the field: the best six journals in medical education have contributed more than 10,000 research articles to the literature between 1988 and 2010, the best six in the broader domain of educational research, including educational psychology, about 4,000.

But what about the quality of the science of medical education? Recently, a number of authors have criticized the state of the art in medical education research from the perspective of the research methodologies our field preferably employs (Albanese 2009; Davis and Ponnamperuma 2006; Eva 2009). The suggestion is that we should be more rigorous than we are. Use less self-report instruments and more measures of actual performance. Experiment rather than observe. Rather than assessing the quality of our science through concentrating on its methods, the present contribution takes as its point of departure the contents of medical education research. What are the topics, the themes that have dominated the field the last 20 years? And what do they say about the researchers’ priorities?

There are a number of reasons to study the central themes of medical education. The first is sheer curiosity. It is simply interesting to see what has kept the field preoccupied all those years. What were the most popular topics, the most urgent issues? Which institutions, which scientists addressed these themes most often? Science is not an anonymous affair. It involves people and research departments that can be more or less successful in contributing to the research effort. The second reason is that we can ask ourselves whether these themes really represented priorities based on the informational needs of the domain. Have we been involved in the “right stuff” from the perspective of the wants of the field? Third, would such analysis allow for the formulation of research priorities for the future? Are themes missing or insufficiently covered? Fourth, taking a historical perspective, which themes have emerged throughout the last 20 years and which have disappeared? Have some themes shown consistent interest? Are there emerging themes? Are some areas of research effectively “dead”? What does this say about the state of the field? And fifth, do journals published on different continents emphasize different issues? Is there a distinctively “European” or a “North-American” medical education?

To answer these questions, we have analyzed abstracts of the more than ten thousand research articles that have been published by the six highest-impact medical education journals since 1988 (the scientific record becomes less accessible before that date). By means of text analysis software we have attempted to extract the issues that have kept medical educators busy most frequently during this period. In addition, we have examined individual areas for change over time with emphasis on possible differences between journals from different continents. Finally, we have looked at the articles that have had the most impact on our community, as witnessed by the citations they received, the most productive and influential researchers, and the most productive institutions.

Method

Materials

The scientific database ISI Web of KnowledgeSM from Thomson Reuters was used to extract data for this study (http://apps.isiknowledge.com). To search the database various search fields can be selected, such as topic, title, author, editor, and publication name. We used the names of the six top ranking medical education journals as search criterion, viz. (1) Academic Medicine, (2) Advances in Health Sciences Education, (3) Medical Education, (4) Medical Teacher, (5) Advances in Physiology Education, and (6) Teaching and Learning in Medicine. These six journals are the journals with the highest impact factor in the field. In total, 10,168 journal articles published in these outlets between 1988 and 2010 were found. We only selected original research articles as search criterion in ISI Web of KnowledgeSM.

Procedure

All titles and abstracts of the medical education journals were extracted and analyzed by means of content analysis as described below. The results were then exported into SPSS and subjected to further statistical analysis. Subsequently, journals that can be considered representative for either Europe or North America medical education were identified, based on the countries of origin of the majority of contributions to these journals. For Europe we selected the journals Medical Teacher and Medical Education (total number of abstracts reflected in our data: 4,416), whereas for North America we chose Academic Medicine (total number of abstracts reflected in our data: 3,653). We excluded other journals to be able to compare sets of abstracts of approximately equal size. Both sets of abstracts covered about 80% of all published abstracts in our data. In this set, we investigated how research priorities developed between 1990 and 2010 using 5-year intervals and whether differences between the two sets of abstracts emerged.

In addition, the analysis function in ISI Web of KnowledgeSM was used to generate four additional tables depicting (1) a ranking of the 10 institutions worldwide with the highest number of published articles in medical education since 1988, (2) a ranking of the 10 most cited articles in medical education, (3) a ranking of the 10 most productive researchers who contributed to the field of medical education, and (4) a ranking of the ten most cited authors.

Analysis

The 10,168 titles and abstracts were analyzed using linguistic text analytical technologies applied by the SPSS Text Analysis for Surveys™ 2.1 software (SPSS 2006). The software uses advanced linguistic technologies that extract and classify key concepts from the responses. Using these technologies, the content is analyzed as a set of phrases and sentences whose grammatical structure provides a context for the meaning of a response. The software enables the coding and categorization of responses in a fraction of the time required when doing it manually. More important, the categorization of responses is done consistently and reliably. Unlike human coders, the software classifies the same concepts in the same categories every time. The first step of the content analysis is to extract key terms from the responses. The software uses linguistic algorithms to identify relevant concepts based on libraries that contain pre-coded definitions. This means that the extraction does not treat a response as a set of unrelated words, but it identifies key words, compound words, and patterns in the text. As a next step in the content analysis, the extracted terms are grouped by the software into categories that form the basis for further analyses. A category refers to a group of closely related concepts, objects, or opinions. The software applies three linguistic techniques that take the meanings of the extracted terms and their inter-relationships into account: (1) term derivation, (2) term inclusion, and (3) semantic networks. Since these techniques complement each other, all three are used for the categorization of the extracted terms. The term derivation technique forms categories by analyzing if any of the terms are morphologically related, for instance, the term “clinical decision-making” and “making clinical decisions” would be grouped in the same category. The term inclusion technique uses algorithms to create categories by taking a term and finding other terms that can be included. For instance, the terms “student assessment”, “approach to assessment”, “method of assessment” “written assessment” would be grouped under the root term “assessment”. As such, “assessment” forms a category, which includes the root term “assessment” and all word combinations before it, after it, or both. The semantic networks technique forms categories using a semantic/lexical network, which is based on WordNet® (Miller 1995). WordNet® is a large lexical database of the English language developed by Princeton University. Nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a concept. The created synsets form the basis of a category.

A total of 1,088 categories were extracted from the text. The majority of categories represented single term entries or irrelevant concepts, which were manually removed from the analysis. For instance, qualifiers like “no”, “not”, “good” or “better” were removed. In addition, similar concepts were manually grouped under a more generic super-ordinate category name. For instance, the concepts of “examination”, “multiple-choice questions”, “use of portfolio”, and “method of student assessment” were grouped under the category name “methods of assessment”. After removing and restructuring, 29 meaningful concept categories remained. For each concept category a frequency count was generated, which represents the number of articles that mentioned the concepts under that category. This means that even if a concept appeared more than once in the title and/or abstract, the frequency count would still reflect the number of articles that mentioned the concept, rather than the number of times a concept was mentioned overall in the text. The formed categories were then ordered according to the highest to lowest frequency of the most occurring concepts. The highest frequency count would thus represent the most significant concepts in the medical education literature.

To examine potential differences over time, we conducted for each of the categories formed a 4 × 2 ANOVA to assess changes over the four time intervals and between the two sets of journals.

Results

The first four columns of Table 1 summarize the results of the content analysis. Columns five and six summarize changes in frequency of occurrence of these themes over time and differences between the selected journals. If no comment is inserted, no systematic changes over time were observed. Figure 1 is an illustration of how one of the themes, that of the clinical clerkships, changed in frequency of occurrence over time and how it was differently represented in the different journals.

Table 1 Results of the content analysis of the abstracts of the six highest impact journals in medical education 1988–2010
Fig. 1
figure 1

Illustration of how the Clinical Clerkship theme changes in frequency of occurrence over time and how it is differently represented in the different journals. The graphs indicate mean proportion of occurrence of the theme for five-year time intervals between 1990 and 2009

Table 2 displays a listing of the names of the ten universities producing the highest number of publications in medical education between 1988 and 2010 as represented in the six focal journals. The third column displays the total number of articles coming from each of these institutions. Differences are fairly large among these first ten (Texas produced 6.81 articles per year in the particular period; Maastricht 14.04) but may be influenced by the size of the academic staff involved in medical education research. In addition, the reader should be aware that the data does not show whether researchers from other institutions contributed as co-authors; only the first author’s institutional affiliation was counted.

Table 2 The ten universities with the highest number of publications in medical education 1988–2010

Table 3 contains the names of the ten researchers who have contributed the largest number of articles to the field of medical education. Here again the differences are more sizable, the most productive investigator, Cees van der Vleuten from Maastricht University, published almost five times as many papers as the number ten on the list. The reader should bear in mind however, that no distinction was made between first, last, or co-authorship. Researchers with many PhD and Master students tend to produce more work as co-authors than researchers without such students. In addition, several authors on the list tend to publish extensively in journals other than the six journals included.

Table 3 The ten authors who contributed most articles to the field of medical education 1988–2010

Table 4 shows the top-ten of most cited articles in the field. Since the citations were counted only in social science citation index journals in the Thomson Reuters database, they do not necessarily represent the full extent to which each of these articles are cited in the literature. For instance, the Albanese and Mitchell paper was cited 1,647 according to the Google Scholar database, whereas the Hafferty paper was cited 404 times. So Google Scholar seems to find on average 2.4 times more citations than the (more restricted) Thomson Reuters database.

Table 4 The ten most cited articles in the field of medical education 1988–2010

Table 5 contains a listing of the ten researchers whose work was cited most in the literature. Again, since the data are from the ISI Web of KnowledgeSM database of Thomson Reuters, their real impact may be underestimated. For instance, Geoffrey Norman, is, according to Scholarometer (http://scholarometer.indiana.edu) cited almost 19,000 times because of citations to two of his statistics books not appearing in ISI Web of KnowledgeSM. It is interesting to note that Tables 3 and 5 only partially overlap: the most productive researchers are not necessary the researchers with the most impact on the work of others.

Table 5 The ten researchers most cited in the field of medical education 1988–2010

Discussion

The purpose of the present study was to provide an insight into the most common themes of research into medical education. In addition, the institutions and researchers most often involved in medical education research were to be identified. To that end a content analysis was carried out on 10,168 abstracts extracted from the six most influential journals in medical education published since 1988. In addition, the Thomson Reuters Web-of-Science database was searched for names of contributing authors and institutions. Finally, impact of these authors on the work of other researchers, as expressed by the citations that these authors’ work receives, was studied. There results give rise to the following observations.

First, twenty-nine major themes of medical education research were identified, of which student assessment, clinical and communication skills, clinical clerkships, and problem-based learning (PBL) were the most prominent. Assessment of students in particular seems to be an overriding concern. Twenty-six percent of the articles counted deals with this issue in one way or another. Reliability and validity of the measures employed are recurrent concerns. It is interesting to note that written and oral examinations as object of study do not attract much attention. It is possible that these have been studied extensively in bygone eras and have not any longer been considered problematic in the last 20 years. The same may apply to the use multiple-choice tests, in the sixties and seventies a core issue in medical education. Interestingly, new methods such as self-assessment, expert judgment of performance, and portfolio now emerge as approaches deserving notice. Within the broader assessment category most articles have been devoted to performance assessment, as witnessed by the frequencies by which concepts such as clinical skills assessment and, broader, clinical competence assessment, and objective structured clinical examinations occur in the literature.

With regard to teaching methods, focus is again on the clinical phase of medical education: the teaching of clinical and communication skills (17%) and the clinical clerkships (13%). A good third is here PBL (7%). Compare this with the relative lack of emphasis on lectures and computer-assisted instruction. These topics seem to have had their time. The fairly recent interest in topics such as professionalism in medicine, patient safety, scholarship in education, and the possible role of the humanities in medical education is already noticeable in the literature.

Second, US schools produce most of the research in medical education. This was to be expected because the number of medical schools in the US is sizable and medical education research developed in this country first. In addition, US medical schools tend to be research intensive and this attitude tends to spill over to the departments of medical education. It is, therefore, surprising to note that two Canadian and two European universities appear in the top-ten of most versatile institutions. The department of medical education of Maastricht University, The Netherlands, is in fact the most productive of all. This state of affairs is reinforced by the findings summarized in Table 3. Here eight out of eleven researchers are either European or Canadian. Clearly, medical education research is a more international endeavor than many other medical domains.

Third, PBL represents a considerably smaller domain of attention than assessment. However, its core articles are among the most cited in the field, as displayed in Table 4. Among the top-ten of most cited articles four are about PBL and among the top-twenty, there are another three. It is perhaps witness of the amount of enthusiasm and controversy that this approach to medical education surrounds. The most cited papers are all reviews of the literature rather than research articles. The field seems to quite heavily rely on such reviews because among the top 25 most cited articles only three present original research. The first appears on position twelve, involving a comparison between psychometric properties of checklists and global ratings in OSCE-format examinations (Regehr et al. 1998).

Fourth, the medical education literature shows an overwhelming emphasis on the direct preparation of students for professional practice. Almost 60% of the articles assessed deal with the issue of the student as professional in a direct or indirect way. There is nothing wrong with this. Training to become a doctor is high-stakes education: if students are poorly prepared for professional practice, patients may suffer or even die. Therefore, issues of optimizing clinical competence, both in terms of diagnosis and management and in terms of the clinical and interpersonal skills, are vital to those involved in such education. However, it is somewhat disappointing to note that preclinical education is studied considerably less extensively although in most countries it constitutes the larger part of medical training. Issues such as: Do the basic sciences need to precede the clinical sciences or is integration possible from day one? Should teaching in medicine be multidisciplinary or is a disciplinary approach essential for deep understanding of health and disease? How much guidance through lecturing do students need in order to learn effectively? To what extent does deep understanding of the basic sciences improve diagnostic reasoning? How can practicals in anatomy, physiology or biochemistry be optimized for learning? What is the nature of the knowledge used in diagnostic expertise? are largely unresolved and deserve resolution to improve medical education.

A fifth observation is that medical education research seems almost exclusively geared toward the individual student and his learning. In fact these two concepts were among those with the highest frequency in the abstracts. Motivation, learning styles, academic achievement, reasoning skills, and validity of differences between students on tests are important concepts in this respect. Medical education research seems thoroughly “psychologized.” This may come as no big surprise if one considers the scientists who have dominated the field for the past 20 years. Almost half of the researchers listed in Tables 3 and 5 are psychologists, and psychologists tend to see the world through an individualizing looking glass. These investigators have brought good ideas and much rigor to the field, but may have also narrowed its view on reality. It is not difficult to formulate questions provoked by other perspectives that seem quite relevant to medical education. For instance: (1) A systems view. We know that medical students on average need more time to graduate than is nominally available (Schmidt et al. 2010). In particular in Europe, students tending to postpone studying is a big problem. We also know that examinations drive learning (Van der Vleuten 2000). Why is nobody systematically experimenting with examination systems to optimize study duration? Organizing the curriculum in a series of sequential modules seems to improve learning considerably as compared with teaching subjects in parallel (Jansen 2004). Why is this so? (2) A sociological view. Those admitted to medical school tend to come from higher middle class families. One may assume that these students as graduates have only limited experience with the kinds of life challenges of large segments of the population. Is this so? And does this limited experience affect their ability to serve those segments? (3) An economic perspective. Is it possible to train more doctors in a shorter time? We know that there have been attempts to shorten medical education in several countries. What are the results of these attempts? Is the training of highly qualified professionals who are in training in some cases for 12 years really what is needed? Is it possible to involve lesser-trained health professionals in medical practice? What would be the consequences for medical training? (4) An ecological perspective. To what extent do we need tertiary care hospitals for the initial training of doctors? It was already noticed in 1961 that these hospitals treat only the most ill patients comprising less than 1% of the variation in disease patterns in the population (White et al. 1961). In addition, teaching hospitals vary considerably in their ability to provide student during clerkships with an all-round experience (Wimmers et al. 2006). Would it therefore not be more effective to conduct clerkships largely in primary-care contexts? It seems important to initiate more, and more conclusive, research dealing with these issues important to the quality of medical education and medical practice.

A sixth observation is that most research conducted in medical education seems to be effectiveness-driven rather than discovery-driven, that is: it studies the relative effectiveness of existing approaches rather than to discover new ones. One could argue that, since the sixties five innovations have emerged from medical education research: (1) the content specificity phenomenon, and the closely related finding that clinical reasoning is to a large extent knowledge based; (2) problem-based learning, (3) a systematic approach to the training of professional skills, (4) the OSCE, and (5) the finding that global ratings of performance are more valid than detailed checklists. This seems to be a fairly limited harvest given the amount of energy and manpower that has gone into producing the 10,000 or more articles. In addition, these innovations have been fallen prey to the same effectiveness virus as most other topics. A case in point is problem-based learning. Most of the research effort in this domain went into studying the relative effectiveness of PBL compared to various kinds of conventional education (Albanese and Mitchell 1993; Colliver 2000; Dochy et al. 2003; Gijbels et al. 2005; Khoo 2003; Vernon and Blake 1993). On the other hand, very limited research has been dedicated to answering questions such as: What works in PBL? Why does it work? How can it be improved? These discovery-oriented questions remain largely unanswered. It seems that too much research is aimed at justification of ideas and too little aimed at clarification.

Seventh, examining the changes in various areas over time yields a somewhat different perspective on the field. Clearly some areas, such as clinical reasoning, continue to attract interest. Others such as computer-assisted instruction, appear to have “had their day”. Observation of these trends is easy; interpretation is more difficult. From the outset, it is not clear what should be viewed as desirable or undesirable. If, as described above, most research is effectiveness-driven, then one might well view a pattern where a field emerges, shows a focus of activity, then disappears, as desirable. The question has been raised, then answered. The systematic reviews are the final product from such a perspective. On the other hand, from a philosophy of science perspective, a field is progressive to the extent that new questions are constantly arising. Kuhn’s “scientific revolutions” (Kuhn 1962) arise from a change from dying, regressive research paradigms that have run out of questions to new paradigms that create new and interesting questions. Lakatos directly describes programs as “regressive” or “progressive” to the extent that they can continue to create new and more refined questions (Lakatos and Musgrave 1974). Given the perspective on many educational leaders that the field is underdeveloped in theory (Albert et al. 2006), one could presume that a domain such as clinical reasoning, which has shown consistent and monotonic growth over two decades, is a prototype of a progressive, research domain.

Eighth, in terms of emphasis some differences between “North-American” and “European” medical education emerge. These differences seem to be largely due to politico-cultural distinctions between the new and the old world. The position of minority students in higher education is such an issue. While US higher education wrestles with how to make access fair in terms of ethnicity, this issue has not yet reached Europe, but it may, as Europe becomes more diverse. The same applies for cultural competence. Differential emphasis on woman’s health seems to suggest that this movement has been more successful abroad than on the mother continent.

Finally, although this review began with a deliberate focus on the content of medical education, not its methods, the integration of these results with an examination of time trends in research approaches may well be informative. As indicated earlier, the dominance of effectiveness-driven inquiry may well be reflected in the dominance of psychological and psychometric methods in this review (although it is clearly not the case that psychology research is necessarily theory-free). But this may well be changing, as themes suggested by disciplines other than psychology, more emphasis on theory testing, and more diverse research methods assume greater prominence in the field.