Introduction

Healthcare professionals have the ethical imperative to ensure patients’ safety while delivering optimal treatment. To develop the knowledge, skills, and attitudes of professionals, medical education involves the participation of live patients. These two situations seem difficult to reconcile, but could be reconciled by the use of simulation-based medical education (SBME) [1,2,3]. To learn and improve knowledge, clinical skills and attitudes [4,5,6,7], SBME provides healthcare professionals with a controlled practice environment such as computer-based virtual reality simulators, high-fidelity manikins, simple task trainers or actors posing as standardized patients [8]. Several trials report that SBME is effective in enhancing medical procedures, technical skills (i.e., central venous catheter placement, cardiopulmonary resuscitation), communication, teamwork, leadership and decision-making [2, 9,10,11,12,13,14,15,16,17,18,19,20]. In Emergency Medicine (EM), SBME has evolved from the sparse use of low-fidelity manikins a decade ago, to high-fidelity simulation being fully integrated in numerous residency-training programs worldwide [21, 22].

The number of trials assessing SBME interventions has rapidly expanded [23, 24]. To be useful to research users, as for any healthcare intervention, SBME must be assessed by well-designed trials, then fully and transparently reported. Many studies show that potential flaws in design, conduct and reporting of randomized controlled trials (RCTs) can bias their results [25,26,27,28,29,30]. Even if poor reporting does not mean poor methods [31], adequate reporting allows readers to assess the strength and weakness of studies and improve the replication of interventions in daily practice [32, 33].

To our knowledge, the methodological characteristics of SBME trials in the field of EM have not been assessed. In this study, we aim to (1) assess the proportion of simulation trials among RCTs evaluating interventions in the field of EM, (2) describe and evaluate their methodological characteristics, and (3) estimate whether the reports adequately describe the intervention to allow replication in practice.

Methods

We performed a review of reports of RCTs assessing an SBME intervention in the field of Emergency Medicine that were published over a 4-year period. We used the Cochrane Collaboration risk of bias tool [34] and the Medical Education Research Study Quality Instrument (MERQSI) [35] to assess the risk of bias and methodological quality of included RCTs. We used reporting guidelines to evaluate intervention descriptions. We report this review in accordance with the Preferred Reporting of Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) statement [36] (Online Appendix 1).

Search strategy

We searched MEDLINE via PubMed for all reports of RCTs published from January 1, 2012 to December 31, 2015 in the 6 general and internal medicine journals (New England Journal of Medicine, The Lancet, Journal of the American Medical Association, Annals of Internal Medicine, British Medical Journal and Archives of Internal Medicine) and the 10 EM journals (Resuscitation, Annals of Emergency Medicine, Emergencies, Injury, Scandinavian Journal of Trauma Resuscitation & Emergency Medicine, Academic Emergency Medicine, Prehospital Emergency Care, European Journal of Emergency Medicine, Emergency Medicine Australasia And World Journal Of Emergency Surgery) with the highest impact factor according to the 2014 Web of Knowledge (search date: February 17, 2016). We applied no limitations on language. We therefore, decided to use a period of 4 years and one database (i.e.; PubMed via Medline), with the hypothesis that this period and database would give us a large picture of simulation research in EM. The search strategy is reported in Online Appendix 2.

Study selection

We included all RCTs assessing SMBE interventions, regardless of type, which were performed in an emergency department or evaluated an emergency situation (e.g., cardiopulmonary situation). We excluded systematic reviews or meta-analyses, methodological publications, editorial style reviews, research letters, secondary analysis, abstracts and posters, correspondence, and protocols.

Two reviewers (AC and JT) independently examined all retained references based on the title and abstracts, then the full text of relevant studies, according to inclusion and exclusion criteria. Any disagreements were resolved by discussion with a third researcher (YY).

Data extraction

Two reviewers (AC and JT) extracted data in duplicate and independently using a standardized data extraction form. When assessments differed, the item was discussed until consensus was reached. If needed, a third reviewer (YY) was consulted.

General characteristic of RCTs assessing a SBME intervention

For each RCT, we assessed the following:

  1. 1.

    General characteristics: year of publication, name of journal, location of studies, publication time, number of centers involved, number of participants randomized and analyzed, type of participants involved (e.g., nurses, medical students), study design (parallel-arm or cross-over study), ethical committee approval, funding sources, reporting of registration number or study protocol available. We extracted primary outcomes as reported in the report. If the primary outcome was unclear, we used the outcome stated in the sample size calculation. We determined the EM topics related to (1) cardiopulmonary resuscitation (CPR), (2) airway management (without CPR situation), (3) triage, (4) surgical intervention (i.e., cricothyroidotomy) and (5) others.

  2. 2.

    Type of comparator (e.g., usual procedure or not) and the tested intervention. We defined the type of simulation as high or low-fidelity: high-fidelity manikins are “those that provide physical findings, display vital signs, physiologically respond to interventions (via computer interface) and allow for procedures to be performed on them (e.g., bag mask ventilation, intubation, intravenous insertion)”, and low-fidelity manikins are “static mannequins that are otherwise limited in these capabilities” [37]. Simulation studies with cadavers were considered low-fidelity simulation.

Methodological quality of RCTs assessing an SBME intervention

Risk of bias assessment

The risk of bias within each RCT was evaluated by assessing the following key domains of the Cochrane Collaboration risk of bias tool [34]: selection bias (methods for random sequence generation and allocation concealment), performance bias (blinding of participants and personnel), detection bias (blinding of outcome assessors), attrition bias (incomplete outcome data), and reporting bias (selective outcomes reporting). Each domain was rated as low, high, or unclear risk of bias by the Cochrane handbook recommendations [38].

Methodological quality assessment

The methodological quality was appraised using the Medical Education Research Study Quality Instrument (MERSQI). The MERSQI is a 10-item scale developed to measure the methodological quality of trials assessing educational interventions by evaluating six domains [35]. The 10-item scale (range 5–18 because our study involved only RCTs) covers the following domains: study design (1–3 points), number of institutions studied (0.5–1.5 points), response rate (0.5–1.5 points), type of data (1 or 3 points), internal structure (0 or 1 point), content (0 or 1 point), relationship to other variables (0 or 1 point), appropriateness of the analysis (0 or 1 point), complexity of the analysis (1–2 point) and outcomes (1–3 points). A high score indicates high quality. Despite no predefined cut-off for high and low quality, one study used a MERSQI score of ≥ 14.0 as a reference value for high quality.

Intervention description and replication

We assessed how key methodological components of the SBME interventions were reported according to a modified checklist based on the “Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide” [33]: what (materials) (i.e., describe any physical or informational materials used in the intervention), what (procedures) (i.e., describe each of the procedures), who provided (i.e., description of intervention provider), where [i.e., describe the type(s) of location(s)], when and how much (i.e., report the number of times the intervention was delivered), how well (planned) (i.e., the intervention adherence) and how well delivered (actual) (i.e., assessment of the intervention adherence). See the complete descriptions in Table 1.

Table 1 Description of key methodological components of simulation-based medical education

If authors correctly reported all key items, the intervention was considered reproducible. Items missing from the intervention description or not described in sufficient detail for replication were considered incomplete.

Data synthesis and analysis

Quantitative data with a normal distribution are reported with mean and standard deviation (SD). If the distribution was not normal, they are reported with median and interquartile range [IQR]. Qualitative data are reported with numbers (%). For risk of bias defined by the Cochrane Collaboration, we determined the frequency of presence of each bias item. We reported individual frequencies or scores for each item on the MERSQI. SAS 9.3 (SAS Institute, Inc., Cary, NC, USA) was used for all analyses.

Results

Search results

Our search identified 1394 reports of RCTs; 270 trials (19%) were in the field of EM and 68 (25%) assessed an SBME intervention (Fig. 1).

Fig. 1
figure 1

Flow chart of the study

General characteristics of RCTs assessing an SBME intervention

The 68 reports of RCTs were published in 9 different journals (Table 2). All included study reports were published in EM journals, with 29 (43%) published in Resuscitation. About half of the studies were performed in Europe (n = 32, 47%) and 59 (87%) were monocentric. Cross-over trials accounted for 41% of our sample (n = 28). Most of the included studies had ethics committee approval (n = 61, 90%). Seven (10%) were registered at a public registry, or had an available protocol. The median number of participants randomized and analyzed was 46 [IQR 30; 77] and 41 [27; 72]. The most frequently studied populations were medical students (n = 16 studies, 24%), laypersons (n = 13, 19%) and Emergency Medical Service (n = 10, 15%).

Table 2 General characteristics of the included studies (n = 68)

Cardiopulmonary resuscitation (CPR) (Online Appendix 3) Most of the RCTs (n = 55, 81%) studied CPR [39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93]. Two-thirds (n = 36) focused on basic life support [39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74], and others a specific aspect of CPR (i.e., ventilation, chest compression), or a specific population (i.e., children/neonates) [88,89,90,91,92,93]. All trials involved the use of simulators, but high-fidelity simulators were used in only seven studies (12%). In most cases, the comparator was the usual procedure (n = 44, 80%). CPR quality outcomes were reported for 41 trials (75%), but their definition varied across the trials (e.g., chest compression rate, mean chest compression, correct compression depth rate, correct recoil rate or correct hand position rate.

Airway management (Online Appendix 4) Ventilation or intubation was evaluated in five studies (7%) [94,95,96,97,98]. The aim of the trials was to test a new non-invasive mask (n = 1) [94], compare different airway management approaches (n = 1) [95], assess intubation for trauma patients (n = 2) [96, 97] or intubation in chemical, biological, radiation and nuclear situations (n = 1) [98]. All interventions involved low-fidelity simulators, and the comparator was the usual procedure. The outcome assessed in all five trials was the delay or success with ventilation (n = 5).

Triage (Online Appendix 5) Three RCTs (4%) evaluated triage in mass casualty incidents [99,100,101], using computer simulation or actors. Triage accuracy was the outcome used in each study (n = 3).

Emergency surgery (Online Appendix 6) The surgical airway was assessed in two RCTs (3%) on cadavers, and compared novel techniques to the usual procedure of cricothyroidotomy [102, 103].

Others (Online Appendix 7) The remaining RCTs (n = 3, 4%) evaluated a Glasgow scale scoring aid [104], the use of a novel medication delivery system during in situ simulation sessions [105], and teleconsultation for EM service teams [106].

Risk of bias assessment

Random sequence generation and allocation concealment were classified at unclear or high risk of bias for 34% (n = 23) and 49% (n = 33) of trials (Table 3). Improper reporting or absence of participant blinding led to classifying 81% (n = 55) of the trials at unclear or high risk of bias. Risk of bias due to assessor blinding was classified at high or unclear risk of bias for 33% (n = 22) of the RCTs, with risk of attrition bias classified at high or unclear risk of bias for 25% (n = 17). Because trials were mostly unregistered (90%, n = 61), the risk of selective outcome reporting was unclear in most reports (n = 63, 93%).

Table 3 Bias assessment (according to the Cochrane Collaboration risk of bias tool [34]) of included studies (n = 68)

Methodological quality assessment

Most studies were monocentric (n = 59; 86%) (Table 1). The response rate was ≥ 75% in 56 studies (82%) (Table 4). Assessment of an objective and measurable outcome (i.e., chest compression depth, success rate of intubation) was used in most included studies (n = 53; 78%). The internal structure and the content of the evaluation instrument was correctly reported for 67 (99%) and 64 (94%) studies. Authors report the relationship to other variables for only 41 (60%) studies. The analysis of data is appropriate in all studies. Knowledge or skills is the outcome used in almost all studies (n = 66; 97%). The mean (SD) MERSQI score was 13.4 (1.3)/18. The lowest score was 10 and the highest was 15.5.

Table 4 Ratings from measures of study educational quality

Intervention description and replication

Only three articles (4%) correctly report all the items of the modified TIDieR checklist for intervention descriptions (Table 5). The items most reported are the procedures (n = 61; 88%) and who provided the intervention (n = 59; 86%). Elements of materials used are reported for only 28 trials (41%). One-third (n = 26; 38%) report where the intervention occurred. The “when” and “how much” items are completely reported for 48 studies (70%). The adherence (planned and actual) to the intervention is correctly reported in only 10 (15%) and 5 (7%) articles.

Table 5 Reporting of key items for reproducibility of the intervention

Discussion

We aimed to conduct a methodological review of published RCTs assessing a simulation-based intervention in EM. Simulation represents up to 25% of our RCTs’. The most frequent topic is CPR. Only half of the studies have low risk of bias for allocation concealment. Only 10 are registered or have an available protocol. Despite these methodological difficulties, the methodological quality is high, with a mean (SD) MERQSI score 13.4 (1.3)/18. The description of the intervention is correctly reported for only 4% of studies.

One of the results of our study is the importance of simulation in EM research. Almost 15 years ago, a report by the Commonwealth Fund Task Force emphasized that the quality of care that patients receive could be to some extent determined by the quality of medical education students and residents receive [107, 108]. However, as stressed by Stephenson et al. [109] “Medical education is not short of excellent ideas about how to improve courses and create the professionals needed by society. What is in much shorter supply is evidence about the effectiveness of such teaching (…)”. Another finding of our study is that, within our sample, more than one quarter of RCTs published in EM appear to be assessing educational interventions rather than patient-centered. To our knowledge, no such assessment has been performed in another medical specialty. Moreover, our analysis shows that the majority of the educational content in EM is CPR-based. This may indirectly suggest that simulation-based education is not used to its full potential to allow assessment of other interventions such as decision-making, communication and team work skills.

Several other studies evaluate the internal validity of published articles in certain specific simulation situations [110,111,112]. These studies indicate that simulation research is poorly reported. Systematic reviews of SBME also quantitatively document missing elements in abstracts, study design, definitions of variables, and study limitations [113,114,115]. The MERSQI score for our studies indicate that the educational quality of the studies appears fair. Bias related to the lack of blinding remains problematic when designing RCTs. Concealing study procedures from study participants is difficult, and probably impossible for some SBME interventions. However, alternate methods exist, such as having participants blinded to the hypothesis. For example, Philippon et al. assess whether the death of the manikin increases anxiety among learners as compared with a similar simulation-based course in which the manikin stays alive [116]. Participants were blinded to the study’s objectives, and were advised that they were participating in a study designed to assess emotions while managing life-threatening situations. Additionally, when blinding trial participants is not possible, outcome assessors can often still be blinded to limit the risk of bias due to open RCTs. As developed by Kahan et al. [117], when blinding outcome assessment is not possible, strategies exist for reducing the possibility of bias (i.e., modify the outcome definition or method of assessment).

With only 4% of the interventions being fully reported, dissemination of the studies’ findings is placed at risk. The lack of part of the information on the interventions may affect the ability of other researchers and educators to replicate them. Clear and precise recommendations on how SBME interventions should be reported should help improve trial transparency and enable the dissemination of efficient interventions and discard ineffective ones. A steering committee of 12 experts in simulation-based education and research recently developed specific reporting guidelines for simulation-based research [23]. These guidelines are an extension to the Consolidated Standards of Reporting Trials (CONSORT) [118]. The application and impact of these guidelines on quality of reporting will be of interest. However, the experts focused on the reporting of only key items of the CONSORT and not on the description of the intervention. Without a complete published description of the intervention, other researchers cannot replicate or build on research findings. The objective of the TIDieR is to improve the reporting of interventions, make it easier for readers to use the information, reduce wasteful research and increase the potential impact of research on health. Many editors endorse the CONSORT statement to improve the reporting of RCTs. However, the completeness of the reporting is only one aspect of the quality of the methodology. To avoid waste of research with lack of information on methods, authors, editors, and peer-reviewers must pay better attention to the reporting of keys elements of the reproducibility [119]. Of note, biased and misreported studies contribute to an important waste in medical research, estimated at up to 85% of research investments each year [120,121,122,123].

Strength and limitations

Our study has several limitations. First, we searched only one database (MEDLINE via PubMed) without searching ERIC or EMBASE. However, our search was exhaustive and performed according to the Cochrane standards. Second, for the assessment of the methodological quality, the authors might have omitted key information from reports that was deleted during the publication process, and we were able to assess published reports only. Our convenient sample of journals might also have overestimated the overall quality because we arbitrarily selected journals with the highest methodological quality for selecting articles.

Conclusions

Trials assessing simulation account for one quarter of published RCTs’ sample. Their quality remains unclear, and requires great caution in drawing conclusions from their results. In our sample, authors failed to correctly describe the blinding process, allocation concealment and key elements essential to insure the reproducibility of the intervention. Guidelines for improving the reproducibility of Simulation-Based Medical Education research are needed to help improve the interventions replication in daily practice.