Introduction

Small businesses are important to the economy because they comprise most firms in developed countries [1]. For example, in Canada a total of 98% of all companies have less than 100 employees. One hundred or fewer employees will be the definition of small business used in this review. Small businesses employ approximately 48% of the total Canadian labor force in the private sector and these employees may be at an increased risk of injury. Fatal injuries are consistently found to be elevated in small firms [2, 3], with some estimates showing the fatality rate to be four to 10 times higher than medium to large size firms [4]. For non-fatal injuries, the picture is more complex. Lost-time claim rates often show an inverse relation to firm size. However, some studies find that injuries that do not require time off work are not as frequent in small firms compared to large firms [5, 6].

Unique occupational health and safety (OHS) challenges are thought to arise in small businesses. For example, fewer resources may be devoted to safety compared to large firms, as small businesses need to cope with many business constraints to survive [4]. The economic precariousness of small firms may produce a climate where safety is less salient than firm viability and production efficiency [7]. Small businesses’ short life cycles means that many of them are new and are not familiar with relevant safety regulations and practices [7]. It has also been noted that small firms are more likely to employ “vulnerable” workers such as young and/or low education workers compared to large firms [8]. Consequently, OHS interventions developed for larger firms appear less effective for small firms [9].

Little is known about the most effective OHS interventions for small businesses despite the cost and range of initiatives implemented in the workplace. A comprehensive search found no review that had systematically examined the effectiveness of interventions for reducing work injuries in small businesses. The purpose of this systematic review was to examine the existing quantitative literature on OHS interventions in small businesses. The research question addressed was:

“Do OHS interventions in small businesses have an effect on OHS outcomes?”

Methods

This review was part of a larger review that examined both the quantitative and qualitative literature on small firm OHS. According to the Cochrane Collaboration, a systematic review examines a clearly formulated question that uses systematic and explicit methods to identify, select, and critically appraise relevant research, and to collect and analyze data from the studies included in the review [10]. The methods we outline below include many of the guiding principles of the Cochrane Collaboration system [10]. Accordingly, we describe below replicable, scientific, and transparent procedures to minimize bias at each stage of the review. This process included a predefined set of search terms previously used to search for relevant literature, predefined inclusion/exclusion criteria, predefined quality appraisal criteria, predefined extraction strategy, evidence synthesis guidelines used in other reviews, detailed guides for reviewers at each stage providing operational definitions of terms, and use of a consensus process to develop appropriate tools and resolve any discrepancies between review team members.

A team of 15 researchers participated in the literature search phase of the systematic review. Some reviewers were identified based on their expertise in conducting epidemiologic or intervention studies. Others were recruited for their expertise in conducting quantitative research and meta-synthesis. Finally, some were recruited for their experience in conducting systematic reviews. Review team members had backgrounds in industrial hygiene, biostatistics, clinical psychology, sociology, epidemiology and biomechanics.

The steps of the systematic review process are listed below. The review team used a consensus process for each step of the review:

  • Formulate the research question and search terms

  • Hold a stakeholder workshop for feedback on the research question

  • Identify articles expected to be found in the literature search from all review team members

  • Contact international content experts to identify key articles

  • Formulate literature search strategy

  • Conduct literature search and pool articles with those submitted by experts

  • Review titles and abstracts: select studies for relevance based on predefined inclusion/exclusion screening criteria

  • Review full articles: select studies for relevance based on predefined inclusion/exclusion screening criteria

  • Divide full team into two subgroups: qualitative and quantitative

  • Conduct sub-team quality assessment and partial data extraction: assess quality of relevant quantitative articles with scoring on predefined methodological criteria

  • Conduct sub-team data extraction: extract data from all relevant articles to compile data for tables for synthesis

  • Conduct partial data extraction on all quantitative evidence meeting relevance requirements

  • Synthesize quantitative evidence

Definition of Terms

Several terms from the overall review question, as well as from the sub-questions, were defined and used to develop the literature search criteria, with a view to being as inclusive as possible.

Workplace

Workplaces were limited to those locations that employed teenagers (15 years or older) and/or adults. Military installations were excluded since findings would be difficult to generalize to other workplaces. Laboratory studies were also excluded.

Small Business

Small businesses were defined as those with 100 or fewer employees. The definition was based on consultation with stakeholders, the definitions reported by Industry Canada and the Ontario Workplace Safety and Insurance Board, and observations from an initial “scoping” review of the small business literature showing that the definition of “small” generally varied from three to 250.

Intervention

A planned systematic program or strategy aimed at reducing occupational health problems, including programs focusing on education to workplace staff and/or programs focusing on general organizational factors.

Outcomes of Interest

The outcomes of interest fell into four general categories:

Attitudes and Beliefs

Attitudes and beliefs refer to cognitive or psychological variables hypothesized in several theoretical models of preventive behavior to influence the likelihood of action. The theoretical models included Azjen’s Theory of Planned Behaviour and Bandura’s Social Cognitive Theory. These theories include constructs such as perceived importance, confidence in engaging in safety practices, and perceived barriers. Another theoretical model, evident in at least one study [11] was Karasek’s Demand/Control model, with measures such as decision authority and social support.

Behaviors

These outcomes refer to specific actions related to safety. This domain included compliance with personal protective equipment such as the use of hard hats, gloves, safety glasses, safety (steel-toed) shoes, as well as hearing protection, fall protection, respiratory protection, housekeeping (cleaning workspace), safety behaviors, safety inspections, guards, fire protection and fire safety.

Health

The focus was on unintentional non-fatal and fatal injuries. For example, acute/traumatic injuries (e.g. cuts, burns, fractures) and musculoskeletal injuries (e.g. low-back pain) were included. Occupational health outcomes also within our scope included occupational illnesses (e.g. allergies, respiratory symptoms, skin disorders), and indirect markers of health such as work absence.

Workplace Exposures

These measures refer to exposure to potentially harmful chemical, physical or biological agents in the work environment. For example, this may include exposures to wood dust.

Literature Searches

Nine databases of peer-reviewed scientific literature were searched from their inception to February 2008: MEDLINE, EMBASE, CINAHL, PsychINFO, Sociological Abstracts, ASSIA (Applied Social Sciences Index and Abstracts), ABI (American Business Index) Inform, EconLit and Business Source Premier. Articles not already captured by the search strategy, such as those identified by content experts and from reference lists, were retrieved for review until July 2008. Articles not in English, Spanish, Italian, French, Portuguese or German were excluded at the title and abstract screening. Language proficiency of team members was the reason for these exclusions. Search terms are listed in the related article by MacEachen et al. in this edition of the journal. The process of devising the search strategy consisted of reviewing terms used in articles known to be relevant to the review, consulting with practitioners, researchers and policy makers in the OHS community, and revision of the terms by librarians familiar with the search term systems of each database.

Relevance Appraisal

This review of OHS interventions in small businesses was part of a larger review of both the quantitative and qualitative literatures on small businesses.

The titles and abstracts of each article were screened by team members for inclusion. In cases where there was insufficient information from this screening, full text articles were retrieved. Articles were deemed relevant if they met the following inclusion criteria:

  • Reported in a peer-reviewed publication

  • Analytic focus of study was on small businesses

  • Study included at least one OHS outcome

  • Article included original quantitative empirical data

  • Study evaluated an intervention. The review guide defined an intervention as a planned systematic program or strategy or unplanned case study aimed at reducing occupational health problems (as defined above), including programs focusing on education to workplace staff and/or programs focusing on organizational factors (e.g. safety climate, safety management systems, employer/supervisor behavior). Health promotion and stress management interventions were only included if an OHS outcome measure outlined above (i.e. attitude/beliefs, behavior, health, workplace exposure) was included.

As each title and abstract review was conducted by only one reviewer, a quality control check was conducted by a second reviewer to check for the possibility of selection bias. Five per cent of the studies were randomly reviewed for quality control. A 70% agreement between the reviewers and the auditor was set as a target and it was reached. Each article that passed initial title and abstract screening criteria was then reviewed in full by two reviewers. A consensus method was used to resolve any disagreements between the two reviewers regarding study inclusion. A third reviewer made the final decision when agreement could not be reached. Relevant studies proceeded to quality assessment.

Quality Assessment

The goal was to identify studies of sufficient quality to be included in the evidence synthesis. Each article was evaluated by two reviewers. For the quality appraisal and evidence synthesis phases, there was a quantitative team of four researchers who had published in the OHS field and two research assistants. Papers with engineering interventions all included an industrial hygienist as a reviewer.

An initial set of quality criteria for intervention studies were developed by the review team by culling criteria from previous systematic reviews (e.g. [12]) and a guide on OHS intervention research issues and design [13]. These criteria were individually rated by each member of the team with regard to importance for internal validity, and any discrepancies were discussed in group meetings.

Quality was assessed using 22 criteria on the design and objectives, recruitment level, intervention characteristics, intervention intensity, risk factors, confounders and analysis (Table 1). One item (#18) was specific to industrial hygiene studies that included exposure sampling. Each question was given a weight on a three-point scale ranging from 1 = “important” to 3 = “very important.”

Table 1 Quality assessment questions and weight (quantitative studies)

Sample size was not included as an explicit aspect of quality assessment because there were no clear criteria that would provide an empirically-supported threshold for what constituted a “sufficient” sample. This issue is especially complex because the number of firms (i.e. having a sufficient sample size at the firm-level) is as, or more important, as the total number of employees across all firms (i.e. sample size at the individual level) [14].

The articles were rated based on methodological criteria developed by group consensus and piloted for consistency of rating scores. Quality rating categories of low, medium and high were established to identify which articles continued to data extraction.

A quality ranking was estimated as the sum of all relevant criteria weights divided by the highest possible weighted score. Quality categories were:

  • high quality = 80–100% of the weighted criteria were met

  • medium quality = 50–79% of the weighted criteria were met

  • low quality = 0–49% of the weighted criteria were met

Studies of medium quality or higher proceeded to data extraction and evidence synthesis.

Data Extraction

The purpose of data extraction was to obtain information relevant to the research question to begin aggregating and synthesizing the collective evidence. Two reviewers independently extracted data from included studies and met to reach consensus. Data were extracted on: year of study, jurisdiction, study design and sample characteristics, intervention characteristics (e.g. target of the intervention or interventions employed in the study such as hazard elimination), outcome domains measured, statistical analyses, covariates/confounders and study findings. When the study had multiple follow-up points, our comparison was always baseline versus the last follow-up point. During the data extraction process, reviewers on the systematic review team reconsidered the methodological quality rating scores for each study. Any quality rating changes at the data extraction stage were made with consensus from the entire review team.

Evidence Synthesis

Due to inconsistent reporting of the variance estimates of the outcome measures, populations and outcomes among the studies reviewed, the synthesis approach adapted from Slavin and others [12, 15] known as “best evidence synthesis” was used. The best evidence synthesis approach considers the quality and quantity of the articles and the consistency of the findings among the articles (Table 2). “Quality” refers to the methodological strength of the studies, as discussed above. “Quantity” refers to the number of studies that provided evidence on the same intervention. “Consistency” refers to the similarity of results observed across the studies on the same intervention. When studies reported that an intervention had a significant positive impact on even one outcome within a domain (e.g. three measures in the attitude/belief domain), they were classified as a positive effect.

Table 2 Best evidence synthesis guidelines

Since effect sizes could not be consistently calculated, the effects are presented as they were reported in the studies. A pre-specified algorithm was used to examine the overall level of evidence for intervention effects across the five studies (Table 2). In determining the level of evidence, when studies conducted analyses on multiple outcomes within a category, the study was classified as an outcome effect if one of the measures showed a significant between-group difference. For example, if the effect of an intervention was assessed using three attitude/belief measures, and analyses showed that at least one of the measures showed a significant, positive effect, then the study was considered to have shown a positive outcome.

The predefined evidence synthesis guidelines outlined in Table 2 were used to summarize the data in two different ways. First, studies were grouped by intervention types (e.g. training), and then the guidelines were used to determine the level of evidence for each type. Second, studies were grouped by type of OHS outcome (e.g. health outcome), with the guidelines being used to determine the level of evidence for each outcome type.

Results

Literature Search and Relevance Selection

A total number of 5,067 articles were identified in the literature search after different databases were merged, duplicate articles were removed, and any articles provided by content experts and identified by reviewers were included (Fig. 1).

Fig. 1
figure 1

Flowchart of systematic review process

Quality Assessment and Study Descriptions

Five quantitative studies that met the relevance and quality criteria proceeded to full data extraction and evidence synthesis. These studies were assessed for methodological quality using 22 quality criteria (Table 3).

Table 3 Methodological quality assessment

High Quality Studies

Two studies were of high quality [16, 17]. The high quality studies were consistent in their methodological quality. Both met 17 of the 22 criteria. However, neither study reported checking for differences between remaining and drop-out participants after the intervention. Also, neither described “contamination” between groups. Contamination occurs when workers in a control group are exposed, at least somewhat, to the intervention.

Medium Quality Studies

Three studies were classified as medium quality [11, 18, 19]. These studies each had some strong methodological characteristics similar to the high quality studies: concurrent comparison (control) group(s); time-based comparisons (pre-post); follow-up length of three months or greater; description of the research question; description of the intervention; and optimization of the statistical analyses.

None of the medium quality studies met the criterion for evaluating differences across groups pre-intervention, while both of the high quality studies did. Also, the medium quality studies did not meet the criteria for consideration given to power, while one of the high quality studies did.

Low Quality Studies

Eighteen studies were classified as low quality [2037]. The low quality studies often did not meet some or all of the criteria related to selection/sampling issues, measurement issues, and/or statistical issues. All high and medium quality studies used a comparison group, yet only 32% of low quality studies did.

Data Extraction

Descriptions of Intervention Categories

Table 4 shows the intervention categories and provides a detailed description of the five high and medium quality interventions. Some studies had multiple components in their intervention (i.e. combined a training component and an engineering control component). Each study assessed at least one intervention. When a study included more than one intervention, these were extracted and categorized according to the pre-specified system.

Table 4 Description of interventions in data synthesis (quantitative studies)
  • The intervention sub-component that was most commonly evaluated involved some type of training [11, 16, 17, 19]. There were different types of training. The most common objective was imparting safety information; other goals were interactive problem-solving and increasing motivation for safety

  • Two studies included interventions that implemented engineering controls [16, 18], in one of the two studies, the engineering component was the only intervention applied [18]

  • Workplace safety audits were part of the intervention in two studies [11, 17]

Table 5 shows study and intervention details, including the research question, sample size, inclusion/exclusion criteria and final analyses described. Other features of the studies are described below.

Table 5 Study and intervention details

Countries of origin Three studies were conducted in the United States [16, 18, 19] and one each was done in Norway [11] and Denmark [17].

Size of businesses Two studies included businesses with a range of sizes up to 100 employees [11, 19]. Two included businesses with 50 or fewer employees [16, 18]. One study included businesses with five or fewer employees [17].

Business industry sector One study occurred in multiple sectors [19]. Two studies took place in the manufacturing sector [16, 18], one was in the agricultural sector [17] and one was in other services, such as repair and maintenance [11].

Study designs The studies that proceeded to data extraction were three randomized trials [16, 17, 19] and two quasi-experimental designs [11, 18] (see Table 4). One study had “open” employee samples in which employees entered and left over the assessment period [16]. A fixed population design, in which the same participants were followed over time, was used in the other four studies [11, 1719].

Sample size The sample sizes in the studies varied greatly from three employees and one firm [18] to 721 employees in 226 firms [11]. Loss to follow-up details were lacking in three study descriptions [11, 17, 18]. When reported, the number of firms lost to follow-up varied from one of 48 (two per cent) [16] to 12 of 90 (13 per cent) [19].

Length of intervention In one study, the intervention length was not specified [18]. In four studies that specified a duration [11, 16, 17, 19], the time period varied substantially. Also, it was difficult to quantify durations for studies with multiple intervention components. The studies that included a training component tended to report the duration of training sessions. One study clearly specified that each of the four training sessions lasted six hours and occurred over the course of two years [11]. Another study [19] referred to a multiple day training session, but the exact duration was not provided.

Length of observation The length of observation, between baseline and the last follow-up, varied substantially from one month [18] to two years [11].

Age The age of employees was reported in three of the five studies [11, 16, 17]. Of these, the group mean age of employees varied from 32.5 [16] to 36.8 [17].

Gender Gender of employees was noted in four of the five studies. The percentage of women in an intervention or control group varied from <5% [11, 16] to 40.6% [17, 18].

Research question The detail in the research questions and objectives varied in both the high and medium quality studies. For example, in one high quality study, there was an explicit objective to obtain a specific percentage reduction of a workplace exposure [16]. Other studies did not provide that level of detail in their research objective. Only one study provided explicit hypotheses on what behavioral or attitudinal changes would be linked to changes in health outcomes [16].

Inclusion and exclusion criteria All five studies provided some inclusion criteria. Some of the criteria were based on firms’ listing in a particular business directory or association; this may have influenced the representativeness of the sample [11, 17]. Several studies had the number of employees as a specific criterion (e.g. five to 25 employees in the firm) [16, 18]. Another study used other inclusion criteria (e.g. listing in a directory of small businesses in a particular geographic area) to indirectly focus on small businesses [19].

Only two of the five studies described exclusion criteria [11, 17]. The exclusion criteria referred to possible cross-contamination of the intervention and within-firm changes that would adversely affect the intervention and/or the outcome measures.

Covariates and confounders All five studies assessed covariates and confounders (e.g. seasonal variation, job tasks). The variables considered varied substantially with little consistency across studies. Four of the five studies assessed possible covariates and confounders, and also integrated some variables into their statistical analyses when assessing intervention effectiveness [11, 1618]. Two studies provided information to establish whether there were between-group differences for covariates and confounders [11, 16].

Statistical analyses One study developed a multivariate statistical model based on the Poisson distribution. Two studies used univariate statistical tests such as the ANOVA [18, 19]. Two studies used a combination of univariate and multivariate models to assess between-group differences [11, 17].

Outcomes of interest The four types of outcomes extracted and categorized according to the pre-specified system were workplace exposures, behavior, attitudes/beliefs and health. Two studies included workplace exposure measures [16, 18]. Behavioral measures were included in three of the five studies [11, 16, 17]. Some measures focused on specific types of behavior, such as self-reported dust control behavior [16], while others examined a more general list of actions related to health and safety management behaviors [11].

Attitudes and beliefs were assessed in three of five studies. Two studies showed that perceived barriers to personal protective equipment or safety procedures in general were common [16, 19]. One study assessed confidence in engaging in safety practices and readiness to change [16]. Another study assessed whether the intervention changed employee decision authority and social support at work [11].

Health outcomes were assessed in three of the five studies. Two studies assessed rate of injury and frequency of illness [17, 19], and another measured musculoskeletal pain [11].

No study assessed all four types of outcomes outlined above. However, two studies assessed three out of the four outcomes [11, 16].

Evidence Synthesis

Table 6 presents a summary of the intervention effects as reported in the five studies. There were no negative or adverse effects on outcomes in any of the five studies. Therefore, positive effects or no effects are consistently reported for the interventions. With respect to the review question, a moderate level of evidence was found for the effect of injury prevention interventions, when looking across all outcome domains (i.e. behavioral, workplace exposure, attitudes/beliefs, and health).

Table 6 Intervention effects in quantitative studies

Below, evidence synthesis by intervention type using the pre-specified levels of evidence outlined in Table 2, is provided. In order to make some initial (albeit general) statements about intervention effectiveness, given only five studies, the evidence synthesis was applied, counting positive effects across all domains as meeting the criteria.

Engineering Plus Training, Safety Audit and Motivational Components

One high quality study that evaluated engineering controls along with training and motivational components (i.e. financial incentive as motivation) was identified [16]. The training consisted of an educational component with a series of one-day seminars and an interactive problem-solving component. No statistically significant effect on workplace exposure outcomes was found. However, this study did have a positive effect on attitude and belief measures.

  • With just a single high quality study available, there was limited evidence that this multi-component intervention had an effect on outcomes of interest.

Training Plus Safety Audit

Two studies, one each of high and medium quality, examined a training-plus safety-audit intervention in small firms. The training in the high quality study consisted of an educational component and an interactive problem-solving component [17]. The training in the medium quality study included only an educational component on health and safety management training [11]. The high quality study showed a positive effect on behavioral measures [17]. However, the study showed no effect on health-related outcomes. The medium quality study showed a positive effect on health and safety management behaviors and in the attitudes/beliefs outcome domain (e.g. social support). No significant effect was found in the health outcome domain.

  • Summarizing across outcome domains, the two studies provided limited evidence that training plus a safety audit has an effect on OHS-related outcomes.

Training Only

One medium quality study examined training only [19]. The training consisted of an educational component called REACH OUT, which was a train-the-trainer program. The study reported a positive effect on illness rate. The study also showed a positive effect on perceived access to personal protective equipment.

  • As there was a single medium study available, there was insufficient evidence to determine that training had an effect on OHS-related outcomes.

Engineering Only

One medium quality study showed positive effects of engineering control on workplace exposure outcomes [18].

  • With just a single medium quality study available, there was insufficient evidence regarding the effect of this engineering control on OHS-related outcomes.

Table 7 presents another summary of the same intervention effects reported in the five studies, this time by the four domains of outcomes examined: workplace exposure, behavioral change, attitudes/beliefs and health. In the one high quality [16] and one medium quality study [18] that reported on workplace exposures, one showed a significant improvement associated with the intervention [18] and the other did not [16].

Table 7 Effects summary by type of outcome
  • These studies suggest that there was insufficient evidence that workplace exposures were influenced by the respective engineering interventions.

One high quality [17] and one medium quality [11] study examining behavioral change found positive effects on this outcome. Both interventions had training and safety audit components. However, one multi-component intervention study assessed safety behaviors and found no effect [16].

  • These studies suggest that there was partial evidence that safety-related behaviors were influenced by the respective interventions.

One high quality [16] and two medium quality [11, 19] studies assessed attitudes and beliefs, and all three showed that significant positive changes were produced on this outcome. All of these interventions had some kind of training component.

  • These studies suggest that there was moderate evidence that attitudes and beliefs were influenced by the respective interventions.

In the one high quality [17] and two medium quality [11, 19] studies that assessed health outcomes (e.g. injury/illness rates), one showed positive effects. All of these interventions had a training component, and two also had a safety audit component [11, 17].

  • These studies suggest that there was insufficient evidence that health outcomes were influenced by the respective interventions.

Discussion

This systematic review used a pre-specified and explicit approach to answer the question: “Do OHS interventions in small businesses have an effect on OHS outcomes?” The literature on OHS interventions in small businesses was found to be heterogeneous in terms of the interventions implemented, study design, quality, and outcomes measured.

From an initial pool of 44 possible articles, 23 relevant studies were identified. Two were found to be of high quality, three of medium quality and the rest of low quality. Evidence was synthesized from the medium and high quality studies. Based on the evidence criteria for data synthesis, at least three high quality studies with consistent findings were needed to find “strong evidence” of an effect. With only two high quality studies, a moderate level of evidence was found for the effect of OHS interventions in small businesses in the aggregate (i.e. across environmental exposure, behavior, attitudes and beliefs and health). This finding means that a majority of high and medium quality studies found positive effects on OHS outcomes. In addition, no evidence was found that any intervention had a negative or deleterious effect on OHS outcomes.

This review included studies in different industries including farming, car repair garages, a printing press and woodworking businesses, while one study looked at multiple industries. While stratifying the interventions and effects based on industry would have been desirable, having only five studies precluded this option.

Complexity of Intervention Designs

Four studies in the quantitative literature included multi-component interventions (i.e. all studies except one [18]). The authors drew attention to the need to address multiple issues such as engineering arrangements, staff training, social marketing and safety audits. An exception to the need for multi-component interventions might be particular engineering interventions that do not require changes in practices and behaviors in the workplace [18]. Some interventions focused on the specific needs of each business [16] and on a train-the-trainer approach rather than on offering prescriptive OHS suggestions [19]. One study was also sensitive to the cost feasibility of engineering interventions and offered solutions that were fiscally manageable [18]. Overall, these studies suggest that small business OHS changes and improvements require a focus on a series of interrelated factors.

The Relevance of Business Size Across Sectors

This group of small business intervention studies shows the need to consider small business size when conducting interventions. An engineering intervention study emphasized the need for interventions that were affordable to small businesses [18]. A train-the-trainer intervention study across industry sectors found that the effect was greater in small businesses with more employees [19]. The differential effect of the intervention may suggest that the features of very small businesses may not fully realize the benefits of a train-the-trainer model.

These small business intervention studies also drew attention to the impact of sample size on intervention design and the ability to detect effects. To detect less than robust intervention effects, more small businesses are required. While the feasibility of recruiting a large number of small firms is a challenge, some studies were successful in this regard [16]. Other studies drew attention to the complex interplay of numerous environmental and individual risk factors that lead to an accident [17]. The need to include many small businesses in order to have sufficient sample size also increases the possible heterogeneity among businesses in terms of work processes and workers. Therefore any study of small businesses poses specific challenges to intervention evaluation. However, as noted in the results, the two high quality intervention studies [16, 17] also show that it is possible to conduct well-designed intervention studies in small businesses.

Quality of the Scientific Literature

To advance the quality of evaluation of small business interventions, and to shift the level of quantitative evidence from moderate to strong, further research should include several methodological features shown in Table 1.

A common characteristic of many low quality studies was not including a concurrent control group. A control group is necessary to evaluate effectiveness. Time series of one group provides some useful information, but because some threats to validity can still occur with time series (e.g. history effects) this is not considered as strong a design as one with a control group. If a study did have a control group, randomization of control and intervention groups was least likely to be included in low quality studies. One possible advantage of studying interventions in small businesses, rather than in large firms, is that the potential for control group contamination may be lower. Because small businesses are relatively independent, interventions can be carried out in physically distinct locations without concern that employees in the control group will be exposed to the intervention.

The intervention studies could also be improved by increasing the number of small businesses recruited. Only one high quality study provided an explicit sample size calculation (i.e. the number of small businesses) [16]. The feasibility of recruiting many small businesses was demonstrated in the study by Lazovich et al. [16].

Studies could be improved by measuring the outcome for at least four to 12 months post-intervention so that longer-term effectiveness and sustainability can be assessed. Also, covariates and confounders should be measured and adjusted for using multivariable statistical models. This is critical when using a non-randomized study design.

Issues in Conducting Systematic Reviews

This section discusses issues in conducting systematic reviews that emerged during this review. The identified issues provide opportunities to improve systematic review methodology.

Intervention Research

It is strongly encouraged that small businesses and their safety partners systematically evaluate any intervention that is implemented. Guides are available to help with evaluation designs [13]. If funding and support are not provided for evaluations, then future review efforts on this topic will provide little additional guidance. Few studies were found to use similar outcome measures, making it a challenge to integrate findings. While some diversity in specific measures is inevitable, the study of wood dust control [16] provides a good example of the use of outcome measures from multiple domains (workplace exposure, behavior and attitude).

Intervention and Outcome Specification

Although previous work that proposed a categorization system for types of interventions and outcomes was identified [38], it was found to be too general. Developing meaningful categories to communicate findings will continue to be a challenge in systematic reviews of OHS. One notable issue is intervention specification. Intervention components were often combined with each other, which prevented the identification of component-specific effects. Four of the five studies that included training found positive effects across the different outcome domains. However, they could not be grouped together in a “training” category because they included additional intervention components.

Methodological and Statistical Improvements to the Literature

Statistical analyses are helpful to ascertain whether differences between groups are simply due to chance. However, heterogeneity in the types of statistical procedures used made quality assessment and evidence synthesis on this review difficult.

Strengths, Limitations and Next Steps

Strengths

To be as comprehensive as possible, the peer-reviewed literature in several electronic databases and the reference lists of relevant studies were searched. External experts were also contacted to request potentially relevant published articles. These steps helped ensure that as much of the relevant literature as possible was found.

The review teams also used a process of interchanging the pair of reviewers at each phase to improve the quality and independence of quality assessment and of data extraction.

Limitations and Next Steps

There were several limitations in this review. One limitation was that the review included only peer-reviewed literature and not the “gray literature.” Another was that the inclusion criteria limited the focus to studies with clearly defined samples of 100 or fewer employees. The review was also limited to articles published in English, Spanish, Italian, French, Portuguese or German. It is possible that articles in other languages might have provided relevant evidence that could have contributed to answering the research question.

The following steps are recommended for future systematic reviews:

  • Review the observational literature that examines small business safety climate and OHS safety processes, especially those that examine differences between small businesses

  • Include gray literature for intervention studies that have not been published in peer-reviewed journals, but could inform small business OHS

Conclusion

Even though there were few studies that adequately evaluated small business interventions, two types of OHS prevention activities with emerging evidence to support them are:

  • a combination of training and safety audits

  • a combination of engineering, plus training, safety audits, and motivational components

Although stronger levels of evidence are required to make recommendations, these interventions most frequently prompted positive changes in safety-related attitudes and beliefs and workplace parties should be aware of them.