Introduction

Acne vulgaris is a multifactorial inflammatory dermatologic condition, comprising the most common presentation to dermatologist’s offices and affecting over 85 % of individuals in their lifetime [1, 4, 5, 49, 50]. The primary site of pathology is the pilosebaceous unit, located in greatest preponderance in the skin of the face and torso [40]. Topical therapy is a mainstay in the treatment of acne providing direct action at the site of the pathology. Many topical medications are limited in their ability to bypass the hydrophobic barrier of the stratum corneum, requiring structural alterations and pairing with solvents and/or vehicles to enhance drug delivery and esthetic effects [11, 37].

Acne is often associated with significant negative impacts on quality of life [35]. This drives a continued need to improve the efficacy of available treatments, often resulting in emergence of new delivery systems. Many new vehicles not only improve drug delivery and efficacy, but offer cosmetic enhancements and inherent biological activity that could improve the natural disease course of acne [11, 28]. In addition, efficacy investigations of acne therapies are commonly determined in clinical trials comparing responses of an active drug group to those in a “placebo-” or “vehicle”-treated group that operate under the assumption that all vehicles are equal. Furthermore, the transient nature of acne lesions and the varying severity of acne in each patient render efficacy estimations questionable.

Benzoyl peroxide (BPO) is one of the most frequently used acne treatments and after decades of its use, it retains its potent antimicrobial effect against Propionibacterium acnes without fostering resistant strains, even when used in combination with topical antibiotics [1214, 22, 29]. While its exact mechanism is not entirely known, BPO is an oxidizing agent known for its antimicrobial and keratolytic activity. As with many topical medications, BPO must be formulated to enhance its penetration through the stratum corneum and permeation to its site of action, and because of its unique structure, vehicles can greatly affect its concentration and efficacy [18, 29]. Recent technological innovations have led to vehicles that deliver the active ingredient more efficiently and produce biological actions including the absorption of excess sebum from the skin surface [17, 21]. For instance, use of a microsphere-based vehicle allows for sustained release of BPO improving the tolerability and decreasing the concentration-dependent irritancy without compromising its efficacy, even at lower concentrations [38, 48]. Such augmentation of pharmaceuticals to treat acne means that “vehicle”-treated groups should not be synonymous with “placebo”-treated groups [19]. Because vehicle treatment groups have been shown to exhibit large effects in the topical treatment of acne and a true placebo should have no inherent activity against the disease process, at a minimum, “no treatment” groups may be needed as comparators in topical acne trials [8].

This article evaluates randomized controlled trials that use a “vehicle” control group to assess the efficacy of topical BPO in the treatment of acne. BPO is unique in that approval of its use for acne in the United States did not require proof of efficacy in double-blind randomized controlled trials. We hope to use the comparison of active drug and vehicle treatment groups to provide a framework to analyze acne clinical trial design and implementation to draw important conclusions as to how therapeutic efficacy measurements can be improved. In addition, we hope to elucidate how physicians can use this information and efficacy results from randomized controlled trials to better optimize the therapeutic regimens they recommend to patients.

Methods

Data sources and search strategy

We searched Embase, PubMed, Cochrane Library, and ClinicalTrials.gov for articles published or registered randomized controlled trials through April 11, 2012. The search terms utilized were “acne” and “acneiform eruptions” combined with “controlled trial”, “placebo”, and “randomized trial” combined with “benzoyl peroxide”. All articles written in any language were evaluated and results were translated to English as needed.

Selection and outcomes

Potentially eligible trials identified through database searching were initially reviewed individually by title and available abstract by all authors independently and results were cross-checked for accuracy. Studies were excluded if the investigational treatment was not BPO, they were not clinical trials investigating efficacy, or acne was not the treatment indication. Trials were then screened based on study procedures, the quality of the clinical trial, and excluded if there was no “vehicle” or “placebo” treatment group, BPO was not an investigational drug, or efficacy was not objectively measured with lesion counts.

Data abstraction

Data were independently abstracted by two authors for our primary outcome measures for each trial and the results were compared to ensure accuracy. The primary outcomes of interest included the percentage of responders in both the active treatment group and the vehicle group after the initial, blinded, vehicle-controlled treatment phase. We classified the outcome measures, recorded the length of time of the treatment period, and recorded pertinent inclusion and exclusion criteria. The number of patients treated in each study arm was documented along with drop-out rate, the percentage of BPO in the active formulation, vehicle characteristics, and schedule of administration. Data on study design were recorded.

Statistical analysis

All patients in the active BPO monotherapy and vehicle groups were pooled. The patients in each group achieving the outcome measures were added according to the intention-to-treat numbers and the weighted averages were tabulated. Outcomes for percent reduction in total lesion count, inflammatory lesion count, and non-inflammatory lesion count were determined and compared between the active treatment and vehicle groups. The average randomization fraction of patients allocated to active treatment versus vehicle treatment groups for all studies was also determined. For trials where patients were also treated with combination formulations or other active treatment preparations, all patients were grouped into the active treatment group.

Results

Trial flow

There were 103 potentially relevant studies retrieved in our search of PubMed, Embase, and clinical trials.gov. Ninety-one trials were excluded. Of the trials excluded, 1 trial did not evaluate acne vulgaris, 24 were not randomized controlled trials, 21 did not include a BPO monotherapy arm, 37 did not use a placebo or vehicle group as a comparator, 1 did not use the same vehicle comparator that was in the BPO formulation, 2 did not evaluate the efficacy of BPO, and 5 were repeat studies. Trials with “no treatment” groups were not found. Twelve studies met all inclusion and exclusion criteria and were included in the study procedures.

Trial characteristics

In the 12 studies analyzed, 2818 patients were in benzoyl peroxide monotherapy treatment groups and 2004 were in vehicle treatment groups. Trial characteristics including the numbers of participants randomized to each study arm, the treatment regimen, the vehicle type, inclusion and exclusion criteria, and outcome measures are listed individually for each study in Table 1.

Table 1 Trial characteristics

On average, the studies included patients between 12.9 and 31.1 years of age, with 9.7–106.3 non-inflammatory lesions, and between 14.6 and 60 inflammatory lesions. The average number of daily treatment applications was 1.5 and the average study duration was 10.4 weeks. The average randomization fraction of active treatment to vehicle groups was 4:1 (Table 2).

Table 2 Results

Outcomes

The weighted average percent reduction in total number of acne lesions was 44.3 (SD = 9.2) and 27.8 (SD = 21.0) for the active and vehicle treatment groups, respectively (Table 2). The average reduction in the mean number of non-inflammatory lesions was 41.5 % (SD = 9.4) in the active treatment group and 27.0 % (SD = 20.9) in the vehicle group. The percent decrease in inflammatory lesions was 52.1 (SD = 10.4) in the benzoyl peroxide group and 34.7 (SD = 22.7) in the vehicle group. The percent of participants achieving treatment success as designated by the study outcomes were 28.6 (SD = 17.3) and 15.2 (SD = 9.5) in the active treatment and vehicle groups, respectively (Fig. 1).

Fig. 1
figure 1

The absolute difference in the percent efficacy between the benzoyl peroxide (BPO) and the vehicle treatment groups. a The individual distributions of the studies and b the overall averages are depicted. * P < 0.05, ** P < 0.01

Discussion

The percentages of “vehicle” or “placebo” responders in RCTs evaluating the efficacy of BPO in the treatment of acne are remarkably high, especially when compared to those of the active treatment responders. In addition, the differences between the active drug and placebo group have continued to decrease, especially in more recently conducted trials. Furthermore, the standard deviation of the weighted outcome averages is large. Together, these factors indicate that the accuracy of determined efficacy values may be compromised.

The lack of clear disparity between the treatment and vehicle responders could be secondary to a variety of factors either inherent in clinical trials that contribute to a “placebo effect” or to a false treatment response. These factors act to blur the line between the active drug response, vehicle response, placebo response, and the response expected secondary to natural disease progression. Acne lesions are transient and the severity of an individual’s acne changes over time. Because it is not realistic to monitor these changes more frequently or in the time periods when patients are not enrolled in trials, there is a need to decipher values for each group of responders to more accurately define efficacies of therapeutics [26].

The placebo effect is an important concept within clinical trials and efficacy determinations, and much research has been done to unravel the components of the placebo effect [2, 9]. Factors within topical acne RCTs potentially reinforcing placebo or vehicle responses include therapeutic ritual from frequent dosing and the direct application of medication or vehicle to the site of pathology, the physical characteristics of the medication, more frequent office visits during trial periods, and increased attention to skin care during study periods [20, 30]. In addition, patient expectations contribute to observed effects with any prior experiences with RCT involvement, ineffective treatments, and provider relationships potentially acting to alter patient responses [27, 32, 43, 44]. However, other factors may also be improving disease course of acne.

Interestingly, the fraction of participants randomized to vehicle treatment groups in acne trials is very low with the average ratio in this review being 1 to every 4 in the vehicle to active treatment group, respectively. It is important to note that patients are informed of the study design and the probability of receiving active drug prior to consent [3]. This leads to patient expectations of active drug receipt that may contribute to their response and skin care regimen adherence, and to placebo response. To more accurately assess efficacy, the placebo effect needs to be separated from vehicle effects. Moreover, many acne studies only evaluate treatment effects on the face, a site that is often visually inspected and critiqued, although potentially encouraging compliance with study protocols. This would escalate improvement in both treatment groups, further emphasizing the need for “no treatment” groups to have distinct values of drug efficacy, vehicle efficacy, placebo effect, and natural disease course.

BPO has been a mainstay of acne treatment for over five decades. Interestingly, only 12 published RCTs evaluated BPO monotherapy versus vehicle. Furthermore, only one of these trials compared vehicle to a BPO monotherapy formulation that is commercially available. The remaining 11 trials utilized BPO monotherapy formulated with a vehicle only available in combination products. With the wide variety of vehicles utilized in topical acne therapeutics and their biologic effects that potentially improve acne, the use of a “no treatment” group in RCTs is essential to determine true medication efficacies. Also, it is important to tease apart accurate efficacies of different drug monotherapies and their vehicles separately as treatment with combinations of monotherapies versus combination drug formulations can achieve equivalent results at a fraction of the cost [39].

There is great variability in the severity classifications used to determine patient eligibility in acne trials. In addition, this variability also exists in the outcome measures captured. While the natural disease spectrum and course of acne is such that no consensus has been reached to homogenize acne severity classification, the design of RCTs determining medication efficacies should be standardized to allow for better comparison for different treatment efficacies [41, 42]. Furthermore, because outcome measures are determined by investigators, the accuracy of their assessments and determination of lesion counts are thus subject to discrepancy and error. Careful double blinding, scrutiny of observed treatment responses especially between the baseline and first follow-up visit, and the use of specialized imaging techniques to determine treatment responses could improve the accuracy of outcome measurements [34].

These findings must be interpreted in the context of our study design. Heterogeneity in RCT design and the differences in efficacy and tolerability of the BPO and/or vehicle formulations exist. Furthermore, there is yet to be a unified acne severity classification system or set of efficacy outcome measures. In addition, many efficacy trials evaluating acne treatments are comparator studies. We intend for the analysis of these trials to illustrate a variety of important points for consideration when interpreting efficacy results and designing of RCTs for topical acne treatments.

Importantly, our discoveries reveal the need for further investigation in multiple areas of RCT design and the separate implementation of vehicle and drug formulation efficacy testing. Furthermore, determination of the typical acne disease course is needed to validate the responses seen in RCTs. Standardization of acne severity measures for study inclusion and continued scrutiny of the accuracy of outcome measures can allow for cross comparison of different acne RCTs and improvement in our understanding of topical acne treatment efficacies. With continued testing of available acne treatments and a better understanding of responses, we can better guide patients.