Introduction

Vital dental bleaching is a technique that produces quick results and improves the patient’s appearance and self-esteem. A study conducted with questionnaires by Poznan and Poland reported that 85% of the patients who had submitted to dental bleaching were satisfied with their final appearance [1]. Similar findings were observed in the city of Santiago, Chile, where the authors reported that most of the patients were highly satisfied after bleaching treatments [2].

Basically, there are two types of dentist-supervised dental bleaching: at-home and in-office bleaching protocols. Although at-home bleaching is the most frequently used technique, some patients prefer faster results, and thus, in-office rather than at-home bleaching is a more suitable procedure [3, 4]. Like at-home bleaching, the in-office protocol produces satisfactory whitening results [5,6,7,8,9].

In-office bleaching systems employ high- or low-concentrate hydrogen peroxide (HP) that is sometimes activated with heat and/or light sources [10,11,12]. The rationale behind the use of light with in-office bleaching is to accelerate the bleaching process, by increasing the temperature of HP [13, 14]. It is believed that such temperature rise increases the HP decomposition rate in free radicals for oxidization of complex organic molecules [13, 14]. This association is usually called “power” or “jump-start” bleaching. There are many types of light-activating sources, such as halogen lamps, a laser, light-emitting diodes (LEDs), metal halides, and plasma arc lamps (PACs) [5, 15,16,17,18,19].

The benefits of light-activation have been questioned, as many randomized clinical trials (RCTs) have found controversial findings [16, 20,21,22,23]. A recent systematic review [24] comparing the efficacy of a control group (bleaching without light activation) versus the combined effect of light-activated bleaching systems showed that light activation does not seem to improve color change. However, in this systematic review, all types of light-activated systems were merged and not evaluated separately. Perhaps differences in the light activation protocols may play a role in the performance of light-activated bleaching.

A network meta-analysis allows for the comparison of various treatments using a single model. This methodology is particularly useful when a gold standard is unknown and there are no trials comparing the treatment options [25]. Such an approach combines the extracted data considering the direct evidence (that is, the evidence that comes from head-to-head trials) and the indirect evidence (the evidence that comes from trials with a common comparator: for example, one trial comparing a halogen lamp with a light-free, and another trial comparing a PAC with light-free, provides indirect evidence for the comparison of a halogen lamp with a PAC). The application of this approach enables treatments to be ranked in terms of the probability of each treatment being the most effective for each outcome measure.

Therefore, the purpose of this systematic review is to establish if there are evidence-based differences in the bleaching efficacy of various treatments: light-free and six types of light-activated bleaching protocols (halogen lamps, lasers, LED, LED/lasers, metal halides, and PACs) using high- or low-concentrate HP.

Materials and methods

Protocol and registration

This study protocol was registered at the International Prospective Register of Systematic Reviews (PROSPERO—CRD42017078743) and followed the recommendations of the Preferred Reporting Items for Systematic Reviews and network Meta-Analysis (PRISMA) statement for reporting [26].

Eligibility criteria

This systematic review and network meta-analysis was conducted to answer the following PICO question: “In the adult population, are there differences among seven protocols (a light-free, a halogen lamp, a laser, LED, LED/laser, metal halides, and a PAC) regarding color change?” We included parallel and split-mouth RCTs that included at least one group treated with in-office dental bleaching with light activation in adult patients. RCTs were excluded if they compared in-office dental bleaching with combined bleaching (in-office bleaching with jump-start associated to at-home bleaching). No year or language restrictions were applied.

Information sources and search strategy

Electronic databases (MEDLINE via PubMed, Cochrane Library, Brazilian Library in Dentistry, Latin American and Caribbean Health Sciences Literature database (LILACS) and citation databases, Scopus, and Web of Science) were comprehensively searched (Table 1). The reference lists of each primary study were hand-searched for additional relevant publications. We also searched the related article links of each primary study without publication date or language restrictions.

Table 1 Electronic database and search strategy conducted initially in April 23, 2017 (updated on March 30, 2018)

The controlled vocabulary (MeSH terms) and the free keyword in the search strategy were defined based on the population (adult patients who underwent vital tooth bleaching) and intervention (light-activated in-office bleaching) aspects.

Additionally, gray literature obtained by searching the abstracts from the International Association for Dental Research annual conference and its regional divisions (1990–2018), the database of the System for Information on Gray Literature in Europe and dissertations, and theses from the ProQuest Dissertations and Theses full-text database as well as the Periódicos Capes Theses database were investigated.

To locate unpublished and ongoing trials related to the review question, the clinical trial registries were searched as well: Current Controlled Trials (www.controlled-trials.com), the International Clinical Trials Registry Platform (http://apps.who.int/trialsearch/), ClinicalTrials.gov (www.clinicaltrials.gov), Rebec (www.rebec.gov.br), and EU Clinical Trials Register (https://www.clinicaltrialsregister.eu).

Study selection and data collection process

Initially, the articles were selected by title and abstract according to the aforementioned search strategy. Articles that appeared in more than one database were considered only once. Full-text articles were also obtained when the title and abstract presented insufficient information for making a clear decision.

Subsequently, each eligible article received a study identification generated by combination between the first author and year of publication. Relevant information about the study design, participants, treatment, and outcomes were extracted using customized extraction forms. Concerning color change, data before and 1 week post-bleaching were extracted. As some studies did not report this period, the most immediately post-bleaching periods were extracted up to 1 month post-bleaching depending on what the authors reported.

All processes cited were conducted independently by three authors (B.M.M., A.B., and T.P.M.). In case of any doubt, a fourth author was also consulted (A.R.).

Risk of Bias in individual studies

Quality assessments of the selected studies were carried out by three independent reviewers using the Cochrane Collaboration tool for assessing risk of bias (RoB) in RCTs [27]. The assessment instrument contains six items: sequence generation, allocation concealment, blinding of the outcome assessors, incomplete outcome data, selective outcome reporting, and other possible sources of bias. For this study, the first three items were considered to be key domains, and no other possible sources of bias were considered. Each domain level was judged as having a low, high, or unclear bias. Afterward, each study was classified as having a low RoB (if all key domains were deemed to have a low RoB), an unclear risk (if one or more key domains were judged as having an unclear risk), or a high RoB (if at least one key domain was considered to have a high RoB). When a study was classified as unclear, its authors were contacted to obtain more information to allow for a definitive judgment. Quality assessments were also conducted by the three already cited authors, and any disagreements among the reviewers were solved through discussion and if needed by consulting a fourth reviewer (A.R.).

Summary measures and statistical analysis

Only studies classified as having a low or unclear RoB were included in the meta-analysis. Independent analyses were performed for both high- and low-concentrate bleaching gels. Products with HP concentrations higher than 25% were classified as high-concentrate products, and the ones with concentrations equal to or lower than 25% were considered to be low-concentrate products. The outcome color change was measured in 2 units: ∆E* (CIEL × a × b × color scale system) and ∆SGU (shade guide units). The mean difference of deltas (treatment A versus treatment B) was used as the effect size measure to compare treatments.

The mixed treatment comparison (MTC) methodology was chosen to carry out the network meta-analysis. This model is supported by the Markov Chain Monte Carlo (MCMC) hierarchy and is extremely versatile, allowing for the simultaneous comparison of all seven treatments and the incorporation of trials with three or more arms. The evidence of each possible pairwise comparison was evaluated exclusively from direct evidence (head-to-head trial), exclusively from indirect evidence (trials with a common comparator) or a combination of both depending on which evidence was available for each pair.

First, a traditional meta-analysis was performed for each pairwise comparison where evidence was available from two or more studies. Random effects models with the DerSimonian and Laird variance estimator and the inverse of the variance method were considered because high heterogeneity was expected among the studies. The I statistics and the Cochran Q test was used to measure heterogeneity among studies.

Subsequently, four network meta-analyses (two concentration and two measuring scales) were performed using light-free bleaching as the common comparator. Both fixed and random effects with the homogeneity of variances were adjusted, and the one with better performance following the Deviance Information Criterion (DIC) was chosen to show the results. Consistency assumptions between direct and indirect evidence (that is, if the information of both sources of evidence are similar enough to be combined) were checked using the posterior plots and the Bayesian p values produced by the node-splitting method by Dias et al. 2010 [28]. In this approach, each pair of treatments in a closed loop (direct evidence connecting three or more pairs) has its MTC evidence (pairs in a closed loop always have direct and indirect evidence, so for these pairs, the MTC evidence is a combination of both types of evidence) split and compared (high Bayesian p values for these comparisons indicate no inconsistence. A p value equal to or greater than 0.1 was considered as the threshold for significance, as the same data were used in multiple comparisons. The results were displayed in point estimates, 95% CrI (credible intervals are the Bayesian analogous to the frequentist confidence intervals) and surface under the cumulative ranking curve (SUCRA) probabilities (the probability of being the treatment with the higher color change).

All analyses were implemented using the Meta and GeMTC packages of the R statistical software (https://cran.r-project.org).

Results

Study selection

The database screening returned a total of 6602 studies, which was reduced to 4906 following the removal of duplicates. After title screening, 136 studies remained, and this number was reduced to 28 following the careful examination of the abstracts or full text (Fig. 1).

Fig. 1
figure 1

Flow diagram of study identification

Characteristics of the included studies

Study design and method of color evaluation

Descriptive characteristics of the 28 selected studies are presented in Table 2. In brief, the study design was balanced among the studies: 14 studies used parallel design [5, 11, 16, 19, 22, 29,30,31,32,33,34,35,36,37, 38, 42, 45], and 14 studies used the split-mouth design [15, 17, 18, 20, 21, 38,39,40,41,42,43,44,45].

Table 2 Summary of the primary studies included in this systematic review

For color evaluation, 22 studies used a shade guide [5, 11, 15, 16, 18, 19, 21, 23, 30, 32,32,33,34,35,36,37,38, 40,41,42,43,44,45], and 15 studies used an objective instrument (spectrophotometer or colorimeter) for color assessment [11, 17, 19, 20, 22, 29, 32, 33, 35, 36, 38,39,40,41, 45]. Photography was also used in nine studies [5, 11, 18, 20, 31, 38, 39, 42, 44].

Age of the patients in the primary randomized control trials and gender

The ages of the patients ranged from 18 to 78 years old; eight studies did not report age ranges [17, 18, 30, 32, 38, 39, 41, 42]. The mean age of all participants included in the RCTs that reported this information was approximately 30 years, showing a predominance of young adults (Table 2). Females were predominant in all studies that reported this characteristic [11, 16, 17, 19, 21, 22, 35, 41, 43, 45].

Bleaching protocols

The concentration of HP varied from 6 to 38% (Table 2). The application protocol for in-office bleaching was quite variable, although a high number of studies applied the product for three 15-min applications during each clinical session [11, 17, 21, 29, 30, 32,33,34,35, 37,38,39,40, 44]. Twenty-one studies used high-concentrate HP [5, 11, 15, 16, 18, 21,22,23, 29, 30, 32,33,34,35,36, 38, 39, 41, 42, 44, 45], and another 12 studies used low-concentrate HP [17,18,19,20, 22, 31, 34, 35, 37, 40, 43, 45]. Variations in this protocol were observed, with one, two and four applications per session, for various periods of times. Most studies performed a single clinical session [11, 16,17,18,19,20,21, 23, 31, 32, 34, 36, 37, 39, 40, 42, 43], but two or three sessions with intervals between 7 and 14 days were also observed (Table 2).

Different types of light activation were used. Six studies used halogen lamps [5, 16, 33, 36, 42, 44], four used only a laser source [11, 32, 36, 41], seven used only LED [11, 15, 16, 23, 32, 33, 44], 13 used LED/Laser [5, 15, 21, 22, 29, 30, 33,34,35, 38, 39, 44, 45], eight used metal-halide light [16,17,18, 20, 31, 37, 40, 43], and two used PAC [11, 19] with various protocols. In some studies, light was applied for the same amount of time that the gel was applied; in other studies, light was applied for a few minutes with a specific time interval between applications (Table 2).

Assessment of the risk of Bias

The RoB of the eligible studies is presented in Fig. 2. Few full-text studies reported the method of randomization, allocation concealment, and whether or not the examiner was blinded during color assessment in shade guide units, as they were usually classified as having an unclear RoB. However, four out of the 28 studies were considered to be at high risk in the key domains of bias at the study level [17, 23, 31, 44], so they were not used in the meta-analysis. The study of Martin 2015 [34] was considered to have a high RoB in the incomplete outcome data, but this item was not considered to be a key domain.

Fig. 2
figure 2

Summary of the risk of bias assessment, according to the Cochrane Collaboration tool

Evidence network

In this phase, six out of the 24 studies eligible for meta-analysis were removed. The study by Bortolatto 2016 [22], Kugel 2006 [18], Martin 2015 [34], and Martín 2015 [45] were removed because the authors compared a low-concentrate HP with a high-concentrate HP; the study by Posso Moreno 2010 [20] was removed because the data could not be extracted and the study by Ward 2012 [43] was removed because the authors did not have a comparator group in the study (Fig. 1). In summary, 18 studies were included in the meta-analysis of color change outcome, with 13 of these having two arms [15, 19, 21, 29, 30, 32, 35, 37,38,39,40,41,42], two having three arms [5, 36], and three having four arms [11, 16, 33].

In two studies that did not report the standard deviation (SD) [33, 42], we imputed an SD that was based on the average of the coefficient of variation of the other studies that reported the same finding [46]. More extreme imputations (such as a value corresponding to the lowest coefficient of variation of the primary studies and a value that was as high as the reported mean) was evaluated, and no differences in the results herein reported could be detected.

Figure 3 shows the evidence network of light activation comparisons, where each node represents a treatment and the line thickness represents the number of studies included in the comparison. From the evidence network, it is possible to observe that some pairwise comparisons have no direct evidence that comes from head-to-head studies (a metal-halide light and laser, for example; Fig. 3b) and others have limited evidence as can be seen by the number of studies that compared protocols joined by straight lines.

Fig. 3
figure 3

Network of eligible comparisons for color change (a) ΔE for high-concentrate HP; (b) ΔSGU for high-concentrate HP; (c) ΔE for low-concentrate HP; (d) ΔSGU for low-concentrate HP. (n = number of patients for the pairs)

For high-concentrate HP gel, six treatments were compared using color change in terms of ΔE* (Fig. 3a), totaling 21 pairs of comparisons and 641 patients. Seven treatments were compared using color change in terms of ΔSGU (Fig. 3b), totaling 30 pairs of comparisons and 835 patients. For low-concentrate HP products, three treatments were compared using color change in terms of ΔE* (Fig. 3c), totaling two pairs of comparisons and 78 patients, and four treatments were compared using color change in terms of ΔSGU (Fig. 3d), totaling four pairs of comparisons and 186 patients. Figure 3 also depicts the number of studies that contributed for direct evidence. For instance, five studies compared LED/laser with a light-free (Fig. 3a).

Network meta-evidence

Evidence of inconsistency was not found for the two high-concentrate HP networks. The smallest Bayesian p value found was equal to 0.24 for the halogen lamp versus laser treatment comparison for ΔSGU (Figs. 4 and 5). Inconsistency was not evaluated for the two low-concentrate HP networks because it was not necessarily due to the absence of closed loops. Table 3 summarizes the results of the four network meta-analyses conducted. Positive values for the delta mean difference favor the column-defining treatment, and negative values favor the line-defining treatment. For example, the color change for ΔSGU with a laser is, on average, 0.74 smaller when compared with a halogen lamp (− 0.74 (95% CrI − 1.90 to 0.40) Table 3a), although the difference is not significant. No significant differences were found among the treatments in each network. Therefore, SUCRA analyses were not performed.

Fig. 4
figure 4

Forest plot of evaluation of the inconsistency assumption between direct and indirect evidence used in the network meta-analysis to effect of color change in ∆E for high-concentrate HP with different kind of light-activation on the median of the mean difference (MD) (p < 0.05 indicates inconsistency of the pairs)

Fig. 5
figure 5

Forest plot of evaluation of the inconsistency assumption between direct and indirect evidence used in the network meta-analysis to effect of color change in ∆SGU for high-concentrate HP with different kind of light-activation on the median of the mean difference (MD) (p < 0.05 indicates inconsistency of the pairs)

Table 3 Pairwise comparisons for the efficacy of the six light-activation and light-free. Results are delta mean differences (95%CrI) between the column-defining treatment and the row-defining treatment. (A) Effect of color change in ∆E (on the underside) and ∆SGU (on the top side) for high-concentrate HP. (B) Effect of color change in ∆E (on the underside) and ∆SGU (on the top side) for low-concentrate HP

Discussion

For the research question under evaluation in this study, multiple types of light-activation devices are present in the market, and they vary significantly in terms of the light spectrum, intensity, and power output. Although previous systematic reviews of literature have already focused on this research question, they merged the outcomes of all types of light-activation devices to compare them against a control group of in-office bleaching without light activation [24, 47]. The combination of different kinds of studies in a meta-analysis has been one of the criticisms of this methodology, as such a process is based on subjective judgment, and researchers may have different opinions concerning the appropriateness of combining results. Additionally, there is often an interest among clinicians to identify the most effective treatment or to rank the treatments among a range of clinical available alternatives, such as the type of light-activation device used in conjunction with in-office bleaching.

Recently, network meta-analyses have been presented as an extension of traditional meta-analysis, where multiple treatments can be compared using a single model. With such an approach, when direct evidence (head-to-head trials) and indirect evidence (a comparison of two treatments is made through a common comparator) are both available, they are combined in a single measure. This method has become increasingly common in the medical literature [48,49,50,51]; however, in the dental literature, few studies have used this methodology [52,53,54]. Indirect comparisons can increase the validity of comparisons obtained with direct comparisons [55] and may also provide valuable clinical information in the absence of direct comparative data [56].

Differently from the two previous traditional systematic reviews of the literature [24, 47], the present study evaluated the impact of the different types of light activation on bleaching efficacy through a Bayesian network approach. The analysis performed in this study confirmed the previous findings: there is no evidence that light activation offers better efficacy in terms of color change [16, 33, 36]. In addition, the results of this study showed that there is no evidence regarding which of the six types of light activation (a halogen lamp, a laser, LED, LED/laser, metal halides, and a PAC) has better performance when it comes to color change.

The rationale behind the lack of efficacy of light activation was previously mentioned in an earlier publication [24]. From chemical theories, we know that heat and light sources can accelerate the decomposition of HP to form oxygen and perhydroxyl free radicals, but this does not necessarily mean that under a clinical scenario, greater whitening efficacy will be observed as shown in the present systematic review. It is very likely that there are unknown rate-determining steps in the oxidizing mechanism of tooth whitening [24], which may play a more significant role in the color change.

For instance, the mean age of the participants included in the primary studies of this systematic review is under 30 years (Table 2). It was demonstrated that for every increase of 1 year in the participant’s age, we observed a decrease of the final whitening degree of 0.69 for the ΔE, suggesting that the whitening degree is negatively affected by the participant’s age [57]. In other words, as most of the RCTs evaluated color change in young patients, in which color change occurs more easily, we may not extend the conclusions of this systematic review to elderly patients. This is one of the limitations of the present and earlier systematic reviews of the literature [24, 58, 59]. The results described in these systematic reviews appraise and summarize the evidence of the primary studies but also carry on their limitations. Perhaps the use of light-activation in-office bleaching may be effective in more challenging clinical scenarios, such as in elderly patients. Thus, RCTs with such population samples are encouraged.

In agreement with the systematic review of Maran et al. [24], we did not observe differences in color change with high- and low-concentration bleaching gels. However, in the latter comparison, there are still few well-conducted RCTs, which reduce the precision of the color change outcome. Perhaps well-conducted RCTs should be conducted with lower HP concentrations and varied types of light sources to increase the precision of the estimates herein presented or to change the view in this aspect.

Although the main goal of tooth bleaching is to whiten teeth, tooth sensitivity is the main adverse effect of this type of cosmetic treatment. There is a wide belief that light activation may lead to a higher risk and/or intensity of bleaching-induced tooth sensitivity. This finding was indeed demonstrated for low-concentrate products in an earlier systematic review [59], but the quality of the evidence was not high due to the data imprecision of the estimate due to the low number of included studies. Adding indirect information to this outcome through network meta-analysis may increase the reliability of this estimate. Bleaching-induced tooth sensitivity was not evaluated in this study; but it is under investigation through another network meta-analysis from our research groups.

Another important factor evaluated in this study that deserves attention from the research community is the RoB of the studies included in this review. From the 28 studies, only nine were classified as being at low RoB in all domains. Additionally, the majority of the studies did not report adequately the key domains of randomization and allocation concealment. Randomization is the most important tool that only RCTs can employ. It provides comparative groups at a baseline for both known and unknown baseline features. However, randomization alone is not complete and may be broken if the random sequence is not kept secret until implementation.

The process of protecting a random sequence is called allocation concealment [27, 60], and the adequate management of random sequence and allocation concealment keeps the study free of selection bias. In the newer version of the Cochrane Collaboration tool for assessing the RoB of RCTs (RoB 2.0 version), randomization and allocation concealment were merged in a single domain, as they both focus on preventing selection bias [61]. Future RCTs on this topic should pay more attention to these aspects during design and execution to increase the quality of the evidence produced in the dental field.

Finally, one should discuss the limitations of this systematic review: (1) the analyses did not consider the differences in the protocols of each light-activation device, but it seems that for high-concentrate in-office gels, the use of light activation may be useless for a young population, and (2) we cannot rule out the fact that the limited number of studies comparing the protocols may have been the reason for the similar results herein presented. Future RCTs with low-concentrate gels are still required to increase the precision of the findings herein reported.

Conclusions

We did not observe superiority of any light-activation protocol for in-office bleaching even when compared with non-light activation protocol. Light activation, regardless of the type of device used for such a purpose, did not improve bleaching efficacy. The same findings were observed for high- and low-concentrate in-office bleaching gels, although there is still limited number of published articles for each type of light.