Introduction

Over-the-counter (OTC) products appeared in the early 2000s and represent bleaching agents that are available in drugstores, supermarkets, and general stores [1, 2]. These products, encompassing strips, dentifrices, paint-on gels, mouthwashes, chewing gum, and varnishes, are self-applicable and commonly marketed without the need for dentist supervision [3]. They contain hydrogen or carbamide peroxide in their composition, although the concentration may vary according to the regulatory agency of each country [4]. While some regions, such as Europe, prohibit the commercialization of whitening products with hydrogen peroxide concentrations exceeding 0.1% without dentist supervision [5], OTC products are classified as “cosmetics” in other locations. This categorization facilitates their worldwide purchase through online sales without necessitating a prescription [4, 5].

Although several clinical studies have used methodologies with OTC products demonstrating significant color changes, the delivery methods present various application protocols [6,7,8,9,10]. Such a scenario opens comparison possibilities that are unfeasible for conventional randomized clinical trials because of the need for numerous groups and samples [11, 12].

Network meta-analyses (NMA) create simultaneous direct and indirect estimates and are applied to these cases to integrate several groups [13]. Moreover, systematic reviews promote the data survey of possible adverse effects in tooth bleaching procedures with OTC products. Considering that these products are not individualized, the occurrence and intensity of dentin hypersensitivity and gingival irritation are concerning [14, 15].

Considering the variety of OTC products available and their easy purchase and use, the primary objective of this systematic review was to map the global scientific literature to assess the color change efficacy from different OTC bleaching protocols. The secondary goal was to evaluate the adverse effects of the various techniques.

Methodology

Protocol registration

The protocol of this systematic review was reported according to the PRISMA-P (Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols) guidelines [16] and registered in the PROSPERO database (http://www.crd.york.ac.uk/PROSPERO) under CRD42021276125. The systematic review was produced according to the JBI Manual for Evidence Synthesis [17] and reported following the PRISMA-NMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [18]. There were no deviations from the registered original protocol.

Research question and eligibility criteria

This systematic review aimed to answer the following guiding question based on the PICOS acronym (Population, Intervention, Comparator, Outcome, and Study design): "In adult patients undergoing vital tooth whitening, does the application of over-the-counter whitening protocols result in a superior effect on color change when compared to placebo or dentist-supervised protocols?". To compare their effectiveness, we included comparisons between dentist-supervised treatment to evaluate if OTC treatment can reach the same results in color change as the gold standard treatments. Additionally, we also included comparisons between OTC products and placebo treatments to analyze if these products would provide real effects for color change, and comparisons between different OTC bleaching protocols.

Inclusion criteria

  • Population: Adult individuals subjected to tooth bleaching in vital teeth;

  • Intervention: At least one group treated with OTC products composed of hydrogen peroxide (HP) or carbamide peroxide (CP), regardless of the concentration or application method;

  • Comparator: OTC products based on HP or CP, placebo (negative control), or conventional at-home and/or in-office tooth bleaching methods, regardless of the bleaching agent (positive control);

  • Outcomes: Color changes after tooth bleaching (∆Eab* / ΔSGU) as the primary outcome, and adverse effects, such as dentin sensitivity or gingival irritation, as a secondary outcomes;

  • Study design: Parallel or split-mouth randomized clinical trials without restricting publication year or language.

Exclusion criteria

  • Studies with participants subjected to tetracycline staining;

  • Studies with participants undergoing orthodontic treatment;

  • Studies that did not clearly describe the used bleaching technique.

Information sources and search

The electronic searches were performed until December 2021 in the Cochrane Library, Embase, LILACS, MedLine (via PubMed), SciELO, Scopus, and Web of Science databases. Google Scholar and ProQuest partially captured the "grey literature" to reduce selection bias. An update was performed up to January 2023 in the MedLine (via PubMed) database. The search strategies in each database agreed to their respective syntax rules (Table 1).

Table 1 Strategies for database search

Study selection

The results obtained in the primary databases were initially exported to the EndNote Web™ software (Thomson Reuters, Toronto, Canada) for cataloging and deduplicating. The "grey literature" results were exported to Microsoft Word (Microsoft™, Ltd, Washington, USA) for manually extracting duplicates.

The results were exported to Rayyan QCRI software (Qatar Computing Research Institute, Doha, Qatar) [19] for study selection. Two reviewers performed all phases independently, and in case of disagreements, a third reviewer (LRP) was consulted for a final decision. Examiners were considered eligible for the subsequent phase only after reaching an agreement of Kappa ≥ 0.81, and this procedure was applied for all steps of the systematic review. In the first phase, the titles were read and those unrelated to the topic were excluded. In the second phase, the abstracts were evaluated with the initial application of the eligibility criteria. The titles that met the study objectives but did not have abstracts available were fully analyzed in the next phase. In the third phase, the full texts of eligible articles so far were read to verify whether they met the eligibility criteria. If the full texts were not found, a bibliographic request was made to a library database, and e-mails were sent to the corresponding author up to three times within 15 days to obtain the texts. Full texts published in languages other than English or Portuguese were translated for applying the eligibility criteria.

Data collection process

A calibration exercise was performed before data extraction, in which the reviewer’s extracted information from three eligible studies jointly to ensure consistency. Next, the full texts of the selected articles were reviewed, and the following data were systematically extracted: (a) study identification (author, year, location, and funding sources); (b) sample characteristics (the number of participants, distribution by sex, and mean age); (c) bleaching protocol (OTC product and application method); (d) contact time between the bleaching agent and the tooth surface; (e) assessment methods for color changes (spectrophotometry / Vita scale) and sensitivity; (f) outcomes of color changes (ΔEab* / ΔSGU), tooth sensitivity, and gingival irritation; (g) follow-up time for post-bleaching color assessment. The corresponding author was contacted via e-mail in the case of incomplete or insufficient information.

The color change estimate could be extracted using two metrics. The first is the Commission Internationale de L’Eclairage (CIE) LAB coordinates system, an objective method to evaluate color change using a spectrophotometer. This system is based on the luminosity (L* coordinate) and the a* (red-green axis) and b* (yellow-blue axis) chromaticity coordinates. The result is calculated using the following formula: \(\mathrm\Delta\mathrm E\mathrm a\mathrm b\ast=\sqrt{\left(L1-L2\right)^2+\left(a1-a2\right)^2+\left(b1-b2\right)^2}\) [20]. The second is a subjective method based on the Vita Shade Guide (Vita Zahnfabrik, Sackingen, Germany). Initially, the shade units are ranked by their value, according to the manufacturer, and the operator can use a spectrophotometer or visually evaluates the initial and final color using the Shade Guide. The difference is showed as the ΔSGU [21].

Only studies that reported mean values and respective standard deviations of ∆Eab* or ∆SGU were included in this NMA.

Risk of bias within individual studies

Two reviewers (MNO and MTCV) independently assessed the individual risk of bias in the eligible studies with the Risk of Bias Tool of the Cochrane Collaboration (version 2.0) (RoB2) for randomized clinical trials. This tool consists of five domains: bias from the randomization process, bias due to deviations from the intended interventions, bias from missing outcome data, bias from outcome measurements, and bias from the selection of the reported result. Each domain was assessed according to the algorithms proposed in the RoB2 manual and included signaling questions with "yes," "probably yes," "probably not," "no," or "no information" as potential answers. These answers showed the occurrence and provided the base to judge the risk of bias at the domain level, which could be "high risk," "some concerns," or "low risk." The article had a "low risk" of bias if all domains had a low risk, "some concerns" if at least one domain showed some concerns, and a "high risk" of bias if at least one domain presented a high risk, or several domains showed some concerns. Reviewer disagreements were solved by discussing and consulting with a third reviewer (LRP).

Data synthesis

Three review outcomes were quantitatively analyzed: ∆Eab*, ΔSGU, and tooth sensitivity. Firstly, we performed pairwise comparisons with available head-to-head data. Treatments were grouped into common nodes based on each OTC bleaching product and respective use protocol. The treatments were grouped according to the delivery method (i.e., strips, gel), bleaching agent (HP or CP) and its respective concentrations, and the duration of the contact between the bleaching agent and the tooth structure. For example, in Kim et al., 2018 [10], the treatments were performed twice daily for 30 min each, for four weeks, totalizing 28 h of contact between the bleaching agent and the teeth. Although classification decisions were arbitrary and may compromise the outcomes, the lack of grouping would merge several OTC products and contribute to network incoherence. Subsequently, a random-effects frequentist NMA compared multiple OTC bleaching protocols through common comparators by integrating direct and indirect estimates [13]. Transitivity was evaluated by comparing the distribution of important covariates across comparisons [22]: sex, age, and follow-up assessment time point for ∆Eab*; age and follow-up assessment time point for ∆SGU; and only follow-up assessment time point for tooth sensitivity [23]. We must anticipate that all analyses were conducted using data from the second week of follow-up; the low number of comparisons precluded analyses using other time points. Random-effects models with the Der-Simonian and Laird variance estimator [24] were preferred over fixed-effects models based on the deviance information criterion (DIC). Gingival irritation was narratively described due to the very low density in the network.

The effect estimate was the mean difference (MD) instead of standardized MD (SMD), as studies used comparable scales for the assessed outcomes and to prevent the standard deviation (SD) effect on SMD estimates. For the tooth sensitivity outcome, we used the risk difference (RD). The 95% confidence intervals (CI) were calculated for all estimates. Direct, indirect, and network estimate (0.05 significance level) comparisons evaluated local incoherence.

League tables presented the outcomes, and each treatment was ordered from best to worst according to the ranking probabilities of treatment effects. The MetaInsight, version 5.1.2, hosted all analyses.

Geometry of the network

The geometry of the networks was explored using conventional measurements of number of nodes and edges as well as additional metrics of density (the ratio between real and possible edges) and number of strong edges (edges with more than one trial) [25]. Edges proportional to the number of arms in the corresponding pairwise meta-analysis represented direct comparisons among the various OTC bleaching protocols.

Assessment of inconsistency

The presence of inconsistency in NMAs was assessed by examining the agreement between direct and indirect effect estimates. When applicable, the difference between head-to-head and indirect estimates was calculated, along with the respective 95% confidence interval.

Certainty of evidence

The GRADE tool classified the certainty of evidence of treatment effect estimates for the network meta-analysis [26, 27]. First, the certainty of evidence evaluation of each direct comparison verified the risk of bias, inconsistency, indirectness, and publication bias. Indirect comparisons considered the first-order loop with the lowest certainty and evaluated intransitivity. Finally, concerns of imprecision or incoherence in the network meta-analysis caused certainty of evidence downgrading. The certainty of evidence could be high, moderate, low, or very low [26].

Results

Study selection

The first study selection phase yielded 18,564 results distributed in nine electronic databases, including the "grey literature." After removing duplicates, 12,614 results remained for analysis. A careful reading of titles and abstracts excluded 12,535 articles. Five of the 79 remaining studies were not found, and 74 were fully read, of which 37 were excluded. Appendix S1 describes the reasons for exclusions. The 37 remaining articles constituted the qualitative analysis, and ten composed the meta-analyses (Fig. 1).

Fig. 1
figure 1

Flowchart of the selection process according to PRISMA

Characteristics of eligible studies

The articles were published between 2001 and 2022 and performed in 12 countries, with 20 studies in America [28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47], 12 in Europe [1, 6, 7, 9, 48,49,50,51,52,53,54,55], and five in Asia [10, 56,57,58,59]. Two studies [9, 41] had a split-mouth design, and the others were parallel. 27 articles declared funding sources [10, 28, 30, 31, 33,34,35,36,37,38,39,40,41,42,43, 48,49,50,51,52,53,54,55,56,57].

The total sample included 1,932 participants, with 548 men and 1,177 women in studies reporting the sex of participants. The age group ranged between 15 and 79 years. Among the products used, 25 studies tested the whitestrips (HP concentration ranging from 5,3% to 14%) [1, 10, 28,29,30, 33,34,35,36,37,38,39,40,41,42,43,44,45,46,47, 51, 52, 54, 56, 57], 13 studies evaluated paint-on products (HP concentration ranging from 3 to 9%) [7, 9, 10, 31, 32, 37, 38, 49, 50, 53, 56,57,58], three studies tested OTC bleaching gels (HP concentration ranging from 6 to 10%) [45,46,47] and dentifrices (HP concentration ranging from 0.75% to 2.8%) [6, 55, 59]. In 14 studies, a placebo group (negative control) was present [10, 32, 33, 36, 38,39,40, 44, 45, 49, 55,56,57,58] and 12 studies [1, 9, 10, 28, 30, 35, 41, 43, 46, 47, 51, 54] had a positive control group using a dentist supervised bleaching protocol – 10 of them, with an at-home treatment [1, 10, 28, 30, 35, 41, 43, 46, 47, 54] and three with an in-office bleaching group [1, 9, 43]. More details about the bleaching protocols and the duration of the treatments are shown in (Table 2). Spectrophotometry (ΔEab*) and the Vita scale (ΔSGU) analyzed tooth bleaching. The occurrence of tooth sensitivity and gingival irritation was assessed by self-perception with a categorical evaluation (yes or no), and intensities were examined with visual analog scales (VAS) (Table 3). Moreover, color change assessments had different intervals, from immediately after, one day, seven days, and several others up to360 days. The most common evaluation interval was 14 days, reported in nine studies [29, 30, 32, 34, 49, 51, 56,57,58].

Table 2 Summary of the main characteristics of the eligible studies with information about the bleaching agents and color assessment
Table 3 Summary of the main characteristics about adverse effects related on the eligible studies

Risk of individual bias in the studies

Only four studies had an overall low risk of bias [6, 46, 47, 55] (Fig. 2). Three studies showed an overall high risk of bias due to missing outcome data [49], deviations from the intended interventions [37], and outcome measurements [36, 49].

Fig. 2
figure 2

Individual risk of bias of studies

Syntheses of results and meta-analyses

Color changes (∆Eab*)

Among the 13 studies that provided the ∆Eab* values, three [27, 33, 57] were excluded because they did not present the values of standard deviation, one [35] because used a different method of color assessment (colorimeter), one [8] did not provide the concentration of HP in the intervention group, four [28, 41, 42, 46] due to the lack of a common comparator group, and one [58] due to violating the transitivity assumption (the study sample exclusively comprised female individuals.). Hence, three studies [10, 30, 40] with five treatments and eight pairwise comparisons were included, totaling 169 participants. Density was 0.8 and each edge was composed by only one trial; that is, there was no strong edge. Figure 3 demonstrates the network and Appendix S2 presents direct evidence findings.

Fig. 3
figure 3

Network plot geometry for ∆Eab*. HP: hydrogen peroxide; CP: carbamide peroxide

Figure 4 shows estimates from NMA. ∆Eab* was significantly higher for 6% HP strips (≥ 14 h) (MD: 3.07; 95% CI: 0.63 – 5.50) compared to placebo two weeks after treatment. No other significant differences were observed. The differences among direct, indirect, and NMA evidence suggested no inconsistency (Appendix S3).

Fig. 4
figure 4

League table with ∆Eab* results from NMA. Treatments were ordered from best to worst according to the ranking probabilities of treatment effects. Results with statistical significance are in bold. HP: hydrogen peroxide; CP: carbamide peroxide

Color changes (∆SGU)

Among the 17 studies that presented ∆SGU evaluation, two [1, 37] were excluded due to not providing the exact contact time between the bleaching agent and tooth surface, and one [45] showed the results as visual graphs, without specifying the mean and standard deviation values. Moreover, this NMA excluded ten studies [7, 9, 31, 41, 46,47,48,49,50, 53] due to the lack of a common comparator group and one [36] due to violating the transitivity assumption (the study only provided the color assessment at the third week). Hence, three studies [10, 32, 51] with six treatments and eight pairwise comparisons were included, totaling 239 participants. Figure 5 demonstrates the network, and Appendix S4 presents direct evidence findings. This geometry had a density of 0.53 and no strong edge.

Fig. 5
figure 5

Network plot geometry for ∆SGU. HP: hydrogen peroxide; CP: carbamide peroxide

Figure 6 shows estimates from NMA. The ranking probability showed that the most effective treatment was at-home 10% CP (≥ 14 h), followed by 6% HP strips (≥ 14 h) and 3% HP strips (≥ 14 h). The at home 10% CP (≥ 14 h) protocol had significantly higher ∆SGU than all other treatments, except for 6% HP strips (≥ 14 h) (MD: 0.41; 95% CI: -1.01 – 1.83). Moreover, ∆SGU was significantly higher for 6% HP strips (≥ 14 h) compared to 6% HP paint-on gel (7-13 h) (MD: 1.93; 95% CI: 0.27 – 3.59) and placebo (MD: 2.60; 95% CI: 1.01 – 4.19). The assessment of inconsistency was not possible due to the absence of mixed (both direct and indirect) evidence (Appendix S5).

Fig. 6
figure 6

League table with ∆SGU results from NMA. Treatments were ordered from best to worst according to the ranking probabilities of treatment effects. Results with statistical significance are in bold. HP: hydrogen peroxide; CP: carbamide peroxide

Adverse effects

Among the 30 studies that evaluated tooth sensitivity, four [28, 30, 50, 51] employed desensitizing agents in at least one group (which could introduce bias in pooled estimates), three [42, 45, 52] were excluded due to providing only total sample estimates of tooth sensitivity occurrence (without specifying it according to comparison group), one [1] due to not providing the exact contact time between the bleaching agent and tooth surface, other [53] did not reported the assessment method for tooth sensitivity, and another [45] showed the results as visual graphs, without specifying the occurrence of tooth sensitivity according to each group. Furthermore, this NMA excluded ten studies [7, 9, 34, 35, 38, 41, 47, 48, 51, 57] due to the lack of a common comparator group and five [33, 35, 40, 42, 43] due to violating the transitivity assumption. The tooth sensitivity NMA included five studies [28, 39, 45, 46, 56] with 9 treatments and 11 pairwise comparisons, totaling 216 participants. Figure 7 demonstrates the network and Appendix S6 presents direct evidence findings. The geometry had very low density (0.31) and no strong edge.

Fig. 7
figure 7

Network plot geometry for tooth sensitivity. HP: hydrogen peroxide; CP: carbamide peroxide; OTC: over-the-counter

Figure 8 shows estimates from NMA. The ranking probability showed that at-home 10% CP (7-13 h) exhibited lower risk of tooth sensitivity, followed by placebo, and 6% HP paint-on gel (≥ 14 h). Placebo (RD: -0.21; 95% CI: -0.39 – -0.04) and 6% HP paint-on gel (≥ 14 h) (RD: -0.21; 95% CI: -0.42 – -0.01) had significantly lower risk of tooth sensitivity than 10% HP strips (≥ 14 h). The assessment of inconsistency was not possible due to the absence of mixed (both direct and indirect) evidence (Appendix S7).

Fig. 8
figure 8

League table with tooth sensitivity results NMA. Treatments were ordered from best to worst according to the ranking probabilities of treatment effects. Results with statistical significance are in bold. HP: hydrogen peroxide; CP: carbamide peroxide; OTC: over-the-counter

Certainty of evidence

Overall, this study analyzed 61 direct and indirect comparisons considering three outcomes (color changes (∆E*), color changes (∆SGU), and tooth sensitivity). The certainty of evidence varied from very low to low. The main reasons for evidence downgrading were the risk of bias and imprecision (Appendix S8).

Discussion

The ∆Eab* analysis did not show statistical differences between OTC protocols and the positive control. Considering meta-analysis limitations, only one supervised protocol could be compared (at-home 10%CP ≥ 14 h), and the similar time of bleaching agent application among all groups (≥ 14 h) and the close HP concentration justify the results. However, the positive control in the ∆SGU analysis presented more color changes than all four evaluated OTC protocols. One analyzed study [10] used paint-on gel (3% ≥ 14 h) and reported participants with difficulties using the bleaching agent. That may explain the difference in results because OTC protocols are not personalized, and the absence of professional support may cause complications in product use.

The ∆Eab* analysis showed that two OTC protocols did not differ from the placebo group: 3%HP strips ≥ 14 h [10] and 3%HP paint-on gel ≥ 14 h [10]. That may be due to the low bleaching agent concentration, reinforced by the higher color change achieved with 6%HP strips ≥ 14 h in two studies [20, 30]. The ∆Eab* indices of one positive control also did not differ from the placebo treatment: at-home 10%CP > 14 h [30]. The low product concentration in these cases associated with short application times (30 min) potentially influenced the findings. Carbamide peroxide takes longer to react and release hydroxyl radicals [60]. Therefore, at-home bleaching with carbamide peroxide should use an impression tray as the bleaching agent for longer.

It is worth noting that although ∆Eab* and ∆SGU provide two different analyses, both methods generally evaluate and identify color changes [61]. Therefore, the findings should be complementary.

Gingival irritation is an adverse effect of bleaching procedures caused by the direct contact of hydrogen peroxide with the gingival mucosa. The bleaching agent is highly toxic when working on fibroblasts in the gingival tissue, reducing cell survival [14]. OTC products are not customized for everyone, potentially promoting contact between the bleaching agent and the gingival tissue and consequent irritation [4]. The data from our search did not allow a meta-analysis comparing gingival irritation between OTC products and positive controls. Further clinical studies should perform this comparison.

An important point to emphasize is that the application of products containing HP should always be supervised by a professional. Indiscriminate use of bleaching products can potentially cause oral lesions. Additionally, the lack of a personalized reservoir for at-home use of these products may result in the ingestion of HP [4], leading to irritation of the gastrointestinal tract, nausea, and vomiting [62].

Tooth sensitivity is the most common adverse effect of bleaching treatments. Although its biological mechanism has not been established, it might occur from the permeability of oxygen ions, which are cytotoxic in odontoblastic extensions close to pulp cells [63]. Considering that tooth permeability is higher in exposed dentin due to dentinal tubules, OTC products may promote the direct contact of bleaching agents with the exposed dentin in patients with gingival recessions, or non-carious cervical lesions may lead to severe inflammatory reactions in pulp cells [15, 63]. Different standardizations in visual analog scales limited the sensitivity intensity assessment.

Despite our extensive search, none of the direct comparisons (∆Eab, ∆SGU, or sensitivity) included more than one study comparing similar treatments, restricting the application of the findings to clinical conditions, considering the high number of indirect comparisons. Moreover, the study limitations excluded other delivery methods from the meta-analysis, such as dentifrices and mouthwashes. Thus, we suggest that further studies standardize the outcomes, providing means and standard deviations of ∆Eab and/or ∆SGU in each group and assessing more bleaching products.

Another factor worth noting is that 25 eligible articles declared funding sources by private companies related to dental material production and all presented satisfactory results for bleaching with OTC products. Bradley et al. [64] advocate that conflict reporting or interest statements should become more open, thus establishing reliability in study objectivity. More studies unattached to company funding should be conducted.

The body of evidence in this review presents noteworthy limitations. First, only four studies showed a low risk of bias [6, 46, 47, 55]. The primary source of bias referred to randomization, which is common in dental randomized clinical trials and significantly implicates the internal validity of studies. Second, the low number of studies in each comparison and the small sample sizes directly impacted estimate precision, contributing to the uncertainty of findings. Lastly, the low density (indicating low graph connectedness) and the lack of strong edges (meaning a low weight of evidence on each pair) impacted the certainty of NMA estimates. Nevertheless, these metrics indicate potential evidence gaps that should be addressed in further RCTs.

Conclusion

Over-the-counter products achieved satisfactory effects on tooth bleaching compared to the placebo, with little to no impact on dentin hypersensitivity and gingival irritation but with very uncertain evidence. Lower risks of bias and larger study samples are required to draw more conclusive directions.