Introduction

Herbst appliance treatment (Tx) has been shown to be an effective approach in Class II:1 patients. While the Herbst appliance was conditionally followed by removable appliances [1] in the early period of modern Herbst appliance Tx, it has routinely been followed by a phase of multibracket appliance (MBA) Tx since the mid-1980s [2].

Several investigations assessing possible outcome-influencing factors of this Tx approach in terms of effectiveness have been published during the last decades [3,4,5,6,7,8,9,10]. However, all these studies focused on very specific parameters (like age, skeletal maturity, or growth pattern) and therefore rather small patient samples. Thus, they constitute very narrow subgroup analyses respectively selected group analyses, and the results cannot be extrapolated to Class II:1 samples in general, even if the data are very valuable regarding the general scope and the possibilities of this Tx approach.

Therefore, it was the aim of the present investigation to assess a large cohort of consecutive, unselected Class II:1 Herbst-MBA patients to determine representative data on the efficiency and the outcome quality of this Tx approach.

Class II:1 classification was performed according to Angle’s definition; maxillary anterior teeth are protruded as well as mandibular dentition being positioned posteriorly compared to the “normal” relationship with the mesiobuccal groove of the mandibular first molar occluding with the mesiobuccal cusp of the maxillary first molar.

Material and methods

Study population

After ethical approval (Nr. 80/14), the archive of the Department of Orthodontics at the University of Giessen, Germany, was screened for all Class II:1 patients in which Herbst-MBA Tx had been started since the introduction of this Tx approach at the study center in 1986 and was finished until 2014. The latter was true for 526 patients (53% females, 47% males) with a mean age of 14.4 years (range 9.8–44.4) at the start of Herbst-MBA Tx.

The Herbst appliance (Fig. 1) is a so-called fixed functional appliance which is used for mandibular advancement. It consists of attachments (bands or casted splints) in the lateral segments of both jaws which are connected by a telescoping mechanism from the upper posterior to the lower anterior region resulting in mandibular “bite jumping.” As the appliance is worn 24 h/day, patient compliance is of minor concern. According to clinical and experimental studies, both the upper and the lower jaws’ skeletal and dental structures are affected [1, 2, 11]. During the last decades, the appliance has been shown to be effective in both Class II:1 as well as Class II:2 patients and to offer a respectable treatment alternative to surgical mandibular advancement in borderline cases [12].

Fig. 1
figure 1

Herbst appliance: casted splints in the upper and lower jaws connected by telescoping mechanisms between the upper first molars and the lower first premolars. In addition, a lingual arch is placed between the lower lateral segments

Methods

The treatment charts and the respective data were available for all 526 patients. Study casts from before Tx (T0), after Herbst-MBA Tx (T1), and after at least 24 months of retention (T2) were evaluated using the Peer Assessment Rating (PAR) index [13] and the occlusal variables overjet, overbite, and sagittal molar and canine relationships. In addition, the Ahlgren scale [14] was applied to assess the post-retention results (T2).

The PAR ratings were performed by a calibrated and certified operator according to the respective guidelines [13] and using an original PAR ruler. The same investigator assessed all standard occlusal variables. Visual ratings of the sagittal molar and canine relationships were performed to the nearest 0.25 cusp widths (cw) and classified as Class I, II, or III. Linear measurements were made to the nearest 0.5 mm using a manual caliper. The ratings according to the Ahlgren scale were performed by two calibrated and experienced orthodontists according to the respective guidelines [14].

To assess the observer reliability, all study models of patients 1–20 were evaluated twice, and Kendall’s tau correlation coefficient was calculated for the occlusal variables and the PAR index. The respective values range between 0.83 and 0.98, corresponding to a high consistency [15]. For assessments according to the Ahlgren scale, a conformity rate of 79–93% can be assumed according to previous investigations [16, 17].

The mean, standard deviation, minimum, maximum, and median values are given for all variables. For the changes which occurred during Tx (T1-T0) and during retention (T2-T1), an explorative statistical analysis was performed. As the data did not show a normal distribution (Kolmogorov-Smirnov and chi-square tests), a non-parametric test (Wilcoxon signed-rank test) was used for data analysis. The level of significance was p < 0.05. In addition, to assess for possible correlations respectively associations, the Spearman-Rho and the Kruskal-Wallis tests were applied.

Results

Patient sample

While Tx was initiated in a total of 526 patients, it was discontinued prematurely in 18 patients (3.4%). So, the Tx data of 508 patients were evaluated as well as the follow-up (≥ 24 months) data of 240 patients (Fig. 2). Study casts were available in most cases: n = 492 (T0 and T1) and n = 232 (T2).

Fig. 2
figure 2

Patient flow chart. The numbers and percentages of patients who started, discontinued, and finished Tx as well as of those who fulfilled a follow-up period of ≥ 2 years are given

The most frequent pre-Tx skeletal maturity stage [18, 19] was shortly after the peak of the pubertal growth spurt: MP3-G/C3-S4 (Table 1).

Table 1 General characteristics of the patient sample: pre-Tx age and skeletal maturity as well as the duration of the observation periods are given. The median (Med), mean value (Mean), standard deviation (SD), minimum (Min) and maximum (Max) values are given for age and observation period duration, while the distribution in percent is given for the skeletal maturity stages

39.4% of the patients had had a phase of previous orthodontic Tx (mainly with removable appliances; 25% at the study center, 75% elsewhere).

Treatment duration and retention

The mean Tx duration was 8.1 ± 1.79 months for the Herbst phase and 16.0 ± 7.4 months for the subsequent MBA phase, resulting in a total Tx duration (T0-T1) of 24.2 ± 7.8 months. The mean follow-up period (T1-T2) was 32.7 ± 15.9 months (Table 1). Retention was performed using bonded canine-to-canine or removable Hawley retainers or a combination of both. Most patients still wore the retainers at follow-up (Supplementary Table 1).

Occlusal variables (Table 2; Supplementary Table 2; Fig. 2a–f)

The mean overjet decreased from 7.0 ± 2.3 to 2.0 ± 0.9 mm during Tx. During the retention period, a slight increase of 0.7 ± 1.0 mm occurred. For overbite, a decrease from 4.0 ± 1.9 to 1.5 ± 0.9 mm was seen during Tx, while an increase of 0.5 ± 1.1 mm occurred during the retention period. All these changes were statistically significant (p = 0.000).

Table 2 Overjet, overbite, sagittal molar, and canine relationships (right/left) as well as PAR score at T0, T1, and T2. For each variable, the median (Med), mean value (Mean), standard deviation (SD), minimum (Min) and maximum (Max) values are given. cw: cusp widths

For the sagittal molar relationship (right and left), an overcorrection from 0.7 ± 0.4 cw Class II to − 0.1 ± 0.3 cw Class III occurred during Tx and settled to 0.0 ± 0.23 cw Class I during the retention period. The sagittal canine relationship showed a decrease from 0.7 ± 0.3 cw Class II to 0.1 ± 0.2 (right)/0.2 ± 0.2 (left) cw Class II during Tx (p = 0.000) which settled to 0.2 ± 0.2 cw Class II during the follow-up period (p = 0.002–0.044).

Thus, on average, the occlusal variables were normalized by Tx.

Outcome quality (Table 2, Supplementary Table 2, Fig. 3g, and Supplementary Fig. 1)

Before Tx, the mean PAR score was 32.4 ± 8.8 points which decreased to 8.0 ± 4.5 points during Tx (p = 0.000). During the retention period, a relapse of 0.8 ± 5.3 points occurred (p = 0.015). This PAR score increase was by 1.0/2.0 points lower (p = 0.148) in subjects still wearing bonded lower/upper and lower retainers at T2 (Table 3).

Fig. 3
figure 3

Boxplots showing the changes of a overjet, b overbite, cf sagittal molar and canine relationships (right/left), and g PAR score during T1-T0 and T2-T1

Table 3 Final PAR score at T2 and changes of the total PAR score during retention (T2-T1) in subjects with no retainer (n = 42), a bonded lower retainer (n = 71), and bonded upper and lower retainers (n = 115) at T2

The outcome quality (PAR categories) after Tx differed only minimally from the results at follow-up (T2) and showed the following prevalences (T1/T2): 62/57% “greatly improved,” 36/40% “improved,” and 2/3% “worse/no different.” While no correlation was found between the PAR score reduction (T2) and pre-Tx skeletal maturity (r = 0.057), a slight correlation was seen between the PAR score reduction and pre-Tx malocclusion severity in terms of Class II molar relationship (r = 0.230).

The categorization according to the Ahlgren scale revealed the following results at T2: 17% “excellent,” 35% “good,” 45% “acceptable,” and 3% “unsuccessful” occlusal outcomes. No group difference for pre-Tx skeletal maturity was found (p = 0.638), but a slight association seems to exist for pre-Tx malocclusion severity in terms of Class II molar relationship (p = 0.031).

Discussion

The present investigation is the first to investigate a large unselected cohort of consecutive Herbst-MBA patients to determine representative data on the efficiency and the outcome quality of this Tx approach. The existence of such data seems to be particularly essential as the results of current systematic reviews and meta-analyses on the effectiveness and stability of fixed functional Class II Tx illustrate respective deficits [9, 10, 20].

Study population and methods

The investigation is based on the evaluation of all Class II:1 patients who underwent Herbst-MBA Tx at the study center during a period of 28 years irrespective of Tx outcome. The patient sample was homogenous in terms of the underlying malocclusion (Class II:1) but the overall pre-Tx (T0) severity varied (total PAR score 32.4 ± 8.8) as did the pre-Tx age (14.4 ± 3.4 years). While the Tx approach was similar in all patients, Tx had been accomplished by several practitioners using different types of straight-wire MBAs. These issues might have had a minor impact on Tx outcome especially in terms of Tx duration and occlusal aspects such as rotation control or torque, but they do not really interfere with the aim of the study to get an overview of the Tx quality provided.

The same applies for the retention regime, which was not uniform as the patient sample was collected during a period of almost 30 years. While the standard retention protocol comprised of mainly removable appliances (predominantly Hawley retainers) during the early years of Herbst appliance treatment, fixed retention in both jaws had established during the later years. In between, combinations like for example fixed retention in the lower jaw and removable retention in the upper jaw were considered appropriate. This also applies for additional night-time wear of an activator which had been recommended in a certain amount of patients. However, when looking at the literature, no relevant influence was found for a certain type of retention when comparing three different regimes in a RCT [21].

In 18 of the 526 patients, Tx was discontinued prematurely (10× due to transfer to another place/disappearance, 7× due to unwanted MB Tx, 1× due to compliance during MB Tx). Unfortunately, however, in most cases, no study model was available to assess the achieved Tx changes.

As it was the aim to determine objective data on the Tx outcome quality, the PAR index was applied. While this index has been shown to be valid and reliable [22, 23], it has also been criticized due to problems in terms of interpretation [24] as well as its weighting system [25]. Therefore, a second index for outcome quality assessment [14] was used.

Results

Looking at the general Tx data, it seems to be worth mentioning that a premature discontinuation of Tx occurred in only 3.4% of the patients. This percentage is rather low when comparing it to the literature, where values between 9 and 17% are published for Class II fixed functional Tx [26,27,28]. For the remaining patients, the average Tx duration was 24.2 ± 7.8 months (median 22.8). Unfortunately, no data from a comparable cohort of unselected Class II patients treated by fixed functional as well as MBA appliances exists, but a recent meta-analysis of 22 studies [29] describes a slightly lower mean duration (19.9 months) for fixed appliance Tx in general (Class I, II, or III; no differentiation in terms of non-extraction/extraction protocols) without adjunctive use of functional appliances. In addition, the latter investigation did consider neither the severity of the underlying malocclusion nor the Tx outcome.

Overjet, overbite, and the sagittal molar relationships were slightly overcorrected during active Tx and settled into normal Class I relationships during the follow-up period. For the canine relationships, a slight Class II relationship prevailed at T2. This is in concordance with the literature [30,31,32].

Outcome quality—active Tx

The outcome quality according to the PAR index showed a mean post-Tx score of 8.0 ± 4.5 in the present, fully unselected patient sample. Similar values of 6.2 to 8.0 are described by Al-Yami (n = 1583) [33], Birkeland et al. (n = 93) [34], and McGuiness et al. (n = 207) [35] for other mainly unselected Class II:1 samples where diverse Tx protocols (extraction, non-extraction) were applied. In terms of PAR categorization, 62% respectively 36% of the current results were “greatly improved” or “improved” which is in concordance with the findings of Birkeland et al. (63% “greatly improved,” 33% “improved”) while the investigation by Al-Yami revealed slightly less advantageous results (46% “greatly improved” and 48% “improved”).

As most of these results are rather similar, the question arises whether the PAR index is a sensitive enough tool to detect minor but clinically relevant differences at all.

When evaluating specifically those cases (n = 10) which were categorized “worse/no different” according to PAR score reduction during active Tx, the mean pre-Tx PAR score was by 4.6 points lower compared to the remaining sample. Therefore, in terms of severity, these cases were below average. Nevertheless, as the mean post-Tx PAR score was by 14.2 points higher compared to the rest of the sample, the categorization “worse/no different” can probably be attributed to a combination of poor response/growth and poor cooperation.

Outcome quality—follow-up

Looking at the follow-up period, a slight PAR score increase by 0.8 ± 5.3 points occurred. This is in concordance with a minor shift in the PAR categorization with slightly less patients becoming categorized as “greatly improved” (62 ≥ 57%) and slightly more patients becoming categorized as “improved” (36 ≥ 40%) or “worse/no different” (2 ≥ 3%). Similar PAR score increases of ≤ 1 point for follow-up periods of 2–3 years can be found in the literature for patient samples where mainly bonded retainers were used [36, 37].

A comparison of the subjects wearing either a lower or both lower and upper bonded retainers to those not wearing any bonded retainer revealed by 2.0–2.5 points lower values in terms of PAR score increase during retention. While no statistical significance (p = 0.148) was determined for this variation, it is certainly of clinical significance. In the literature, the final PAR score is described to be ~ 5 points less in patients with bonded retainers still in place 5 years post-Tx when compared to those without retainers [33, 38].

When considering the second, subjective outcome quality assessment—the Ahlgren scale—it is most interesting to discover that if we pool “excellent” and “good,” the percentage (52%) is similar as for the PAR category “greatly improved” (57%). The same is true for “acceptable” (45%) and the PAR category “improved” (40%). Unfortunately, no data for direct comparison are available in the literature.

Limitations

The fact that follow-up data were available from only 45.6% of the patients certainly has to be considered as limitation. The same is true for the missing study models in some cases. Nevertheless, the T2 patient sample is still rather large. Besides that, in terms of consistency, it might have been beneficial if all patients had been treated by the same practitioner using the same kind of MBA or to perform a randomized clinical trial, but due to the large sample and the long period of record collection, such a study design is not realistic. This is also true for the favorable thought of having a comparable untreated control group available.

Conclusion

In summary, Class II:1 Tx using Herbst-MBA is an efficient approach in orthodontic care. During an active Tx period of an average of 2 years, high-quality results can be obtained in the majority of patients.