A commentary on

Rezaei N, Bagheri Z, Golshah A.

Survival analysis of three types of maxillary and mandibular bonded orthodontic retainers: a retrospective cohort. BMC Oral Health 2022; 22: 159.

figure 1

GRADE rating

Commentary

This study addressed an important clinical question in the retention phase of orthodontics. The fairly long-term follow-up and the total number of participants may be good points for this study. However, we will highlight some points to draw a conclusion about this study.

The study objective was to compare three types of bonded retainers in terms of survival rate. For this purpose, the authors included private clinic patients who had completed the orthodontic treatment phase and had bonded retainers on both arches from a specific type of retainer. They reported that the design was retrospective, but surprisingly, they mentioned that the patients were recalled and followed for many time points. This suggests that the study design was ambidirectional rather than retrospective.

There was no explicit description of all the investigated outcomes. After reading the results, the survival rate, bleeding on probing and maximum pocket depth appear to be the investigated outcomes in addition to the retainer failure (the primary outcome).

The sample size was calculated according to a previous study,1 which investigated the survival rate of two types of different retainers: 0.016 × 0.022 inch braided stainless steel wire and β-titanium wire. The sample size calculation description was unclear and cannot be replicated. Specifically, the authors did not mention the statistical test used from the TrialSize R package and reported the patient ratio as a p-value (patient ratio in the group to be p = 0.534). Furthermore, these retainers were different from those considered in the present study and no hypothesis was assumed to justify the sample size of the study.

One orthodontist measured and assessed all the parameters without providing any information regarding the method error. Also, as the same orthodontist did the periodontal indices, they could not be blinded to the type of retainer, probably leading to performance bias.

The authors preferred multiple statistical tests to multivariable regression models, lowering the credibility of the conclusions. First, statistical tests are prone to Simpson's paradox2 because important confounding variables (such as age and sex) are discounted. Second, conducting tests on an unmatched sample introduces bias, which is imminent in retrospective studies. Third, statistical tests do not inform about the magnitude of an association, as they merely aid in rejecting or failing to reject the null hypothesis. Hence, the researchers relied unduly on p-values, which encourages fallacious interpretation of non-statistically significant results as evidence of no association and misjudged conclusions: 'their failure rate was not correlated with the age or sex of patients or the treatment duration'. All issues above would have been mitigated had the authors performed matching at least by age and sex, for instance, using propensity scores followed by a multivariable regression tailored to the outcome type, achieving double robustness.3 The substantial imbalance in sex and age likely contributed to the non-statistically significant associations reported in the study, rendering the conclusions of no correlation overly misleading. The authors applied one regression model without investigating the underlying assumptions to justify its implementation. The Cox regression model is based on the proportional hazards assumption. When this assumption does not hold, the researchers must resort to more flexible models that do not rely on proportional hazards. Last but not least, the Kaplan-Meier plots were incorrect. All three orthodontic retainers should appear in one Kaplan-Meier plot to allow comparison and investigation of the proportional hazards assumption.

The authors excluded patients with changed or repaired retainers, but later, in the results, they reported the number of replaced retainers, which is confusing.

A table with the distribution of all baseline characteristics per retainer group is missing. Such a table would have elucidated potential differences in the baseline between the three groups.

The oral hygiene and the pre-treatment malocclusion were not reported, which may substantially affect and confound the findings.

Unwanted tooth movement was not reported in the results as planned in the methods. In contrast, different comparisons were made for different variables, which may lead to selective reporting.4

The present study may be prone to selection bias, as happier patients with treatment results may be more concerned about the relapse come to follow-up, and subsequently, participate in the study. Also, uncompliant patients with worse treatment results may have less interest in the relapse, and thus, be lost to follow-up. However, the true level of failure may be underestimated.

As the study was done in one private clinic and the same clinician treated the patients, the generalisability of the study is limited.