Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Evidence-based medicine has been defined by Rosenberg as the process of “systematically finding, appraising and using contemporary research as the basis for clinical practice” [1]. This definition can also be applied to dentistry and, in turn, to orthodontics [2]. According to Bader [3], “Evidence-based Dentistry is not simply a new name for an old practice. The process is designed to answer specific questions, and it includes systematic and qualitative search of all available evidence.”

The scientific community perceived the need for evidence-based data, and many attempts were made to produce evidence-based conclusions when conducting studies. The research method used in a study is primarily dependent upon the specific question(s) the study is addressing. For any posed clinical question, specific research designs can be used in order to provide information that is more valid than others [4 5]. Clinical decisions, which are based on evidence-based conclusions, are made with regard to a classification of the quality of the various study designs [6 7]. A hierarchy of quality of evidence of the various study designs is presented in Table6.1. Randomized controlled clinical trials (RCTs) can produce very strong evidence [8 9] in contrary to other study designs, because with their use the ­effectiveness of a treatment intervention can be better evaluated [10], and it is feasible to assess if one treatment intervention is better than another [7]. Further, systematic reviews (SRs) of RCTs should be credited with the production of even more strong evidence of treatment effects, because the source studies are precisely selected, and after appropriate evaluation, the outcomes are qualitatively synthesized following a specific protocol [7 1114]. Finally, in addition to the argument about which study design should be placed at the top of the pyramidal hierarchy of ­evidence, ­meta-analysis (MA) is regarded as the highest level of analysis, in which conclusions are made by quantitatively synthesizing the source data of several studies, such as of RCTs, providing already strong evidence, and therefore the evidence produced from MAs should be considered as the strongest possible [15]. All these study designs create a challenging environment of knowledge which could result in wrong conclusions if the appropriate attention is not given. For this reason, the main challenge for the orthodontist is the necessity of integrating the gathered evidence into clinical practice [16].

Table 6.1 Evidence of clinical effectiveness: a hierarchy of quality of evidence of the various study designs (in descending order)

The aim of this chapter is to discuss the basic principles of MAs and their conduct and reporting, as well as to investigate the evidence currently existing in the orthodontic literature derived from high-quality studies, such as MAs. The results from these studies will be critically presented and discussed according the specific fields of orthodontics that were investigated. Lastly, the meta-analytic procedures used to obtain conclusions that can be used for practicing ­evidence-based orthodontics will be evaluated.

2 Meta-analysis as a Tool for Evidence-Based Practice

2.1 Background

The introduction of MA in medical research has an important positive impact on research synthesis in reopening the question of how best to summarize the results of many separate studies, using statistical procedures for computation of effect size in order to evaluate treatment effects. By combining the data from individual studies, a MA increases the overall sample size to a great extent, which in turn increases the statistical power of the analysis, as well as the precision for estimating the treatment effects. Because the “data” used for MAs are derived from original studies published in scientific journals, the quality of the MAs depends heavily on the quality of these studies. Therefore, issues such as how well these studies were conducted, how well the findings were reported, and how they came to the attention of the meta-analyst are of great importance. Well-performed MAs allow a more objective appraisal of the evidence than traditional narrative reviews, provide a more precise estimate of a treatment effect, and may explain the heterogeneity between the results of individual studies [17]. In contrast, imprecisely conducted MAs may be biased, mainly because of the exclusion of relevant studies or the inclusion of inadequate studies [18]. Clearly, MAs present several advantages over conventional narrative reviews and may constitute a very powerful tool in clinical research. It is, unfortunately, not a perfect tool. Any statistical procedure or analytic approach can be misused or abused. Most of the criticisms of quantitative approaches attempting to reviewing literature are objections to the misuse or abuse of MAs.

2.2 Definitions

The term “meta-analysis” was first proposed in 1976 by Glass, a psychologist [15]. He identified three levels of analysis and thereby established a meaningful context for meta-analytic statistical procedures. According to this classificationprimary analysis refers to the original statistical analysis of the data as it is collected by the researchersecondary analysis refers to analysis of this data by someone other than the researcher who collected the original data (possibly for purposes or with analytic strategies other than those of the original researcher), andmeta-analysis refers to analysis of the data of several independent studies.

Meta-analyses are statistical procedures that integrate the results of several independent studies considered to be “combinable” [19]. In other words, a MA provides a logical framework to a research review. This means that similar measures from comparable studies are listed systematically and the available effect-measures are combined, where possible [20]. Trying to define more precisely the term “meta-analysis,” Kassirer [21] stated that “Meta-analyses are studies of studies ….”.

Sometimes MAs are called “statistical overviews” or “systematic reviews”, although the later term should be used only when quantitative data synthesis is not possible. In detail, with a SR a qualitative synthesis of the available data without any statistical analysis is performed when the data derived from the original studies are not similar and they cannot be combined, while with a MA a quantitative data synthesis with specific statistical meta-analytic procedures is attempted when there is data similar enough to be combined. In other words, a MA is a SR with statistical analysis.

2.3 Meta-analysis vs. Systematic or Narrative Review Articles

Meta-analyses are similar to SRs because they are based on thorough reviews of the literature about a single research topic. Nevertheless, they differ from these articles, in that theystatistically combine the results of several studies into a single outcome measure. In addition, a significant problem among the narrative review articles is that, although they may start with the same basic aims and may also use the same material in the literature, different conclusions might be reached. Perhaps even more important than the disagreement of conclusions derived from different narrative reviews is the disagreement in conclusions between a narrative review and a MA or a SR of the same research field. For example, Cooper and Rosenthal [22] performed a study with 41 faculty and graduate students who read and integrated the results of the same seven studies of gender differences in task persistence. According to the results of this study, it was found that the traditional reviewers were significantly more likely to conclude that there was no support for the hypothesis tested, in contrast to the generally more accurate meta-analytic reviewers.

2.4 Purposes of Meta-analyses

Meta-analyses may have several purposes, such as (a) to summarize a large and complex amount of literature on a topic, thereby resolving conflicting reports, (b) to clarify or quantifying the strengths and weaknesses of various studies on a specific research field, (c) to avoid the time and expense of conducting a clinical trial or in contrary to the need for a major clinical trial, and (d) to increase the statistical power of the analysis by combining data from many smaller studies. By performing such studies, the precision of an estimated treatment effect can be improved, variations in treatment effects through subgroup analysis can be investigated, and the generalizability of known treatment effects can be improved.

2.5 Limitations and Strengths of Meta-analyses

Meta-analyses are not without controversies. The benefits and risks of the more complicated procedures of MAs continue to be debated in the medical research community [2329]. Controversial issues include (a) the ability of the researchers to combine studies that differ in important aspects, such as study populations, experimental designs, and quality controls, (b) the possibility that publication or selection bias could exist when conducting such studies, and (c) the fact that sometimes the results of MAs on the same research topic have been contradictory [20 30]. Meta-analyses present some limitations as well as strengths and advantages. The following paragraphs provide a summary of these aspects [31].

2.5.1 Limitations

Meta-analyses have been accused of oversimplifying the results of a research area by focusing mainly on overall effects while downplaying mediating or interaction effects, and they been criticized that sometimes they are mixing studies that measure “apples” with those that measure “oranges” [32]; ultimately, no meaningful results can be obtained. Some researchers support the idea that MAs ignore the possible impact of study quality and risk of bias on the results of a review. By assessing the quality and risk of bias of the source studies included in the MA, this problem could be eliminated. When conducting a MA, there could be some mistakes in the classification of studies or in calculating the effect sizes because of the complicated coding system used, as well as because many studies do not provide all the ­necessary information for inclusion in the analyses. A further weakness of MAs can arise from the situations when in some cases the available studies for a particular research area are few in number or they are of low quality, their data are heterogeneous [33], and thus the corresponding results should be interpreted with great caution.

2.5.2 Strengths

The procedures employed in a MA permit quantitative synthesis of the literature that addresses specific research areas. A MA is likely to be more objective than a traditional extensive literature review, and therefore its use is more efficient to summarize large bodies of literature [34]. By means of MA, it is possible to reach stronger conclusions because more studies can be analyzed objectively with specific statistical methods as opposed to traditional narrative literature reviews. Furthermore, it should be mentioned that a MA is also very helpful in examining lack of evidence in a specific research field, providing insight into new directions for research, as well as finding mediating or interactional relationships or trends that they are either too minimal to be observed or they cannot be hypothesized and tested in individual studies [34 35].

2.6 Indications and Contraindications of Meta-analyses

When conducting MAs, misleading results could generally be avoided if some basic principles are defined at the outset and carefully followed throughout the investigation. First, the need for performing such a study must be examined, and this need must be later justified in the publication. Victor [36] proposed some indications and contraindications that should be seriously taken into consideration for a MA. According to him, a MA is indicated when: (a) there is a necessity of an urgent decision and because of the lack of time, the performance of a new trial is impossible, (b) there is research on the safety aspects of drugs and other therapies and especially the evaluation of side effects, (c) there are many non-conclusive studies on a specific treatment, where small effects are important and when target-­reaching trials are unrealistic in view of needed sample size and time, and (d) there are contradicting results of studies or effects, which vary too much among different types of subjects.

In contrast, MAs are contraindicated when (a) they are performed as the basis of drug approval and registration or (b) they are conducted in an attempt to make an irrelevant or unimportant effect significant by combining numerous insignificant studies showing small effects. In addition, it is also a misuse of MA when a researcher uses it to avoid the time and hard work of conducting a study of his or her own. Some basic principles to overcome these, as well as other, controversial issues are discussed below.

2.7 Conducting Meta-analyses

The most important issue before conducting a MA is to define a protocol to work with. This protocol, which will also help to reduce bias, should be precisely followed during the study and should include (a) the definition of the response ­variables, (b) the methods of literature searching for the individual studies to be included in the analysis, (c) the measures taken to reduce and identify publication bias, (d) the inclusion and exclusion criteria for the studies to be included in the analysis (selection bias), (e) how the data will be extracted from the studies, (f) how the quality (risk of bias) of the source studies will be assessed, and (h) how the data will be analyzed statistically, including among others information concerning the definition of the effect size, the use of fixed effect or random effects model, heterogeneity assessment, as well as subgroup and sensitivity analysis).

2.7.1 Definition of the Response Variables

The definition of the response variables that are intended to be examined is significant, because most of the times different definitions of the same variables (as they are described in the original studies included in the analysis), are not suitable for combining. Furthermore, there is more need to present the minimum difference in the response variable that is considered to be clinically important and not only the statistically significant differences. This way, bias in interpreting the results of the analysis could be avoided, and research is mainly focused on clinical importance.

2.7.2 Methods of Literature Searching

First of all, the period of time covered by the literature search during which the desired studies have taken place must be mentioned, because this places the MA in perspective with developments in medicine that may precede, coincide with, or follow the performed study, and allow other researchers to replicate the study if needed. Nevertheless, the most important issue is a detailed description of the information sources and search strategies that were used to locate the studies to be included in the MA. In MAs, the “material” to be studied consists of individual source studies of identical or similar research fields. It is very important to identify as many of these studies as possible, so that the “sample” used in the MA will be as large and as representative as possible. An incomplete search can lead to “selection bias” by failing to identify important studies. In order to avoid selection bias when conducting a MA, the literature should be searched thoroughly and systematically. It is recommended that these search methods should include: (a) Keyword (index terms) searches of computerized bibliographic databases (such as MEDLINE), which will be described later in this chapter [37]. All additional information about these searches should also be presented, such as the databases searched, the dates covered by the search, and whether or not the search was conducted by a professional medical librarian. (b) Cross-checking citations to appropriate studies, either through reviewing the bibliographies of published articles that have been already identified or through indexing and citation services, such as the Science Citation Index. (c) Checking investigators, granting agencies, and industries or pharmaceutical companies for information on published or unpublished studies. (d) Searching trial registries of pertinent studies, such as The Cochrane Library Databases and the Oxford Database of Perinatal Trials. (e) Journal hand-searching.

In order to identify all possible studies to be included in a MA, not only one but several search strategies should be performed, and computer literature searches should not be the only strategy used to identify studies [20 38 39]. Another question that needs to be answered is if non-English language studies should be included in the analysis [38]. Unfortunately, MAs published in the English language journals restrict their search to papers which were also published in English, although they usually claim a very thorough search to retrieve every paper dealing with the topic to be investigated, which is rather questionable.

2.7.3 Measures to Reduce Publication Bias

The tendency for increased publication rates among studies that show a statistically significant effect of a specific treatment has been already documented [4042]. Publication bias is correlated to the fact that studies with statistically significant results are more likely to be published than those without statistically significant results [43], and therefore measures should be taken to identify and reduce publication bias.

The inclusion of unpublished data in SRs or MAs is still controversial [20 21 44 45]. However, before conducting a search for a MA, a decision should be made whether unpublished data that have not undergone a formal peer review process, including dissertations or conference presentations, should be included [21 39 42 46].

Finally, publication and selection biases in MAs are more likely to affect small studies, which also tend to be of lower methodological quality. This may lead to “small-study effects,” where the smaller studies in a MA show larger treatment effects. Small-study effects may also arise because of between-trial heterogeneity. Thus, it is generally considered that studies with large sample sizes are less likely to be affected by publication bias than studies with smaller sample sizes, because larger studies are more likely to be reported even when they present negative or not-significant results [38 39].

A very useful graphic method that can be utilized to identify and present possible publication bias in MAs in an association between treatment effect and study size is the “funnel plot” [20 38 39]. This is a scatter plot of the treatment effects calculated from the individual studies on the horizontal axis, against study size or standard error on the vertical axis [47]. Funnel plots take advantage of the well-known “law of large numbers,” i.e., the larger the sample size, the more probable it is that the sample mean is a better estimate of the population mean. This suggests that the precision that results from estimating the underlying treatment effect will increase as the sample size of the component studies increases. Thus, large studies appear at the top of the graph and tend to cluster near the mean effect size. Smaller studies appear towards the bottom of the graph and tend to be dispersed across a range of values.

Fig. 6.1
figure 00061

Hypothetical funnel plot. A symmetrical plot indicates the absence of bias (smaller circles indicate smaller studies). Thecircle (study) on the right outside thecurve indicates the possible existence of bias

In the absence of bias the studies are expected to be symmetrically distributed about the pooled effect size (Fig.6.1), and thus visual inspection of a funnel plot may give a clear indication of publication bias. In the presence of bias, the plot’s lower part can be expected to show a higher study concentration on one side of the plot than the other. In case of funnel plot asymmetry, additional statistical tests can be used to further assess the publication bias, such as the Begg & Mazumdar’s rank correlation test [48], the Egger’s test of the intercept [18], the Failsafe N or “file-drawer number” approach [49 50], and the Duval and Tweedie’s “trim and fill” method [51].

2.7.4 Selection Bias

Selection bias is a systematic error that results from the way the subjects are selected into the study or because there are selective losses of subjects before data analysis. Selection bias can occur in any kind of epidemiological study [52] and is one of the main reasons for divergent results among MAs, [39]. Thus, there is significant need to define the specific inclusion and exclusion criteria used for the studies to be included in the MA. Another aspect of selection bias is the populations utilized in the papers to be included in the MA. As MAs combine results of many different studies, the populations involved in these studies may be many and diverse, and there is a need to describe these populations and to which the results are to be generalized. The variability in the populations included in the MA may make interpreting the results difficult.

The inclusion and exclusion criteria of the source studies should be as specific as possible in order to be able to compare only compatible and relevant studies of suitable quality. According to Kassirer [21], the studies included in a MA should meet the following criteria: (a) The included studies should test the same hypothesis [46] and should therefore have the same outcome or end point. (b) These studies should compare similar patients or similar interventions [38 46]. For example, a study testing a drug against a placebo should not be compared with one that tests it against a competing drug. (c) The source studies should present some characteristics of scientific quality, such as adequate sample size, random assignment between treatment and control groups, masking of patients and examiners, quality controls on data collection and management, and finally formal statistical analyses. At least two reviewers should individually perform the selection of studies to be included in the analysis in a blind procedure, and then the inter-reviewer agreement should be objectively evaluated (i.e., by calculating the Cohen’s kappa).

2.7.5 Quality (Risk of Bias) Assessment

Currently the term “risk of bias” assessment is widely used for evaluating each included study in a SR or a MA, while formerly, the authors tended to use the term “quality” analysis. When carrying out a SR or a MA it is important to distinguish between ­quality and risk of bias and to focus on evaluating and reporting the latter. Assessing the risk of bias should be part of the conduct and reporting of any SR or a MA. In all situations, systematic reviewers are encouraged to think ahead carefully about what risks of bias (methodological and clinical) may have a bearing on the results of their SRs. There are three main ways to assess risk of bias: (a) scales, (b) checklists, (c) and individual component approaches. Scales that are commonly used for the assessment of risk of bias include: the Downs-Blacks scale [53], the Jadad scale [54], and the Newcastle-Ottawa Scale (NOS) as suggested by the Cochrane Handbook for Systematic Reviews of Interventions [55]. Checklists are less frequently used and potentially have the same problems as scales. The most commonly used is the PRISMA checklist [56]. The use of a component approach and one that is based on domains for which there is good empirical evidence and perhaps strong clinical grounds is advocated. The new Cochrane Risk of Bias tool is one such component approach [57].

Characterization of the quality of the studies allows researchers to use more specific inclusion or exclusion criteria and sometimes to assign weights to studies of different quality when combining them in the analysis. At least two reviewers should individually evaluate the quality of each study in a blind procedure, and then the differences among them should be objectively evaluated [38 39] (i.e., by calculating the Cohen’s kappa).

2.7.6 Data Extraction

The criteria used to extract data from various studies included in the MA should be reported and must be specified in advance. In other words, the “who, what, and how” of the extraction process should be precisely described. When the extraction criteria are too general, data extraction could be a subjective procedure, which may introduce bias into the process. Similar to quality assessment of the source studies, data extraction should be performed by two different researchers, and the inter-examiner reliability to establish the consistency of extraction should be evaluated by comparing the results for agreement (i.e., by calculating the Cohen’s kappa).

2.7.7 Statistical Procedures

The effect of statistically pooling the results through a meta-analytic procedure is to increase the sample size. Consequently, the statistical power of the MA is stronger than that of the individual studies. Differences in statistical methods can result in different results, and therefore careful attention should be directed to choosing the statistical procedures for conducting MAs.

There are some basic differences in the statistical procedures used in the source studies and those used in MAs. The most significant difference is the unit of analysis, which in the individual source studies is the subject (e.g., patients, students, and observational entities), while in a MA, it is the results of a study.

Another important difference is the application of appropriate statistical techniques. Some meta-analysts simply apply the usual techniques used for primary analysis also in MAs [58]. As the hypothesis tested in a MA is based on differing sample sizes, these statistical tests will have different sampling variances and will probably violate the assumption of the homogeneity of these variances. Therefore, statistical procedures that are specially developed for MAs seem to be more appropriate for the quantification of the results of the individual studies [5962].

Despite the existence of these specially designed procedures, some simple statistical methods for combining the results of individual source studies include various counting or summation procedures, such as the “head counting” or “vote countingapproach [32 35 63 64] or simply combining significance tests of the source studies [35]. Researchers should be very careful in choosing these simple procedures for statistical analysis, which are actually not recommended. In the first procedure, the highest number of positive or negative studies determines the results of the analysis. In the second procedure, the results in a MA are summarized by mathematically combining thep-values of each of the studies into a singlep-value. The latter approach is thus based entirely onp-values and because, as mentioned above, studies with nonsignificant results are published less often, this method will be associated with publication bias.

2.7.7.1 Effect Size

Before conducting a MA, an “effect size”of the variable under investigation should be initially estimated, which combines the differences and standard deviations of the response variables into standardized units across the studies. An effect size is a measure of the strength of the relationship between two variables in a statistical population, or a sample-based estimate of that quantity, and it is calculated from the source data. It is a descriptive statistic that conveys the estimated magnitude of a relationship without making any statement about whether the apparent relationship in the data reflects a true relationship in the population. In that way, effect sizes complement inferential statistics such as p-values. The most commonly encountered effect sizes (a) for continuous data include the Mean difference, the Standardized mean difference, and the Weighted mean difference, (b) for dichotomous data the Odds ratio, the Risk ratio, and the Risk difference, and (c) for censored or survival data the Hazard ratio.

2.7.7.2 Fixed-Effect and Random-Effects Models

The specific statistical method used depends on several factors, such as the hypothesis being studied and the nature of the response variable. The statistical methods may also involve one of the two types of models, which should be also reported. A “fixed-effect” model makes more assumptions about the variability/heterogeneity in the analysis, while a “random-effects” model makes fewer assumptions about the variability/heterogeneity in the analysis, is more conservative, and thus it may be a better summary of the effects reported in the individual studies of the MA.

2.7.7.3 Forest plot

A very useful way to report the results of MAs is to present a summary measure of the estimated size and direction of the effect of the treatment with the corresponding confidence intervals by means for example of the odds ratio [65]. This ratio corresponds to the likelihood of an outcome occurring in the treatment group divided by the likelihood that the same outcome will occur in the control group. The procedure can be very clear if it is presented by means of a table and/or a “forest plot of odds ratio” and its95%confidence intervals of each study included in the analysis (Fig.6.2). The studies may be arranged in several ways to reveal certain features of the findings, such as publication date, sample size, and quality of the study. An odds ratio greater than 1 indicates an increased risk in the treatment group, while an odds ratio less than 1 indicates a decreased risk. A ratio of 1 indicates no difference in risk. This means that the outcome is as likely to occur in the treatment group as it is in the control group, and therefore the treatment can be regarded neither as harmful nor as protective.

Fig. 6.2
figure 00062

A “forest plot of odds ratio.” The estimated mean odds ratio and 95% confidence limits are shown for each study. An odds ratio of 1 means that the treatment neither increases nor decreases the risk of the outcome of interest

2.7.7.4 Heterogeneity or Homogeneity of the Data of the Source Studies

As a MA combines the results of several individual studies, the degree of disagreement or agreement among them (heterogeneity or homogeneity) can also affect the results of the MA. When the source studies present homogeneous data, they can be more easily interpreted; in contrast, heterogeneous data are more difficult to explain. Sometimes, the heterogeneity among the data of the individual source studies is due to the fact that these source studies are conducted without similar protocols [66].

Heterogeneity (and possible causes for it) can be initially assessed by means of visual inspection of the forest plots. However, for a more precise estimate, the Q statistic and the corresponding P-value, as well as the I² index, which is an indicator of true heterogeneity in percentages, should be used. A I² index value of 0% indicates no observed heterogeneity, while larger values show increasing heterogeneity (with 25% indicating “low”, 50% “moderate” and 75% “high” heterogeneity) [58].

Another more useful way to examine and report the degree of heterogeneity between source studies and to identify outliers is theLAbbé scatter plot (Fig.6.3) [68]. This plot is a representation of event rates in the treated group against those in the control group. If the treatment is beneficial, studies will fall to the right of the line of identity (the no-effect line). A homogenous set of studies will scatter around a parallel line that corresponds to the combined treatment effect.

Fig. 6.3
figure 00063

The L’Abbé scatter plot shows the event rates in the treated group (“variable 1 proportion”) against those in the control group (“variable 2 proportion”). If the treatment is beneficial, studies will fall to the right of the line of identity (the no-effect line), which is not the case in the current diagram. In addition, because the set of studies do not scatter around the line, which corresponds to the combined treatment effect, it could be concluded that this set of studies is heterogeneous

2.7.7.5 Sensitivity Analysis

When important choices and assumptions are made, there is a need to test them with a sensitivity analysis to determine whether their impact on the results is warranted. In a sensitivity analysis, some studies are excluded to determine how their exclusion affects the results. If the effect is great, the studies may have an excessive impact on the results. If the effect is small, the results may be more representative of all the studies.

2.7.7.6 Software for Meta-analysis

Over the past few years, computer software entirely designed for MAs has increasingly become available, while meta-analytic procedures have been introduced in general statistical software packages. These can be differentiated in commercial or freely available packages and are presented in detail in Tables6.2a and6.2b.

Table 6.2a Commercial software entirely designed for meta-analysis
Table 6.2b Freely available software entirely designed for meta-analysis

2.8 General Suggestions

In summary, when conducting a MA, there are some general rules that should be followed. Investigators are advised to:

  • Do everything for the right cause and be prepared to defend it.

  • Perform the procedures of statistical analysis in more than one way, whenever it is possible.

  • Report everything (also in the form of tables or diagrams) in order to provide readers, reviewers, or editors with all the information they would need to evaluate the meta-analytic procedures followed.

  • Finally, do everything to minimize mistakes and/or biases.

3 Meta-analysis in Orthodontic Literature

The first MA assessing the effect of a therapeutic intervention was published in 1955 and evaluated a placebo treatment [69]. Subsequently, the development of more sophisticated statistical techniques took place in the social sciences [7073]. Meta-analysis was rediscovered by medical researchers for use mainly of the RCT research, particularly in the fields of cardiovascular disease [7476], oncology [77 78], and perinatal care [79]. Furthermore, MAs of observational studies [80] and “cross-design synthesis” [81 82] have also been performed.

The number of papers published utilizing MAs in medical research has increased sharply during the last 10 years, and in the year 2003 represented 0.23 % of all published studies (Fig.6.4 and Table6.3). Despite the large number of MAs in medical research only 18 such papers were ­published (as cited in MEDLINE) in the field of orthodontics until the year 2003 (Tables6.3 and6.4). However, it must be emphasized that MAs are difficult to conduct in orthodontics because of the nature of the available literature [83].

Fig. 6.4
figure 00064

Diagram presenting the number of meta-analyses indexed in MEDLINE during the period 1966–2002 (Results from MEDLINE search using title word or medical subject heading [publication type] “meta-analysis”)

Table 6.3 Number of meta-analyses as indexed in the MEDLINE database ­during the period 1966–2002
Table 6.4 Meta-analyses in the field of orthodontics as indexed in MEDLINE during the period 1966–2003

Since MAs are considered, if not the best, one of the three most appropriate study designs that could provide the strongest level of evidence, and since there are very few of these type of studies investigating orthodontic issues, we can assume that only some questions that concern the orthodontic community have been addressed satisfactorily so far. Furthermore, meta-analytic procedures are very complicated, and it is questionable whether the proper approaches have been used by the researchers when conducting these studies.

The aims therefore of this second part of the chapter are to investigate the evidence currently existing in the orthodontic literature that is derived from MAs, to critically present and discuss their results regarding the specific fields of orthodontics that they are dealing with, and to evaluate the methods used in these studies, in order to obtain conclusions that can be used for practicing evidence-based orthodontics.

For the identification of MAs, detailed search strategies were developed for the following databases: (a) MEDLINE (1966–2004), (b) the Database of Abstracts of Reviews of Effects (DARE) of the Centre for Reviews and Dissemination (CRD), and (c) the abstracts of Cochrane Reviews and the Cochrane Oral Health Group Trials Register. All electronic searches were conducted on February 25, 2004

The search strategy for MEDLINE was conducted after appropriate changes in the vocabulary and according to the syntax rules for each database, using 61 terms related to orthodontics (such asorthodont*malocclusion crossbite openbite prognath*retrognath*, etc.) and the termmeta-anal*. The search strategy implemented in the Database of Abstracts of Reviews of Effects (DARE) of the Centre for Reviews and Dissemination (CRD) and in the abstracts of Cochrane Reviews and the Cochrane Oral Health Group Trials Register, was performed using the term “orthodontic.” Hand-searching was also conducted from the reference lists of the studies initially selected, in order to identify more articles to be included in this evaluation. No language restriction was applied during the whole identification process of the studies. The criteria for considering studies to be included in this MA were: (a) The studies should not only have been indexed as MA by the ­corresponding databases but they should also have performed statistical evaluation of the source data, since many SRs are indexed in MEDLINE as MAs although they do not perform quantitative synthesis of data.) (b) The research papers should have discussed a subject related to orthodontics.

Following the utilization of the search strategy in MEDLINE, 57 studies were initially retrieved. Forty-eight of them were excluded because either their subject was not related to orthodontics or, following evaluation of the methodology used, it was found that statistical methods were not applied (most of these studies were actually just SRs) or that the ones used were not appropriate. Only nine papers met the inclusion criteria and remained for further evaluation. The search strategy of the abstracts of Cochrane Reviews and the Cochrane Oral Health Group Trials Register revealed four studies. Two were SRs, one was not related to orthodontics, and only one study was a MA. The search strategy of the Database of Abstracts of Reviews of Effects (DARE) revealed initially 40 studies. Thirty-seven of them were excluded, while only three met the inclusion criteria described above. All three studies were also included in MEDLINE, while one was also included in the abstracts of Cochrane Reviews. Finally, one further study was retrieved with the hand-searching process. In total, only 11 studies met the inclusion criteria as set above and remained for further evaluation; they are presented in Table6.5. These MAs investigated the ­following subjects: (a) functional appliances in Class II treatment, (b) maxillary protraction in Class III treatment, (c) treatment of transversal problems, (d) orthodontics and temporomandibular disorders (TMD), (e) cephalometric landmarks identification, (f) overjet size in relation to traumatic dental injuries, and (g) obstructive sleep apnea syndrome (OSAS).

Table 6.5 Meta-analyses included in current evaluation, their subjects, and the corresponding statistical procedures for data analysis

3.1 Functional Appliances in Class II Treatment

In 1991, Mills [101] reported the effect of functional appliances on the skeletal pattern following treatment of Class II, division 1 patients. He reviewed the findings of 26 papers dealing with results of Andresen and Frankel appliances, as they were retrieved from cephalometric radiographs. These findings presented a high degree of consistency regardless of the backgrounds and the countries of the corresponding patient samples. In order to gain in statistical significance, the findings of the primary studies were combined to produce larger samples, and they were also compared with a control group produced from reports of untreated Class II, division 1 individuals. To accomplish this task, Mills [101] combined the means and the given standard deviations of the primary studies, using Student’st-test for the full figures and the Mann–WhitneyU-test for the annual results. The author concluded that (a) there was no appreciable restraining effect on the forward growth of the maxilla in either group (functional appliances group and control group), (b) a slight mean increase in mandibular growth could be observed, mainly in a vertical direction, (c) no change in the position of the glenoid fossa was evident, and (d) there was a wide individual response, and average changes were rarely observed in a patient.

Regarding the methodology used in this study, although there was no language bias, no multiple publication bias, and no citation bias, the meta-analytic procedure followed by the author presented some problems: (a) No reference was made about the search method performed and whether databases were investigated or not, which could result in an incomplete inclusion of studies for the MA. (b) Statistical tests for primary analysis of studies (Student’st-test and Mann–WhitneyU-test) were used and not specially designed tests for MA. Although some primary studies did not report standard deviations, nevertheless they stated that measurements did not vary in great extent between samples. The author assumed that the overall measurements given were a reasonable estimate. (c) The treatment durations of the patients included in the primary studies were different, and the control group consisted of cases with rather milder symptoms, which could lead to selection bias. For these reasons, the MA performed by Mills [101] cannot be regarded as done in an appropriate way, and therefore the corresponding conclusions stated by the author should be viewed with caution.

More recently, Chen et al. [89] investigated the efficacy of functional appliances on mandibular growth in patients with Class II malocclusion, aiming to test the hypothesis that functional appliances enhance mandibular growth when they are used for the treatment of Class II malocclusions. In order to identify the corresponding studies, they performed an electronic search in MEDLINE for the years 1966–1999. Six articles meeting validity standards were evaluated for 12 cephalometric variables. The data of the studies included in the MA were evaluated using ANOVA, Student’st-test for paired data, and 95% confidence intervals. For the variables Co-Pg, Co-Gn, SNB, LIA, and other measurements, the authors found no significant difference between the untreated control group and the group treated with functional appliances. However, for the variables Ar-Pg and the variables Ar-Gn, there was a significant difference between the control and the treated groups. These results showed that it was not easy to reach definite conclusions about the effectiveness of the appliances used for the treatment of Class II malocclusion and suggested the need to reevaluate the use of functional appliances for mandibular growth enhancement. This was due to the many inconsistencies in measuring treatment results and to the different treatment durations.

Although there was no citation bias and multiple publication bias, some problems could be observed in the methodology used by the authors, such as: (a) the lack of searching databases additionally to MEDLINE, (b) the limitation of the language of the studies to English, (c) the statistical tests used for primary analysis of the studies (Student’st-test and ANOVA) and lack of a specially designed test for MA, (d) no mention of control and treated group sizes in the main text, although the results of the analysis were given in box plots, and (e) the differences in the age of patients at the start of treatment as well as differences in treatment duration in the various studies included in the analysis, which could indicate the possible existence of selection bias. Consequently, the MA performed by Chen et al. [78] presents some weaknesses. However, if more detailed inclusion and exclusion criteria had been applied, most probably the conclusions would have been more indistinct.

To conclude, even if some positive effects of functional appliances on the skeletal pattern and especially on mandibular growth can be assumed as reported by Mills [101], there is still no strong evidence that actually could confirm these findings.

3.2 Maxillary Protraction in Class III Treatment

In the first study, an attempt was made to evaluate the effectiveness of maxillary protraction with orthopedic appliances in Class III patients by Kim et al. [94], aiming to determine whether a consensus exists regarding controversial issues, such as the timing of treatment and the use of adjunctive intraoral appliances. In order to identify the corresponding studies, they performed an electronic search in MEDLINE for the years 1966–1996 and reviewed the abstracts and summaries of these articles by hand-searching. Fourteen studies met the selection criteria. In order to combine the data in the primary papers, Kim et al. [94] performed an analysis by summarizing the means and standard deviations of the primary studies and by graphically representing these results.

The statistical analysis of the changes following treatment in selected cephalometric landmarks showed no distinct differences between the palatal expansion group and the non-expansion group except for one variable, which increased to a greater degree in the non-expansion group. Examination of the effect of age revealed greater treatment changes in the younger group of patients. These results indicated that protraction face mask therapy is effective in patients who are growing, but to a lesser degree in patients who are older than 10 years of age, and that protraction in combination with an initial period of expansion may provide more significant skeletal effects.

Although there was no language bias evident in this study, some problems of the meta-analytic procedures that weaken the results should be mentioned. These include the following: (a) use of common statistical approaches for the primary analysis of studies (summaries of the means and standard deviations along with graphic representations) and not specially designed tests for MA, (b) limitation of the literature search conducted by the authors to identify the corresponding studies only to MEDLINE without investigating other databases, (c) the ethnic maturation differential that may well exist in the primary studies and should also be taken into consideration, (d) the lack of a matched control group, and (e) the lack of standardization of the design of the various studies. For these reasons, the results of this MA should be regarded with caution.

Maxillary protraction was also the subject of a MA conducted by Jager et al. [92]. Their aim was to quantitatively assess the published results concerning the treatment effects of maxillary protraction on craniofacial growth of patients with Angle Class III malocclusion using a MA. In order to identify the corresponding studies, they performed an electronic search in MEDLINE for the years 1966–1998. In addition, the reference lists of the studies collected in the database search were surveyed for further information. Twelve studies remained for subsequent evaluation after implementation of the inclusion criteria. In order to combine the data of the primary papers, the results of different cephalometric measurements were reviewed by the Dstat 1.10 software in order to calculate a standardized treatment effect variable. The homogeneity of the variances of the different effect variables as well as a composite effect was calculated. The results of this analysis demonstrated that a significant composite effect of the protraction treatment on some craniofacial skeletal and dental components was evident. In conclusion, maxillary protraction was shown to have a significant treatment effect. However, several of the individual effect variables demonstrated a significant lack of homogeneity. Study characteristics that might possibly account for this variability were the patients’ ages at the treatment start and the combination of maxillary protraction with rapid maxillary expansion. Regarding the statistical procedures used, the fact that a specially designed analysis was performed adds to the strengths of this study, while the limitation of literature searching in only one database (MEDLINE) could be regarded as a weakness.

To summarize, maxillary protraction could modify the skeletal and dental components of the face. Thus, it seems that the protraction treatment is effective in patients who are growing but to a lesser degree in patients who are older than 10 years of age. Moreover, protraction in combination with an initial period of expansion may provide more significant skeletal effects.

3.3 Treatment of Transversal Problems

The subject of the first study conducted by Burke et al. [96] was a MA of the changes of mandibular intercanine width during and after treatment and post-retention. In order to identify the corresponding studies, they performed an electronic search in MEDLINE. The reference lists of the studies collected through this search were reviewed, and recent issues of relevant journals were also hand-searched. Twenty-six studies that assessed the longitudinal stability of post-retention mandibular intercanine width were evaluated. For the statistical analysis, weighted averages and standard deviations for the means were compared for linear changes in intercanine transverse dimensions during treatment (T1), immediately after treatment (T2), and after removal of all retention (T3). Paired two-tailt-tests were performed between T3 and T1 means on all groups, and the corresponding 95% confidence intervals were computed. Following the above meta-analytic procedure, the conclusions reached by the authors were the following: (a) Regardless of patient diagnostic and treatment modalities or whether treatment was extraction or non-extraction, mandibular intercanine width tends to expand during treatment of approximately 1–2 mm, to contract post-retention to approximately the original dimension, and to show a net change in post-retention between 0.5 mm expansion and 0.6 mm constriction. (b) While statistically significant differences could be demonstrated within various groups, the magnitudes of the differences were not considered clinically important. (c) The net change in mandibular intercanine width that was calculated from the sum of cases included in the MA was approximately zero, which supports the concept of maintenance of the initial intercanine width in orthodontic treatment.

Regarding the methodology used by the authors to evaluate the primary data, the following problems should be mentioned: (a) The authors used statistical tests for primary analysis of studies (paired two-tailt-tests) and not specially designed tests for MAs. (b) No control group size was mentioned. (c) The literature search was limited only to MEDLINE, and no other databases were investigated. (d) The post-retention period varied from 4 months to 12 years, which could indicate the existence of selection bias. (e) No test for homogeneity was demonstrated. Consequently, it is questionable if the results of this MA are strongly supported by the methodology followed by the authors, and they should be considered with caution.

The second study performed by Schiffman and Tuncay [91] evaluated existing trials on maxillary expansion in order to understand the appropriateness and stability of this procedure. The authors searched MEDLINE from 1978 to 1999, and additional hand-searching was also performed. The evaluation of the primary data consisted of coding and scoring each study with respect to preestablished characteristics. Following this evaluation, only six studies remained in the final analysis. An overall effect size was computed, and aspects of the study design were analyzed. The results of this study were the following: (a) The mean expansion after appropriate adjustment was 6.00 mm, with a standard deviation of 1.29 mm. Of the 6 mm average, 4.89 mm was maintained while wearing retainers. (b) The 6 mm average expansion with retention in the short-term (less than 1 year) yielded a 4.71 mm residual expansion, which subsequently was reduced to 3.88 mm during the short-term post-retention period. (c) In the long-term post-retention study period, only 2.4 mm of the residual expansion remained, which was no greater than what has been documented as normal growth. The authors concluded that there was inadequate evidence to support the opinion that the expansion achieved beyond what is expected from normal development of the maxilla could be retained.

Although selection bias was not present in the study, it should be mentioned that: (a) the literature search included only one electronic database (MEDLINE). No other databases were investigated, and this search was conducted only for the period from 1979 to 1999. (b) There was a language restriction, since only English language publications were included in the study. (c) The authors did not present anyp-values for the measurements given in their analysis, while on the other hand, they stated that the significance level was set atp  =  0.10 (instead of the commonly accepted value in the medical community ofp  =  0.05). Therefore, this meta-analysis presents a few weaknesses, and the derived conclusions should be regarded with some caution.

More recently, Harrison and Ashby [102] performed a meta-analysis aiming to evaluate orthodontic treatment procedures used to expand the maxillary dentition and correct posterior crossbites. In order to identify relevant studies, the Cochrane Controlled Trials Register and MEDLINE were searched for all RCTs and controlled clinical trials (CCTs). In addition, hand-searching was performed. In total, 5 RCTs and 8 CCTs were included in the study. For the statistical analysis of the primary studies, the odds ratio, the relative risk, the relative risk reduction, the absolute risk reduction, the number needed to treat, and the corresponding 95% confidence intervals were calculated for event data. The weighted mean difference and the corresponding 95% confidence intervals were calculated for continuous data. The conclusions made from this MA supported that: (a) occlusal grinding in the primary dentition with/without the addition of an upper removable expansion plate was shown to be effective in preventing a posterior crossbite present in the primary dentition from being perpetuated to the mixed and permanent dentitions, (b) no evidence of a difference in treatment effect (molar and canine expansion) between the test and control intervention was found in the trials that compared banded versus bonded and two-point versus four-point rapid maxillary expansion, banded versus bonded slow maxillary expansion, ­transpalatal arch with/without buccal root torque, or upper removable expansion appliance versus quad-helix, and (c) insufficient data were provided regarding two-point versus four-point rapid maxillary expansion to allow a formal analysis.

Although the procedures used in this study were well performed and followed the guidelines for undertaking MAs [85], some trials included small sample sizes, and they were inadequately powered. Therefore, the authors concluded that further studies with appropriate sample sizes would be required to assess the relative effectiveness of these interventions.

To summarize, from the results of the studies that evaluated the treatment of transversal problems, it could be concluded that regarding mandibular expansion, the net change in mandibular intercanine width following treatment and retention was approximately zero, which supports the concept of maintenance of the initial intercanine width in orthodontic treatment. Thus, there was inadequate evidence to support the opinion that the maxillary expansion achieved beyond what is expected from normal development of the maxilla could be retained, while occlusal grinding in the primary dentition, with or without the addition of an upper removable expansion plate, was shown to be effective in preventing a posterior crossbite present in the primary dentition from being perpetuated to the mixed and permanent dentitions. Some weaknesses and problems in the methodology used (i.e., language bias, publication bias, and lack of control group size) were present in the above studies, with the exception of the third study which could be considered as an illuminating example to other meta-analysts for the methodology followed by the authors.

3.4 Orthodontics and Temporomandibular Disorders

The study by Kim et al. [90] investigated the relationship between orthodontic treatment and temporomandibular disorders (TMD) in patients following completion of orthodontic therapy. The authors conducted a computerized search in MEDLINE, while additional material was retrieved by the reference lists of the articles found and from a list of published and unpublished studies. For the statistical analysis, the authors divided and extracted data from 31 articles according to study designs, symptoms, signs, or indexes. To test whether all primary studies attempted to estimate or observed the same true effect and whether variability between results of the studies was due to random error only, a statistical test for the hypothesis of parametric homogeneity (H) was conducted. In addition, probabilities of homogeneityP(H) and odds on parametric homogeneity (H) were calculated. The authors could not deliver a definitive conclusion regarding the question about the relationships between orthodontic treatment and temporomandibular disorders (TMD) because of the severe heterogeneity of the data of the primary studies. In addition, they stated that the data included in their MA do not indicate that traditional orthodontic treatment increased the prevalence of TMD and that a reliable and valid diagnostic classification system for TMD is needed for future research.

Despite the extreme heterogeneity of the data in the primary studies, the following parameters regarding the methodology used weakened even more the results of this investigation: (a) Only one electronic database was investigated. (b) Only English language papers were identified, which could lead to language bias. (c) Not all of the primary studies included in this attempt to conduct a MA were of the same quality. (d) The inclusion of studies regardless of the age of the patients, the large number of different types of appliances that were evaluated in the study, and the different durations of the treatment could suggest selection bias. To summarize, it was not possible for the authors of the above study [90] to produce evidence on the relationships between the prevalence of temporomandibular dysfunction and orthodontic treatment.

3.5 Identification of Cephalometric Landmarks

In this MA Trpkova et al. [97] tried to assess the magnitude of identification error for 15 lateral cephalometric landmarks. In order to identify the corresponding studies to be included in the MA the authors performed a computerized search in the MEDLINE database from 1966 to 1995 along with hand-searching. In total, six studies were evaluated. The statistical analysis of the primary data included a weighted average of the estimates in order to combine studies reporting means and standard error, and one-way analysis of covariance in order to combine studies reporting standard deviations. The results of this study were a measure of the systematic and random errors involved when locating landmarks on lateral head films, and they were presented as standard mean errors with the corresponding 95% confidence intervals for the repeatability and reproducibility of the 15 cephalometric landmarks. According to this investigation, the authors concluded that 0.59 mm of total error for the x-coordinate and 0.56 mm for the y-coordinate are acceptable levels of accuracy, and that only the landmarks B, A, Ptm, S, and Go on the x-­coordinate and Ptm, A, and S on the y-coordinate presented an insignificant mean error and a small value for total error. Therefore, these landmarks may be considered reliable for cephalometric analysis of lateral radiographs. For these reasons, the authors emphasized the importance of critical interpretation of cephalometric measurements and careful selection of landmarks for cephalometric analysis and suggested that a separate analysis to estimate the identification errors of landmarks with dubious reliability should be a prerequisite both for research purposes and in clinical practice.

Regarding the methodology used when conducting this MA, the following limitations that could weaken its strength should be mentioned: (a)The literature search included only one electronic database (MEDLINE), and no other databases were investigated. (b) No information was given if a language restriction was applied during the identification process of the papers to be included in the analysis. (c) No homogeneity test of the primary data was presented in the study. (d) No information was given regarding the ages of the patients in the primary studies. Consequently, the methodology followed by the authors weakens the evidence produced by this MA, and therefore the above-mentioned results should be treated with some caution.

3.6 Overjet Size in Relation to Traumatic Dental Injuries

In the only study where traumatic dental injuries were examined, Nguyen et al. [93] investigated the risk of traumatic injuries of the anterior teeth due to overjet. In order to identify relevant studies, they performed a literature search of MEDLINE (1966–1996) and Excerpta Medica (1985–1996) databases. Eleven articles were included and evaluated in their study. In order to qualitatively assess these articles, a methodological checklist for observational studies was developed. The relative risk of overjet, compared with a reference, was expressed as an odds ratio. For each primary study, the odds ratio and the corresponding 95% confidence intervals were computed, and subsequently these odds ratios were pooled across the studies. Finally, the influence of the quality of the studies on the pooled odds ratio was addressed. From the results of this analysis, the authors concluded that children with an overjet larger than 3 mm are approximately twice as much at risk of traumatic injury on anterior teeth than children with an overjet less than 3 mm and that the effect of overjet on the risk of dental injury is less for boys than for girls in the same overjet group. In addition, the risk of anterior teeth injuries tends to increase with increasing overjet size.

Regarding the meta-analytic procedures used in this study and despite the fact that a possibility of publication bias cannot be excluded, some additional problems should also be mentioned: (a) The studies selected in this MA showed data heterogeneity with the exception of the subgroup between the ages 6 and 18. (b) There was no reference to the number of the control group size in any of the studies selected to be analyzed. (c) The authors did not mention if they included studies reported in other languages than English, which possibly could suggest an English language bias. Therefore, although this MA was performed adequately well, the above-mentioned conclusions should be appraised with some caution due to the limitations of the source studies and the heterogeneity of the primary data implemented in the analysis.

3.7 Obstructive Sleep Apnea Syndrome

The MA study conducted by Miles et al. [98] investigated possible significant differences between the cephalometric variables describing the craniofacial skeletal or soft tissue morphology of individuals with and without obstructive sleep apnea syndrome (OSAS). In order to identify relevant studies, a MEDLINE search for the years 1966–1993 and hand-searching were conducted. Subsequently, a hierarchical analysis was performed to examine the quality of evidence within this body of literature. The meta-analytic procedures employed in the study were: (a) combined means and standard deviations for the OSAS and control groups in order to examine the distribution and consistency of outcomes across studies; (b) Z-scores for statistical significance testing between groups; and (c) the potential diagnostic accuracy of the variables, represented by the area under the receiver operating characteristic (ROC) curves. Following this evaluation the authors concluded that: (a) The literature is characterized by several methodological deficiencies, and therefore it is equivocal regarding a causal association between craniofacial structure and OSAS. (b) Evidence for a direct causal relationship between craniofacial structure and OSAS is unsupported by the literature, both qualitatively and quantitatively. (c) The rationale for OSAS treatments based on morphologic criteria remain unsubstantiated. (d) The two most consistent, strong effect sizes with the highest potential diagnostic accuracies were variables related to mandibular structure (Sn/MPA, Go-Gn). (e) Although mandibular body length (Go-Gn) appears to be an associated factor, this does not support causality. (f) More standardization of research methods and data presentation is required.

Regarding the methodology followed by the authors, although no publication biases existed, the studies included in the MA were limited to those published in the English language, which implies the presence of language bias. Furthermore, only one study met all the inclusion criteria set by the authors, while none of the efficacy treatment studies met the inclusion criteria. Consequently, the methodology followed by the authors did not result in strong evidence, mainly due to the small number of the original articles that was included in the MA.

4 Concluding Remarks

In summary, according to the results obtained from the MAs related to orthodontic subjects discussed in the aforementioned pages it can be concluded that: (a) There is not enough evidence to reach definite conclusions about the effectiveness of functional appliances used for the treatment of Class II malocclusion. (b) Maxillary protraction treatment is effective in patients who are growing, especially in those who are older than 10 years of age, and when it is performed in combination with an initial period of expansion may provide more significant skeletal effects. (c) Regarding mandibular expansion, the concept of maintenance of the initial intercanine width during orthodontic treatment remains unchanged. (d) Regarding maxillary expansion, there is inadequate evidence to support the opinion that the expansion achieved beyond what is expected from normal development of the maxilla could be retained, while occlusal grinding in the primary dentition with or without the addition of an upper removable expansion plate was shown to be effective in preventing a posterior crossbite. (e) There is not enough evidence concerning the relationships between the prevalence of temporomandibular dysfunction and orthodontic treatment. (f) Only the landmarks B, A, Ptm, S, and Go on the x-coordinate and Ptm, A, and S on the y-coordinate may be considered to be reliable for cephalometric analysis of lateral radiographs. (g) The risk of anterior teeth injuries may increase with increasing overjet size. (h) There is no causal association between craniofacial structure and obstructive sleep apnea syndrome.

The discussion of the methodology of meta-analytic procedures employed in the various studies revealed some problems and limitations, which weaken the evidence produced by these studies, and therefore a more critical appraisal of the findings should be undertaken. The main problems and limitations that were found were the following: (a) bias in the procedure of identifying and selecting the primary studies, and language bias or publication bias were either present or the possible measures taken (or not) to avoid these biases were not reported by the authors; (b) the lack of the homogeneity evaluation of the primary data, and if done the existence of heterogeneity in many studies; (c) the lack of information regarding the sample and control group sizes, which eliminates the possibility for other researchers to repeat the MA using the same information and data provided or to re-evaluate the scientific progress by applying a cumulative MA; (d) limitations in the treatment groups, such as the existing differences of the age of the patients, the analysis of treatment groups with developmental differences, and the comparison of different types of interventions weaken the conclusions produced from the meta-analytic statistical procedures; and (e) the small number of the original articles possible for inclusion in almost all the meta-­analytic procedures, due to the lack of high-quality original research articles in orthodontic literature.

For all these reasons, it is obvious that more high-quality research papers in orthodontic-related subjects are needed in order to produce strong evidence concerning the effectiveness of the various treatment approaches used in our everyday clinical practice.

Key Issues of Clinical Interest

  • Evidence-based orthodontics gains with the time the attention of researchers and clinicians exponentially.

  • Only a few study designs can provide sound evidence for clinical decisions in medicine and consequently in dentistry and orthodontics.

  • Among all available study designs, meta-analyses are considered to provide the strongest evidence.

  • Although the quantity of meta-analyses in orthodontics is still low, their number is constantly increasing during the last years.

  • The meta-analyses with orthodontic-related subjects conducted so far provided, mainly due to methodological inconsistencies and the low number of original studies appropriate to be included in the analyses, only limited evidence.

  • There is not enough evidence to reach definite conclusions about the effectiveness of functional appliances used for the treatment of class II malocclusion.

  • Maxillary protraction treatment is effective in patients who are growing, and when it is performed in combination with an initial period of expansion may provide more significant skeletal effects.

  • The concept of maintenance of the initial intercanine width during mandibular expansion remains unchanged.

  • There is inadequate evidence to support the opinion that the maxillary expansion achieved beyond what is expected from normal development of the maxilla could be retained.

  • Occlusal grinding in the primary dentition with or without the addition of an upper removable expansion plate was shown to be effective in preventing a posterior crossbite.

  • There is not enough evidence concerning the relationships between the prevalence of temporomandibular dysfunction and orthodontic treatment.

  • The risk of anterior teeth injuries may increase with increasing overjet size.

  • There is no causal association between craniofacial structure and obstructive sleep apnea syndrome.