Introduction

Antipsychotic drugs (APs), especially the second-generation ones (SGAs), are increasingly being used in children and adolescents, on- and off-label, to treat a wide range of acute and chronic psychiatric disorders [1]. The efficacy of APs comes at the cost of a progressive accumulation of metabolic disorders [2]. A key issue with the risk–benefit balance of APs is that, while efficacy is often connected to continued drug use and therapeutic compliance, metabolic adverse effects are cumulative and usually irreversible, tilting progressively the risk–benefit balance towards a disadvantageous one [3]. The metabolic adverse effects of APs are indeed a prominent factor that may limit patients’ compliance and thus the effectiveness of APs in the long term [4, 5]; in addition, youths display an increased vulnerability to metabolic adverse effects of APs, as compared to adults [6]. Initial symptoms are usually weight gain, dyslipidemia and insulin resistance, which may precipitate during time towards obesity, type 2 diabetes, and even overt metabolic syndrome [7, 8]. Many studies have focused on the metabolic adverse effects of APs by studying weight gain (or body mass index -BMI- increase), which is the metabolic outcome most often reported in randomized controlled trials (RCTs) on mixed-age populations. These studies have been reviewed systematically and meta-analyzed [9,10,11,12,13], with results that supports the general notion of excessive weight gain during AP exposure in adults, more with SGAs than with older APs. In particular, olanzapine and clozapine seem to promote the largest weight gain, followed by quetiapine, risperidone and then aripiprazole, ziprasidone [14, 15]. These meta-analytical conclusions may be, however, not applicable to youths. Indeed a more recent network meta-analysis focused on RCTs conducted only in the pediatric age has changed the order of risk of APs to olanzapine, quetiapine, clozapine, risperidone, aripiprazole, ziprasidone [16]. While the age-related discrepancy is now known, it is not yet clear whether if moving from weight, which is a raw outcome measure, to more standardized outcomes like BMI and BMI-Z, could provide comparable results. Another relevant clinical outcome lacking from most RCTs, and therefore from meta-analyses, is waist circumference, which is directly related to the risk of hypertension, diabetes, dyslipidemia and overt metabolic syndrome [17]. Another issue with the generalizability of the results from these meta-analyses is that data drawn primarily from RCTs are good at assessing efficacy, yet less informative when assessing safety. Clinical trials are often too short to capture long-term or developing adverse events. They typically exclude patients with complex poly-therapy or with concomitant medical conditions that may affect weight gain. Patients included in RCTs are intensively monitored and they usually receive psycho-educational counseling, which may not be a standard of care and may significantly improve their lifestyle [18]. Moreover, the metabolic adverse effects of APs are recognized as important factors that determine clinical decisions over therapy continuation vs. tapering or switch [4, 19], an aspect that cannot be verified in clinical trials having fixed treatment options. Given the above limitations, the adverse metabolic effects of APs are best observed within observational studies, as they can focus more easily than RCTs on youth, can capture all routinely monitored parameters and can report on patients’ behaviors regarding eating and exercise habits, as well as clinicians’ attitudes towards adverse drug reactions and therapeutic management. Another point of strength of observational studies, crucial to study metabolic aspects, is their duration: clinical trials often last 1 to 3 months, whereas observational studies are usually longer, lasting up to 36 months. The duration of drug exposure in clinical trials allows the detection of weight changes at the study endpoint, when the clinical benefit can be measured through rating scales; however, it is too short to allow the assessment of weight when therapy is stably titrated according to the clinical needs of each patient, and a real clinical improvement can be observed. We thus conducted a systematic review and a meta-analysis of naturalistic long-term pediatric studies, trying to overcome the limitations present in the current literature and being widely inclusive in terms of outcomes, using weight, BMI, BMI-Z score and waist circumference. Providing results that are more accurate and applicable to pediatric populations is important, since there are two opposite opinions on the relationship between weight/BMI gain and therapy duration. One claims that different APs have different weight gain potential [14, 16], the other claims that all APs are ultimately equal in magnitude, yet they cause weight gain with different latency [20]. Solving this conundrum by analyzing data from appropriately conducted long-term observational studies can improve the guidance on AP switch and therapeutic management in child and adolescent psychopharmacology.

Methods

Literature search

The review protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO ID: CRD42019121617) before the analyses began. The search used for this review included terms related to antipsychotics, children and adolescents, metabolism and observational studies (the search string is available in the Supplemental Methods). The search was conducted in Pubmed on February 1st 2019. No publication date or language restriction was applied. Other potentially relevant studies were searched from references of reviews and included as other sources. Duplicate studies were then removed (see the Preferred Reporting Items for Systematic reviews and Meta-Analyses PRISMA diagram in Fig. 1). Retrieved records were independently reviewed and screened based on title and abstract, by the second and third authors, under supervision by the first author. Criteria for article inclusion were the presence of an abstract and a full text; the actual presence in the abstract of terms related to antipsychotics, and metabolism; a clearly observational study nature; a follow-up duration of 3 months at least; a study population composed of children or adolescents or young adults, (below 21 years) with an average sample age below 18 years. Criteria for article exclusion were: being letters, commentaries, reviews, case reports; dealing with addiction or substance abuse disorders or eating disorders; including patients with clearly stated genetically-determined syndromes known to alter the sense of satiety or hunger (e.g.: Prader-Willi, or other forms of syndromic neurodevelopmental disorders). Then the eligibility of screened publications was assessed by examining full texts. The second and third authors performed the task under supervision by the first author. Studies were considered eligible for this review if they were performed in groups of patients all of less than 20 years of age; they reported quantitative effects of antipsychotics on at least weight or BMI; they were formally observational clinical studies.

Fig. 1
figure 1

Flow-chart of the study selection process. The number of studies excluded or maintained at each step is reported with the related reasons

Data systematization

The second and third authors split the selected studies among themselves and extracted the following data using pre-specified forms: bibliographic reference; year of publication: study type (prospective, retrospective); study duration; patients’ diagnosis; concomitant medical conditions (especially if comorbidities known to affect metabolism); number of treatment arms; type of control group (same patients; matched controls; unmatched controls); antipsychotic(s) used and dose (mean, sd, range); concomitant drugs used (especially psychologically or metabolically active drugs with an effect on appetite/weight); age of subjects (mean, sd, range); number of subjects; number of males and percentage; metabolic outcomes, including: weight, weight-Z, BMI, BMI-Z, waist circumference, waist-hip ratio; study time-points. The extracted information was reciprocally verified for concordance with full-texts by the second and third authors. Discrepancies were resolved by consultation with the first author. Record management was performed using Microsoft Excel. The risk of bias of selected studies was assessed using the Newcastle–Ottawa quality assessment scale for cohort studies (NOS) [21], with study-specific criteria (Supplemental Table 1). The first author then grouped studies based on the availability of baseline and follow-up observations for each of the outcomes proposed for meta-analysis, at each study time-point. Study time-points were determined following the distribution of study duration within observational studies, with most of them lasting around 6 months and 12 months, with a tolerance of ± 2 months, and with a minority lasting more than 12 months.

Meta-analysis and meta-regression

Eleven separate meta-analyses were conducted, according to data available for each combination of outcome and time-point: weight (6, 12, > 12), BMI (6, 12, > 12), BMI-Z (6, 12, > 12), waist size (6, 12). Weight-Z and waist-hip ratio were discarded due to insufficient data availability. We calculated changes at the baseline vs. at the follow-up time, in antipsychotic treatment arms and in control arms, as mean differences on each outcome, reporting 95% confidence intervals (CIs), using the inverse variance method in random effects models. We applied subgroups to each analysis to compare drugs among themselves. In addition, within each outcome and drug combination, we performed post-hoc Z-tests comparing time-points among themselves. Heterogeneity between studies was measured using the I2 statistic: p < 0.10 was considered indicative of statistically significant heterogeneity and an I2 value of 40% or more was considered indicative of sizeable heterogeneity. Sensitivity analyses were performed to determine the variability of results when the number of studies exceeded five. Begg’s funnel plots were drawn; Egger’s tests were used to assess statistically the asymmetry of the funnel plots; significant asymmetry was claimed when the intercept reached significance levels (p < 0.05) within the regression model: SND = intercept + × precision. Review Manager 5.3 (Copenhagen, the Nordic Cochrane Centre, the Cochrane Collaboration) was used to conduct this meta-analysis. We attempted to conduct meta-regression analyses on the same outcomes and follow-up time points, using the inverse variance method in random effects models, to assess the effect of different treatments while controlling for confounding effects in a step-wise approach. Confounders used were: follow-up duration, patients’ age and sex, percentage of treatment-naïve patients, percentage of study completers at each time-point, study sample size, outcome values at the baseline. SPSS v22 with the additional MetaReg package (https://mason.gmu.edu/~dwilsonb/downloads/MetaReg.sps) was used to conduct meta-regressions.

Results

Description of studies

Following our search criteria, we identified 934 entries in PubMed. We expanded them by searching references from review articles and selected 215 abstracts, from which 83 full texts were selected as relevant to our topic. Of those, 38 observational study reports contained meta-analyzable information. A full description of included studies is available in Table 1. The study quality was rated by the NOS scale (Supplemental Table 1): individual study scores are available in Supplemental Table 2. Briefly, 10 studies were scored eight points, 20 studies seven points and 8 studies six points: the average score was 7. Seventeen studies were prospective and 21 retrospective, reporting data from time points spanning over 4–192 weeks. Twenty studies observed only 1 treatment arm, six observed 2 arms, nine observed 3 arms, three observed 5 or more arms; overall, there were 5 control arms. The drug treatments most frequently observed were risperidone, olanzapine and quetiapine; a minority of studies observed “mixed” treatment arms without distinguishing single drugs, containing mainly risperidone, quetiapine and antipsychotic polytherapy (details shown in Supplemental Table 3). Patients were predominantly males (67.8 ± 15.4%) of 13.1 ± 2.5 years of age. Fourteen studies focused on antipsychotic-naïve patients (see details at the end of results), while in the other studies the prevalence of naïve patients was 34.6%. Most studies included patients who were also using psychostimulant and/or antidepressant medications.

Table 1 Description of included studies

Following study criteria, we excluded all time-points shorter than six months, thus removing from the dataset four studies [22,23,24,25]. In view of the time-points found, we chose to organize the meta-analysis based on the duration of observations in three groups: 6 months (average duration 6.1 ± 0.6 months), 12 months (11.8 ± 0.7) and > 12 months (26.6 ± 10.5); each study could report on multiple time-points and thus be included in more than one meta-analysis. Regarding outcomes, weight was observed for 6 months in 16 studies (30 treatment arms, 879 patients) [26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41], at 12 months in 10 studies (23 arms, 3473 patients) [28, 34, 35, 38,39,40,41,42,43,44] and at longer times in 4 studies (5 arms, 121 patients) [39, 41, 43, 45]. BMI was observed for 6 months in 17 studies (31 treatment arms, 903 patients) [26, 27, 30,31,32,33,34,35,36, 38,39,40,41, 46,47,48,49], at 12 months in 13 studies (28 arms, 3667 patients) [34, 35, 38,39,40,41,42,43,44, 46, 47, 50, 51] and at > 12 months in 6 studies (10 arms, 363 patients) [39, 41, 43, 46, 47, 52]. BMI-Z was observed for 6 months in 12 studies (24 treatment arms, 413 patients) [32, 33, 35,36,37,38,39,40,41, 46, 49, 53], at 12 months in 12 studies (28 arms, 1591 patients) [35, 38,39,40,41,42, 44, 46, 51, 53,54,55] and at > 12 months in 7 studies (14 arms, 647 patients) [39, 41, 46, 52, 56,57,58]. Waist size was observed in 3 studies (4 treatment arms, 88 patients) at 6 months [34, 49, 53] and in 2 studies, (3 arms, 53 patients) at 12 months [34, 53].

Meta-analyses

Complete meta-analysis and meta-regression results are provided in Table 2 for weight, Table 3 for BMI, Table 4 for BMI-Z and Supplemental Table 4 for waist size. Significant results are reported here below.

Table 2 Results of the metanalysis and metaregression of the outcome: weight
Table 3 Results of the metanalysis and metaregression of the outcome: BMI
Table 4 Results of the metanalysis and metaregression of the outcome: BMI-Z

For weight, at 6 months, there were significant increases with risperidone (+ 4.47 kg, 95% CI [2.71, 6.23], P < 0.00001), quetiapine (+ 5.84 kg, 95% CI [2.54, 9.13], P = 0.0005) and olanzapine (+ 10.91 kg, 95% CI [6.68, 15.13], P < 0.00001); weight gain with olanzapine was significantly larger than in the untreated (P = 0.02) and risperidone (P = 0.006) groups. We found no evidence of heterogeneity (risperidone P = 0.92, I2 = 0%, quetiapine P = 0.85, I2 = 0%, olanzapine P = 0.98, I2 = 0%, untreated P = 0.85, I2 = 0%. In meta-regression, higher patients’ age was associated with larger weight gain (P = 0.0002, B = 0.43). At 12 months, weight gain was significant without treatment (+ 2.3 kg, 95% CI [1.30, 3.31], P < 0.00001), in mixed treatment arms (+ 9.42 kg, 95% CI [6.88, 11.95], P < 0.00001), with risperidone (+ 6.01 kg, 95% CI [3.15, 8.87], P < 0.0001) and olanzapine (+ 10.7 kg, 95% CI [3.98, 17.42], P = 0.002); weight gain was significantly larger with mixed treatment (P < 0.00001), risperidone (P = 0.02) and olanzapine (P = 0.02) as compared to untreated arms. There was no significant heterogeneity (untreated P = 0.83, I2 = 0%, mixed P = 0.77, I2 = 0%, risperidone P = 0.77, I2 = 0%, olanzapine P = 0.22, I2 = 32%). In meta-regression, longer follow-up length was associated with smaller weight gain (P = 0.03, B = − 0.40), while higher dropout rates were associated to larger weight gain (P = 0.04, B = 0.37). At time points longer than 12 months, risperidone (+ 9.51 kg, 95% CI [2.01, 17.01], P = 0.01) and mixed treatment (+ 12.59, 95% CI [9.87, 15.30], P < 0.00001) were associated with significant weight gain, without heterogeneity (respectively P = 0.81, I2 = 0%, P = 0.49, I2 = 0%). Meta-regression was not feasible, due to the low number of studies.

For BMI, at 6 months, there were significant increases with quetiapine (+ 1.5, 95% CI [0.37, 2.62], P = 0.009), risperidone (+ 2, 95% CI [1.40, 2.60], P < 0.00001), mixed treatment (+ 3.37, 95% CI [1.75, 4.98], P < 0.0001) and olanzapine (+ 3.47, 95% CI [2.21, 4.72], P < 0.00001), with a predictably high heterogeneity among mixed treatment arms (P < 0.00001, I2 = 88%) but not others (risperidone P = 0.93, I2 = 0% quetiapine P = 0.53, I2 = 0%, olanzapine P = 0.97, I2 = 0%); BMI gain was higher in the mixed treatment (P = 0.03) and olanzapine (P = 0.02) arms, as compared to untreated, and with olanzapine it was higher than with quetiapine (P = 0.02) and risperidone (P = 0.04) as well. Meta-regression evidenced no significant moderators. At 12 months, all treatments expect for ziprasidone (+ 6.06, 95% CI [− 5.61, 17.73], P = 0.31) were associated with significant BMI gain, without heterogeneity (untreated P = 0.40, I2 = 0%, mix P = 0.14, I2 = 45%, aripiprazole P = 0.70, I2 = 0%, olanzapine P = 0.42, I2 = 0%, quetiapine P = 0.33, I2 = 13%, risperidone P = 0.74, I2  = 0%); untreated arms (+ 0.75, 95% CI [0.45, 1.05], P < 0.00001) showed significantly lower BMI gain as compared to risperidone (+ 2.13, 95% CI [1.43, 2.84], P < 0.00001. Comparison P = 0.0004), mixed treatments (+ 2.95, 95% CI [2.13, 3.77], P < 0.00001. Comparison P < 0.00001) and olanzapine (+ 3.42, 95% CI [2.10, 4.75], P < 0.00001. Comparison P < 0.0001), but not with respect to aripiprazole (+ 1.7, 95% CI [0.28, 3.12], P = 0.02), quetiapine (+ 1.82, 95% CI [0.53, 3.11], P = 0.006) or clozapine (+ 3, 95% CI [0.29, 5.71], P = 0.03). In meta-regression, the prevalence of antipsychotic-naïve patients (P ≤ 0.0001, B = 1.15) and patients’ age (P = 0.017, B = 0.22) were associated with higher BMI gain, while longer follow-up length (P < 0.0001, B = − 1.19), lower dropout rates (P = 0.002, B = 0.36), larger sample size (P = 0.002, B = − 0.31) and higher baseline BMI (P = 0.001, B = − 0.40) were associated with lower BMI gain. At > 12 months, BMI gain was significant with aripiprazole (+ 2.1, 95% CI [0.26, 3.94], P = 0.03), risperidone (+ 2.16, 95% CI [1.00, 3.32], P = 0.0003. Heterogeneity P = 0.91, I2 = 0%), and with mixed treatments (+ 3.32, 95% CI [1.78, 4.86], P < 0.0001) that showed high heterogeneity (P = 0.02, I2 = 76%). No differences were found among treatments and meta-regression was not feasible.

For BMI-Z, at 6 months, there were significant increases in the treatment arms mixed treatment (+ 0.51, 95% CI [0.40, 0.63], P < 0.00001), quetiapine (+ 0.57, 95% CI [0.40, 0.74], P < 0.00001), risperidone (+ 0.62, 95% CI [0.45, 0.79], P < 0.00001) and olanzapine (+ 0.94, 95% CI [0.60, 1.29], P < 0.00001), which was the only treatment that showed a significantly larger increase than the mixed treatment arm (P = 0.02). In the aripiprazole arm, we did not detect a significant change. There was no evidence of heterogeneity (risperidone P = 0.22, I2 = 24%, quetiapine P = 0.37, I2 = 7%, olanzapine P = 0.86, I2 = 0%, mixed P = 0.48, I2 = 0%) and meta-regression was not feasible.

At 12 months, all treatment arms were associated with significant increases, except for ziprasidone. BMI-Z gain was significant with aripiprazole (+ 0.31, 95% CI [0.14, 0.48], P = 0.0004. Heterogeneity P = 0.92, I2 = 0%), without treatment (+ 0.37, 95% CI [0.18, 0.56], P = 0.0001), with quetiapine (+ 0.54, 95% CI [0.20, 0.88], P = 0.002. Heterogeneity P = 0.01, I2 = 69%), risperidone (+ 0.61, 95% CI [0.40, 0.82], P < 0.00001. Heterogeneity P = 0.001, I2 = 68%), clozapine (+ 0.8, 95% CI [0.08, 1.52], P = 0.03), mixed treatments (+ 0.94, 95% CI [0.20, 1.69], P = 0.01. Heterogeneity P < 0.00001, I2 = 98%), olanzapine (+ 0.98, 95% CI [0.46, 1.51], P = 0.0003. Heterogeneity P = 0.10, I2 = 49%). BMI-Z gain with olanzapine was higher than with ziprasidone (P = 0.006), aripiprazole (P = 0.02), without treatment (P = 0.03), and with risperidone (P = 0.03); BMI-Z gain with risperidone was higher than with ziprasidone (P = 0.02) and aripiprazole (P = 0.03); BMI-Z with mixed treatments was higher than with ziprasidone (P = 0.04). Meta-regression evidenced that higher baseline BMI-Z was connected to lower BMI-Z gain (P = 0.007, B = − 0.72). At > 12 months, mixed treatments (+ 0.44, 95% CI [0.29, 0.59], P < 0.00001), risperidone (+ 0.48, 95% CI [0.31, 0.66], P < 0.00001) and olanzapine (+ 0.89, 95% CI [0.21, 1.57], P = 0.01) were associated with significant BMI-Z increases; there was no sign of heterogeneity (aripiprazole P = 0.39, I2 = 0%, quetiapine P = 0.31, I2 = 2%, risperidone P = 0.73, I2 = 0%) and meta-regression was not feasible.

For waist size, there were significant increases in all arms at 6 and 12 months, including mixed treatment (+ 9.1 cm, 95% CI [3.41, 14.79], P = 0.002 and + 10.2 cm, 95% CI [5.62, 14.78], P < 0.0001), quetiapine (+ 6.9 cm, 95% CI [3.95, 9.85], P < 0.00001; + 9.1 cm, 95% CI [5.73, 12.47], P < 0.00001) and risperidone (+ 8.8 cm, 95% CI [1.62, 15.98], P = 0.02; + 11.5 cm, 95% CI [9.66, 13.34], P < 0.00001). There was no evidence of heterogeneity in the only arm reporting more than one study (risperidone at 6 months, P = 0.18, I2 = 44%), no difference across subgroups and meta-regression was not feasible.

Begg’s funnel plots were drawn for all meta-analyses, showing no suggestions of asymmetry, Egger’s tests were performed, finding no indication of significant asymmetry (Supplemental Results).

We have described in Supplemental Table 5 the time-trends for all outcomes; however, due to the inter-correlation between values, we did not formally test the significance of changes across time-points.

Sensitivity analyses were carried out, considering only studies on drug-naïve patients (see Supplemental Table 7 for the dataset). Results obtained from the sensitivity analysis dataset (see Supplemental Results_ Sensitivity analyses and Supplemental Results_Comment to sensitivity analyses) did not differ in an appreciable way from those obtained from the whole dataset.

Discussion

In conducting this systematic review, we found that evidence coming from well-conducted long-term observational studies, which may be able to answer the crucial questions linked to metabolic disturbances induced by APs, is currently limited. In fact, even in observational contexts, we found that most studies cover follow-up durations of only 6–12 months, a period that does not allow a deep understanding of the metabolic disturbances progression, that may typically occur over years [2, 3, 8] and results in stabilization or waning. However, many studies suggest that statistically significant changes occur very rapidly, already after 2 or 4 weeks, as demonstrated by data from the TEA trial, which compared aripiprazole to quetiapine in youths [59]. In addition, treatment protocols and measured outcomes were highly heterogeneous, resulting in few studies suitable for a meta-analysis. Despite this, the collected studies provided valuable information and the results of our meta-analysis are still relevant for the clinical practice and intriguing in terms of AP choice.

Our results on weight may be considered as the extension of those previously available from meta-analyses of short-term RCTs. Previously reported weight gain over placebo after 3 months of treatment was between + 3.45/3.47 kg for olanzapine, + 1.72/1.77 kg for risperidone, + 1.41/1.43 kg for quetiapine, + 0.85/0.94 kg for aripiprazole and − 0.04 kg for ziprasidone [9,10,11]. In our observational, longer-term dataset olanzapine led to an average weight gain of 11 kg at 6 months, which did not seem to change at 12 months, while gain with quetiapine ramped from + 6 to + 10 kg and with risperidone it showed a slower increase, from + 4.5 to + 6 and to + 9.5 kg at > 12 months. Although risperidone was the only AP with data at > 12 months, it is reasonable to foresee that all three APs might have increased weight up to around 10–11 kg, with differing time spans. The risk order of APs for weight gain was different from the risk order for BMI or BMI-Z gain, which were instead consistent among themselves. A possible way to solve this discrepancy might be to use as outcome the percent weight gain over baseline. In fact, an absolute change in kilograms is not very informative when 13-years old kids are compared with 17-years olds. In addition, many clinical trials report incidences of weight gain > 7%, an outcome that we have not found reported in the observational studies reviewed; future studies should include it among safety parameters to be monitored. A similar trend of risk across all outcomes was restored by using the relative weight gain measure, with an average gain of 17.5% with olanzapine, 10.9% with risperidone and 10.6% with quetiapine. These values increased progressively becoming 19.7% for olanzapine, 18.5% for quetiapine and 14.3% for risperidone in the 12 months dataset, while in the > 12 months one, risperidone resulted in a 22.8% increase. This suggests that there may be in general a “ceiling effect” to relative weight gain that is time and drug-dependent, ranging up to slightly above 20%, where olanzapine is quicker and risperidone slower in reaching the ceiling. From a methodological point of view, our results may thus suggest that, in studies conducted without neither a control group nor standardized values, percent weight gain over baseline may be an outcome more accurate than weight gain. When considering BMI, previously available pediatric results are limited: a meta-analysis on all APs used in naïve patients reported a + 1.97 mean BMI increase after one month [12]. Another meta-analysis on studies including patients with first-episode psychosis reported a mean BMI gain of 1.4 at three months, and 1.86 at longer terms [13]. In our dataset observing single APs, we found a risk order maintained through the different time points, with olanzapine, risperidone and quetiapine. Olanzapine was associated, even at 6 months, with a significantly higher BMI gain (+ 3.5) than risperidone (+ 2) and quetiapine (+ 1.5), which instead did not differ from that in the untreated arm. In the 12 months dataset, the gain of the olanzapine arm did not change (+ 3.4), while that of risperidone (+ 2.1) became significant vs. untreated. The gains in the quetiapine and aripiprazole arms (+ 1.8 and + 1.7 respectively) were not different from those in the untreated arm. In the > 12 months dataset, risperidone and aripiprazole arms were undistinguishable (+ 2.2 vs. + 2.1). Thus, for BMI we observed a partial drug and time-dependent trend. BMI gains, seen with olanzapine and risperidone already in the shortest time dataset, showed minimal changes over time, while aripiprazole arms showed increasing BMI gains through the time-points, so that in the > 12 months dataset, the gain of aripiprazole was similar to that of risperidone. However, this apparent trend with aripiprazole was not supported by post-hoc time-wise Z tests, probably due to the limited sample size. Data regarding aripiprazole may thus confirm previous studies in support of a slower, but neither absent nor lower in magnitude, weight gain propensity [55] as compared to other APs. Results concerning quetiapine may suggest a lower risk as compared to olanzapine and risperidone, yet we lacked data for a > 12 months dataset. Analysis of BMI-Z showed a trend consistent with that of BMI, while literature suitable for comparisons is lacking. In our datasets at 6 and 12 months, olanzapine arms (+ 0.9, + 1) showed higher gains than risperidone (+ 0.62, + 0.6) and quetiapine (+ 0.57, + 0.5) ones, with the risperidone arm showing significantly higher gain than the quetiapine arm in the 12 months dataset. Of note, gain in aripiprazole and untreated arms was similar, while the ziprasidone arm showed no significant gain. In the > 12 months dataset, while the gain with olanzapine stood still (+ 0.9), that of risperidone arms showed a slight decrease (+ 0.5). BMI-Z analysis may have thus confirmed a moderately inferior risk for quetiapine as compared to risperidone, which still must be confirmed in a dataset including observations lasting longer than 12 months.

Indications from meta-regression results suggest a role for the persistence in therapy of patients, as evidenced by the moderators dropout rate and follow-up length, although it is not clear whether the effect may be practically related to the retention into datasets of overweight or weight-gain-resistant patents over time. Patient selection is a prominent factor in observational studies, due to the extreme flexibility of drug doses and to the high incidence of therapeutic shifts or withdrawals. Most clinical trials provide results of analyses conducted on patients who completed predetermined follow-up assessments, thus automatically excluding patients who may drop out due to metabolic complications. It would be interesting, in this regard, to include in trials systematic measures of body parameters at dropout, or systematically reporting reasons for dropout. Notably, most clinical trials deal with dropout, for instance with statistical methods as Last Observation Carried Forward (LOCF) or Mixed Model regression (MMR) analyses. However, we argue that MMR analyses with missing data imputation (for instance by LOCF) may be particularly inadequate when dealing with outcomes that are most likely in a constant worsening, such as the metabolic parameters during long-term treatment with antipsychotics. Other moderators which should be investigated more adequately are patients age and drug-naïve status, which may be reasonably entwined, considering that most patients access drug therapy when entering the school-age and, in adolescence, they can often be already past users of APs. Importantly, in the 12 months datasets of BMI and BMI-Z, the largest datasets, we could observe that as the baseline values were higher, the gains were smaller. This moderator, together with meta-analysis data showing drug-specific gain values consistent across several time-point datasets, suggest that a ceiling effect is very likely with respect to body measure increases, and the ceiling effect becomes more evident when using more standardized outcomes, i.e. BMI-Z. Indeed, baseline values were not significant moderators of weight gain, they were among moderators of BMI gain and they were the only significant moderator of BMI-Z gain. This is another observation in support of choosing BMI-Z as the only proper safety outcome. In addition to standardizing weight measures into BMI-Z, future studies should also monitor and standardize waist measures, for instance using waist-hip or waist-height ratio reference values [60] to express percentiles or Z values.

Limitations

Several limitations affect the interpretation of our results, of which some are inherent to selected studies and unavoidable, while others are methodological. Regarding unavoidable limitations, there are two main weakness of observational studies. The first is that clinicians non-randomly assign patients to medications, implying that patients with a higher risk of metabolic issues may be prescribed lower doses and/or safer drugs, being also more subject to dose reduction and drug withdrawal. This may result in smaller effect sizes and in drug-specific biases, which cannot be controlled for. The second is that observational studies often lack, by definition, a proper control arm. It is usually unethical to have a placebo or untreated arm including patients in need of medication, yet without that, conclusions on the deviation of treatment arms from the norm can only be partial. We observed that a minority of studies reported untreated arms, yet there is generally a problem when comparing over long periods of time untreated patients, with patients in antipsychotic treatment. For some time, schizophrenic patients may manage without medications, but over a longer time, one must assume these patients have substantially milder symptoms, including fewer negative symptoms. Presumably, these patients have stronger social networks, attend school, have more physical activities and are less prone to substance abuse and unhealthy living. As such, they are less prone than patients in treatment to be victims of disease-related risk factors for weight gain, which could be a substantial bias.

To minimize the limitation of lacking control groups, the best outcomes for observational studies that do not provide a control arm are Z values. Whether applied to weight, BMI, or other measures, Z values are advantageous as they incorporate sex, age and ethnicity factors, which are essential to provide a reference for body parameters. In spite of this, we observed that the majority of included studies reported unstandardized values, thus being incompletely informative about body parameters changes. The presence of various outcomes (weight, BMI, BMI-Z) measuring the same clinical entity (body mass and its distribution) is thus another unavoidable limitation, since data are not sufficient to transform these outcomes to a common one. Another unavoidable limitation regards the drug-naïve status of patients. Clinical trials require patients to be drug-naïve or at least that a washout period take place before the initiation of experimental treatment. That is not the case of clinical practice. Thus, in most observational studies, patients may be enrolled when they are already in a compromised metabolic status and, therefore, baseline values of the study are not “real” baseline values. This can have direct consequences on finding a modest or null adverse effect of treatments, as supported by our meta-regression results that assign an important role to baseline values. Nevertheless, we conducted sensitivity analyses on studies including only drug-naïve patients, which produced results very similar to those of the main analyses. One additional limitation of included studies is that in several cases data used for different publications may have been taken from the same patient samples, as shown in Supplemental Table 6, thus leading to potential data multiplication. Methodological limitations comprise, in the first place, the choice of analyses conducted in presence of insufficient sample sizes for most APs. We chose to conduct separate analyses for each molecule, rather than collating them into risk classes, to try to provide drug vs. drug comparisons, at the expense of analytical power. Another choice that might limit our findings was to compare drugs among themselves, although they were drawn from studies with different designs, rather than performing a network meta-analysis: in case of such small datasets from heterogeneous studies, we preferred to use normal meta-analyses, under the suspicion that a network approach would have not been feasible. Consequently, much more data is required in order to provide rigorous results. Our study comprised several separate meta-analyses for each time-point. There are several methods to collapse time-points into single outcome variables, yet we found it would not be optimal to do so, since source studies had many diverse time-points and there were many missing data. Mixed models may, especially in such cases, provide very limited advances over univariate analyses [61]. We thus preferred to conduct separate comparisons for each time point. Finally, our dataset was not big enough to perform all meta-regression analyses. We observed some moderators appear as significant in some subsets, yet the heterogeneity in outcomes and time-points across studies did not allow us to verify the role of distinct moderators across all outcomes and/or time-points simultaneously; therefore, meta-regression could provide useful suggestions that require further verification.

Conclusions

Overall, results of our meta-analyses are consistent with those drawn from meta-analyses of interventional trials [14,15,16], with some new insights. Olanzapine was confirmed as the AP with the highest potential for metabolic disruption. It was followed by risperidone and quetiapine, in a risk order that could not be clearly interpreted, yet suggesting that quetiapine may be associate with smaller, but quicker, weight gain propensity as compared to risperidone in observational settings. Presently, this interpretation requires more data for adequate support: in particular, studies observing quetiapine effects on BMI and BMI-Z at time-points longer than 12 months are needed. We may have found, albeit in limited datasets, suggestions that aripiprazole induces slower metabolic disturbances as compared to olanzapine and risperidone, and that ziprasidone is weight neutral. Suggestions for the clinical practice include: the opportunity to switch from olanzapine and risperidone to quetiapine, or aripiprazole, and possibly ziprasidone, when wanting to slow down or contain weight gain, obviously balancing safety with efficacy. Olanzapine and Clozapine must be considered as second choices in pediatric patients, due to a vastly more rapid and large adverse effect. BMI or even better BMI-Z should be strictly monitored during treatment with all antipsychotics, also in the long term. Being overweight should not be considered as a contraindication to beginning treatment with APs, yet patients must be monitored more strictly and possibly offered psycho-educational interventions on healthy lifestyles. Suggestions for future observational studies are: to measure both BMI and waist-hip/waist-wrist ratio, standardizing them to Z values; to report body measures also for patients who drop out of studies; to take into account baseline values as confounding factors. More observational studies, using standardized measures and lasting more than 12 months, are required to refine our preliminary interpretations.