Introduction

Bipolar disorder (BD) is a chronic and debilitating illness with an approximate prevalence of 1% (Ferrari et al. 2016). Mood stabilizers and antipsychotic drugs can treat mania but may not improve depressive symptoms making the treatment of bipolar depression more challenging (Calabrese et al. 2007; Perlis et al. 2006; Sachs et al. 2007). Moreover, antidepressants may lead to phase switching, particularly with monotherapy (Post et al. 2006; Viktorin et al. 2014). Though there is a lack of understanding of the underlying pathophysiology of bipolar depression, several new biological hypotheses are emerging, including neuroinflammation, neurodegeneration, and mitochondrial dysfunction (Kato 2007; Kato and Kato 2000; Myint and Kim 2014; Naaldijk et al. 2016; Pereira et al. 2018; Sigitova et al. 2017).

The previous studies have revealed that in BD, there is a reduction in measure and activity of electron transport chain complex I, downregulation of nuclear transcripts for proteins of the entire electron transport chain, increased lipid peroxidation, alterations in calcium metabolism, and increase in the production of reactive oxygen species (ROS) which potentially damage mitochondria, resulting in an exacerbation of mitochondrial energy production failure (Andreazza et al. 2010; Hagen et al. 2002; Kato 2008; Munakata et al. 2004; Naydenov et al. 2007). These advances in the understanding of the pathophysiology of the bipolar disorder suggest that interventions that target mitochondrial dysfunction may provide a therapeutic benefit (Pereira et al. 2018). The candidate mitochondrial agents used in BD are N-acetylcysteine (NAC), coenzyme Q10 (CoQ10), alpha-lipoic acid (ALA), acetyl-L-carnitine (ALCAR), creatine monohydrate (CM), S-adenosylmethionine (SAMe), melatonin, pyrimidines, choline, vitamin A, vitamin C, vitamin D, vitamin E, folic acid, etc. (Pereira et al. 2018). These agents help to normalize mitochondrial functions by various mechanisms like scavenging reactive oxygen species (ROS), maintaining membrane integrity, by their action as methyl donor, or regulating mitochondrial biogenesis and bioenergetics, etc. (Baek et al. 2013; Berridge 2017; Tan et al. 2016). The action of these drugs on the mitochondrial function has been tabulated in Table S1.

The literature search found that there are a few clinical trials and meta-analyses evaluating different mitochondrial agents in BD (Brennan et al. 2013; Forester et al. 2015; Jahangard et al. 2019; Jensen et al. 2008; Kay et al. 1984; Kishi et al. 2019; Kondo et al. 2011; Lyoo et al. 2003; Marsh et al. 2017; Mehrpooya et al. 2018; Murphy et al. 2014; Naylor and Smith 1981; Nierenberg et al. 2017; Pittas et al. 2021; Toniolo et al. 2018). However, inconclusive and contradictory results are the main hindrance in their therapeutic translation. We have used multiple treatment meta-analyses to provide a valuable summary for guiding decision-making for clinicians. Initially, we planned to use data integration from direct and indirect comparisons to summarize the results of various randomized controlled trials (RCTs) assessing the effects of single or combined therapies. But, as there were no studies comparing active interventions and most studies compared individual mitochondrial agents against placebo, either indirect or direct comparisons were available. Hence, this network meta-analysis has been planned to compare the efficacy of different mitochondrial agents in terms of change in depression rating scale in bipolar depression.

Network meta-analysis synthesizes data from all available evidence whether direct or indirect to compare multiple therapeutic options of interest. This method of analysis also possesses some assumptions which need to be fulfilled for erroneous results. The most important assumption of all direct evidence being connected in a network of comparisons and a plot for the same can be drawn for confirmation and macro-view of the pairwise relations. In a network meta-analysis, a network graph consists of nodes, depicting intervention of interest and edges connecting the nodes that represent direct comparisons. Size of the node and thickness of the lines in edges give a weighted representation of the strength of evidence available. In nutshell, the network geometry allows graphical visualization of available evidence for NMA. However, other metrics like rank probability and relative effects are needed for proper interpretation of data in a detailed manner because geometric network gives very raw representation (Tonin et al. 2019).

Method

Protocol development and registration

A standard network meta-analysis protocol was developed following PRISMA-P guidelines and was registered in the International Prospective Register of Ongoing Systematic Reviews (PROSPERO: CRD42021246296) (Moher et al. 2015). This network meta-analysis was conducted and reported in conformance to PRISMA Extension Statement for Reporting of Systematic Reviews Incorporating Network Meta-analyses of Health Care Interventions (PRISMA-NMA) statement (Hutton et al. 2016). The protocol of the meta-analysis was exempted from the full review and approved by the Institutional Ethics Committee, All India Institute of Medical Sciences (AIIMS), Bhubaneswar, as per ICMR National ethical guidelines (2017) for biomedical and health research.

Search strategy

We searched for MEDLINE/PubMed, Cochrane database, and the International Clinical Trials Registry Platform (ICTRP) for randomized controlled trials (RCTs) on treatment modalities targeting mitochondrial dysfunction in bipolar disorder published till 31 March 2021. Other databases like EMBASE, CINAHL, and AMED could not be searched due to financial constraints. Search terms were constructed using the PICO method applying the MeSH terms along with Boolean operators. The list of references for the published studies was also searched, and the International Clinical Trials Registry Platform (ICTRP) was checked for unpublished data. The detail search strategy with MeSH terms and Boolean operators has been presented in the supplemental text.

Study selection criteria

RCTs on agents targeting mitochondrial dysfunction in bipolar disorder either alone or in combination with first-line mood stabilizers were considered for inclusion. All included RCTs reported a change in symptom scoring for depression as an outcome measure. All studies published in peer-reviewed English language journals were included. The studies were not restricted by the date of publication. Commentary, editorials, case series, case reports, secondary analyses, and opinions were excluded.

Types of participants and intervention

Patients diagnosed with bipolar disorder (current stage depression) of either gender of age more than 18 years were considered inclusion criteria. The patients were on agents targeting mitochondrial dysfunction (NAC, CoQ10, ALA, ALCAR, CM, SAMe, melatonin, pyrimidines, choline, vitamin A, vitamin C, vitamin D, vitamin E, folic acid) either as monotherapy or as an add-on to standard therapy with mood stabilizers and antipsychotics. Placebo or any standard mood stabilizers like lithium and valproate were considered the control group.

Type of outcomes

The efficacy outcome chosen for this network meta-analysis was a change in depression rating scales like Montgomery–Åsberg Depression Rating Scale (MADRS) or Hamilton Depression Rating Scale (HDRS). The studies reporting a change in depression rating scales irrespective of the study duration were included.

Study selection and data collection

The relevant articles were screened in a stepwise manner. All articles found through database searches were screened initially based on title and abstract based on eligibility criteria. The full articles of the titles selected after initial screening were retrieved and read. Inclusion criteria were decided a priori to the literature search, and the articles that satisfied the selection criteria were included in the network meta-analysis. Three reviewers (AM, MJ, BRM) independently reviewed each of the studies for inclusion and excluded the non-relevant studies, and any disagreement between the three reviewers was solved amongst themselves in consultation with another reviewer (RM).

Data extraction and management

Data extraction was done using a predefined form including study characteristics, details of participants, interventions, comparators, and outcomes. Three authors (AM, MJ, BRM) extracted the data independently, while any disagreement between them was resolved by consensus or discussion with another author (RM). Data were in the forms of plots in some studies and were converted to numbers using plot digitizer.

The geometry of the network

A network plot was constructed for mitochondrial agents as single or combined treatments in bipolar depression for efficacy outcome. In the network plot, each circle represents the intervention, and the size of the circle represents the number of participants receiving the intervention. The thickness of the lines connecting the circles is proportional to the number of studies available. If any closed triangles were to be found, node-splitting analysis which enables comparison between direct effects with that of indirect effects obtained via network meta-analysis was planned.

Data analysis

The contrast level approach was used for frequentist network meta-analysis and the arm-level approach for Bayesian network meta-analysis. For the frequentist network meta-analysis approach, each row represents data from the individual study as all the studies were two armed. The R package “netmeta” was used for conducting frequentist network meta-analysis (Gerta Rücker, 2015). Random-effects network model was built with fitted value for each comparison in this network meta-analysis. The I2 statistic was used to assess heterogeneity/inconsistency in the network model, and Q statistic was determined to know design-specific decomposition of comparison wise inconsistency. As there were no closed loops in the network graph, we did not try to visualize direct and indirect evidence. P-scores, which measure certainty that one treatment is better than another treatment, averaged over all competing treatments, were used to rank treatments from most beneficial to least beneficial. Network forest plots were created, taking placebo as the comparator. The results for relative effects of comparisons have been represented as mean difference and 95% confidence interval (95% CI). Comparison-adjusted funnel plots were created to assess publication bias. Meta-regression was performed using package “metafor”, taking the duration of treatment as a moderator variable (Viechtbauer 2010).

For the Bayesian approach, individual rows of data consisted of the treatment arm and included data from all included studies. Data were analysed using the GeMTC graphic user interface, which uses the R “gemtc” package for Bayesian analysis (Harrer et al. 2021; van Valkenhoef et al. 2012). The random-effects variance consistency model using Markov chain Monte Carlo simulations and non-informative prior probabilities were used to estimate the mean difference. Five thousand burn-in iterations and 40,000 inferences with a thinning factor of 10 were used to generate the models. Convergence was assessed using the Brooks-Gelman-Rubin diagnostic tool and visual inspection of time series and density plots. The potential scale reduction factor (PSRF) for each of the models was approximately 1 (< 1.001). The results for relative effects have been represented as mean difference and 95% credible interval (95% CrI). The distribution of probability for ranking mitochondrial agents for efficacy outcome was plotted. Surface under the cumulative ranking (SUCRA) scores were calculated using the dmetar package. The model fit statistics like residual deviance (Dres) and leverage (pD) were calculated to check the model fit and assess each data point’s influence on the model parameters. However, there is one other approach available for Bayesian analysis, where numerical quantification is done for the associated uncertainty for each study to be outlier. This numerical estimate is called Bayes factor, and then predictive priors are incorporated into NMA model to adjust for outliers. However, these steps make this analysis more cumbersome process than using posterior predictive models as gemtc. Second advantage of using present method is that it uses simulation approach, which is more appropriate for sparse data as is the case of this NMA (Zhang et al. 2015).

We have tried to compare and contrast frequentist and Bayesian approaches to estimate treatment effects of mitochondrial agents. Netmeta uses graph-theoretical model, while gemtc uses hierarchical binomial-normal model for network meta-analysis. A linear model is employed to estimate treatment effect, and it is assumed that study-specific treatment effect has a random distribution when contrasted against reference treatment in frequentist framework. In Bayesian approach, likelihood is placed on trial arms when using package gemtc. With arm level data, the events per trial arm are modelled as realizations from a binomial distribution with probability equal to the observed event rate in the trial. The treatment effect is then estimated using generalized linear model (logistic regression) for the specific arm. However, heterogeneity structure is similar in both netmeta and gemtc. Ranking probabilities of treatments are calculated using P- and SUCRA scores in frequentist and Bayesian approaches, respectively. While P-score evaluates the certainty that a treatment is better than another one, averaged over all competing treatments, SUCRA is the inversely scaled average rank of treatment in a network. SUCRA is the estimate of the posterior probability that a treatment is best.

Risk of bias assessment

The risk of bias in individual studies was assessed using the standardized risk-of-bias assessment tool 2 (RoB2) of the Cochrane Collaboration (Cumpston et al. 2019; Sterne et al. 2019). As per RoB2, bias is assessed in five distinct domains (bias arising from the randomization process, bias due to deviations from intended interventions, bias due to missing outcome data, bias in measurement of the outcome, bias in selection of the reported result). Within each domain, one or more signalling questions are answered, which lead to judgements of “low risk of bias”, “some concerns”, or “high risk of bias”, and the judgements within each domain lead to an overall risk-of-bias judgement. Three reviewers (R.M., B.R.M., and A.M.) independently evaluated and recorded their judgements and justifications in each domain for each included study. Any disagreement was resolved by the consensus of the majority.

Quality of evidence

GRADE approach for rating the quality of evidence generated was used. Estimates from direct and indirect estimates were rated for study design, risk of bias, indirectness, inconsistency, imprecision, and publication bias. If only direct or indirect evidence is available for a given comparison, the network quality rating is based on that estimate. When both direct and indirect evidence is available for a particular comparison, the higher of the two quality ratings as the quality rating for the NMA estimate could be used, but we did not have both for any pair of comparisons (Puhan et al. 2014). We have also tried to sort network meta-analysis results based on the quality of evidence and effect estimates (Brignardello-Petersen et al. 2020).

Results

Search results and study characteristics

Through a systematic literature search, 2659 publications were identified, out of which 2614 articles were excluded at the first level of screening, assessing the title and the abstract. The reasons for exclusion in the first level were review articles, case report, case series, secondary analysis, commentary, pre-clinical studies, unrelated studies, and letter to the editor. Full-text records of the remaining 45 articles were retrieved and screened. In the second level of screening, 30 articles were excluded, and 15 studies were included for the present network meta-analysis. The stepwise process of the study selection has been presented through the PRISMA flow chart (Fig. 1). The vital study characteristics of each included study have been summarized in Table S2 (Bauer et al. 2018; Berk et al. 2008, 2012, 2019; Brennan et al. 2013; Coppen et al. 1986; Ellegaard et al. 2019; Jahangard et al. 2019; Lyoo et al. 2003; Magalhaes et al. 2011; Marsh et al. 2017; Murphy et al. 2014; Romo-Nava et al. 2014; Toniolo et al. 2018; Yatham et al. 2016). Risk of bias was assessed for individual studies and represented in Table S3.

Fig. 1
figure 1

PRISMA flow diagram for the study selection process

Summary of network geometry

A total of 15 studies with 904 patients with bipolar depression were included in the present network meta-analysis. Amongst them, 13 studies were two-arm studies, one three-arm study, and one four-arm study. From the three-arm and four-arm studies, the data have been extracted from the intervention of interest (two arms) for the present meta-analysis. The network plot has been depicted in Fig. 2.

Fig. 2
figure 2

Network plot of the included studies. Each node represents the intervention in comparison, and the edges represent available direct evidence. Size of the node and thickness of the edges represent the weightage of the available direct evidence. There are no closed triangles available; thus, studies with direct comparisons between treatments are not available

Analysis of comparison of all possible treatments

Frequentist approach

In the frequentist model, the fitted value for each comparison was calculated in a random-effects model and has been depicted in Table S4. The test of heterogeneity for the model was statistically significant (τ2 = 1.0172, I2 = 91.6% (84.4%, 95.4%)) and showed a high heterogeneity. Within designs, heterogeneity was higher than between design inconsistency, and with design-specific decomposition analysis, Q statistic was found to be 59.37 (p < 0.0001) for NAC versus placebo, which was statistically significant. The net treatment effect of all possible interventions with respect to one another and placebo has been represented in Table S5(A). NAC was found to be the best treatment as shown by the highest P-scores, followed by CoQ10 and combination therapy of ALA plus ALCAR, which can be further confirmed by visualizing forest plot as depicted in Fig. 3A. The P-score of all the therapies has been tabulated in Table S6.

Fig. 3
figure 3

Relative effects plots. Comparative efficacy of all possible treatments as compared to that of placebo: A frequentist approach and B Bayesian approach. The “zero” on x-axis represents the line of no difference between intervention of interest and placebo. NAC appears to be more efficacious than placebo in frequentist approach, but none of the treatment is better than placebo as per Bayesian approach

A comparison-adjusted funnel plot was made, and no significant publication bias was found (Fig. 4). This finding was confirmed by a non-significant p-value (p = 0.423) of the Egger linear regression equation. The meta-regression model accounting for effect of duration of therapy on the efficacy outcome in the included studies was not significant (coefficient =  − 0.0101; se = 0.229; p = 0.6602).

Fig. 4
figure 4

Comparison-adjusted funnel plot. Publication bias was found to be non-significant

Bayesian approach

The Bayesian model consists of all possible comparisons of interventions, whether monotherapy or combination, within the included studies, compared with placebo. The random-effects standard deviation for the model is 3.19. As a key metric of model fit, residual deviance (Dres) was calculated for each data point. Each data point was found to contribute about one or less to the residual deviance (Dres = 30.2, number of data points = 30), suggesting a well-fitting model. Leverage statistics were also done to assess the influence of each data point on the model parameter. The mean difference for coenzyme Q10 was − 5.8 (95% CrI: -24 to 13), SAMe − 5.6 (95% CrI: − 30 to 20), NAC − 5.4 (95% CrI: − 13 to 2.5), vitamin D 4.7 (95% CrI: − 14 to 24), ALA + ALCAR − 3.1 (95% CrI: − 22 to 16), agomelatine − 0.25 (95% CrI: − 19 to 18), CM − 2.3 (95% CrI: − 22 to 17), choline 0.23 (95% CrI: − 23 to 24), folic acid 1.3 (95% CrI: − 17 to 20), and melatonin 1.4 (95% CrI: − 18 to 21). All the treatments compared to placebo for the outcome of change in MADRS score had a mean difference whose 95% credible interval crossed the line of null effect. The relative effects of treatments with respect to each other in direct and indirect comparisons as well as that achieved from network meta-analysis have been tabulated in Table S5(B), and the relative effects plot in comparison with placebo has been represented in Fig. 3B. Node-splitting model could not be created due to the lack of closed triangles in the network plot. For a change in depression rating scales as an outcome measure, the highest probability of the first rank was SAMe, followed by CoQ10 and then by choline. The probability of the second rank was greatest for CoQ10, followed by NAC. The third rank probabilities were highest for NAC, CoQ10, and ALA + ALCAR in decreasing order. The probabilities for all the ranks have been plotted and depicted in Fig. 5. The SUCRA score depicting cumulative rank probabilities for each treatment has been depicted in Table S6.

Fig. 5
figure 5

The rank probability of different mitochondrial agents in bipolar depression for change in depression rating scale as outcome parameter

The NMA results sorted based on the quality of evidence for change in depression rating score for interventions when compared with placebo have been tabulated in Table 1. We found that NAC may be more effective than placebo, but the rest of the interventions were clearly not better than placebo.

Table 1 Network meta-analysis results sorted based on GRADE certainty of evidence for the comparisons of mitochondrial agents versus placebo for bipolar disorder

Discussion

The present network meta-analysis summarizes 15 RCTs containing 904 patients with bipolar disorder assessing the efficacy of mitochondrial agents in bipolar depression with change in depression rating scores as the outcome of interest. Both frequentist and Bayesian approaches were used to calculate the relative effect sizes of interventions targeting mitochondrial dysfunction with respect to one another as well as placebo. NAC appeared to be the most efficacious treatment amongst the mitochondrial agents for BD, followed by CoQ10 and combination therapy of ALA and ALCAR. But, at the same time, none of the therapeutic agents (except NAC) proved to have a significant efficacy over placebo.

Despite the findings of morphological and physiological dysfunction in mitochondria in earlier studies, none of the existing mitochondrial agents has been able to alleviate symptoms of patients in BD significantly (Cataldo et al. 2010; Pereira et al. 2018). This may also be the reason for sparse data available in the form of RCTs. Though these agents are promising and well-tolerated without any serious adverse effect, they have not shown to be effective for symptomatic relief (Kuperberg et al. 2021).

The heterogeneity in this network meta-analysis was very high. Between-design inconsistency was higher than within design heterogeneity. Meta-regression was carried out to take care of the heterogeneity, taking the duration of therapy as a moderator variable, but the duration of therapy had no statistically significant effect on reduction in depression rating scores. Though the effect estimates were different between the frequentist and Bayesian analysis, the direction of estimates was conserved for both. In addition, the SUCRA score and P-scores were almost identical to each other, and the treatment rankings are the same for both approaches. No closed triangles were formed in the network graph, and thus, the assumption of consistency could not be tested, as only direct (comparison to placebo) or indirect evidence (comparison to other active intervention) were available.

Confidence intervals in frequentist approaches were smaller than the credible intervals of the Bayesian methods, but that may be at the cost of coverage, as coverage and precision are reciprocally related. It has been shown that coverage in scenarios of high heterogeneity and sparse data, i.e. one or two trials per contrast, is higher when using Bayesian methods rather than frequentist (Seide et al. 2020). Overall, coverage also increases with sparser data in high heterogeneity circumstances.

The frequentist approach considers the value of parameters available in the study as fixed constants to make inferences, whereas the Bayesian method deals with the degree of uncertainty by probabilistic theory using priors (Shim et al. 2019). However, the magnitude for effect estimates for both approaches converge when applied to large data. While sorting evidence, NAC has low certainty of the evidence of being better than placebo. This clearly depicts that more randomized controlled trials with a large sample size should be done to derive quality evidence before confirming or deferring its role in bipolar disorder.

The most important limitation of this study was sparse data, i.e. only one clinical trial for most of the comparisons, unavailability of clinical trials with the comparison between various mitochondrial agents. Another important limitation was the non-inclusion of studies from databases like EMBASE, CINAHL, and AMED as a few of trials with our search strategy might have been missed. The safety data could not be analysed for the interventions as some of the included studies have not reported adverse effects.

Conclusion

In conclusion, the findings of the present network meta-analysis showed that the existing mitochondrial agents are not efficacious in bipolar depression, but targeted therapies with better efficacy profiles could be developed in future. Furthermore, when comparing the two network meta-analysis approaches, the effect estimates may vary in magnitude but not on the direction of effects or treatment rankings. Future clinical trials should be conducted to generate conclusive data on the efficacy and safety of mitochondrial agents using established active comparators.