FormalPara Key Points

This systematic review and meta-analysis examined the available pilot study and randomised controlled trial (RCT) evidence for drugs used to treat sleep and circadian rhythm disturbance in bipolar disorder (BD).

Despite sleep and circadian rhythm disturbance constituting common features of BD, few RCTs have adequately assessed these outcomes.

The largest efficacy signal was detected for manic symptoms in studies that used adjunctive melatonin treatment during bipolar mania.

1 Introduction

Bipolar disorder (BD) is a chronic affective disorder characterised by recurrent episodes of increased activity and elated or irritable mood (mania) and predominant episodes of low mood, energy, and motivation (depression). Pharmacotherapy (e.g. lithium, antiseizure medications, and second-generation antipsychotics) is the mainstay of treatment for acute mood episodes with continuation recommended during the maintenance phase of periods of relative mood stability (euthymia) [1]. Even though effective treatments exist, the recurrence rate is high; half of patients relapse within 2 years and 70–90% within 5 years [2]. Residual depressive symptoms and mood elevation during euthymia predict future depressive and manic episodes, respectively [2]. In addition to targeting core manic/depressive symptoms during acute episodic manifestations, identifying and managing subsyndromal symptoms during inter-episode phases are key treatment goals [3].

Sleep abnormalities are common in all phases of BD and constitute core symptoms of both bipolar depression and elevated mood. Insomnia and reduced subjective sleep need are core symptoms of mania, and insomnia or hypersomnia are core symptoms of depression [4]. Clinically significant sleep disturbances persist during euthymia in the majority of patients [5] and actigraphy studies reveal abnormal patterns of objectively estimated sleep throughout acute mood episodes and inter-episode periods [6, 7]. Several lines of evidence suggest that sleep disturbance is associated with worse clinical outcomes and relapse. Systematic review of the BD prodrome identifies sleep disturbance as the most common early symptom of mania and sixth most common early symptom of depression [8]. Prospective studies demonstrate that poor sleep in euthymic BD patients is associated with worse residual depression and mood elevation and predicts earlier mood episode recurrence [9, 10]. Appropriate treatment of sleep may be beneficial for managing acute mood episodes and preserving mood stability during euthymia.

Recently, there have been calls to prioritise sleep assessment and treatment for people with BD that likely have a comorbid sleep disorder or enduring sleep symptoms [11]. Prevalence estimates from a multicentre study suggest that insomnia disorder is present among 40% of BD patients [12]. Pilot trial data show that cognitive behavioural therapy for insomnia (CBT-I) modified for BD may be an efficacious therapy [13] and has been recommended during maintenance treatment [14]. Short-term sedative-hypnotic drug treatments for insomnia disorder are recommended when CBT-I is ineffective or unavailable [15, 16]. Yet, little is known about the efficacy of pharmacological interventions for insomnia and sleep disturbance in patients with BD. Benzodiazepine drugs have been used acutely to manage insomnia, anxiety, and high levels of irritability/restlessness during mania, but long-term use is not recommended [17]. Chart-review evidence suggests that non-benzodiazepine hypnotics (NBZHs) are commonly used and are effective for managing insomnia in BD [18]. However, the role of the use of NBZHs and that of other sleep medications is presently unclear in BD and efficacy studies for these medications in BD are yet to be assessed by systematic review. Moreover, the treatment mechanism of such drugs is principally through their hypnotic effects, but this does not inform our understanding of treatments targeting the circadian rhythms that underpin healthy sleep.

Sleep is regulated by both a homeostatic and a circadian process [19]. The circadian process is generated and maintained by the circadian timing system; a network of brain and peripheral tissues individually expressing circadian rhythms that are controlled hierarchically by the suprachiasmatic nucleus (SCN), which acts as a central pacemaker for the system [20]. The principal purpose of the circadian timing system is to synchronise or ‘entrain’ individual rhythms to operate in stable circadian phase (i.e. timing) with each other and to the environment. Circadian rhythm disturbance may emerge from any factor that perturbs the optimal function of the circadian timing system or how it is entrained to the environment. Such disturbance can arise, for example, from dysfunctional phase relationships within the circadian timing system causing internal desynchrony of endogenous circadian rhythms or from mismatch between the circadian timing system and environmental demands (e.g. school/work schedule), which in turn can severely negatively impact sleep–wake patterns [21].

Circadian rhythm disturbance is common in BD and likely contributes to maladaptive sleep–wake patterns. Multiple studies have found circadian rhythm disturbances in molecular, physiologic, and behavioural parameters in BD [22, 23]. A delayed-phase circadian phenotype is sustained during the inter-episode period of BD [24], which may contribute to sleep onset and maintenance insomnia and daytime sleepiness. Abnormal monoamine signalling and metabolism have been linked to repeated circadian phase delays in animal models of circadian rhythm disturbance [25]. A similar pattern of circadian rhythm disturbance in humans may impact brain circuits responsible for mood regulation that are also involved in the pathophysiology of BD. Furthermore, phase advances and delays of circadian rhythms in BD have been shown to precede the onset of mania and depression, respectively [26, 27]. This suggests that circadian rhythm disturbance might be an important precipitating factor for worsened symptoms or relapse in BD.

Melatonin is a key circadian hormone that expresses a robust circadian rhythm and acts as an important endogenous modulator of the circadian timing system. Unlike hypnotic drugs, which mainly have a GABAergic mechanism of action, a principal therapeutic mechanism of melatonin is its powerful modulatory effect on the circadian timing system. Melatonin may be deployed as a chronotherapy, where it is used to advance sleep–wake patterns that are phase delayed and not entrained optimally to the environment [28]. There is also evidence to suggest that the melatonin-receptor agonists ramelteon and agomelatine have similar circadian phase-advancing effects [29, 30]. Studies of the circadian rhythm in BD suggest that melatonin is reduced in amplitude and delayed in phase [31, 32]. Thus, treatment with exogenous melatonin and melatonin-receptor agonists is hypothesised to have potential adjunctive benefits for BD due to the phase-advancing effects of these substances on the circadian timing system [33]. A meta-analysis of melatonin and melatonin-receptor agonist randomised controlled trial (RCT) studies in BD suggests potentially lower relapse rates due to depression, but limited support for improvement in sleep quality or manic symptoms [34]. However, this previous review focused on melatonin and ramelteon only. Findings from recent RCTs examining the potential effects of agomelatine and melatonin treatment on BD depression and mania have yet to be assessed systematically. Moreover, the possible effects of treatment on sleep and circadian rhythms in people with BD merits systematic review.

Here, we present a comprehensive systematic review of experimental intervention studies in BD examining NBZHs and substances with either a primary indication for sleep disturbance such as ‘z-drugs’ (i.e. zaleplon, zolpidem, zopiclone and eszopiclone) or an established pattern of off-label use as a hypnotic medication. Furthermore, we review studies examining exogenous melatonin and melatonin-receptor agonist drugs with a potential chronotherapeutic influence used to treat BD during either acute or maintenance treatment phases. RCT studies examining the same symptom outcome were additionally selected for meta-analysis. While evidence suggests that some mood stabilisers and antipsychotics may have a direct or indirect influence on sleep and circadian rhythms [35, 36], these substances were beyond the scope of this review. Our aim was to examine the therapeutic potential of hypnotic agents and melatonin/melatonin-receptor agonists in BD as emerging novel treatments, instead of the putative circadian rhythm-related mechanisms of standard medications used for treating BD. The primary outcome parameters of this review were sleep quality and insomnia symptoms. Secondary outcomes comprised symptoms of mania and depression. An appraisal of study quality/risk of bias was also performed. Additionally, we evaluated the trends in study assessment and reported methods to enable recommendations for future studies.

2 Methods

2.1 Protocol and Registration

A review protocol outlining the following methodology and analytic approach was pre-registered on PROSPERO (reference number: CRD42020167528, https://www.crd.york.ac.uk/prospero/). A feasibility amendment to the protocol and results of search updates are described in the supplementary methods (see electronic supplementary material [ESM]). Initially we planned to include only RCTs in this review, but upon running our initial search we discovered several non-randomised/non-controlled studies that we judged to be potentially informative for understanding methodological trends and experimental outcomes reported in this literature. Thus, we expanded our criteria to include these studies but summarised these separately from the RCT findings. This systematic review was conducted in accordance with the Preferred Reporting Items of Systematic Reviews and Meta-analyses (PRISMA) guidelines [37].

2.2 Search Strategy

Study records were identified via a search of the electronic databases AMED, Embase, MEDLINE, and PsychINFO from inception to February 1, 2020. The following search terms were used: (‘sleep’ OR ‘sleep disturb*’ OR ‘insomnia’ OR ‘circadian’ OR ‘parasomnia’ OR ‘hypersomnia’) AND (‘bipolar disorder’ OR ‘mania’ OR ‘bipolar depression’ OR ‘euthymia’) AND (‘pharmacological’ OR ‘drug’ OR ‘antihistamine’ OR ‘hormone’ OR ‘agomelatine’ OR ‘chloral’ OR ‘clomethiazole’ OR ‘diphenhydramine’ OR ‘doxepin’ OR ‘doxylamine’ OR ‘eszopiclone’ OR ‘melatonin’ OR ‘promethazine’ OR ‘ramelteon’ OR ‘suvorexant’ OR ‘zaleplon’ OR ‘zolpidem’ OR ‘zopiclone’). The aforementioned named drugs were selected by three authors (NMMcG, DSK, KEAS) based on having an indication for insomnia or sleep difficulty, or a wide pattern of usage for these complaints such as through off-label or non-prescription use, as well as those that were substances with a melatonergic mechanism of action. A description of each substance named above is included in the supplementary materials (see ESM). All retrieved title and abstract records were screened for inclusion to identify studies relevant to the review objectives. If the title and abstract indicated that the study could potentially meet the inclusion criteria, then the full-text record was retrieved. Titles and abstracts were screened by two of three authors (DSK, MdAC, NMMcG) with discrepancies adjudicated by the third author. Full-text records were screened in a similar manner with discrepancies resolved by consensus decision of all three authors. Based on the discovery of studies published since our initial search, we conducted an updated search on August 31, 2020 and updated this again on October 31, 2021.

2.3 Study Selection

The search was limited to human studies, published in English, in peer-reviewed journals. Experimental studies that examined the effect of any of the substances named in our search strategy in patients with BD were selected for inclusion. Permissible study designs were RCT studies using parallel-group or cross-over allocation. We also permitted experimental studies that were classified as non-controlled and/or non-randomised pilot trials that assessed the clinical feasibility of an intervention (henceforth termed ‘feasibility studies’). Due to differences in study design between feasibility studies and RCTs, the findings from each study type were presented separately. One study (Leibenluft et al. [38]) reported data from the first five patients of a cross-over RCT with limited statistical quantification of study outcomes (instead having a descriptive focus on physiologic outcomes) and had no later publication of a completed trial. Hence, it was included as a feasibility study. Participants included adults only (≥  18 years old) with no upper age limit. Participants in included studies consisted of individuals diagnosed with BD based on validated psychiatric assessment using DSM/ICD diagnostic criteria. To expand our range of detectable studies, our initial strategy included studies with samples that comprised ≥ 80% BD cases, also confirmed by psychiatric assessment. However, in all the studies meeting inclusion criteria the individuals in each ‘BD group’ consisted wholly of patients with clinically confirmed BD. In the case of RCTs, studies had to contain a placebo control group, or a comparator group using another medication or receiving ‘treatment as usual’.

2.4 Exclusion Criteria

Studies that included individuals with a comorbid or clinical history of neurodevelopmental disorder or neurological disease were excluded. Studies examining treatment outcomes as a result of antipsychotic, antimanic, or mood-stabiliser interventions were not included in this review. However, their usual use in studies where hypnotic or melatonin/melatonin-receptor agonists were applied as an adjunctive treatment was permitted. We expected to detect studies that used short-term benzodiazepine treatment to stabilise agitation and restlessness in manic patients as is sometimes the case during acute management of bipolar mania. Consequently, studies using benzodiazepines acutely for manic or hypomanic patients, in both treatment and placebo control groups, were also included in this review. Similarly, studies reporting concurrent use of benzodiazepines as part of usual maintenance treatment were permitted where there was an established baseline period prior to randomisation and where dosage was not altered between baseline and the study endpoint. All other studies investigating benzodiazepine use as a primary intervention or those investigating an outcome related to benzodiazepine dependence or cessation deviated from the review protocol and were excluded.

2.5 Critical Appraisal and Risk of Bias

Individual studies were assessed for potential bias by two independent reviewers (DSK and MdAC) with disagreement between ratings resolved by a third reviewer if necessary (NMMcG). The methodology of RCT studies was assessed using the revised Risk of Bias tool 2.0 (RoB2) developed by the Cochrane collaboration [39]. The following factors were judged to have a risk of bias designation of ‘Low’, ‘Some concerns’, or ‘High’: bias due to randomisation, bias due to deviations from the intended interventions (separate factors for assignment and adherence), bias due to missing data, bias due to outcome measurement, and bias due to selection of the reported result. An overall RoB2 score was also generated. The primary outcome parameter of each study was selected for appraisal using the RoB2 tool and is underlined and indicated in bold in Table 2.

We predicted that the methodology of feasibility studies would be too heterogeneous to assess using a risk of bias tool designed for non-randomised or cohort studies. However, we judged it important to appraise the rationale, methods, and clarity of reporting standards of these studies. Therefore, we used a modified version of the Appraisal tool for Cross-Sectional Studies (AXIS) [40] because of the ubiquitous applicability of most AXIS items to determine the reporting standards of a diverse range of studies beyond those that are cross-sectional. We modified the scale by excluding two items that were only relevant for cross-sectional studies (i.e. categorisation and reporting of non-responders). We also used the modified AXIS tool to appraise the reporting standards of RCTs as the tool captured additional useful information not covered by the RoB2.

2.6 Data Handling and Analysis

Primary measures of interest were participant questionnaires and sleep logs or clinician-rated scales that assessed sleep quality and insomnia symptoms. Secondary measures of interest were symptoms of mania and depression assessed by participant questionnaires or clinician-rated scales. Instruments used by each study to assess the primary and secondary measures of interest are indicated in Tables 1 and 2. All relevant findings emerging from feasibility studies and RCT studies were summarised and are described separately according to their respective study design and intervention.

Table 1 Experimental feasibility study characteristics and summary of findings
Table 2 RCT study characteristics and summary of findings

Additionally, all relevant RCT study outcomes were extracted for quantitative analysis. For each included study, effect sizes between intervention and placebo control group at post-intervention were calculated using Hedges’ g and corresponding confidence intervals were calculated using 95% confidence intervals. Effect sizes were calculated using published means and standard deviations. These data consisted mainly of the reported raw means of completers or imputed data using the last observation carried forward (LOCF) method in the case of attrition of randomised participants before trial endpoint. In instances where other measures of central tendency or dispersion were reported (commonly standard error of the mean or confidence intervals), effect sizes were estimated using Cochrane collaboration-recommended procedures [41]. If the relevant information required to calculate effect sizes was not reported in the published paper or materials, then the corresponding author was contacted in order to obtain these data.

Meta-analyses were conducted using random-effects models to estimate weighted pooled effect sizes (DerSimonian-Laird estimator) [42]; these were preferred over fixed-effects models due to expected heterogeneity between studies. Heterogeneity was estimated using the I2 statistic and corresponding 95% confidence intervals. All analyses were performed using R (version 3.6.3, R Core Team, Vienna) using the following packages: ‘esc’ [43], ‘meta’ [44] and ‘dmetar’ [45] and ‘robvis’ [46].

3 Results

3.1 Details of Studies Included in the Review

A PRISMA flow diagram depicting the study selection process is shown in Fig. 1. Among the 3884 records retrieved by database searches, 710 duplicate records were excluded and a further 3125 records were excluded following title and abstract screen. After full-text review of the remaining articles, a total of nine studies met the inclusion criteria of this review. After the initial database search was conducted, two additional articles were published and were identified via database alerts (Moghaddam et al. [47] in March 2020 and Quested et al. [48] in May 2020). Both were RCTs that following full-text review were judged to meet the inclusion criteria defined in our protocol, thus bringing the total number of studies identified to 11. Subsequent repeated searches did not reveal additional studies that met inclusion criteria (additional search results are described in the supplementary methods, see ESM).

Fig. 1
figure 1

PRISMA flow diagram of included studies

Among the studies identified were five clinical feasibility studies consisting of three open-label studies [49,50,51], one randomised cross-over placebo-controlled feasibility trial [38], and one mini-longitudinal intervention cohort study with a clinical control group [52]. Six remaining studies were all placebo-controlled, parallel-group RCTs [47, 48, 53,54,55,56]. All studies were published between 1997 and 2020 and recruited a total of 1279 participants (722 female, 56%).

The BD presentations focused on by studies were as follows: BD-I (n = 7, 64%), BD-II (n = 1, 9%), combined BD-I or BD-II (n = 2, 18%), and rapid-cycling BD (n = 1, 9%). Depression was the most common mood state (n = 6, 55%), followed by mania (n = 3, 27%) and finally stable/euthymic state (n = 2, 18%) (excluding the study by Leibenluft et al. [38] on rapid-cycling BD patients). Interventions used by studies involved melatonin (n = 4) or melatonin-receptor agonists (agomelatine = 4, ramelteon = 3).

3.2 Summary of Feasibility Study Findings

Key characteristics of feasibility studies are presented in Table 1. Five studies were published between 1997 and 2014, and these included data from 88 participants (55% female).

Three studies examined agomelatine treatment, each using Hamilton Depression Rating Scale (HDRS) scores as a primary outcome. Calabrese et al. [50] assessed 21 depressed BD-I patients treated with adjunctive agomelatine, taken in the evening during a 6-week open-label study with an optional extension period of 12 months. The authors reported an 81% response rate and 38% remission rate of depression at the week 6 endpoint. Sleep symptoms assessed via three HDRS sleep items were found to decrease but were not statistically analysed.

Fornaro et al. [51] assessed 28 depressed BD-II patients treated with adjunctive agomelatine taken at bedtime in a 6-week, open-label study and showed a 64% response rate on depression symptoms at study endpoint. Sleep disturbance assessed via the Pittsburgh Sleep Quality Index (PSQI) was statistically significantly reduced at study endpoint compared with baseline.

Tyuvina and Smirnova [52] examined the effects of agomelatine treatment for 8 weeks in a mixed sample of 23 depressed BD-I and BD-II patients compared with a clinical control group of 22 patients with recurrent depressive disorder. Agomelatine was applied as monotherapy, except in the case of BD-I where previously prescribed valproate or lamotrigine were maintained. The authors report a 91% depression response rate and 65% remission rate in the BD group at week 8. The BD group had a statistically significantly better remission rate than the depressed control group (55%). The authors report statistically significant improvements in sleep outcomes throughout the study, assessed by visual analogue scales. However, no standardised sleep instrument was used in this study and the sleep dimensions were described as non-specific patient ratings of sleep.

Two studies examined melatonin treatment. Bersani and Garavini [49] examined adjunctive melatonin treatment in a sample of 11 manic BD-I patients with treatment-resistant insomnia receiving ongoing antimanic treatment. The authors found a statistically significant improvement in mania severity and longer self-reported sleep duration at the 30-day study endpoint, assessed using a study-specific questionnaire.

Leibenluft et al. [38] conducted a 12-week, double-blinded, placebo-controlled, randomised cross-over trial of melatonin 10 mg in five rapid-cycling BD patients. Melatonin treatment was added to each patient’s stable regimen of medication and was administered at 22:00 in the evening. The authors did not find a statistically significant difference in daily observer-rated depressive or hypomanic or manic symptoms and reported no appreciable differences in sleep logs recording sleep onset, wake onset, and sleep duration. Sleep log accuracy was limited by a recording resolution of 15 min. The authors of this study noted that after withdrawal of treatment one participant developed an unstable sleep–wake cycle in which sleep onset and offset progressively phase delayed each day, rotating through an entire 24-h cycle within 3 weeks. The authors concluded that in this subject melatonin withdrawal may have precipitated a free-running circadian rhythm pattern (i.e. a condition where sleep–wake behaviour is unentrained by the environment, responding only to the endogenous influence of the circadian timing system). However, it may also be difficult to identify a free-running circadian rhythm in patients with rapid-cycling BD as behaviourally erratic sleep–wake patterns might present similarly to a circadian rhythm sleep–wake disorder. The authors also describe suppression of melatonin in blood samples obtained from two participants between baseline and post-treatment and suggest that this may have been caused by the administration and subsequent withdrawal of melatonin treatment. However, the magnitude of melatonin suppression was determined by numeric differences in melatonin profiles and could not be statistically analysed as there were too few participants to compare. The small sample size of this study did not permit any further statistical comparisons and there were no relevant data reported in this study to include in the meta-analysis with other RCT studies.

3.3 Summary of RCT Findings

Key characteristics of RCTs that met inclusion criteria are presented in Table 2. Six studies were published between 2010 and 2020, which in total included 1191 participants randomised to an intervention (57% female). All studies were double blinded and placebo controlled. The overall rate of intervention completion was 51%.

Yatham et al. [56] conducted the only study examining agomelatine. Participants were 344 depressed BD-I patients recruited through a large multicentre trial from centres in 15 countries. Treatment consisted of agomelatine taken at 20:00 in the evening for 8 weeks, adjunctive to mood stabiliser treatment. The authors found no differences between groups on the Montgomery–Åsberg Depression Rating Scale (MADRS) primary efficacy outcome, nor on any secondary symptom outcome. Sleep rated via the Leeds Sleep Evaluation Questionnaire (LSEQ) did not differ between treatment arms.

Two studies examined melatonin treatment, both during manic mood state. Moghaddam et al. [47] examined the effect of melatonin 6 mg adjunctive to antimanic treatment for 6 weeks in a trial involving 60 BD-I patients. The authors found a statistically significant greater decrease in manic symptom severity in the melatonin-treated group, which was assessed by the Young Mania Rating Scale (YMRS). A greater proportion of the melatonin group had a large treatment response compared with placebo, registered via Clinical Global Impression (CGI) scores. No differences in depressive symptom outcomes were detected and sleep outcomes were not assessed.

Quested et al. [48] examined melatonin 2 mg modified-release treatment for 3 weeks in a trial involving 41 BD patients during mania or hypomania. Both BD-I and BD-II patients were eligible for enrolment in this trial. Treatment was adjunctive to previously prescribed medication and acute antimanic treatment involving lithium and antipsychotic medication. The authors found no statistically significant difference in improvement between intervention and placebo control group on the mania symptom primary efficacy outcome, assessed using the YMRS. Secondary symptom outcomes suggested a greater proportion of symptom improvement among the melatonin group on self-reported mania symptoms assessed by the Altman Self-Rating Mania scale (ASRM), and depressive symptoms assessed by the Quick Inventory of Depressive Symptomatology Self-Reported (QIDS-SR). No group-wise differences were found for sleep outcomes assessed via the LSEQ and wrist-worn actigraphy, the baseline of which was limited to only one night before randomisation.

Three studies examined ramelteon use. McElroy et al. [54] examined adjunctive ramelteon treatment for 8 weeks in a trial involving 21 BD-I patients with mild-to-moderate mania and sleep disturbance. Participants were instructed to take the medication 30 minutes before bedtime. This study was the only RCT in this review to examine insomnia symptom change as a primary outcome, which did not differ between ramelteon and placebo control groups. No statistically significant differences in secondary outcomes, inclusive of YMRS mania severity, were detected in the endpoint analysis. Longitudinal analysis suggested that ramelteon may have decreased depressive symptoms.

The two remaining ramelteon studies examined BD-I participants during stable (euthymic) state. Norris et al. [55] examined 83 BD patients with sleep disturbance treated with adjunctive ramelteon or placebo for up to 24 weeks, taken at bedtime. The authors report that participants randomised to the ramelteon group were significantly less likely to relapse (depression or mania; odds ratio 0.48, p = 0.024). Secondary outcomes examining depressive symptoms (MADRS) and sleep quality (PSQI) did not differ between groups at the week 24 study endpoint. Longitudinal analysis suggested lower depressive symptoms at weeks 8 and 16 and better sleep quality at weeks 8 and 12 in the ramelteon-treated group, but these differences were not sustained until the end of the study. The reliability of this secondary analysis is negatively impacted by a high drop-out rate and the LOCF imputation method used (see Sect. 3.6 quality assessment and risk of bias).

Mahableshwarkar et al. [53] differed from all other studies in this review by conducting a trial with four treatment groups; three different formulations of sub-lingual ramelteon (doses: 0.1, 0.4, or 0.8 mg) versus sub-lingual placebo. In total, 642 participants were randomised between groups to receive treatment for 12 months. Treatment was adjunctive to the usual maintenance treatment of the patient which included mood stabilisers, antipsychotics, and indicated antidepressants. The authors report no differences between any ramelteon group and the placebo group on the time-to-relapse primary efficacy outcome. Secondary outcomes showed that the 0.1-mg dose group had improved self-reported quality of life scores, which was statistically significant compared with placebo. This study did not examine sleep as an outcome. The trial ended prematurely as it met futility criteria.

3.4 Quantitative Synthesis

Supplementary Table 1 shows the data extracted for quantitative analysis (see ESM). The quantitative analysis included five of the six identified RCT studies. The study by Mahableshwarkar et al. [53] was not included as it compared three different ramelteon intervention groups with placebo and additionally because it used a non-standard formulation of ramelteon (sub-lingual). CGI scale scores were not quantified due to differences among studies in reporting manner and differently reported subscales. Three studies reported data that examined sleep quality; McElroy et al. [54] and Norris et al. [55] both used the PSQI, and Quested et al. [48] used the LSEQ. The agomelatine trial by Yatham et al. [56] examined PSQI changes but data were not available to extract. No study demonstrated differences between treatment and placebo. Four studies examined symptoms of mania and all assessed symptoms using the YMRS [45,46,47,48]. Only in the study by Moghaddam et al. [47] did treatment (melatonin) perform significantly better than placebo (Hedges’s g − 1.24; < 0.001). Four studies examined symptoms of depression assessed using the MADRS [55, 56], HDRS [47], or the Inventory of Depressive Symptoms (IDS) [54]. No study demonstrated statistically significant differences between treatment and placebo.

3.5 Meta-Analysis

Figure 2 shows separate forest plots for the following treatment outcomes: (A) sleep quality, (B) manic symptoms, and (C) depressive symptoms. For the model estimating sleep quality, two studies using PSQI were pooled (as LSEQ is a multi-domain visual analogue scale with no overall score). The estimated mean effect size for sleep quality was non-significant at g = − 0.04 (95% CI − 0.81 to 0.73; p = 0.92; I2 = 59.4, 95% CI 0–90.5, κ = 2). The estimated mean effect size for symptoms of mania was g = −0.44 (95% CI − 1.03 to 0.14; p = 0.14; I2 = 73.2, 95% CI 24.5–90.5, κ = 4) and for symptoms of depression was g = − 0.10 (95% CI − 0.27 to 0.08; p = 0.28; I2 = 0.00, 95% CI 0.00–75.1, κ = 4), neither model was statistically significant.

Fig. 2
figure 2

Forest plots summarising the meta-analysis of pooled effect sizes. Forest plots indicate the pooled effect size for studies examining a sleep symptoms (Pittsburgh Sleep Quality Index; PSQI), b manic symptoms and c depressive symptoms. Effects sizes estimated using a random-effects model

3.6 Quality Assessment and Risk of Bias

Supplementary Table 2 in the ESM summarises the quality assessment of all included studies, according to the AXIS criteria modified for use in this review. Supplementary Figure 1 and Supplementary Figure 2 summarise the risk of bias of RCT studies using the RoB2 criteria. According to the AXIS criteria, the main limitation we identified was the failure to provide information justifying the sample sizes of participant groups (45% of studies). Furthermore, over half of studies (55%) had industry involvement or other potential conflicts of interest that were deemed to be a potential source of bias.

RCT studies were generally well designed and met RoB2 criteria for low risk of bias for most domains. Four studies had an unclear risk of bias due to bias arising from potential selection of reported results as a result of no pre-specified statistical analysis plans. The study from Norris et al. [55] had a high risk of bias on secondary outcome findings due to missing data; because the study primary outcome was participant relapse, and this was also a reason for drop out, the ramelteon group had double the number of participants than the placebo control group at study endpoint.

4 Discussion

There are few studies examining pharmacotherapeutic interventions for sleep and circadian rhythm disturbance in BD. Although our search remit was wide, the studies identified exclusively involved melatonin or melatonin-receptor agonist drugs, used adjunctively for maintenance treatment or during acute mood episode. Two recent reviews, including one meta-analysis [34] and a systematic review performed by the International Society for the Study of Bipolar Disorders (ISBD) [14], examined similar interventions and concluded that there was little support for treatment efficacy in BD. Our review identifies two later-published melatonin RCTs that suggest promising treatment benefits during bipolar mania [47, 48]. Furthermore, the current review included agomelatine, the use of which has to our knowledge not been hitherto reviewed in BD.

Over the last decade, several well designed RCTs have been conducted; a substantial development compared with the previous generation of open-label and non-controlled feasibility-only studies. However, our meta-analysis of RCT results failed to demonstrate a superior treatment effect on sleep quality or symptoms of mania and depression. Despite reports of some favourable treatment effects across the studies identified by this review, there are several study limitations in the literature published to date and challenges arising from heterogeneous treatment and population characteristics, small sample-sized studies, and methodological constraints. Below we discuss the key outcome parameters of this review, reflect on the limitations of the studies identified by this review, and develop recommendations for future studies in this area.

4.1 Sleep Quality and Insomnia Symptoms

Among the studies that examined sleep quality and insomnia symptoms as outcomes [48, 51, 54,55,56], there was no consensus pattern indicating symptom improvement after treatment with melatonin or melatonin-receptor agonists. These findings are unexpected given the usage of these substances to treat sleep disorders, but also consistent with a previous meta-analysis examining the effects of ramelteon treatment in BD that similarly found no benefits for patient sleep [29]. This contrasts with meta-analyses of melatonin [57] and ramelteon [58] trials that demonstrate improvements in subjectively assessed sleep outcomes. However, these studies involved individuals with sleep disorders and had larger sample sizes. A major limitation of the studies identified in this review is the lack of focus on sleep outcomes. Interestingly, earlier non-controlled/non-randomised feasibility studies included experimental sleep outcomes [49,50,51,52] (albeit often non-standardised measures), but sleep was not reported routinely among later RCT studies. Furthermore, poor sleep or insomnia symptoms were only listed among the inclusion criteria of two RCTs included in this review [54, 55]. Moreover, no study used polysomnography to objectively measure sleep. These omissions may contribute to the absence of a detectable sleep effect.

4.2 Mania Outcomes

Four RCT studies examined manic symptoms in BD treated with either melatonin or ramelteon [47, 48, 54, 55]. Although the estimated pooled effect of treatment was not statistically significant, there was substantial heterogeneity between studies likely emerging from differences in treatment dose and patient mood state. Notably, positive treatment outcomes for manic symptoms were reported in both RCTs that examined adjunctive melatonin treatment during manic episode [47, 48]. These findings also supported the results of the earlier published feasibility study of melatonin for bipolar mania by Bersani and Garavini [49]. These findings suggest that melatonin may produce promising clinical benefits for patients with BD who are currently experiencing a manic or hypomanic episode.

The chronotherapeutic effects of melatonin represent a possible treatment mechanism. Sleep disturbance and reduced subjective sleep need are core symptoms of mania, with profound sleep pattern changes preceding manic shift [59]. Perturbed molecular and endocrine circadian rhythms also suggest acute circadian dysfunction during mania [27]. Moreover, lower plasma melatonin levels and hypersensitivity to melatonin suppression under light have been identified as bipolar traits, which in turn are hypothesised to contribute to the abnormal circadian entrainment expressed in the disorder [60, 61]. Based on these phenotypes, exogenous melatonin may function as an additive treatment for mania due to its circadian rhythm-entraining properties. Indeed, previous studies that therapeutically target circadian rhythm disturbance and facilitate endogenous melatonin secretion, such as via dark therapy [62] and the use of short-wavelength light or ‘blue-blocking’ glasses [63], have demonstrated positive treatment effects for bipolar mania. Despite the biological plausibility for this mechanism in melatonin, ramelteon does not appear to show the same treatment potential based on the studies published to date.

4.3 Depression Outcomes

We found no conclusive support for the use of melatonin or melatonin-receptor agonist drugs as either an acute or maintenance treatment in bipolar depression. Several non-controlled feasibility studies examined agomelatine as a treatment for acute bipolar depression and report positive effects for depressive symptoms and sleep quality [50,51,52]. However, the only RCT identified in this review suggested that agomelatine is not superior to placebo for either outcome [56]. Post-hoc sensitivity analysis of these multicentre data suggests a notably high placebo response rate in this trial may obfuscate a potential treatment effect of agomelatine [56]. There may be limited evidence supporting agomelatine use for bipolar depression from non-controlled studies, but more well designed RCT studies are required to interrogate this further. RCTs that additionally examined the maintenance efficacy of ramelteon report conflicting results [53, 55].

There is a paucity of effective treatments for acute depression in BD. The use of antidepressants is controversial owing to the danger of possible precipitation of elevated mood in BD [64]. There has been much research interest in agomelatine as a first-in-class novel antidepressant with high melatonin-receptor affinity and melatonin-receptor agonist properties with limited affinity for other neurotransmitters that are targeted by classic antidepressant classes [65]. The RCT results published by Yatham et al. [56] suggest that a dose of 25–50 mg has a similar safety and tolerability profile to placebo. During the initial 8-week period of the trial, the proportion of participants who experienced manic or hypomanic symptoms was higher in the agomelatine group compared with placebo, but this difference was not statistically significant [56]. During the 12-month trial extension period the agomelatine group had a higher frequency of hypomanic episodes than the placebo control group and there was an identical number of manic episodes reported in each group [56].

Multimodal therapy for depression including an 8-week agomelatine course has previously demonstrated statistically significant improvements in depression and sleep symptoms, earlier sleep onset and phase advance of the circadian phase of melatonin [30]. Notably, in this study antidepressant response was strongly correlated with greater magnitude phase shifts, suggesting that its treatment effects may be driven by modification of the endogenous melatonin rhythm. However, as agomelatine is also a serotonin-receptor antagonist, the antidepressant effects cannot be explained completely by a melatonergic mechanism of action. It is perhaps for this reason that previous reviews examining melatonin and ramelteon in BD have not also included agomelatine among the relevant literature on pharmacological interventions that target the circadian timing system. The present literature suggests agomelatine may be safe for BD, but future trials are needed to examine antidepressant efficacy for the disorder. Future mechanism evaluation studies are needed to disentangle the potential chronotherapeutic effects of this drug from its antidepressant properties.

4.4 Other Hypnotic Drug Studies

Surprisingly, this review did not identify any experimental studies examining NBZHs that are typically used to treat insomnia symptoms. Clinical records suggest that NBZHs are widely used by BD patients experiencing insomnia [18]. The efficacy of z-drugs is established for the management of insomnia disorder [66], but crucially also indicated for short-term use only, whereas continuous use of z-drugs for a period > 6 months and polypharmacy with benzodiazepines has been demonstrated in BD [17]. Moreover, almost 75% of chronic NBZH users identified by chart review were also taking antimanic maintenance treatment, suggesting incomplete remission from sleep symptoms from maintenance treatment alone [18].

Benzodiazepines are sometimes used adjunctively for the initial stabilisation of symptoms of agitation and restlessness during mania. Treatment guidelines indicate that benzodiazepines should not be used for long periods because of the risk of tolerance and dependence. We are not aware of studies exploring the adjunctive antimanic potential of NBHZs. Prescription patterns also indicate concerns about the long-term use of NBHZs comparable to that of benzodiazepines [17, 67]. Perhaps safety concerns limit the consideration of these medications as an experimental adjunctive antimanic or short-term maintenance treatment. This would explain the lack of experimental studies compared with other drug classes identified in this review. Until more evidence is available, the potential clinical benefits of NBHZs for BD remain unelucidated.

4.5 Study Limitations and Future Recommendations

There are several methodological limitations that are common across virtually all studies identified by this review. Here we discuss the prevailing limitations and develop recommendations for the design of future studies.

Despite an explicit focus on interventions that are hypothesised to ameliorate disrupted sleep–wake patterns in BD, we note that adequate assessment of sleep as an outcome parameter is surprisingly lacking. Sleep quality and symptoms of insomnia were mainly assessed using self-reported Likert-style questionnaires or visual analogue scales. Such measures do not have the detail to inform our understanding of variables such as sleep timing, sleep onset latency, and sleep duration. Sleep diaries could be used to better understand the effect of treatment course on sleep outcomes and could be easily recorded using personal electronic devices. Although one trial examining bipolar mania also employed actigraphy, the baseline period was limited to one night [48]. Obtaining a satisfactory baseline from manic or hypomanic patients referred rapidly for acute treatment may represent a challenge in future studies. Ideally, objective assessment of sleep–wake patterns using actigraphy would greatly improve the quality of future studies where feasible.

The gap between studies that describe circadian rhythm disturbance in BD and clinical studies examining the effects of melatonin and melatonin-receptor agonists is striking. Most of the studies identified in this review discuss the putative chronotherapeutic effects of these substances but none included physiologic or behavioural assessment of circadian rhythm phase. Dim-light melatonin onset (DLMO) is a measure of endogenous melatonin phase that is a reliable phase marker of the human circadian timing system [68]. DLMO has been utilised previously in clinical efficacy and mechanism evaluation studies involving exogenous melatonin and agomelatine to confirm that these substances elicit phase-advancing effects on sleep timing and physiologic circadian rhythms [28, 30]. DLMO assessment can be obtained from blood samples, or non-invasively from saliva samples, collected at regular intervals in the evening under a controlled dim-light environment that does not suppress melatonin levels. Incorporating detailed phase markers such as DLMO into future clinical studies should enable researchers to estimate the impact of exogenous melatonin and melatonin-receptor agonists on the circadian timing system and determine whether this predicts improvement of symptom outcomes. This might be an important step in elucidating the mechanistic effects of treatment. However, this may also be challenging for larger studies due to the requirement of a controlled lighting environment in the late evening and the additional expense of laboratory assessment. Perhaps it is for these reasons that we did not detect any studies that utilised DLMO. Clearly, the focus of clinical trials is foremost on core symptom outcomes that are of potential clinical significance, which require sample sizes that are often prohibitive for detailed physiologic assessment of circadian rhythm parameters. However, at minimum the quality of measures could be improved by monitoring parallel changes in sleep diary data and individual preferences for organising activities in the morning compared with the evening (a trait known as ‘chronotype’ previously examined in BD [24]). Objective monitoring of the 24-h rest–activity pattern using actigraphy could also be used to derive a phase marker of human activity for future studies, providing an estimate of circadian rhythm phase.

Better consideration of chronobiological principles could also improve the design of future studies. Importantly, the effect of chronotherapy on sleep–wake patterns is subject to inter-individual differences in the circadian timing system. Differences in administration time may have profound effects on the magnitude of circadian phase shifts. For example, examination of the phase-response curve indicates that melatonin can result either in an advance in sleep-phase onset, if it is administered at the appropriate time, or in an unwanted delay in sleep-phase onset, if the treatment is administered too late in the evening [28]. In contrast to reporting standards common in behavioural chronotherapy trials [14], several studies identified here do not report the exact time at which participants were instructed to take the melatonin/melatonin-receptor agonist or placebo. It is therefore challenging to evaluate the usual pattern of usage of patients. Administration time was also not matched to the underlying endogenous rhythm of melatonin for each individual, which could be estimated using DLMO for example. Thus, any potential treatment effect may have been masked or reduced if timing was not properly calibrated to the circadian phenotype of the patient. It is difficult to predict the effects that exogenous melatonin and melatonin-receptor agonists might have on sleep and circadian rhythms without information about the underlying endogenous circadian rhythm of melatonin. Future studies might employ a behavioural estimate of circadian phase such as the midpoint of the sleep phase for each individual to standardise a personal time window of administration in situations where DLMO assessment is not possible.

Additionally, there was substantial variation between studies regarding the dose of melatonin, which ranged between 2 and 10 mg. It is not clear whether different dose–response effects may have been elicited between studies. Although some evidence suggests that the dose response of melatonin is less important than time of administration for human circadian entrainment [69], higher doses are proportional to higher circulating levels of melatonin [70]. Differences in circulating levels of melatonin between studies could translate to more exaggerated soporific effects as dose increases [28]. Furthermore, Quested et al. [48] suggest that a slower-release formulation such as the one used in their trial may attenuate the phase-advancing potential of melatonin to create a spill-over effect causing an unwanted phase delay later in the night. Future work is needed to provide clarity on the optimal dose and formulation to expand on the potential beneficial effects of melatonin treatment in patients with bipolar mania and hypomania.

Finally, we note the limitation arising from concurrent medication use in virtually all studies assessed. Lithium and antiseizure and antipsychotic medications are recommended treatments for BD both during acute mood episodes and during maintenance therapy, and hence had common usage across the studies reviewed here. Lorazepam was also used as a supplementary antimanic treatment in one trial [47]. Importantly, these medications also have sedating properties and there is evidence suggesting that the mechanism of lithium involves the circadian timing system [35]. It is therefore challenging to isolate the effects of melatonin/melatonin-receptor agonists from standard BD treatments and rule out synergistic treatment effects. Well balanced placebo-controlled studies involving treatment as usual afford some experimental control, but we note that the limitation of concurrent use of other medications is often present in this patient group.

4.6 Limitations of the Current Review

We highlight several limitations of this systematic review and meta-analysis. Principally, the pooled analysis of treatment effect sizes was limited by data derived from studies with small sample sizes and outcomes assessed over short durations. Further, few studies provided justification for the sample size selected and as such may have produced false-positive results and conclusions. Thus, we caution that the reliability of the pooled effect sizes generated in our meta-analysis may be unstable as more findings emerge. There was heterogeneity between studies arising from differences in the gender composition, wide age range, and mood state of patient samples as well as the treatment type and dose. As this was anticipated, we planned a random-effects meta-analytic model, which assumes differences in treatment effects between studies. However, there were unfortunately too few studies to conduct separate quantitative subgroup analyses for mood state or treatment type or to compare studies involving BD-I or BD-II patients exclusively or combined. Given these limitations, we consider the summary and critical appraisal of trial methodology and description of literature trends to be the most informative components of this review for guiding the development of future studies.

The focus of this review was the treatment potential of substances with a supposed hypnotic or circadian rhythm-entraining mechanism of action. Hence, the recommendations we produced for future studies mainly reflect findings from studies on human sleep and chronobiology. However, this approach may also be considered a limitation of this review. Alternative mechanisms of action of these substances were not explored, except for serotonergic transmission in agomelatine which was discussed briefly. Importantly, melatonin may have several other beneficial effects for people with BD and comorbid physical illnesses via anti-inflammatory and metabolic effects, for example [71]. Future trials are needed to investigate the non-circadian mechanisms of these substances in BD.

5 Conclusions

Few randomised placebo-controlled studies exist that examine pharmacotherapeutic interventions for sleep and circadian rhythm disturbance in BD, and these are limited to melatonin and related melatonin-receptor agonist medications. There is a paucity of high-quality evidence concerning the benefits of these substances for sleep disturbance in BD. Adjunctive melatonin treatment appears to exert clinical benefits for the acute treatment of mania, but its efficacy as a maintenance treatment remains unexamined. Ramelteon may produce benefits in preventing depressive relapse, but conflicting results require further investigation. More trials are needed to assess the potential of agomelatine in bipolar depression. It is strongly recommended that the design of future studies be improved by incorporating thorough evaluation of sleep outcomes and insomnia symptoms and by contemporaneous monitoring of physiologic circadian rhythms. Circadian rhythm disturbance is a recurring theme in BD research. However, until clinical trials objectively measure circadian rhythm changes associated with treatment in BD the link between potential efficacy and therapeutic mechanism will remain unclear.