Introduction

From 2000 to 2010 the number of women at the age range expected to transition into menopause (50–54 years of age) increased by 26.6 % [1].Worldwide, it is estimated that by 2025, the number of postmenopausal women will be 1.1 billion. With increased life expectancies, women live a third of their life after menopause, with some having the decline of menopause symptoms take many years [2]. The burden associated with untreated menopausal symptoms results in more frequent outpatient visits and incremental health care costs [3]. Menopausal hormone therapy (MHT) is one of the most common treatments used to counteract these symptoms.

The applicability of the evidence for the use of MHT is complicated by the heterogeneity of available trials in terms of age at which MHT is initiated, dose and type of estrogen, contraindications, and adjunct therapies. Additionally, concerns from the Women’s Health Initiative (WHI) report in 2002, led guidelines to advise shorter exposure to MHT. Yet, a decade after the WHI, the use of low-dose MHT has remained constant [4], gynecologist continue to favor MHT [5], and guidelines recommend MHT as the most effective treatment for menopause symptoms, including sleep disturbances [6].

Approximately 40–60 % of menopausal women report sleep related symptoms, with the most common complaint nighttime awakenings [7]. The mechanism by which sleep disturbances arise during menopause is still unclear, and studies characterizing how other menopausal symptoms are associated to sleep alterations are conflicting. An inverse relationship between sleep quality and vasomotor symptoms (VMS) has been reported. Sleep difficulties, however, could present independently [8]. Poor sleep is a risk factor for cardiovascular disease, diabetes, obesity, and neurobehavioral dysfunction [9]. Therefore, reducing the burden of emerging sleep symptoms during menopause will result in an improvement in quality of life and overall health.

Sleep symptoms can be measured objectively (e.g., polysomnography) or subjectively (e.g., questionnaire, severity scale or diary). A previous systematic review has shown that patient reported measurements are highly predictive of quality of sleep [10], and a guideline has emphasized that such measures are important for diagnosing and monitoring response to treatment in many sleep disorders, including insomnia [11]. Further, they both empower patients and aid clinicians in recognizing and valuing the patient’s perspective in response to treatment [12, 13]. These previous publications, however, addressed all adults, and did not tailor their conclusions to postmenopausal women. Therefore, understanding the effects of MHT on subjective sleep quality is important in helping patients and their clinicians manage the symptoms of menopause. However, synthesized evidence is scarce in regards to MHT effects on sleep quality leading to clinical uncertainty when choosing the best treatment.

This systematic review and meta-analysis aims to (1) evaluate the effects of MHT on self-reported sleep outcomes when compared to placebo in postmenopausal women and (2) explore the use of a multi-domain assessment of sleep quality across trials.

Methods

This review adheres to the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statement [14] and was guided by a registered protocol (PROSPERO CRD42015027189). Screening and extraction was performed using online software (https://www.covidence.org/).

Eligibility criteria

Randomized clinical trials (RCTs) that compared the effects of MHT, to each other or placebo, on self-reported outcomes within a sleep questionnaire, symptom scale, or quality of life assessment tool were included. Selection was not restricted by blinding scheme, type or dose of MHT, whether sleep was a primary or secondary outcome, or type of self-reported sleep measurement tool. Minimal intervention length was 8 weeks. This timing was chosen arbitrarily as there is no current agreement on duration for which MHT changes in sleep quality would be anticipated. However, MHT alleviation of other menopausal symptoms have shown benefit as early as 8 weeks [15]. Trials where MHT was combined with compounds other than progesterone derived or selective estrogen receptor modulators were excluded. Women at any stage of natural or surgical menopause above 40 years old were included [16].

Identification and selection of trials

An experienced librarian developed search strategies, using methods recommended by the Institute of Medicine [17], in the following databases: PubMed, Scopus, Ovid MEDLINE, Ovid EMBASE, Ovid EBM Reviews CENTRAL, and Ovid PsycInfo (for search strategy, eAppendix in electronic supplementary). Search included MESH headings and keywords such as menopause, estrogen, and sleep. Databases were searched from 2002 to October 2015, aiming to gather evidence produced or published during and after the WHI reports. Electronic search was supplemented by hand searching eligible articles. There were no language restrictions with non-English articles translated by fluent bilingual speakers. Full texts of included trials were screened in duplicate and independently (κ = 0.74) [18], and disagreements were resolved by arbitration.

Data collection and study appraisal

Data were extracted using an electronic form designed by the reviewers; which was tested and piloted and contained information on patient characteristics, intervention descriptions, methodological quality indications, and outcomes of interest. Data extraction and risk-of-bias assessment were performed independently by two reviewers. Mean and standard deviation at baseline and longest follow up were extracted for the outcome. RCTs were assessed for methodological quality using the Cochrane Risk of Bias Tool [19]. Available study protocols were searched in trial registries. If the blinding of study participants or personnel was rated to be at a high or unclear risk of bias, the trial was considered to be at high risk of bias overall. If all domains were judged to be at low risk of bias, the trial was considered at a low risk of bias. Otherwise, the trial was considered to be at a moderate risk of bias. The quality of evidence was evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach [20].

Author contact

When scores for sleep items within questionnaires were not available or when clarification was necessary the corresponding author was contacted by e-mail. If there was no response, a second, final e-mail was sent after 2 weeks. Authors were given 6 weeks to answer and send requested information.

Meta-analysis

Standardized mean differences (SMD) were pooled using random-effects models. This approach was preferred given the construct of sleep quality was evaluated using different scales, thus, the results were standardized and expressed using standard deviation units to allow meta-analysis. SMD results can be interpreted as 0.2 = small effect, 0.5 = moderate effect, and 0.8 = large effect [19]. For all trials, lower scores indicated better sleep quality (the direction was reversed for one trial to be consistent with the rest). In a trial with more than one active MHT arm, the weighted SMD between groups was compared to placebo. To explain possible inconsistencies across trials, a sensitivity analysis was used to assess the effect of the WHI on the pooled estimate effect. Inconsistency of effects across trials was assessed using forest plots and the I 2 statistic with values over 50 % indicative of moderate to high heterogeneity [21]. Statistical analyses, including overall and subgroup effect estimates, were done using Review Manager v5.3 [22].

Subgroup analysis

Pre-specified subgroup analyses were performed to explore heterogeneity. Trials with inclusion criteria restricted to women with presence of VMS (hot flashes and night sweats) were compared to trials with no VMS criteria. To address duration of MHT and risk of bias, subgroup analysis by duration of intervention (8 weeks vs. >8 weeks) and by overall risk of bias (moderate vs. high) were performed.

Outcome assessment

Two board-certified sleep specialists (M.L.; R.L.) classified sleep items across multiple measurement tools using the seven sleep domains of the Pittsburgh Sleep Quality Index (PSQI). This allowed standardizing results to seven sleep characteristics routinely assessed in clinical interviews of patients with sleep complaints [23]. The PSQI was found to cover most domains of relevance to researchers when studying sleep disorders [10]. For each questionnaire, with at least one self-reported sleep item, each sleep specialist reviewed questionnaire items and dichotomized each under primary and, if applicable, secondary domains of sleep quality. Conflicts were resolved by consensus.

Results

Search strategy and contact of authors

The search identified 424 articles of these, 234 were excluded at title and abstract screening. The full text assessment of the remaining 190 articles resulted in 64 meeting eligibility criteria. After merging multiple publications of a same trial, a total of 42 RCTs were included (Fig. 1). Thirty-five trials were missing data necessary for appraisal. Authors were contacted for 23 trials; the remaining 10 did not provide contact information. Two of 23 contacted authors provided the requested data [24, 25]. Thereafter, a total of nine trials had complete report of sleep quality.

Fig. 1
figure 1

PRISMA flow diagram for study selection process * denotes number of included articles does not match number of included RCTs, as some trials had multiple publications

Description of trials

The data from 42 trials was used for qualitative assessment and is summarized in supplementary eTable1. Across all RCTs, 21 where comparing MHT interventions to each other and nine comparing MHT to placebo. The most commonly administered formulation was oral conjugated equine estrogen (o-CEE) at a dose of 0.625 mg/day (12/42 trials).

The definition of menopause was variable. Most trials used self-report of last menstrual period (LMP) as definition with intervals post-LMP ranging from 6 months to 10 years. Seven of the 42 trials included sleep quality as a primary outcome measure. Across trials significant variability was found on reports of MHT effect on sleeping problems.

One trial was judged to be at low risk of bias, 23 (55 %) at moderate risk and 18 (43 %) at high risk (supplementary eTable 2). Sequence generation and blinding of outcome assessors were the domains least reported. The 18 trials rated at high risk of bias had either not blinded participants or not clearly reported blinding methodology.

Effects of MHT on sleep quality

From the 42 trials, nine trials had mean and standard deviation reports at baseline and longest follow up. Seven of the nine trials included a placebo treatment arm [15, 2429], while 2 had parallel comparisons of MHT formulations [30, 31]. Therefore, the seven RCTs with placebo arm as comparator had similar interventions and reported sufficient quantitative data to allow for statistical pooling (Table 1). The trials were at moderate to high risk of bias (Table 2).

Table 1 Characteristics of included eligible randomized placebo controlled trials with available data for meta-analysis
Table 2 Cochrane risk of bias quality assessment

Subgroup analysis showed that MHT improved sleep quality among women who had concomitant VMS [SMD −0.54 (−0.91 to −0.18, I 2 = 0 %), moderate quality evidence]; test for subgroup difference p = <0.007. No significant difference was noted in trials that included women without VMS criteria [SMD −0.04 (−0.15 to 0.24, I 2 = 43 %)], or when both groups were combined (with and without VMS), [SMD −0.12 (−0.37 to 0.13, I 2 = 66 %)]. Results are depicted as a forest plot in Fig. 2.

Fig. 2
figure 2

SMD for subgroup analysis by VMS, smaller scores indicate better sleep quality. The green square markers indicate standardized mean difference from primary studies, with sizes reflecting the statistical weight of the study using random-effects meta-analysis. The horizontal lines indicate 95 % confidence intervals. The diamond markers represent the subtotal and overall effect estimate and 95 % confidence intervals. SMD interpretation, 0.2 = small effect, 0.5 = moderate effect, >0.8 = large effect

A sensitivity analysis performed to examine whether the WHI affected effect estimate showed no significant difference [SMD −0.17 (0.35–0.02, I 2 = 53 %)]. Subgroup analysis comparing duration of MHT and risk of bias did not show significant differences as shown in supplementary eFig. 1 and eFig. 2.

Outcome assessment

Across 31 self-report sleep tools, the most frequently assessed domains of sleep quality were daytime dysfunction followed by sleep quality and sleep disturbances (Fig. 3). Prior medication use for aid in sleep was only assessed in two scales. Three scales were not accessible for item dichotomizing, two were independently created by authors’ institutions and were not provided, and one was inaccessible through library resources (supplementary eTable 3).

Fig. 3
figure 3

Distribution of the PSQI seven domains of sleep quality across 27 self-reported sleep scales used in included studies

Quality of evidence

The certainty in the estimates following the GRADE approach was moderate confidence in women with VMS and low in women without VMS (Table 3).

Table 3 Summary of the evidence quality grading using GRADE

Discussion

Summary of evidence

In this systematic review and meta-analysis of the effects of MHT on sleep quality, seven RCTs provided similar interventions and sufficient data for meta-analysis. MHT was associated with modest improved sleep quality in women with concomitant VMS at baseline. The effect of MHT is uncertain in women without VMS.

The heterogeneity in trial populations and formulations of MHT limit conclusions. The absorption, distribution, and metabolism of MHT differ among women based on genotype, age, distribution of adipose tissue, comorbidities, and use of other medications [32]. These covariates should help guide the design of future comparative effectiveness trials. Following the WHI, the use of low-dose transdermal estrogen increased more than tenfold [4], yet only three trials [24, 33, 34] had a direct comparison between routes of administration. Additionally, there is still a need for a standard definition of menopause, given that both age and years from menopause have shown to be important indicators of the benefit-risk ratio of MHT [35].

Self-reported sleep quality captures different parameters of sleep than objective measurements [36]. Lack of accessibility to polysomnography resources, and the limited utility of this clinical test within a large population setting, supports the need to develop validated self-reported sleep measurements in menopausal women. It is understood that sleep is best characterized across multiple domains including quality, duration, continuity and effects on daytime function [37]. A thorough assessment of these measurable characteristics of sleep quality, results in a detailed sleep scenario that is understood by both health professionals and patients [38].

In the present analysis, daytime dysfunction, sleep disturbances and overall sleep quality were the most commonly assessed domains. The other domains were infrequently incorporated into questionnaires, including sleep duration and latency (the ease of falling asleep), which have both been associated to negative health outcomes such as higher mortality, coronary heart disease, and diabetes [38]. This finding underscores the importance of further work in the area of standardizing sleep assessment tools.

Limitations and strengths

The systematic review faces a number of limitations. First, the majority of trials lacked a baseline screen for sleep disorders. After menopause, there is an increased risk of sleep disordered breathing due to fluctuating hormones and weight gain. Yet, only three trials [27, 39, 35], had exclusion or testing criteria for sleep related breathing disorders, narcolepsy or periodic limb movements. Second, evidence from this review cannot discern the magnitude of effect on sleep quality through indirect reduction of mood disturbances or frequency and severity of VMS, both known to affect sleep. This is due to heterogeneity in enrollment, as most trials do not follow the recommendations of the Food and Drug Administration for studies assessing treatment of moderate to severe VMS, where participants enrolled should have a minimum of 7 to 8 moderate to severe hot flushes per day, or 50 to 60 per week at baseline [40]. Finally, MHT formulations vary in the inclusion of progesterone or selective estrogen receptor modulator compounds. Progesterone has independent effects on sleep, through anxiolytic and respiratory stimulant action [41]. A number of studies used progestin synthetic derivatives that may not have the same effects as progesterone. Therefore, the independent effects of estrogen vs. progesterone and progestin compounds require further evaluation.

This systematic review also has several strengths. It provides a comprehensive review of the current evidence guided by an a priori registered protocol, with an extensive search for eligible studies in multiple databases including both published and unpublished work. All of the included studies were assessed in duplicate and registered protocols were searched on online trial protocol databases. We incorporated an in depth assessment of the sleep quality outcome scales, to best provide information for clinicians to understand and apply in practice. This is, to our knowledge, the first use of dichotomizing scheme to bring together self-reported sleep quality scales in menopausal women.

Implications for practice and research

Sleep disturbances are a common indication for MHT. Yet, at present there is insufficient evidence to determine how a woman’s self-reported sleep quality during menopause is affected by different routes of administration or formulations of MHT.

Future research focused on determining if MHT is beneficial in improving sleep quality during menopause, will be best addressed by head-to-head RCTs between various formulations and routes of administration. Likewise, sleep quality assessment tools need to be developed with a consensus on a tool with sufficient domains of sleep quality and validated in menopausal women. We suggest use of the seven major sleep domains listed in the PSQI. Yet, important questions specific to menopause must also be addressed in the context of sleep quality. The evidence derived from this systematic review suggests that MHT benefits sleep in women with VMS. This is congruent with current North American Menopause Society recommendations where use of MHT for sleep disturbance is suggested for women with bothersome nighttime hot flashes [42]. Other guidelines lack specifics regarding MHT use, and may benefit from incorporating these results.

Conclusion

It is imperative to endorse meaningful conversations about sleep quality in the clinical setting with female patients nearing menopausal age and to tailor treatment recommendations towards patient-specific complaints. Validated clinical screening tools addressing various domains of sleep quality are needed in order to better describe the current burden of sleep disturbances in this population.