Introduction

Mental health professionals have traditionally used objective measures to assess outcomes in patients with bipolar disorders (BD). However, with an increasing emphasis on patient-centered medicine in the past several years, subjective assessment of patient experiences has increasingly drawn attention [1]. In this context, quality of life (QoL) has been recognized as a key outcome in assessment of patients with BD [2, 3]. World Health Organization (WHO) defines QoL as “an individual’s perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards and concerns” [4: p. 1405]. Based on this definition, QoL is a broad concept, which encompasses several physical, social and psychological domains of personal well-being that goes beyond the usual borders of outcome assessment in the clinical setting.

United States Food and Drug Administration [5] has emphasized the importance of using a patient-reported outcome, such as the QoL for assessing the efficacy of a certain medication. Moreover, European Medicines Agency [6] has stated that measuring QoL is important for comprehensive understanding of a patient’s overall health. Therefore, it is important for mental healthcare professionals to care for the QoL of patients with BD. Several systematic reviews have demonstrated that BD places an enormous burden on QoL of patients living with the condition [79]. Bipolar disorder impairs QoL to a greater extent than chronic physical illness [7]. Indeed, QoL is even impaired in patients with BD who are considered to be euthymic [2]. Reliance solely on objective symptom evaluation will not, therefore, necessarily capture the entire picture. Moreover, it has been shown that patients with BD have important illness-related concerns that go beyond symptom alleviation [10]. It is therefore recommended that in patients with BD, therapeutic goals always allow for trade-offs between treatment effects and QoL [11].

Most previous studies have used generic instruments such as the 36-Item Short Form Health Survey (SF-36) or EuroQoL to assess QoL in patients with BD [9]. Generic measures are useful in that they allow for comparisons of QoL among groups with different health conditions. However, such measures often fail to capture disease-specific impairment in well-being. For example, issues related to identity, finances and religion/spirituality may be of particular concern in patients with BD, but many QoL measures fail to assess these domains of interest [12]. Moreover, compared with disease-specific measures, generic instruments are often less responsive to change and therefore are less sensitive to the effects of therapeutic interventions [13].

Given the clear rationale for a BD-specific QoL instrument, in 2010, Michalak et al. [12] developed a disease-specific QoL instrument for patients with BD (QoL.BD) with valid and reliable scores. Moreover, a brief 12-item version (contents include physical, sleep, mood, cognition, leisure, social, spirituality, finance, household, self-esteem, independence and identity) of the QoL.BD (Brief QoL.BD) showed moderate-to-high correlations with the original 56-item QoL.BD and demonstrated good test–retest reliability. Importantly, the instrument effectively works across mood states and stages of BD and is responsive to change. Furthermore, with only 12 items, the brief form can be easily administered in even the busiest clinical settings [12].

The ways in which QoL and personal well-being are perceived vary significantly from culture to culture [14]. It is essential, therefore, to study the psychometric characteristics of the Brief QoL.BD in different cultural contexts. The primary objective of the present study was to assess the reliability and validity of the Brief QoL.BD scores in Persian version using a nationwide sample of Iranian patients with BD.

Method

This study was a multicenter (six sites in the Iran) cross-sectional prospective study to examine the reliability and validity of the Brief QoL.BD score for Iranian patients. Patients were progressively recruited from clinics at the universities of Iran, Tehran, Qazvin, Zanjan, Ahvaz and Tabriz.

Study participants

This study included 184 patients who had been diagnosed with BD according to the criteria of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) [15] confirmed by administration of Structured Clinical Interview (SCID) [16]. All study participants were selected from outpatient clinics. The inclusion criteria were (a) 18 years of age or older, (b) able to communicate in Persian with research team and (c) written informed consent. Participants with any organic brain damage or severe physical diseases were excluded from the study. Demographic characteristics and clinical information were retrieved from patient medical files. Two hundred participants were invited to participate, and 184 (92.0 %) participants provided informed consent. Sixty-six percent of the patients were female, 41 % were married, and the mean ± SD age was 41.1 ± 5.1 years. Participants’ educational attainment was reported as follows: 23 % completed primary school; 56 %, secondary school and 12 %, higher education. Most patients (72.8 %) were diagnosed with BD type I (BD I). The mean age of diagnosis was 28.3 years, and the mean duration of illness was 13.0 years (Table 1). No significant differences were found between patients with BD I and those with BD type II (BD II) in terms of demographics.

Table 1 Sample characteristics for patients with bipolar disorder (BD)

Measures

Short Form-36 Questionnaire (SF-36)

The SF-36 is a generic questionnaire to assess health-related QoL in healthy and patient populations [17, 18]. It is a self-administered questionnaire which has 36 items covering eight dimensions: physical functioning (PF), role limitations due to physical health (RP), bodily pain (BP), general health perception (GH), social functioning (SF), role limitations due to emotional problems (RE), vitality (VT) and mental health (MH). These scales are commonly aggregated into two summary components: the physical component summary (PCS) and the mental component summary (MCS). All raw scales are linearly converted to a 0–100 scale, with higher scores indicating higher levels of health-related QoL. The SF-36 has been translated into many languages including Persian (Farsi), and its score has shown good psychometric properties in Iranians [19] and general population samples [20].

Quality of Life Enjoyment and Satisfaction Questionnaire Short Form (Q-LES-Q-SF)

The Q-LES-Q-SF is a 16-item self-report questionnaire to measure satisfaction and enjoyment in various domains of well-being [21]. Total score is computed by summing scores on the first 14 items, which are scored on a five-point Likert scale ranging from 1 (very poor) to 5 (very good) [21]. To facilitate interpretation, the total score is converted into a 0–100 scale with higher scores indicating greater enjoyment or satisfaction. Two general questions ask about satisfaction with medication use and overall life satisfaction and contentment. The Persian version of the Q-LES-Q-SF was developed, and the score was validated by Tagharrobi et al. [22].

Positive and Negative Affect Schedule (PANAS)

The PANAS is a self-report tool commonly used to assess positive affect (PA) and negative affect (NA) [23]. The PANAS has two 15-item scales: one for PA and one for NA. Responses are rated on a 5-point scale from very slightly (1) to very much (5) [23]. The PANAS has been translated into many languages including Persian; the Iranian version of the PANAS score has been found to be highly valid and reliable [24].

Young Mania Rating Scale (YMRS)

The severity of mania syndrome was determined by using the YMRS [25]. This scale contains 11 items, and each item measures specifically clinical condition related to mania syndrome over the previous 48 h [25].

Hamilton Rating Scale for Depression (HAM-D)

The HAM-D is a commonly used interviewer-rating scale to assess depressive symptoms [26]. The HAM-D has 17 items rating participants’ experiences over the past week and behavior at interview. The Iranian version of the HAM-D score has demonstrated adequate reliability and validity for assessing depressive symptoms in Iranian participants with bipolar depression [27].

The Brief Quality of Life in Bipolar Disorder (Brief QoL.BD) questionnaire

The Brief QoL.BD is a 12-item disease-specific self-report measure designed to capture patients’ subjective perceptions of QoL [12]. The 12 items include physical, sleep, mood, cognition, leisure, social, spirituality, finance, household, self-esteem, independence and identity (Table 2). Patients describe their experiences over the past 7 days on a 5-point Likert scale ranging from strongly disagree (1) to strongly agree (5). Total score is the sum of the 12 items, with higher scores indicating better QoL [12]. The original version of the Brief QoL.BD score has been found to be valid and reliable [12].

Table 2 Item descriptions and characteristics and test–retest reliability of the Brief QoL.BD item and total scores

Translation and cultural adaptation

Permission was granted by the developers to translate the Brief QoL.BD into Persian (Farsi). The translation procedure was performed in accordance with Beaton’s recommendations [28]. The first step was forward translation. Two bilingual translators whose mother tongue was Persian/Farsi (with divergent backgrounds in medicine and sociology) translated the original English version into Persian independently. Next, the translators and the project manager compared translations, resolved discrepancies and synthesized them into an interim version. Backward translation then involved independent translation of this interim version by two native English speakers who were not familiar with the original English version. Subsequently, the two translators and the project manager assessed agreement between the translations and also the original English version. Finally, to assess cross-cultural equivalence of the interim Persian version, an expert committee was formed. Members of the committee were a methodologist, public health professional, language professional, psychiatrist, mental health specialist, nurse and the translators. All translated versions were assessed and checked for discrepancies. A pre-final Persian version was developed and pilot-tested in 31 participants with BD and diverse educational backgrounds. Each participant was asked to complete the questionnaire and explain what they thought about questionnaire items. Recommended changes were made, and the final Persian version was administrated to 184 participants with BD.

Procedure

The Ethics Committee of the Qazvin University of Medical Sciences approved the present study. Potential study participants were progressively identified and invited to enroll in the study between February 2014 and March 2015. The study aims and procedure were explained to the participants, and informed consent was obtained. Study participants completed the baseline questionnaires. Current depressive symptoms were assessed by two board-certified and trained psychiatrists at each center using the HAM-D. The same questionnaires were readministered at 10 days, 3 and 6 months after baseline assessment.

Statistical analyses

Reliability

The reliability of the Persian version of the Brief QoL.BD scores was assessed by a series of analyses including internal consistency, reproducibility and agreement. To test item homogeneity, Spearman’s rank–order correlation was calculated from inter-item and corrected item-to-total correlations. Internal consistency reliability was measured using Cronbach’s alpha coefficient, with the criterion of ≥0.70 [29]. Brief QoL.BD score stability was assessed by administering the scale on two occasions with a 10-day interval between tests. The intraclass correlation coefficient (ICC) was calculated and evaluated against a minimum standard of ICC ≥ 0.70 [29].

We report minimal detectable change (MDC95 %), which estimates the smallest score change that likely (p < 0.05) corresponds to observable behavioral change and not simply measurement error [29].

Validity

Convergent validity was assessed using bivariate Pearson correlations between the Brief QoL.BD total score and the scores of the following scales: PCS and MCS subscales of the SF-36, PA and NA subscales of the PANAS, HAM-D total score, YMRS total score, Q-LES-Q-SF total score on 14 items and two general item scores of Q-LES-Q-SF (satisfaction with medication use; overall life satisfaction and contentment). Based on Varni et al. [30] and Cheng et al. [31], we classified the correlation coefficients as small (0.1–0.29), medium (0.3–0.49) and large (≥0.5) [31].

In addition, the correlation between Brief QoL.BD total score and HAM-D total score was compared to that between Brief QoL.BD total score and YMRS total score using Fisher r-to-z transformation test [32]. Moreover, known-group validation was tested: independent t tests adjusting for multiple comparisons (Benjamini–Hochberg procedure [33]) as well as adjusting for age and gender were performed to test whether the Brief QoL.BD score could differentiate between subgroups of the patients. Based on the existing literature, it was hypothesized that patients with bipolar II disorder due to longer time spent depressed and higher ratio of depression to mania would report lower QoL scores than patients with bipolar I disorder [34, 35].

Construct validity of the Brief QoL.BD was further assessed via analyses of factor structure. The Brief QoL.BD is a short version of the full 12-scale, 56-item QoL.BD [12], and a single-factor model was described by the developers [12]. Therefore, we conducted a confirmatory factor analysis (CFA) to examine the data model fit of the structure. Considering the ordinal nature of the data, weighted least squares (WLS) was used to estimate the parameters of the CFA model. Moreover, both the polychoric correlations matrix and the asymptotic covariances matrix were used as input for the analyses. Several model fit indices were used: a nonsignificant χ 2 test, comparative fit index (CFI) > 0.90, root-mean-square error of approximation (RMSEA) ≤ 0.08, standardized root-mean-square residual (SRMR) ≤ 0.08, Bentler–Bonett normed fit index (NFI) > 0.90, non-normed fit index (NNFI) > 0.90, goodness-of-fit index (GFI) > 0.90 and adjusted goodness-of-fit index (AGFI) > 0.90 [36].

There is some evidence that QoL is poorer among women with BD relative to men [37, 38], raising the possibility of a gender difference in interpretation of QoL.BD items. Consequently, factorial invariance across genders was also assessed. Three hierarchical levels of factorial invariance were considered: configural invariance, metric invariance and scalar invariance. Configural invariance is achieved if a similar factor structure is found in both male and female samples, metric invariance is deduced if male and female samples demonstrate equal factor loadings, and scalar invariance additionally constrains equal item intercepts [39]. A nonsignificant χ 2 difference test suggests factorial invariance, while changes in the CFI, RMSEA and NNFI between the three levels <0.01 also indicate acceptable factorial invariance [40].

Responsiveness to change

Participants were receiving a range of psychosocial and pharmacological interventions, and it was hypothesized that they would report higher QoL scores at the 3- and 6-month time points relative to baseline [41]. Predicted changes in the Brief QoL.BD scores were assessed using standardized response mean (SRM: mean change scores divided by pooled SD). Based on Cohen’s guidelines, SRM < 0.2 is trivial, 0.2–0.5 is small, 0.5–0.8 is medium, and >0.8 is large [42]. All statistical analyses were performed using SAS 9.2 and LISREL 8.80.

Results

No difficulties were experienced during the translation process, and almost all (99 %) patients found the questionnaire items as well as the instructions easy to understand and acceptable. Because of the comprehensible items, the average rate of incomplete (missing and not applicable) data at the item level was 1.9 %, with a range of 0.1–4.1 %. Average time to complete the Brief QoL.BD was 1.8 ± 0.5 min.

Item characteristics and reliability

Scores on all twelve items of Brief QoL.BD correlated significantly with each other and the total score. The inter-item correlations varied ranged from 0.31 to 0.68 (p < 0.05). Correlations between each item score and corrected total scale score ranged from 0.47 to 0.79 (p < 0.05). Cronbach’s alphas were 0.87 (12 items) and 0.89 for the first and second test administrations, respectively.

An average of 10.3 ± 2 days elapsed between administrations of the Brief QoL.BD. Table 2 shows the SEM and MDC95 % for each Brief QoL.BD item score as well as the total score. Neither individual item scores nor the total Brief QoL.BD score significantly differed between test and retest. The ICCs for the test–retest analysis were high, ranging from 0.74 (self-esteem) to 0.94 (leisure) (Table 2).

Validity

It was hypothesized that people diagnosed with BD II would have significantly lower scores on the Brief QoL.BD than people diagnosed with BD I. Using the method of validation comparing known groups, the data provide support for this hypothesis. Findings presented in Table 3 show that the average total Brief QoL.BD score of people with BD II was almost 7 points lower than that of people with BD I. In addition, people with BD II produced average scores on each item that were significantly lower than those of people with BD I.

Table 3 Comparisons of the Brief QoL.BD items scores and total scores for bipolar disorder (BD) types I and II

Table 4 shows the Spearman’s rank–order correlation coefficients between the Brief QoL.BD item scores and scores of external measures including the SF-36 subscales, PANAS subscales, HAM-D, YMRS and Q-LES-Q-SF. All coefficients were statistically significant at the 5 % level. Significant negative correlations were observed between the Brief QoL.BD item scores, HAM-D total score, YMRS total score and NA score in PANAS. Moreover, all Brief QoL.BD item scores correlated positively with the SF-36 summary scores, Q-LES-Q subscales scores as well as PA score in PANAS. Of the 117 correlations, there were 95 magnitudes above medium (≥0.3), and 22 were small (0.1–0.29). In addition, the absolute correlation between Brief QoL.BD total score and HAM-D total score (r = −0.39) was significantly greater than correlation between Brief QoL.BD total score and YMRS total score (r = −0.23; p = 0.046).

Table 4 Inter-correlations between Brief QoL.BD and SF-36, Q-LES-Q, PANAS, YMRS and HAM-D scores

Goodness-of-fit measures in the CFA showed that the single-factor solution was adequate: χ 2 = 112.712, df = 54, p < 0.001; RMSEA = 0.066 (95 % CI 0.049–0.082); CFI = 0.984; NFI = 0.973; NNFI = 0.987; SRMR = 0.042; GFI = 0.947; AGFI = 0.929. Factor loadings for the twelve items ranged from 0.42 to 0.77 and were all significantly different from zero (Table 5).

Table 5 Goodness-of-fit indices and factorial invariance results of the confirmatory factor analysis (CFA)

Invariance tests of the Brief QoL.BD across genders were conducted by a series of multigroup CFAs. Results indicated that configural, metric invariance and scalar invariance models had acceptable fit indices, except for the χ 2 test. Nevertheless, there was no significant difference between every two models in terms of χ 2 (configural vs. metric invariance: difference = 14.133, p = 0.29; metric invariance vs. scalar invariance: difference = 20.825, p = 0.053; configural vs. scalar invariance: difference = 34.958, p = 0.07); CFI, RMSEA and NNFI differences’ values between every two models were <0.01. The overall change in CFI, RMSEA and NNFI measured from the least constrained model (configural model) to our most constrained model (equal factor loadings and item intercepts) was −0.001, 0.000 and 0.000 respectively.

Table 6 shows changes over time (T1–T3) in the Brief QoL.BD item scores. The results of the repeated measures ANOVA showed significantly longitudinal changes on all of the individual Brief QoL.BD item scores (p < 0.01), with the exception of finance. SRM values ranged from 0.02 to 0.58 in the Brief QoL.BD item scores as well as total score, suggesting that the improved QoL responds to the treatment over time (3–6 months). However, as we simultaneously taken the MDC95 % results on Table 2 into consideration, the responsiveness seemed trivial.

Table 6 Responsiveness of the Brief QoL.BD scores

Discussion

Our primary aim was cross-cultural adaptation and assessment of validity and reliability of the Persian Brief QoL.BD score. Broadly, findings demonstrate that Persian Brief QoL.BD score has acceptable psychometric properties. The new instrument shows its score had acceptable reliability including internal consistency and test–retest reliability, adequate convergent validity, known-group validity and responsiveness to intervention over time. With more than 110 million Persian-speaking people today, this instrument will be an important tool to permit rapid evaluation of well-being in Persian-speaking people with BD. For the first time in any cultural group, we also showed that meaning of the items in the Brief QoL.BD is perceived similarly by both genders. Therefore, differences in the scores between males and females might indeed reflect true gender differences in QoL in BD.

The internal consistency reliability of the Persian Brief QoL.BD is comparable to that of the English version [12]. Cronbach’s alpha coefficients were 0.87 and 0.89 for Persian Brief QoL.BD scale measured 10 days apart. These coefficients exceed the criterion of 0.70 for acceptable consistency. Test–retest reliability established by intraclass coefficient was 0.79 for a 7- to 10-day interval, which exceeds the acceptable criterion value of 0.70.

The Persian Brief QoL.BD scores showed acceptable known-group validity between patients with BD I and II disorder [35]. As expected, patients with BD II showed significantly poorer QoL than did patients with BD I. However, our results on known-group validity do not imply that clinicians should use the Brief QoL.BD as a diagnostic tool to distinguish patients with BD I from those with BD II. Our results add to the evidence that BD II may be associated with greater illness burden compared to BD I.

All correlations between the Persian Brief QoL.BD and other instruments were statistically significant, and most were greater than 0.30, indicating a medium effect size. The brief QoL.BD correlates with instruments that measure similar health-related constructs. However, the magnitude of the coefficients indicates that the Brief QoL.BD may also measure some facets of the quality of life construct that are unique to BD. Previous studies showed that among patients with BD, QoL has a stronger negative relation to depressive than manic symptoms [37, 38]. Our results support these findings by showing a larger correlation coefficient between QoL.BD total score and HAM-D score than that between QoL.BD total score and YMRS score.

Key advantages of our study include a thorough translation process, multicenter data collection, diverse socioeconomic and geographical background of the participants, repeated measures analyses and multifaceted evaluation of the instrument’s validity. In particular, the diversity of the participants increases the external validity of the study by making it generalizable to a larger population. Extensive assessment of convergent validity was ensured using tools that measure various aspects of mood, psychopathology, QoL and functioning. Furthermore, in line with previous studies [12, 41], we found that the Brief QoL.BD scores were significantly improved over time (individual item scores and total score) among this sample of people attending university clinics for treatment. Specifically, the improved QoL was trivial to small (SRM = 0.04–0.38) for all item scores in the 3-month treatment, while small to medium (SRM = 0.20–0.58) in the 6-month treatment, except for the finance item score (SRM = 0.02). When we additionally consider the MDC95 % values, which suggest the smallest noticeable changes, all SRM values were smaller than their corresponding MDC95 % values. Hence, our results might indirectly indicate that the ordinal treatments for BD had weak effects on the QoL of our participants. The trend was also shown in the total score of QoL.BD, a much more reliable score than each individual item score of QoL.BD. Particularly, the SRM of the total score was nearly trivial (0.20), which suggested that although significant, the overall QoL was not improved noticeable.

There are some limitations in the study. First, our sample size on CFA was not large. Some researchers [43] suggest a minimum of 200 is essential, and we only had 184. However, 184 participants are unlikely to cause significant bias; in another study, Anderson and Gerging [44] found that a sample size of 100 would usually be sufficient for convergence. Second, it is unclear whether our results of responsiveness are attributable to the actual effects of intervention as the patients in our study did not receive systematic intervention. Moreover, our SRM results showed that the responsiveness of QoL.BD was trivial and hard to detect. In other words, the intervention in the study might not have promising effects on QoL for patients with BD. Hence, future studies may also want to examine whether symptom management alone is sufficient to improve the QoL for patients with BD, or whether a comprehensive treatment such as systematic intervention substantially improves their QoL. Future studies using systematic intervention are warranted to corroborate our findings. Third, our participants were all recruited from outpatient clinics; the results may not, therefore, be generalizable to inpatient settings. Therefore, future studies recruiting patients from diverse settings are warranted to corroborate our findings.

In summary, the Brief Persian QoL.BD is a psychometrically sound measure with acceptable validity and reliability in its score. Future studies might focus on actual performance of the Persian Brief QoL.BD in routine clinical practice, as well as its responsiveness to change in the context of clinical trials.