Introduction

In recent years, mental well-being (MWB) and positive mental health (PMH)Footnote 1 have been increasingly discussed and assigned more importance especially within the framework of health promotion. In the public health practice and policy arena, validated instruments measuring MWB outcomes are therefore in high demand.

From a theoretical perspective there is still a lack of consensus about the fundamental underlying concepts of MWB but at least there is agreement that it is a complex theoretical construct, encompassing more than just the absence of mental illness and one that has to integrate two meta-theoretical perspectives: namely hedonic and eudaimonic. As they imply rather distinct views of human nature and accordingly approaches to life and well-being, in theory and practice, they have been the subject of intense debate since the times of ancient Greece (Ryan and Deci 2001: 143). Nevertheless, they have been given new impetus in the course of the discussion about PMH and MWB.

The concept of hedonism refers to pleasure or happiness, and good life or well-being, which is interpreted as a maximum accumulation of such moments. Therefore, different approaches in hedonic psychology assess the experience of well-being within a pleasure-pain continuum. For instance, subjective well-being uses three components, namely: life satisfaction, presence of positive and absence of negative mood (together summarised as happiness) (Ibid.: 144). Often, the meta-theoretical starting point fits in with the expectancy value approach, which suggests that “well-being is a function of expecting to attain (and ultimately attaining) the outcomes one values, whatever those might be” (Ibid.: 145). Diener et al. (2009: 63) define subjective well-being as “a person’s cognitive and affective evaluations of his or her life”.

In contrast, the eudaimonic concept derives from Aristotle, who considered hedonism a vulgar ideal and that the pursuit of happiness makes humans slavish followers of their desires. While the hedonic paradigm emphasises the internal affective and cognitive aspects of well-being, the eudemonic concept adds an external dimension (Vanhoutte and Nazroo 2013). Good life goes beyond subjectively felt needs and also takes into account objectively valid needs, as Fromm (1978–79) put it. According to Waterman (1993), the eudaimonic conception of well-being calls upon people to live in accordance with their ‘daimon’, or true self. Under such circumstances one would feel intensely alive and authentic, in a state of “personal expressiveness”. This concept involves more activity, development and personal growth (also as a result of being challenged and exerting effort) than being relaxed and happy. Ryff and Keyes (1995) embraced eudaimonic aspects as psychological well-being, which comprises six distinct dimensions of human actualisation: autonomy, environmental mastery, personal growth, positive relations with others, purpose in life and self-acceptance.

To date there is a consensus that both concepts need to be taken into account to understand MWB. Thus, MWB is conceived as a multi-faceted construct that comprises both hedonic and eudaimonic elements. The issue that is still at stake regarding a concept of MWB is that these elements are both distinct and overlapping and at the same time converging and diverging (Ryan and Deci 2001: 148).

Consequently, this constitutes a challenge for designing a measuring instrument for well-being, several of which are currently available or under development (Vásquez and Hervás 2012). One is the Warwick-Edinburgh Mental Well-Being Scale (WEMWBS), developed by Stewart-Brown and colleagues at the University of Warwick Medical School. The WEMWBSFootnote 2 aims to provide a short instrument to measure PMH and MWB, which is meaningful and easily understandable for people from various social backgrounds (Stewart-Brown 2013: 134).

The WEMWBS

The Warwick-Edinburgh Mental Well-Being Scale (WEMWBS) is a 14-item Likert scale that covers “both hedonic and eudaimonic aspects of mental health including positive affect (feelings of optimism, cheerfulness, relaxation), satisfying interpersonal relationships and positive functioning (energy, clear thinking, self-acceptance, personal development, competence and autonomy)” (Tennant et al. 2007: 3).

Respondents to the scale items are asked to describe their experience over a two-week reference period for which an ordinal five-point frequency answer format, ranging from 1 = “none of the time” to 5 = “all of the time”, is employed. Based on the assumption of one single general MWB factor, the total score is computed as the sum of all items. If respondents answer all items, the theoretical range of the WEMWBS is from 14 to 70 scale points with a higher score indicating a higher level of MWB (Stewart-Brown and Janmohamed 2008).

The WEMWBS was originally developed in English. So far, the scale has been translated into a number of languages (cf. Warwick Medical School 2016). To date, several validation studies have been published, based on both smaller, convenient and community-based samples (e.g. community-dwelling individuals) or cross-sectional samples (e.g. representative of a national population) from the UK (Clarke et al. 2011; Lloyd and Devine 2012; Taggart et al. 2013; Tennant et al. 2007), Spain (Castellví et al. 2014; López et al. 2013), Italy (Gremigni and Stewart-Brown 2011), Turkey (Keldal 2015), Pakistan (Waqas et al. 2015), Australia (Hunter et al. 2015), South Korea (Kim et al. 2014), Brazil (Santos et al. 2015) and Argentina (Azcurra 2015). The English language version was used most frequently, followed by translated versions. The scale was most often tested with adult populations (i.e. 18 to 75 years of age), seldom with adolescents (or pupils, students, e.g. for pre-validation purposes) and only once specifically with older people (i.e. 71 to 96 years of age) (Azcurra 2015).

To date, psychometric studies examining the factor structure (construct validity) of the WEMWBS have substantial shortcomings. Although the hypothesised one-factor scale structure has been supported (tested primarily with exploratory and/or confirmatory factor analytic techniques), many of these studies yielded fit indices that were far from ideal (see also Hunter et al. 2015) when considering conventional criteria (Brown 2006). Some reported a quite good fit (Castellví et al. 2014; Clarke et al. 2011; Keldal 2015; Lloyd and Devine 2012; Tennant et al. 2007) but others reported a less than optimal fit of the empirical data to the hypothesised one-factor model (Gremigni and Stewart-Brown 2011; Hunter et al. 2015; Kim et al. 2014; López et al. 2013). In some applications a two or three factor model was considered, or even implemented (Azcurra 2015; Gremigni and Stewart-Brown 2011; López et al. 2013; Santos et al. 2015; Taggart et al. 2013; Waqas et al. 2015). In addition, adaptations to the scale were also implemented to improve the model fit, e.g. by the specification of residual error covariances (Clarke et al. 2011; Hunter et al. 2015) or even by a reduction of scale items (Gremigni and Stewart-Brown 2011; Santos et al. 2015).

Interestingly, studies have not sought to identify the sources of misfit in greater detail and have also failed to examine whether the scale measurement covers unique variance over and above a general factor of MWB and to test whether it allows for a precise scaling of individuals.

Reise et al. (2010) stated that the problem with traditional psychometric scale testing is that the wrong default model is often used. In particular for multi-faceted constructs and complex measures that include heterogeneous item content, research literature has pointed out that the item responses are possibly not only a reflection of the general factor (Chen et al. 2012; Reise 2012). Consequently, the assumption of uni-dimensionality is often too restrictive and can also result in model misfits. In contrast, a bi-factor modelling approach provides a viable solution and alternative modelling strategy. A bi-factor model is more suitable for a factor structure of multi-faceted items that cover broad (i.e. hedonic and eudaimonic) concepts and highly related (i.e. relationships, affect, functioning) domains that are hypothesised to comprise a general (i.e. MWB) construct (cf. Chen et al. 2006: 190). Recently, bi-factor modelling has been widely implemented and has helped researchers clarify the measurement structure of scales in mental health-related contexts, too (Chen et al. 2013; de Bruin and du Plessis 2015; Deng et al. 2015; Jovanović 2015a, b; Longo et al. 2016; Mészáros et al. 2014). Specifically, Reise concluded that “(…) bi-factor modeling potentially provides a solid foundation for conceptualizing psychological constructs, constructing measures, and evaluating a measure’s psychometric properties” (Reise 2012: 667).

To the best of our knowledge, bi-factor modelling has not yet been applied in studies investigating the psychometric properties of the WEMWBS. The benefit of a bi-factor analysis is that it reveals information about residualised (group) factors above and beyond the hypothesised general MWB factor. The analysis allows a comparison of the strength of a general MWB factor relative to the group factors and estimates the size of the factor loadings of each item. In addition, it delivers additional insights for the appropriateness of the assumption of uni-dimensionality of the WEMWBS. If detected, a strong general factor where each item adequately loads the general factor would provide a justification for a WEMWBS total score.

Aims and objectives

The aim of the present research is to verify the theoretical assumptions of the WEMWBS. Therefore, the research questions are whether the theoretical construct of MWB—which is multi-faceted and comprises distinct and overlapping, but at the same time converging and diverging hedonic and eudaimonic elements—can be empirically proved by the WEMWBS and whether the scale is a valid measurement tool for MWB.

In order to achieve this aim an evaluation of the psychometric properties of the measurement scale was conducted with the first translated German version of the WEMWBS. This represents its preliminary validation, which was done in several steps: first, the face validity of the scale was assessed by a descriptive item analysis. Second, construct validity was tested by confirmatory factor analysis techniques for which a series of one-factor and bi-factor models were specified and compared. Third, criterion (concurrent) validity of the scale was assessed with another well-established external quality of life measure (the EUROHIS-QOL scale). Fourth, properties of the WEMWBS total score were evaluated. The analysis was of data from a community-based sample randomly drawn from adults of advanced age in Vienna, Austria.

Materials and methods

Survey instrument

After obtaining permission from the University of Warwick Medical School, the English WEMWBS version was translated by the authors of this article, based on the guidelines drawn up by Beaton and colleagues (Beaton et al. 2000), and aimed at achieving a cross-culturally sensitive adaption and conceptually equivalent German version.

The translated scale (Table 1) was included in a questionnaire for an omnibus survey by the Viennese Red Cross, which included questions about the organisation’s member magazine, opinions about charitable giving and general socio-demographic information. Respondents were additionally asked to take part in a supplementary survey about well-being and quality of life, which included the WEMWBS and EUROHIS-QOL scales.

Table 1 English original and German translation of the 14-item WEMWBS

Sampling and data collection

The gross population consisted of all supporting members of the Viennese Red Cross (N = 77,191 based on organisational data from 2013). The sampling frame comprised 41,684 members (54%) who lived in the community of Vienna and who had been members for at least 1 year (excluding staff). Out of this, a simple random sample of 6000 was drawn representing a fraction of 14.4% to whom the postal survey was administered in May 2014. The survey aims and the selection process were explained in an accompanying letter. The cover letter included information on how to complete and return the questionnaire in the enclosed envelope, free of charge. In addition, it was possible to respond electronically by completing an identical online questionnaire. For ethical reasons, respondents were informed that participation was voluntary and data handling was based on confidentiality and anonymity. It was possible to contact the telephone service line for further information. In total, 625 people completed the questionnaire (only 14 responded online). After taking neutral non-responses (e.g. deceased, sick persons, unknown addresses) into account, the response rate to the survey was 10.4%. The data of the completed questionnaires were stored using SPSS statistical analysis software without personal identifiers.

Participants

In the sample, more than half of the respondents were females (58.4%) and around two fifths were male (41.6%); 63.5% were married or cohabiting with a partner; 40.1% had attended school for 10–12 years. The majority fell into higher age categories; 82.1% were 60 years or older and therefore 76.6% were retired. The majority had no migration background, but for 14.7%, at least one of their parents had been born abroad. The respondents’ living area reflected a good representation of the Viennese population with regard to inner, outer and peripheral districts (wien.at 2014) (Table 2).

Table 2 Socio-demographic characteristics of survey participants (n = 625)

Analysis methods

Face, construct and criterion validity were analysed in several steps:

  1. (1)

    High readability and face validity are shown if there is a low rate of non-complete responses (Moser and Kalton 1971). Additionally, there should be no truncated distributions indicating floor/ceiling effects (Bortz and Döring 2006). Item (non-) responses were assessed by frequency analysis in SPSS and described by item mean (m), standard deviation (s2), skewness (s3) and kurtosis (s4). It was expected that items have a meaningful variation around their means, with approximately normal (s3 < |2.0|) and uni-modal (s4 = |7.0|) distribution (Byrne 2012).

  2. (2)

    A uni-dimensional latent construct with fixed relations between MWB and the 14 items (v1-v14) was hypothesised (Brown 2006). To test construct validity (Cronbach and Meehl 1955), several models were tested: first, a Confirmatory Factor Analysis (CFA) with one general MWB factor with robust maximum likelihood (MLR) estimation was performed in Mplus 7 (Muthén and Muthén 2012). The CFA results were assessed by standardised factor loadings (> 0.50) and significant (p < 0.050) residuals and by plausible modification indices (Byrne 2012), e.g. error covariances of lower magnitude (Urban and Mayerl 2014). The models were rigorously evaluated by goodness-of-fit statistics: χ 2-test (χ 2/d.f. < 3.0)Footnote 3 (Kline 2011), Root Mean Standard Error of Approximation (RMSEA < 0.08), Comparative Fit Index (CFI > 0.95) and the Standardised Root Mean Square Residual (SRMR < 0.05) (Browne and Cudeck 1993; Hu and Bentler 1999).

    Second, for bi-factor modelling Reise (2012: 677) emphasised the necessity of an exploratory inspection of the data before carrying out a Confirmatory Bi-factor Analysis (CBA) in which additional group factors may be detected over and above the general MWB factor. In any set of items, the CBA enables comparison of the strength of a general factor relative to group factors. Besides the overall model fit criteria stated above, the CBA was evaluated by the proportion of total variance in items explained by the general factor and group factors. For reliability assessment, general MWB factor saturation is expressed by McDonald’s coefficient omega hierarchical (ωH > 0.80); the specific factor saturation of the group factors is evaluated by the coefficient omega specific (ωS) (McDonald 1999; Zinbarg et al. 2005). Furthermore, if there is a relatively high proportion of common variance (ECV, e.g. > 0.70) explained by the general factor, this provides information about the validity of the uni-dimensionality of the WEMWBS (a high ECV for the group factors indicates multi-dimensionality). The higher the ECV, the stronger the general factor relative to the group factors.

  3. (3)

    Criterion validity (concurrent validity) was tested with a well-established measurement scale, the EUROHIS-QOL (Schmidt et al. 2006), derived from the WHO Quality of Life assessment tool (WHOQOL-100) (Power et al. 1998). It is a Likert scale self-assessment instrument of generic quality of life (QOL), with an ordinal five-point answer format covering a two-week reference period. The scale revealed good psychometric properties, especially for the German language version, e.g. high internal consistency (α = 0.80) and factorial validity (CFI = 0.94, RMSEA = 0.08) (Brähler et al. 2007). Since previous studies revealed rather moderate correlations with various life satisfaction scales (Kim et al. 2014; López et al. 2013; Tennant et al. 2007), a correlation of r > 0.50 between the two scales was hypothesised.

  4. (4)

    Finally, the scale properties were analysed by the mean (m) and 95% confidence interval (CI) of the total scale score across major socio-demographic groups. Similar to previous studies, no significant gender WEMWBS differences were expected but higher total scale scores were revealed for younger age groups and better-educated employed people living with a partner (married, cohabiting) and with no migration background/history.

Results

Face validity

Out of 625 respondents, 24 did not complete the supplement that included the WEMWBS and were consequently excluded from the analysis; 556 out of 601 responded to all 14 items of the scale, indicating a high response rate of 94.2%. The highest item non-response was 2.2% (item 3). All categories were used and frequency distributions mostly showed “often” as the modal answer category. All items had a mean between m = 3.0 (items v3, v5) and 4.0 (items v11-13), mostly with a standard deviation between s2 = 0.7 (items v6-7, v11) and 0.9 (items v1–v3, v5, v12). Two items had a higher negative skew of −1.5 (item v11) and −1.2 (item v7) with a higher kurtosis (s4 > 1.7) (Table 3).

Table 3 Frequency distribution and descriptive statistics of the scale items

Construct validity

First, the CFA tested general MWB in a one-factor model that showed significant and high factor loadings between 0.47 and 0.78. However, this model (1a) was not able to reproduce all the item variability as was shown by a bad overall fit: χ 2 = 593.10 (77), RMSEA = 0.11, CFI = 0.82 and SRMR = 0.07. Eleven error covariances between pairs of items needed to be specified in model 1b to achieve a reasonable fit: χ 2 = 226.98 (66), RMSEA = 0.06, CFI = 0.94, SRMR = 0.04 (Table 4).

Table 4 Model comparison

Second, the exploratory bi-factor analysis in model 2a with three group factors achieved an acceptable fit: χ 2 = 247.62 (52), RMSEA = 0.08, CFI = 0.93, SRMR = 0.03. The loading pattern indicated that the items mostly assemble according to the affective (GF1), functioning (GF2) and relationship (GF3) domains of MWB. Based on these results, a CBA in model 2b with three group factors (GF1-3) was specified (see item groupings in Table 5). Additionally, three significant (p < 0.010) error covariances between items v1 and v2 (r = 0.33), v1 and v10 (r = 0.29) and v9 and v12 (r = 0.21) were defined, which resulted in an excellent overall fit of the model: χ 2 = 160.63 (60), RMSEA = 0.05, CFI = 0.97, SRMR = 0.03.

Table 5 Standardised factor loadings of one-factor and bi-factor models of the WEMWBS

In a model comparison, CBA model 2b was the best fitting model and demonstrated superiority. It showed a strong general MWB factor, with significant (p < 0.001) and high factor loadings between 0.51 (v7) and 0.70 (v10) on the general MWB factor. All items except v7 had higher loadings on the general factor than on the group factors, indicating that the variances were predominantly explained by the general MWB factor. Also noteworthy was the finding that no factor loading of group factor GF3 was significant (p > 0.050), indicating that this group factor may not exist above and beyond the general MWB factor and that the variance related to these items is already explained by the general MWB factorFootnote 4 (Table 5).

The general MWB factor explained 70% of the common variance (ECV), while the group factors explained only a small proportion between 6 and 14%, which illustrates that the general factor accounted for a substantially greater amount of variance than the three group factors.

Superior to Cronbach’s alphaFootnote 5 (Zinbarg et al. 2005), McDonald’s coefficient omega hierarchical (ωH) is used in bi-factor models as a measure of reliability and indicates how precisely the WEMWBS items measure the general factor of MWB. In this study, ωH was 0.81 and showed that the greatest amount of variance of the composite score is attributable to the general factor. The omega specific (ωS) coefficients were 0.26, 0.28 and 0.09, which showed poor subscale reliability.

Criterion validity

As hypothesised, the Spearman’s rank correlation between the EUROHIS-QOL and the WEMWBS was moderate with r = 0.64 (p < 0.010). As expected, the total score was not different between genders but discriminated between age groups. In addition, the mean score was higher for those living with a partner (married, cohabiting) compared to those with no partner (single, separated, divorced or widowed). People who attended school for less than 10 years scored significantly lower than those with more years in school. There was no difference between occupational status and people with a migration background. Numerical details are shown in Table 6.

Table 6 WEMWBS total scores across socio-demographic groups

Scale properties

The total WEMWBS score ranged empirically between 25 and 70 with a mean value of m = 54.5 and a median of md = 55.0. There was a meaningful standard deviation s2 = 8.1, and the 95% confidence interval was between 53.8 and 55.2. The shape of the distribution was slightly skewed (s3 = −0.40, s4 = −0.21). There were no floor/ceiling effects because only 0.2% had the lowest and 1.4% had the highest total score (Fig. 1).

Fig. 1
figure 1

WEMWBS total score distribution

Discussion and conclusions

Discussion

The aim of the present research was to verify the theoretical assumptions of the German version of the WEMWBS by evaluating its validity and to assess whether it is a valid tool for the assessment of MWB. In contrast to the rather strong empirical assumption of the original authors of the scale (Tennant et al. 2007), the one-factor structure of the WEMWBS was not supported in the present study. In a non-parsimonious approach to model fitting, the one-factor model was only acceptable after the specification of 11 error covariances between items. Besides the varying cultural/country contexts, diverse study populations and language forms, this might also explain why the one-factor solution has not been able to achieve an excellent model fit in many previous studies.

By contrast, the CBA model indicated that the structure of the WEMWBS was best described by one general MWB factor and three group factors, which yielded the best fit for the data in the present sample. This demonstrated that the specific domains of MWB have unique explanatory power over and above general MWB. It is noteworthy that the bi-factor result primarily yielded strong support for the general MWB factor within the WEMWBS items, but gave only limited evidence for a viable multi-dimensional structure. All items had high and significant loadings on the general factor and—very importantly—relatively low loadings on the group factors.

The bi-factor analysis revealed that the WEMWBS measures a strong general MWB factor and relatively weak residualised group factors that correspond to domains of MWB. It indicates that items measuring eudemonic (positive affect) and hedonic concepts (positive functioning, satisfying relationships) are indeed manifestations of the general MWB factor. It was empirically demonstrated that these domains do not reflect additional factors of MWB because of the small proportions of total and common variance explained by the residualised group factors as well as by their low reliabilities. A substantial amount of variance was accounted for by the general factor with high score reliability. As evidenced, the general factor accounted for 81% of the total test variance and 70% of the common variance. Although the discovery of the group factors suggests that they capture different domains of MWB over and above general MWB, the group factors accounted for only 9% of its observed variance and 6% of its common variance. Moreover, the low factor loadings indicated that group factors do not yield precise enough measures of unique aspects of MWB to be useful in practical applications. In other words, within the WEMWBS there is little common variance beyond general MWB, which suggests exclusively calculating a total WEMWBS score is both empirically justified and recommended. The findings from bi-factor modelling provide important information on how to use and interpret the WEMWBS scores.

Although not hypothesised, three additional measurement error covariances of lower magnitude needed to be specified. This is reasonable because they may arise from similar semantics (i.e. close and intimate relationships of items v9 and v12), be specific for higher age groups (i.e. feeling of being useful and optimistic about the future of items v1 and v2) or be due to translation (e.g. a better translation for “confident” might be “selbstsicher” rather than “zuversichtlich”). Even more significantly, there was no reason to reduce the number of items as has happened in some other studies.

Besides these unique findings regarding construct validity, the German translation showed very similar results when compared to previous validation studies of the WEMWBS. Again, face validity with high response rates, full answer category usage and only slightly skewed answer patterns to the scale items were all replicated in our study. Also reflecting international results, criterion validity was demonstrated with just one quality of life indicator. Yet again, and very similar to previous studies, the German scale version was capable of covering differences in MWB across major socio-demographic groups. In comparison, the total score was slightly skewed but showed neither floor nor ceiling effects as in other studies.

Conclusions

The present study extends available research about the validity of the WEMWBS by using a bi-factor modelling approach, a psychometric tool that enables the division of specific and common variance. Most of the variance in total WEMWBS scores reflects a single common dimension of MWB. This finding deserves further attention and careful analysis, because it suggests that one could use the composite score to scale individuals on a single dimension of MWB. Interpretation of the WEMWBS total score was psychometrically supported in the present study.

With our study we added to already available scale validations with new information about the measurement quality and the feasibility of applying factorial validity (uni-dimensionality) to a broad, multi-faceted and heterogeneous indicators measure. Our alternative and advanced modelling strategies provided tremendous additional insights into the scale’s performance.

Nevertheless, the present study also has limitations that simultaneously highlight future research priorities: first, since the study sample included participants from a cultural context that might include some specifics of the community-based sample (i.e. supporting members of the Viennese Red Cross of an advanced age living in Vienna), further research on the structural validity of the German WEMWBS in a population-based sample (e.g. in other German-speaking populations or countries) across diverse cultural contexts and wider socio-demographic groups is needed to quantitatively test whether the scale possesses measurement invariance, for example (Steenkamp and Baumgartner 1998). Qualitative scale testing could profit from the application of cognitive interviewing techniques (Willis 2005) as well to confirm item translations and wordings are appropriate, for example regarding item 5 (“I’ve had energy to spare”) or item 10 (“I’ve been feeling confident”). Second, in the present study it was only possible to examine the relation of MWB with one convergent quality-of-life measure (i.e. the EUROHIS-QOL scale). Future research with the German scale version should demonstrate in more depth and detail the external correlations of the WEMWBS, ideally with the general and group factors of the bi-factor model (Chen et al. 2012). Last but not least, the present data are cross-sectional, but in future the longitudinal stability and reliability of the WEMBWS needs further exploration in a structural equation modelling framework, for example (Little 2013).

Above all, the German version of the scale achieved good validity and reliability in our test sample. With this background, the German version should be included in further research activities as a measure for MWB to investigate positive health across (groups of) populations of interest.