Introduction

Well-being research and positive psychology have experienced a boom of popularity in the past couple of decades (Antolikova 2012; Helliwell et al. 2012). The debate between the ancient concept of eudemonic (Ryff and Keyes 1995; Ryan and Decy, 2001) and hedonic (Diener et al. 1985) well-being has led to the understanding of well-being as a dichotomy. Hedonic well-being refers to the pursuit of pleasure or happiness whereas eudemonic well-being refers more to human potential; the latter being a more global concept (Ryan et al. 2013).

While more interest has been paid to hedonic well-being in the last decade, measurements of well-being in the broader sense, especially measures of “flourishing” have recently gained attention (Hone et al. 2014a, b; Keyes 2002). Authors such as Felicia Huppert, Corey Keyes, Ed Diener and Martin Seligman have been key in developing this concept with their own peculiarities (Hone et al. 2014a, b). Keyes was the first to coin the term flourishing (Keyes 2002). Three of Keyes’ domains of flourishing appear in every flourishing measure namely positive relationships, meaning or purpose and self-acceptance or self-esteem.

One of the most popular flourishing measures is Diener’s Flourishing Scale (FS) (Diener et al. 2010). The original scale was comprised of 12 items although it was later reduced to eight. The 8-item FS showed sound psychometric properties in its development with an internal consistency of .87 and temporal reliabilities of .71. Factor analysis of the scale showed the scale had a unidimensional factor explaining 53% of the variance. The authors presented normative scores, item analyses and found correlations of .42 to .73 between the FS and the Basic Need Satisfaction in General Scale (Ryan and Deci 2000), Ryff’s Scales of Psychological Well-being (Ryff and Keyes 1995) and the Satisfaction With Life Scale (Diener et al. 1985) indicating good convergent validity.

The FS has been validated in countries such as New Zealand (Hone et al. 2014a, b), Portugal (Silva and Caetano 2013), Japan (Sumi 2014a, b), China (Tang et al. 2014), France (Villieux et al. 2016) and Germany (Esch et al. 2013). All these validations have consistently found a one factor structure explaining between 44% (France) and 73.1% (Japan) of the variance. Convergent validity has been tested using instruments including the Subjective Happiness Scale (Lyubomirsky and Lepper 1999), Satisfaction with Life Scale (Diener et al. 1985), Revised Life Orientation Test (R-LOT) (Scheier et al. 1994), Positive and Negative Affect Schedule (Watson et al. 1988), Chinese Virtues Questionnaire (Duan et al. 2013) and Brief Symptom Inventory (Wang et al. 2013). Convergent validity correlations with the above mentioned scales ranged from .21 to .68 all of them being statistically significant. Two studies mention the testing of discriminant validity using the Hopkins Symptoms Checklist (Derogatis et al. 1974), Perceived Stress Scale (PSS) (Cohen et al. 1983) and The Centre for Epidemiological Studies Depression Scale (Radloff 1977). However, if these results were actually testing criterion validity as discriminant validity, it would mean that correlations among measures of the same concept measured with different methods need to be consistently higher than the correlations among measures of different concepts using the same method (Campbell and Fiske 1959). In addition, contrary to the results in the original version of the FS, all the other validations conduct confirmatory analyses showing good fit scores (CFI between .90 and .986, RMSEA between .041 and .08.). Regarding reliability, all validation studies show internal consistency scores between .83 and .95, although the French version does not report it.

Different well-being instruments have been validated and used in the Spanish context including the Satisfaction With Life Scale (SWLS) (Atienza et al. 2003), Day Reconstruction Method (DRM) (Caballero et al. 2014) and the Subjective Happiness Scale (SHS) (Extremera et al. 2011). All these instruments measure the individual perception of satisfaction using different formats of questions. A recent Spanish validation of the FS was conducted using non-probability samples from Spain and Colombia (Pozo-Muñoz et al. 2016). While this validation use samples from two Spanish-speaking countries and assess invariance, some limitations are to be noted. First, the authors did not carry out a back-translation of the items, as recommended by the International Test Commission Guidelines for test translation and adaptation (Muñiz et al. 2013; Muñiz et al. 2016). Second, Spanish and Colombian Spanish are different in some aspects, and the use of a translated scale into Spanish may not be equivalent in both countries. Third, although the authors assessed invariance between samples, cross-cultural adaptations require semantic, idiomatic, experiential and conceptual equivalence (Arafat et al. 2016; Muñiz et al. 2016), not only metric equivalence. Fourth, the Pozo-Muñoz version of the FS used a 5-point Likert scale rather than the 7-point Likert scale used by the original FS and all other validations (Diener et al. 2010; Hone et al. 2014a, b; Sumi 2014a; Silva and Caetano 2013; Tang et al. 2014) not justifying the change theoretically nor statistically. Other limitations include a fully academic sample, the lack of psychometric analysis of features such as convergent and criterion validity, temporal reliability and using the same sample to analyse exploratory and confirmatory factor structures rather than randomly splitting it in two as suggested in the literature (Brown 2015).

The present study aimed to address the limitations of the previous Spanish validation of the FS. Given the promising psychometric properties of the FS in different countries and populations, the objective of this paper was to validate the FS in a sample of Spanish adults. The psychometric properties of the scale were analysed from an exploratory and confirmatory perspective.

Methods

Design and Participants

The present project was a cross-sectional study using a non-probabilistic sample of the general population. Participants were recruited via email and social media including an explanation of the study and a link to LimeSurvey, an open source survey tool (LimeSurvey Project Team, Schmitz, Carsten 2012). Participants were included if they identified themselves as Spanish (irrespective of the country they lived in) and were at least 18 years of age. The final sample was comprised of 999 participants. Following standard guidelines for analysing the factorial structure of a scale, the sample was randomly split in two in order to conduct exploratory (n = 502) and confirmatory (n = 497) analyses (Anderson and Gerbing 1988; Brown 2015). The sample used for the retest was drawn from the total but the number of observations was reduced down to 102 individuals due to considerable attrition.

The average age of the total sample was 28.4 (SD = 11.7) ranging from 18 to 71 years of age. Of the sample, 31.3% were men, 96.0% had completed at least college/university education and 34.1% were married or cohabiting. Regarding the working status, 41.9% were students only, 27.3% were employed only, 24.9% were employed and students at the same time, 3.6% were unemployed, 1% inactive and 3.6% retired. Regarding Sample 1, the average age was 28.98 (SD = 11.99) and 31.3% were men. The mean age of Sample 2 was 27.77 (SD = 11.27) and 31.7% were men. There were no statistical differences between socio-demographic characteristics of the two samples (see Table 1).

Table 1 Socio-demographic characteristics of the total sample and the two subsamples

Measures

The Flourishing Scale (Diener et al. 2010) is an eight-item instrument describing important aspects of human functioning including positive relationships, feelings of competence and having meaning and purpose in life. The instrument uses a seven-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree). Total scores range from 8 to 56 with high scores indicating respondents viewing themselves in positive terms in important areas of functioning.

Convergent Validity

The Satisfaction With Life Scale (SWLS) is a 5-item instrument designed to measure global cognitive judgment of satisfaction with one’s life (Diener et al. 1985). Participants indicate how much they agree or disagree with each of the 5 items using a seven-point scale that ranges from seven (strongly agree) to one (strongly disagree). Items include ‘In most ways my life is close to my ideal’ or ‘I am satisfied with my life’. Scores range from 7 to 35, higher scores indicating higher satisfaction. The Spanish version used in the present study has been validated in a general sample of Spanish adults and shown an internal consistency of .858 (Atienza et al. 2003). In our total sample, internal consistency was .858.

Criterion Validity

The Life Orientation Test-Revised (LOT-R) has been used to measure optimism (Scheier et al. 1994). The scale is comprised of ten items, four measuring control, three pessimism and three optimism. Each item of the LOT-R is answered on a 1–5 scale that ranges from strong disagreement to strong agreement. Examples of items from this scale are ‘In uncertain times, I usually expect the best’ or ‘I rarely count on good things happening to me’. The scores range from 0 to 12, higher scores indicating higher optimism and lower pessimism. The Spanish version used in the present study has been validated in a general sample of Spanish adults and has shown sound psychometric properties (Ferrando et al. 2002). In our total sample, internal consistency was .763 for optimism and .701 for pessimism.

The Positive And Negative Affect Schedule (PANAS) measures two domains, positive and negative affect (Watson et al. 1988). This 20-item instrument uses a five point Likert scale ranging from 1 (not at all) to 5 (extremely). Items include ‘interested’, ‘excited’, ‘ashamed’ or ‘irritable’. Scores range from 10 to 50, higher scores indicating higher affect. The Spanish version used in the present study has been validated in a large sample of Spanish adolescents and young adults showing internal consistency scores from .80 to .86 (Ortuño-Sierra et al. 2015). In our total sample, internal consistency was .845 for positive affect and .855 for negative affect.

Procedure

The Spanish adaptation process of the FS was conducted using criteria proposed by Muñiz et al. (2013, 2016). The authors stress the need to ensure conceptual, linguistic and metric equivalence. The adaptation of the scale was conducted using the translation-back translation method by two bilingual translators. Discrepancies between both translators were reported to the original author, Ed Diener, for clarification. Later, two qualified judges studied the conceptual and linguistic meaning of the item in Spanish for refinement. The final Spanish version of the FS is shown in Appendix 1. Participants who accepted to enrol in the study had to read and accept an online inform consent before entering the study. Data collection complied with the Spanish Law of Data ensuring confidentiality and anonymity. Data collection started in February 2015 finishing in September 2015. A retest was conducted one month later in order to obtain temporal reliability data. The study met the standards set by the Comité de Ética de la Investigación y Docencia at the Universidad de Valencia (UV).

Data Analysis

A first sample was used to conduct an exploratory factor analysis (EFA) and calculate reliability and item analysis of the FS. Temporal reliability was obtained by calculating the correlation between the total FS score at baseline and the retest total FS score one month later. Convergent validity was measured by calculating the correlation between the total FS score and the SWLS. It was hypothesised that high scores of the FS would be associated with high SWLS scores. Criterion validity was measured using simple linear regression including the total FS score along with scores of optimism and pessimism (LOT-R) and positive affect and negative affect (PANAS). It was hypothesised that higher FS scores would be associated with higher scores in optimism and positive affect and lower scores in pessimism and negative affect.

A second sample was used to conduct confirmatory factor analysis (CFA) testing the one-dimensional model as suggested in the literature (Diener et al. 2010). Polychoric correlations and robust χ2 fitting indicators were calculated given the number of response options and the non-parametric nature of the data (Babakus et al. 1987; DiStefano 2002; Finney and DiStefano 2006; Hutchinson and Olmos 1998; Jöreskog et al. 1999). Due to the sensitivity of the sample size of the χ2 goodness of fit test, we used the Comparative Fit Index (CFIs) and Root Mean Square Error of Approximation (RMSEA) to determine model fit. CFI values of .90 or above and RMSEA values above .06 and below .08 are indicative of good empirical fit (Schumacker and Lomax 2004). Reliability in this confirmatory sample was calculated using the Composite Reliability Index and the Average Variance Extracted Index (Raykov 1997). Data analyses were conducted using SPSS 22.0 (SPSS, I. 2013) and EQS 6 (Bentler 2006).

Results

Results from the exploratory factor analysis (EFA) can be found in Table 2. Using principal axis factoring with the eight FS items, we found good sample adequacy (KMO = .876; χ2 de Barlett = 1377.923; p = .000). EFA led to a one-dimensional factor solution explaining 42.3% of the variance and all saturations ranged from .540 to .730. Internal consistency alpha was .846 not improving with the deletion of any item. Corrected item-total correlations ranged from .356 to .493. Inter-item correlation also seemed adequate, ranging from a minimun of .318 to a maximum of .570.

Table 2 Item analysis of the FS in sample 1

Temporal reliability was calculated using 102 participants who completed the retest survey. Test-retest correlation was .749, p < .001. Regarding convergent validity, correlation with SWLS was .521, p < .001. Regarding criterion validity, correlations with positive affect, optimism, negative affect and pessimism were .422, .488, −.270 and −.385 respectively. All of them were all significant and in the expected direction.

Confirmatory factor analysis (CFA) was carried out to test the one-dimensional model. CFA’s χ2 was significant (χ2 = 65.5765; df = 20) although the goodness of fit indicators were satisfactory (CFI = .982 y RMSEA = .06). All saturations ranged from .53 to .74. The model’s goodness of fit was tested again after deleting item 1, which showed a lower variability and saturation than the rest of the items. The deletion of item 1 did not improve the model (CFI = .981; RMSA = .08). The Average Variance Extracted Index was .518 and the Composite Reliability Index was .841.

Discussion

The aim of this study was to adapt the FS to Spanish and to validate it in a sample of Spanish adults. The psychometric properties have been analysed using standard criteria (Muñiz et al. (2013, 2016). Overall, the present Spanish-language adaptation of the FS showed adequate reliability and criterion validity. Both exploratory and confirmatory factor analyses provided support for a one-dimensional flourishing construct using an eight-item scale. This one-factor solution has been found previously in other studies using the same scale. The present adaptation has addressed many of the limitations of a recent Spanish translation validated in Colombia and Spain (Pozo-Muñoz et al. 2016). The limitations addressed include conducting the back-translation of the scale, using a general population sample, analysing convergent and criterion validity, temporal reliability and randomly splitting it in two to analyse exploratory and confirmatory factor structures.

Our study showed high scores in FS items which has also been found in previous studies including that of the original scale (Diener et al. 2010). The FS was created to measure high levels of well-being (Keyes 2002). Therefore, in a continuum of psychological well-being, flourishing would be located on the most positive side. Only the Japanese version found slightly lower scores. These lower scores could be related to a different cultural meaning of well-being (Kitayama et al. 2000).

The exploratory factor analysis in this study has several differences and similarities with previous validations of the FS scale. This study found a lower percentage of explained variance compared to previous studies (Esch et al. 2013; Hone et al. 2014a, b; Tang et al. 2014) and similar to the French version (Villieux et al. 2016). Item 1 showed a low variability and a high mean compared to the other items on the scale. Low variability and a high mean may compromise the scale’s cumulative error and could be one of the reasons why the percentage of explained variance is not as high as previous versions of the scale even though it showed good evidence of construct validity. Regarding other types of validity, the present study found similar levels of convergent and criterion validity as well as similar internal consistency compared with previous validations (Silva and Caetano 2013; Sumi 2014a). Temporal reliability is higher than the original validation of the scale (Diener et al. 1985) but lower than japanese study (Sumi 2014b).

Regarding the confirmatory factor analysis, the Spanish sample showed adequate goodness of fit, and the Average Variance Extracted Index and the Composite Reliability Index calculated were satisfactory. The CFI had values that were higher than those found in the Portugal, New Zealand and China validations (Hone et al. 2014a, b; Silva and Caetano 2013; Tang et al. 2014), but lower compared the Japanese validation (Sumi 2014a). The residual RMSEA values were lower than the ones found in the Japan and New Zealand validations (Hone et al. 2014a, b; Sumi 2014a). In the latter adaptation, the RMSEA was found not to be adequate and only showed an acceptable value that was correlated with some items’ errors (Hone et al. 2014a, b).

This study has two main implications. First, the FS can now be used to assess efficacy of interventions aiming to increase well-being as a primary or secondary aim. Trials have shown that interventions such as a six-week yoga intervention (Manincor et al. 2016), a web-based happiness training (Feicht et al. 2013) and body-mind medicine (Gimpel et al. 2014) can increase FS scores in different populations and settings. Second, there has been an increasing interest from governments to measure well-being at a national and international level in the past decade (Helliwell 2003). The FS scale is being adapted and validated in different countries and languages. Therefore, the FS scale has the potential to be used in wide international epidemiological studies to compare levels of well-being and to assess risk and protective factors. The use of this scale would suppose a quick, novel and psychometrically sound way to measure well-being.

This study has some limitations. First, the paucity of retest data made it impossible to assess confirmatory temporal reliability. Second, the present study did not assess discriminant validity. Further evidence is needed to test whether several constructs that are supposed to be unrelated to flourishing such as affect intensity and impulsivity, are indeed unrelated (Diener et al. 1985). Third, the sample used in this study was a non-probabilistic one. In order to prove external validity, the present scale should be validated in a representative sample of the Spanish population.

Further studies could assess the suitability of the response scale as the literature suggests seven-point Likert scales are excessive and not more efficient compared to five (Hartley 2014; Lai et al. 2010; McDonald 2004). Additionally, many studies question central values in response scales given that the verbal statement that acompanies the central number does not mean the individual considers they are in the middle of the continuum of the corresponding latent construct (Dalal et al. 2014; González-Romá and Espejo 2003; Hernández et al. 2006; Kulas et al. 2008; Kulas and Stachowski 2009; Murray et al. 2015; Onwuegbuzie and Weems 2004). Also, the probability of chosing a central value in the scale is much lower than chosing any other value even for people whose trait levels are central (González-Romá and Espejo 2003; Hernández et al. 2006). If future studies showed that reducing the number of response options were possible, the scale would be more parsimonious and would increase the validity of the responses without decreasing reliability. Future studies could also analyse the item characteristics from the Item Response Theory perspective. These studies would help analyse the information contributed by each item and decide whether item 1 provides enough information to keep it in the scale. Also, psychometric equivalence studies could be conducted using different versions of the scale in order to test whether there is a differential functioning item.

Conclusion

There is enough evidence to conclude that the Spanish version of the FS is a reliable and valid method for measuring high levels of well-being. The FS is a scale that is short and easy to understand and can be used for different research designs including trials and large surveys. This study has set the grounds for these different study designs in Spain and for clinical use.