1 Introduction

The number of empirical applications of the capability approach (CA) has grown steadily since its foundation in the early 1980s. As an approach to assess human development, the CA focuses on the substantive freedom of opportunities that enable individuals to achieve certain goals and to live a life that they have reason to value (see e.g., Sen 1979, 1985a, b, 1987). Thus, a capability is the ability to achieve rather than the achievement. Accordingly, the CA has been increasingly acknowledged in the theoretical (e.g., Clark 2002; Alkire 2005; Kuklys and Robeyns 2005; Robeyns 2005a; Gasper 2007; Fleurbaey 2008) and often agency focused (e.g., Ibrahim and Alkire 2007; Crocker 2008; Abel and Frohlich 2012) literature. During the last two decades, however, the CA has also become increasingly utilised in empirical studies of inequalities, indicators of well-being and broader quality-of-life measures (e.g., Burchardt and le Grand 2002; Anand et al. 2005; Robeyns 2005b, 2006a; Robeyns and van der Veen 2007; Grasso and Canova 2008; van Ootegem and Spillemaeckers 2010; Burchardt and Holder 2012). Most recently, closed survey instruments on self-reported capabilities have found their way into the published literature (Anand and van Hees 2006; Burchardt and Vizard 2007; Coast et al. 2008a; Lorgelly et al. 2008; Anand et al. 2009; van Ootegem and Verhofstadt 2012). The number of empirical applications that employ those instruments can be assumed to grow during the next decade (Robeyns 2000; Volkert and Schneider 2011). Most often, these empirical applications refer to a theoretical list of central human capabilities developed by Nussbaum (2000). Details and further information on the list by Nussbaum are not part of this study, but they are formidably described elsewhere (see Nussbaum 2006; Robeyns 2006b; Schokkaert 2007, 2009).

Despite the undisputable appeal of the approach to many disciplines, until now, psychometric properties including reliability, validity, and factor structure have not yet been presented in published studies. To our knowledge, the only current exception is a study by Coast et al. (2008b) who reported preliminary findings of the construct validity of a self-reported capability index among elderly people in the United Kingdom. Coast et al. (2008b) provide evidence of good construct validity for their items, but do not report on the internal consistency or the factor structure of their measure.

The purpose of the present study is to investigate the psychometric properties of three new language versions of a previously suggested measure of self-reported capabilities (Anand and van Hees 2006; van Ootegem and Verhofstadt 2012). The original English instrument of eight capability items was developed and first applied by Anand and van Hees (2006) in a small population of 273 English voters. The eight questions capture individuals’ self-reported capabilities in seven important domains of life: “happiness”, “sense of achievement”, “health”, “intellectual stimulation”, “social relations”, “environment”, and “personal integrity”. A global summary item on “capabilities overall” supplements the seven capability domains. The seven domains have already been widely used in empirical applications of the CA in affluent populations (see van Ootegem and Spillemaeckers 2010; Binder and Coad 2011; Volkert and Schneider 2011). Recently, van Ootegem and Verhofstadt (2012) applied the list and its items in a sample of Belgian students to explore determinants of self-reported capabilities and life-satisfaction. In line with other authors who have employed similar lists of self-reported capabilities, however, they did not report on the psychometric properties of the instrument and its distinct items.

1.1 Study Aims

The present study is a positive response to some of the gaps in the current literature and aims at contributing to the empirical literature on self-reported capabilities. We provide sound translations for an established English set of eight self-reported capabilities into German, French, and Italian. We report on psychometric properties including reliability, validity, and factor structure of these three language versions. Further, we investigate the contribution of the seven capability domains to reported overall capabilities. Our sample is considerably larger than the original study (Anand and van Hees 2006) and we are thus able to demonstrate that the application of the instrument and its items is feasible in large scale survey questionnaires. Finally, the study also represents the first attempt to assess self-reported capabilities in young male adults in Switzerland.

2 Methods

2.1 Translation

In the absence of any translation of the instrument and its items, we put additional emphasis on the translation process. We used the current practice guidelines of the Translation-Review-Adjudication-Pre-Testing-Documentation (TRAPD) Protocol as standard methodology to translate, design, and adapt (Harkness 2003). As a multistage survey translation protocol it included five different stages: translation; review; adjudication; pre-testing; and documentation. As an initial step, drawing on the English originals, a German master version was created by a committee of qualified translators. Translators were systematically selected to include both experts who were familiar with the theoretical concept of the CA and uninformed linguists who were not aware of the concept. A back-translation method was applied. In the second step, the German master version was translated into French and Italian. This process was implemented simultaneously. Each translation involved an independent expert committee without prior knowledge of the other translation processes. Translators involved in the process were bilingual with respect to the language of the master version and the target language.

2.2 Pre-Testing

Seven focus group tests with young adults (84 participants) and one expert group test were conducted to check for face validity, errors, and deviations between the translations. Preliminary consensus versions were pre-tested with 227 participants (aged 18–28 years). Focus-group tests, the expert group test and the pre-test included in-depth feedback, cognitive debriefing, and evaluations of specific parts. Results from the pre-test were discussed in a second round of three focus groups of young adults (30 participants). Final consensus versions included a few refinements and were tested during a pilot study with 1,257 participants (aged 18–25 years). Final versions were presented to and approved by the survey’s scientific advisory board which had observed the entire translation cycle. Figure 1 shows the translation process and the samples that were administered during pre-testing, piloting, and the main survey administration in 2010.

Fig. 1
figure 1

Translation process

2.3 Survey Participants

Data for the analysis were obtained as part of the Swiss Federal Surveys of Adolescents (ch-x) conducted in 2010. This large cross-sectional survey biannually enrols young males between the ages of 18–25 years from all language regions in Switzerland. The survey takes place during mandatory conscription at six national recruiting centres in Switzerland. A detailed description of the aims and conceptual framework of the survey can be found elsewhere (Mohler-Kuo et al. 2006).

In Switzerland, military service is mandatory for all male residents holding Swiss citizenship. Every Swiss male living in Switzerland, unless severely disabled, has to attend conscription, usually at the age of 19. However, a small proportion of young Swiss men have to bring conscription forward or postpone it, which is why the age of the conscripts varies from 18–25 years. For Swiss females, military service is voluntary. Because only a few choose to do so (Federal Department of Defence, Civil Protection and Sport 2012: 157 in 2008; 115 in 2009; and 141 in 2010), there is only sufficient data on young male adults.

2.4 Procedure

The survey is separately conducted from the actual recruitment process and is administered as a paper and pencil questionnaire in a classroom setting. The conscripts are provided with a standardised introduction by trained non-military staff. They are informed that participation is voluntary as well as anonymous and that members of the army will not see any of the questionnaires. The final survey sample consisted of 17,152 18–25 year old Swiss male citizens from the three major language regions. The average age was 19 years. The majority of participants were German speaking (72.92 %), 20.68 % were French speaking, and 6.40 % were Italian speaking. We excluded individuals that had one or more missing values in the respective capability items. Omitting individuals with missing values reduced the total sample to an effective sample of 16,193 individuals (German: 11,796; French: 3,353; Italian: 1,044) with valid answers in all items.

2.5 Instrument

2.5.1 Capability Domains

In their initial study, Anand and van Hees (2006) developed and applied a set of questions to measure individuals’ self-reported capabilities. Seven items capture self-reported capabilities in different domains of life: that is, “happiness”,“sense of achievement”,“health”, “intellectual stimulation”,“social relations”, “environment”, and “personal integrity”. Participants can answer all items on a 7-point Likert scale with only the middle and the extreme values labelled (“7 = very inadequate”, “4 = moderate”, “1 = very good”). In our analyses, we use reversed coding to ensure an ascending order of responses. Furthermore, we changed the wording of the lowest possible answer category from “very inadequate” to “very bad” to better reflect uniform answer categories (for exact wording of items and the reversed coding see the “Appendix”). The seven capability domains are considered as particularly relevant in affluent populations (Nussbaum 2000, 2006; Schokkaert 2007; van Ootegem and Spillemaeckers 2010; Binder and Coad 2011; Volkert and Schneider 2011) and are assumed to represent a disaggregation of overall capabilities (Anand and van Hees 2006, p. 271).

2.5.2 Overall Capabilities

The set of questions also includes a global summary item measurement of self-reported overall capabilities: “Taking all things together, I think my options are:…”. Again, participants can answer on a 7-point Likert scale. This methodological approach (i.e., multiple domain specific measurement and a single global summary measurement) has been found fruitful in studies on subjective well-being and quality-of-life (for more detailed discussions see Diener 1984; Larsen et al. 1985; Lucas et al. 1996; Diener 2000; Cummins et al. 2003; International Wellbeing Group 2006; Wu and Yao 2007; Tomyn and Cummins 2010; Casas et al. 2012). Generally, in a bottom-up approach, satisfaction in different domains of the same construct contributes to the explanation of satisfaction in a global measure of this construct.

2.6 Data Analysis

All analyses were of an exploratory nature and were conducted using STATA (version 11, StataCorp 2009). First, we computed descriptive results for all eight categorical items of self-reported capabilities. Second, we computed Cronbach’s alpha coefficient to assess internal reliability for the seven domain specific items. Third, we computed measures of association between all items using Spearman rank correlations. Fourth, we conducted exploratory factor analysis to assess the underlying factor structure of the seven domains. Fifth, we assessed concurrent validity using Spearman correlations between a simple sum score of the seven capability domains and the global measure of overall capabilities. Finally, to further validate each domain’s contribution, we used standard multiple regression analyses to assess the predictive capacity of each capability domain to the variability in the summary item.

3 Results

3.1 Descriptive Statistics and Missing Data

Tables 1 and 2 report the descriptive results and missing data for all eight categorical items. Table 1 presents the distribution of answer categories in absolute numbers. Participants with missing values in at least one of the items were omitted from all analyses (<6 % of all cases). We considered the number of missing values among the items as acceptable as the number was generally low and with a maximum of 2.5 % in the French language version (Table 2). Because the sample was large enough, we decided not to replace missing values. In the resulting effective sample, 16,193 respondents had valid responses to all eight items. Item scores, corresponding mean values and standard deviations (SDs) for the seven domain specific items were generally high and ranged from 5.707 (±1.149; “intellectual stimulation”) to 6.265 (±.964; “sense of achievement”) in the German, from 5.668 (±1.216; “intellectual stimulation”) to 5.908 (±1.136; “happiness”) in the French, and from 5.905 (±1.111; “intellectual stimulation”) to 6.123 (±1.112; “social relations”) in the Italian language versions respectively. Mean values and SDs for the summary item “capabilities overall” were 6.000 (±1.025), 5.889 (1.100), 5.686 (±1.194) in the German, French, and Italian language versions respectively. When ranked by their means, we found that the respondents reported the lowest capability score for “intellectual stimulation” and “capabilities overall” in all three language versions. In contrast, “sense of achievement” and “happiness” appeared consistently among the questions with the highest reported capability score (Table 2).

Table 1 Distribution of answer categories in absolute numbers
Table 2 Items with corresponding domains, means, standard deviations and missing data

3.2 Reliability

We assessed internal reliability for the seven domain specific items. We considered Cronbach’s alpha (α) coefficient to reach at least .7 and item-total correlations to reach at least .5 to indicate good internal reliability, that is, consistency among the items (Clark and Watson 1995; Zumbo et al. 2002). We found α coefficients to be sufficiently high with .853, .870 and .877 in the German, French, and Italian language versions respectively. In all three language versions α coefficients decreased if any of the seven domain specific items were deleted. Item-to-total correlations were all above .5 and ranged between .569 (“intellectual stimulation”) and .688 (“sense of achievement”) in the German version, from .574 (“intellectual stimulation”) to .720 (“sense of achievement”) in the French version, and from .556 (“personal integrity”) to .722 (“sense of achievement”) in the Italian language version. We decided to keep all items in our analyses based on these satisfactory results.

3.3 Correlations

Table 3 reports Spearman rank correlations (r) for all item correlations. We required correlations of at least .3 for items to be included in a factor analysis in the next step. Overall, we found correlation coefficients to lie within a .3 to .7 range in all language versions. Most items showed correlations between .4 and .6 which we defined as moderate associations. In the German language version, out of the 28 item correlations, four items showed correlations below .4 which we defined as a weak association. In the French language version, only one out of the 28 correlations was weak. In the Italian language version, two of 28 correlations were weak. However, not all correlations are equally important. We observed that “intellectual stimulation” showed weak associations more often, when correlated with other item domains, in all three language versions. “Sense of achievement” and “happiness”, in contrast, showed a strong association above .6 to one-another in all three language versions (German: r = .623; French: r = .627; and Italian: r = .647). The summary item “capabilities overall” was positively correlated to all seven capability domains. But it showed the highest association to “sense of achievement” and “happiness” (r: .543–.606).

Table 3 Spearman correlation matrix and results from exploratory factor analysis

3.4 Factor Analysis

To further assess if our data were suitable to conduct a factor analysis we examined the results of the Kaiser–Meyer–Olkin Measure of Sampling Adequacy (KMO) and the Bartlett’s Test of Sphericity (Thompson and Daniel 1996; Pett et al. 2003). We found the KMO of .883 to be larger than the recommended value of .6. The Bartlett’s Test of Sphericity was significant with χ2 (21; 16,546) = 45,177.865 and p < .001. Given these results, a factor analysis was undertaken. Krishnakumar and Nagar (2008, p. 490) suggest exploratory factor analysis (EFA) as the simplest latent variable model that should be preferred above the commonly used method of principal component analysis because the latter is limited to data description. We computed EFA using principal axis factoring. We selected the number of factors based on eigenvalues greater than 1 and factors that lay above the elbow of the graphical results of a scree plot (corresponding scree plots are presented in the “Appendix”). We found a single common factor solution in all three language versions (see Table 3). All seven domain specific items loaded between .6 and .9 on this single factor. Although we found that the factor loadings differed in their magnitude between the three language versions, the domain with the highest loading (.795–.814; “sense of achievement”) and the domain with the lowest loading (.685–.725; “intellectual stimulation”) onto the extracted single factor were identical across the three language versions. Further, we found most communality values to be above the commonly recommended .5 (Tabachnick and Fidell 2006) and estimates for each domain ranged between .4 and .7 (Table 3). Based on these results, we concluded that all items demonstrated a substantial overlap with the extracted single factor that explained between 53 and 58 % of the variance in the three language versions (German: 53.50 %; French: 56.26 per cent; and Italian: 57.63 %).

3.5 Sum Score

We constructed a sum score for the extracted common factor, that is, the sum of questions C1 to C7 ranging from 7 to 49. A high score on the scale expresses greater self-reported capabilities. Figure 2 shows kernel density plots of the sum score for each language version. The plots provide an impression of the distribution of the scale. For each language version, the plot suggests that most of the response scores are at the high end of the scale (i.e., representing high self-reported capabilities). However, we found no floor or ceiling effects (>20 %) in any of the language versions. Based on these results we did not exclude respondents with the highest possible score of 49.

Fig. 2
figure 2

Kernel density plots of the sum score of the extracted common factor. a German language version. b French language version. c Italian language version

3.6 Concurrent Validity

It was hypothesized that the seven domains are positively related to the global summary item. We used “capabilities overall” as criterion to assess concurrent validity by computing Spearman rank correlation coefficients between the sum score of the seven domain specific capability items and the global summary item (Hsieh 2003). We found a strong and statistically significant association in all three language versions (German: r = .651; French: r = .676; and Italian: r = .671; with p < .001 for all three language versions).

3.7 Domain Validation

Table 4 presents results of the standard multiple regression analysis. We regressed the seven domain specific items on the summary measure “capabilities overall”. We used squared semipartial correlations (sr 2) to calculate the unique explained variance the seven domains made in each language version. We included the seven capability domains as predictors of overall capabilities into the simple model and ran the analyses. The goodness-of-fit of the regression analyses is measured by the adjusted R-squared coefficient (R 2). We found similar adjusted R 2s for all three language versions with coefficients of .450, .456, and .491 for the German, Italian and French language versions respectively which we accepted as reasonably high. The results from these analyses also showed that the contributions of all seven domain specific items to the prediction of the global summary item were all positive. The sr 2s ranged between .001 (i.e., .1 %; “intellectual stimulation”) and .049 (i.e., 4.9 %; “happiness”). Although we found most of the domain contributions to be highly statistically significant with p < .001, “social relations”, however, contributed positively in all three language versions, but was not statistically significant in the Italian language version (with p > .05).

Table 4 Prediction of overall capabilities using the seven domain specific items

4 Discussion

Our results demonstrate good psychometric properties and a unitary factor structure of seven capability domains in all three language versions. Internal reliability was high with Cronbach’s α coefficients ranging between .853 and .877. Our findings provide support that the seven domains could be grouped under a common unitary factor structure. All seven items loaded on a single common factor that explained between 53 and 58 % of the variance in all three language versions.

Some domains are worth mentioning for the reader. “Sense of achievement” consistently showed the highest loading onto the unitary factor (.795–.814). Potentially valuable, this finding is in line with Sen’s concept of capabilities per se. According to Sen (1985b, p. 203) the degree of capability an individual perceives to achieve his or her goals in life provides the most complete picture of his or her global freedom to “achieve in pursuit of whatever goals or values he or she regards as important”. This result may especially be valuable if other studies could confirm this finding as it reflects an agreement with the overarching concept of the CA. Apart from “sense of achievement”“happiness” was an important predictor of “capabilities overall” and was highly associated to “sense of achievement”. This finding is less surprising. In studies on well-being, happiness is traditionally located among one of the most important domains (Diener 2000; International Wellbeing Group 2006). Accordingly, measures of subjective well-being are key elements of the CA; but more importantly, they are considered only in conjunction with elements that have an intrinsic value to well-being (Sen 1985b, 1987; Schokkaert 2007). Thus, the opportunity to lead a happy life is important as such, but the opportunity to achieve whatever he or she perceives as valuable in life is superior from a capabilities perspective. “Intellectual stimulation”, in contrast to the former two domains, showed the lowest factor loadings (.685–.725) in all three language versions. While Anand and van Hees (2006) included this domain in their initial study in an adult population, van Ootegem and Verhofstadt (2012) changed the wording of the domain to “education, information and culture” when they distributed the instrument among Belgian students. Although van Ootegem and Verhofstadt (2012) do not discuss their motivation, “intellectual stimulation” might be more difficult than other domains to evaluate at age 18–25 years. Thus it is possible that the age of our sample explains our finding.

Furthermore, our findings suggest that a global measure of “capabilities overall” can be used as an alternative to seven capability domains in all three language regions. We evaluated the instrument’s concurrent validity by examining its two inherent measurement approaches to self-reported capabilities (domain specific versus global). We found strong associations (r > .6) between respondents’ score to “capabilities overall” and the sum score of the seven capability domains. Our results show that both measures are very closely associated, without being congruent. Our results also confirm a positive contribution of each of the seven domains to the prediction of explained variance in the global summary item. Altogether, around 47 % of the adjusted variability in “capabilities overall” could be predicted from the variability in the seven capability domains (adjusted R 2: .450–.491). Comparing this magnitude to studies on subjective well-being, we find surprisingly similar magnitudes (Cummins et al. 2003; Tomyn and Cummins 2010; Casas et al. 2012). Some of them are even lower than our results. From a methodological point of view, this finding is promising because the two measurement approaches are in line with studies that relate to subjective well-being (Diener 1984; Cummins et al. 2003). This finding is also particularly valuable because it is in line with the theoretical hypothesis that self-reported overall capabilities represent an aggregation of different capability domains (Anand and van Hees 2006, p 271; van Ootegem and Verhofstadt 2012, p.142). “Taking all things together, I think my options are:…” may, however, capture some domains more than others. Although intuitively, a single indicator of capabilities may be less reliable than a multiple item indicator, future research should yet explore this relationship further.

Besides strong similarities, our results also reveal differences between the language versions. The relative contributions the seven domain specific items made in terms of factor loadings and unique variances are somewhat different. Although we regarded those differences as rather small, it is, however, possible that the contribution of some domains is not equivalent across different cultural contexts (e.g., such as language regions may represent). This finding is in line with Casas et al. (2012) who found differences in the contribution of subjective well-being domains on a measure of overall life satisfaction between countries. The authors argue that there is good reason to assume that socio-cultural contexts have an influence on predictors of subjective well-being (p. 480). Their contention reflects recent findings and recommendations from the International Wellbeing Group (2006) that patronize our findings.

Before the present study, psychometric properties of closed survey instruments of self-reported capabilities were not available from published studies. To our knowledge, the only current exception is a British study by Coast et al. (2008b). Recently, van Ootegem and Verhofstadt (2012) have used the same list of self-reported capabilities by Anand and van Hees (2006) that we used in the present study. In the present study we respond to a lack of knowledge on the psychometric properties and use three initially translated language versions to assess their psychometric properties in a sample of young male adults between 18–25 years in Switzerland.

4.1 Strengths and Limitations

There are strengths and limitations to our study that should be mentioned. It is worth highlighting that the present study represents the first attempt to assess self-reported capabilities in Switzerland using translated versions of a previously established set of self-reported capabilities (Anand and van Hees 2006; van Ootegem and Verhofstadt 2012). With the present study we respond to the lack of knowledge on the psychometric properties of published capability instruments. The current study is also a response to the lack of translated language versions and applications of published instruments. However, the present study has some limitations in terms of external validity. Furthermore, our results are drawn from a sample of young male adults. To enhance generalisibility, the assessment of the instrument and its items should be repeated among females, different age groups, and possibly other countries. Also, because in the present study our aim was on providing sound translations and to explore the psychometric properties of the set of items, we were not concerned about potential sources of satisfaction or dissatisfaction with specific capability domains, and analyses of consecutive questions will have to follow. Last, the rigorous translation procedure was helpful to achieve conceptual and linguistic equivalence between the German, French and Italian versions. However, because the translation procedure included two steps that are, translation from English to German and translation from German into French and Italian we cannot be sure that we achieved maximal equivalence between the English original and the French and Italian translations.

4.2 Directions for Future Research

The results we report here are encouraging, but future work is needed. New lists have been compiled during and after the present study was conducted (see e.g., Anand et al. 2009). It would have been ideal to compare these instruments of self-reported capabilities to the set of eight items we have used in this study. But time of study set up and availability of alternative lists did not allow such a comparison in the present study. But doing so in the future would be remarkably useful and desirable. Finally, additional encouragement for future work should be directed to the translation as well as the application of already existing instruments of self-reported capabilities which is against the current trend of producing new lists and items. Cultural and linguistic adaptations of existing instruments may be time consuming and resource intensive, but play a key role in the process of developing assessment tools that bear similar psychometric properties.

5 Summary and Conclusion

Using a previously suggested instrument to measure self-reported capabilities, the present study contributes to the growing body on empirical research on capabilities. In this study we use three translated versions of a set of eight capability items. Psychometric analyses support each of the translated versions as a valid and reliable tool to assess self-reported capabilities in a Swiss sample of young male adults. These analyses also suggest that the instrument’s summary item on overall capabilities represents an alternative measure to seven capability domains. While the applicability of the overall set of items in other populations would need further empirical evaluation, the experiences gained from the current study intend to encourage and inform such future work.