1 Introduction

The World Health Organization’s Quality of Life (WHOQOL) instrument was initially developed for three purposes: (1) to extend the dimensions of health measurement beyond traditional health indicators, (2) to develop a more universal instrument for assessing the quality of life (QoL) cross-culturally, and (3) to assess more humanistic elements to promote a holistic approach to health and health care (WHO 1996, 1998; Kuyken et al. 1994). To develop the WHOQOL, based on a universally agreed upon definition, the WHO defined the QoL (WHO 1996, 1998) as “individuals’ perceptions of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards and concerns.” This definition reflects the multidimensional nature of QoL, which looks for the effects of disease and health interventions on QoL rather than the measurement of any detailed symptoms, diseases, or conditions. Beginning with an original WHOQOL pilot assessment, the WHOQOL-100 QoL assessment was developed by the WHOQOL Group with 15 international filed centers (Power et al. 1999). Later on, the abbreviated version of WHOQOL-100, the brief version of World Health Organization’s Quality of Life (WHOQOL-BREF) was developed because the WHOQOL-100 was too lengthy for practical use. WHOQOL-BREF was introduced in a generic English language version and translated in multiple different languages (WHO 1996; WHOQOL Group 1998; Skevington 2001). WHOQOL-BREF has been validated internationally (Skevington et al. 2004a, b) and widely used in a variety of cultures in the last two decades.

Worldwide, the studies using WHOQOL were mainly based on older adults (Lisiane et al. 2007) or adult patients (Bonomi et al. 2000; Trompenaars et al. 2005). Although the WHOQOL is eligible to be administered among young adults (WHOQOL Group 1998), studies with the WHOQOL-BREF among healthy young people, e.g., college students, were sparse. In Thailand, one of the original 15 participating countries, the reliability and validity of the Thai version of WHOQOL-BREF has been tested among a variety of Thai populations, including middle-aged women living with a disability (Rukwong et al. 2007), radiotherapy cancer patients (Phungrassami 2004), patients living with HIV/AIDS (Sakthong 2007), breast cancer patients (Hwang and Wang 2004), and the elderly (Taboonpong et al. 2001). But its reliability and validity has not yet been well studied among Thai college students. Most studies, which tested the psychometric properties of WHOQOL-BREF, assessed the discriminant validity between “ill” and “well” subjects, not only to estimate the ability of WHOQOL in differentiating between “ill” and “well” individuals, but also in determining the “distances” of multi aspects (e.g., individual’s perception of health status, psychosocial status and other aspects of life) of “ill” people from “well” people. In other words, understanding and monitoring well people’s QoL may help researchers and clinical practitioners to tailor more specific and appropriate programs to people with certain health problems (WHOQOL Group 1994). Furthermore, although the WHOQOL-BREF was validated in a number of studies, there seemed to be some specific items that were not appropriate for young people. For example, in a study on the applicability of the WHOQOL-BREF on early adolescence, Chen et al. (2006) found two physical items, i.e., “To what extent do you feel that your pain hinders you in doing what you need to do?” and “Do you need medical treatment to cope with your daily life?” and one social relationships item, i.e., “Are you satisfied with your sex life?” that were not suitable for the early adolescence population in Taiwan. The findings of previous studies indicated that testing the reliability and validity of WHOQOL-BREF among healthy populations will ensure that health and clinical practitioners can carry out the more effective application of WHOQOL in helping their “ill” counterparts who suffer from acute and/or chronic health problems. Thus, the present study was aimed at examining the psychometric properties of the Thai version of WHOQOL-BREF among Thai college students.

2 Methods

2.1 Subjects

Convenience sampling recruited 407 Thai college students from one of the largest universities in Bangkok (mean age = 20.52, SD = 1.22, ages ranged from 17.94 to 25.09). This sample consisted of 177 males (43.6%) and 229 females (56.4%). Among this sample, 116 participants were freshmen (28.6%), 176 sophomores (43.3%), 73 juniors (18.0%), and 41 seniors or graduates (14.6%). One participant who did not respond to the question “How satisfied are you with yourself?” was excluded from the data analysis. Therefore, the final data set used for data analysis had 406 observations, and was complete. The study protocol was approved by the institutional review board of the university.

2.2 Instruments

This study used the Thai version of WHOQOL-BREF consisting of 26 standard items (WHOQOL Group 1998). The 26 items consisted of two generic items, i.e., overall QoL and general health, and 24 other items were classified into four domains, i.e. physical health (7 items), psychological health (6 items), social relationships (3 items), and environmental health (8 items). Each of the 24 items of the four domains derived from each of the 24 facets in the WHOQOL-100 to ensure a broad and comprehensive assessment (WHO 1996; WHOQOL Group 1998).

The two generic items ask about individual overall perception of QoL and individual general perception of participants’ health, respectively. The response options of these two items were scored in a positive direction. Likewise, the response options of 21 of the total 24 other items, in different domains, refer to a favorable direction (i.e., higher scores indicate higher QoL). For example, to respond to the question “How safe do you feel in your daily life?” the five response options are “1 = Not at all,” “2 = A little,” “3 = A moderate amount,” “4 = Very much,” and “5 = An extreme amount.” There are three items whose response options are listed in a negative direction. These three items are “To what extent do you feel that physical pain prevents you from doing what you need to do?” and “How much do you need any medical treatment to function in your daily life?” with endpoints “Not at all” and “An extreme amount”; and “How often do you have negative feelings such as blue mood, despair, anxiety, depression?” with endpoints “Never” and “Always.” They were reversely recoded during the data analysis. The domain scores were calculated by averaging the scores of items within each domain. The mean domain scores were then multiplied by 4 to make them comparable with those used in the WHOQOL-100.

2.3 Data Analysis

Two statistical programs were used to carry out the data analysis in this study, i.e., The Statistical Package for the Social Science (SPSS Windows version 15.0.1, released, Nov. 2006) (SPSS Inc. 2006) and LISREL 8.80 (released July 2006) (Mels 2006). Frequency analyses were performed to assess response distributions at the item level. Descriptive analyses including mean values, standard deviation (SD), skewness and kurtosis were calculated at both item and domain levels. Item analyses were performed to assess internal consistency, corrected item-total correlation, criterion-related validity, and item discrimination. Internal consistency was assessed using Cronbach’s alpha and the values of “Alpha if Item Deleted” were given. Corrected item-total correlation within each domain refers to the correlation between one item and the rest of the scale (the total of other items in the same domain). The use of corrected item-total correlation rather than item-total correlation was to avoid the spuriously inflated correlation. The low corrected item-total correlation indicates that the single item is not really measuring what the rest of the scale is trying to measure. Criterion-related validity was assessed by measuring the strength of the association of each item/corresponding domain with two generic items, i.e., overall QoL and general health, as two criteria using the Pearson correlation. Item discrimination refers to the ability of an item to separate the respondents on the basis of how well they know the material being tested. Floor (the proportion of respondents getting the lowest possible score) and ceiling (the proportion of participants getting the highest possible score) effects were reported and analyzed. In this study on QoL, item discrimination was used to determine how well each item differentiated between participants reporting higher scores of QoL and those reporting lower scores. The item discrimination was assessed by comparing the difference in mean scores of each item between the upper and lower 30% (approximately) groups with domain scores as criteria (Findley 1956). Multiple regressions were used to assess the association between four domains and overall QoL, general health, and QoL plus general health. Standardized coefficient and R 2 were reported. Exploratory factor analyses (EFA with Varimax rotation) were conducted to explore the factor structure. The confirmatory factor analysis (CFA) was performed to examine the construct validity with LISREL 8.80 (Mels 2006). Two-order CFA (Bollen 1989) was used to examine the hierarchical factor model of the WHOQOL-BREF with four domains as first-order latent variables and a whole QoL as second-order latent variables, which may influence four domains directly and observable variables (items) indirectly (Findley 1956). The path coefficients and their significances and a set of indices, such as χ2 goodness of fit 2GoF ), normed fit index (NFI), root mean square error of approximation (RMSEA), were used to determine the fit of data to the model.

3 Results

Table 1 presents distribution information about items and domains. The coefficients of skewness of all items and domains were between −1.0 and 1.0. The coefficients of kurtosis of 21 items and all domains fall between −1.0 and 1.0. Only three items had the kurtosis coefficients (1.16–1.28), which fall slightly out of the range from −1.0 to 1.0. Therefore, the scores of items and domains can be considered acceptable to be normally distributed. Mean scores of 26 single items ranged from 3.33 to 4.12 with SDs from 0.66 to 1.00 on the 1–5 Likert-type scale. Mean scores of domains were 15.13 ± 2.08 (physical health), 14.93 ± 2.18 (psychological health), 14.97 ± 2.41 (social relationship), and 14.08 ± 1.96 (environment) on a 4–20 scale. Both items’ and domains’ mean scores greater than the respective middle values of the scales indicated that college students, in this study, tended to consider their QoL toward the positive direction.

Table 1 Descriptive statistics (N = 406)

3.1 Internal Consistency

The level of internal consistency of the WHOQOL-BREF is displayed in Table 2. The values for Cronbach’s α (>0.70) for four domains were 0.75 (physical health), 0.82 (psychological), 0.73 (social relationships), and 0.79 (environment), respectively, indicating acceptable internal consistency. Based on the information on “Alpha if Item Deleted,” it indicated that the deletion of any item within each domain would not improve the internal consistency. All items of each domain had a reasonable correlation (0.36–0.70) with other items within the same domain in terms of the values of corrected item-domain correlation.

Table 2 Item analyses

3.2 Item Analyses

Although about one third of the items displayed high ceiling effects (>15%), the floor effects for all items were low (≤1%). Furthermore, the proportions for the highest possible scores of those items (except one item) with high ceiling effects were less than the proportions of the second or third highest score. Therefore, the results of the analyses on ceiling effect and the floor effect are acceptable. But the exception item “How much do you need any medical treatment to function in your daily life?” needs to be a concern, because it had a high ceiling effect and the proportion for the best option (47.0% on 5) was greater than other options (e.g., 26.4% on 4), indicating the distribution of this item greatly skewed to the positive endpoint (Table 1).

3.3 Validity

Tables 2 and 3 present the criterion-related validity of items and domains, respectively. All items/domain are significantly correlated (p < 0.01) with the generic items, i.e., Overall QoL (Q1) and general health (Q2). At the domain level, all domain scores were fairly to moderately correlated with Q1 (0.33 ≤ r ≤ 0.52, p < 0.01) and Q2 (0.39 ≤ r ≤ 0.50, p < 0.01). At the item level, all individual items were fairly to moderately correlated with Q1 (0.20 ≤ r ≤ 0.52, p < 0.01) and Q2 (0.20 ≤ r ≤ 0.45, p < 0.01) as well. The above results indicated that all items used in the scale and derived four domains exhibited reasonable criterion-related validity in terms of the same direction and decent magnitude of correlation with the two criteria.

Table 3 Association of domains with general facet items

In Table 3, the results of predictive validity are shown, which assess the ability of scale prediction scores on some criterion measure. Although there were significant correlations between the four domains and the two criterion items, the predictive effects of the four domains on Q1 and Q2 were different. Adjusting for other domains, the physical, psychological and environmental domains had significant predictive effects on overall QoL (p < 0.05, 0.001 and 0.001), whereas, only the physical and psychological domains had significantly predictive effects on the general health (p < 0.001 and 0.05) controlling for other domains. Furthermore, all three domains, except the social domain, predicted the combination of overall QoL and general health, significantly (p < 0.001). It was noticeable that the social domain did not predict either a single overall QoL/general health or the combination of overall QoL and general health adjusting for other domains. It is doubtful that the nonpredictive effect of the social domain caused a high multicollinearity due to the high correlations between the four domains (0.54 ≤ r ≤ 0.72, p < 0.01). Using the cut-off criterion of VIF ≥ 4 and the tolerance ≤0.20 to determine if an independent variable displayed “too much” multicollinearity, the VIF and TOLERANCE for the social domain did not exhibit a serious multicollinearity problem (VIF = 1.8 and Tolerance = 0.55). The multiple determination coefficients (R 2) of the three regression models were 0.34, 0.30 and 0.39, respectively. The above results indicate that the acceptable predictive validity of the domains demonstrated the criterion-related validly.

The t-test results of item discrimination are shown in Table 2. The difference in the scores of each item between the upper and lower 30% participants on the corresponding domains was highly significant (p < 0.001).

3.4 Factor Analyses

The WHOQOL-BREF has four domains as presented above. The EFA (principal axis factoring and Varimax rotation) showed five factors if using the criterion cut-off point of eigenvalues was greater than 1.0. Considering that the last eigenvalue was only 1.01, the last factor would be ignored in the analysis.

A hierarchical CFA (first level: four latent domain variables; and second level: a latent QoL variable) was performed to evaluate the factor model. The hypothesized model is presented in Fig. 1 where circles represent latent variables (factors) and rectangles represent measured variables (indicators). The latent variables consisted of two order latent variables; the first includes four factors (i.e., four domains) and the second has a common QoL factor. The four first order latent variables that may directly influence the observed variables may be influenced by the second order latent variable. The first model was evaluated without including the paths displayed using dotted lines (Fig. 1). The independence model testing the hypothesis that all variables were uncorrelated was rejected (χ2 df = 276, N = 406 = 12291.16, p < 0.001). The hypothesized model that was tested next was rejected, as well (χ 2df = 248, N = 406  = 746.09, p < 0.001), but the χ2 difference test indicated a significant improvement in fit between the independence model and the hypothesized model. Because the value of RMSEA was still high (REMSEA = 0.072 > 0.06), the model modifications were performed to explore possible improvement. The results of modification indices strongly suggested adding error covariance between items 3 and 4 and items 8 and 9. After setting the error covariance of items 3 and 4 free, and items 8 and 9 free, the model fit was further developed (χ2 df = 246, N = 406 = 552.69, p < 0.001). Other model fit indices also indicated a better fitting in comparison to the former model, e.g., RMSEA = 0.054 versus 0.072, NFI = 0.96 versus 0.94, non-normed fit index (NNFI) = 0.97 versus 0.96, comparative fit index (CFI) = 0.97 versus 0.96, root mean square residual (RMR) = 0.03 versus 0.04, standardized RMR = 0.048 versus 0.057. The fit indices indicated that the revised model was more parsimonious and acceptable. In this final model, all items had substantial factor loadings on corresponding factors (p < 0.001) and all first order factors had substantial loadings (p < 0.001) on the common factor (QoL) (Table 4).

Fig. 1
figure 1

WHOQOL-BREF: two order confirmatory factor model (see Table 4 for the unstandardized estimates). Dotted lines denote the set-free paths which were involved in the final model

Table 4 Unstandardized estimation of two-order CFA

4 Discussion and Conclusions

The purpose of this study was to examine the psychometric properties of the WHOQOL-BREF in a general population of Thai college students. The results showed that the instrument performs well for assessing the QoL of Thai college students, although some areas deserve further attention. The distribution of items showed that more than half of 26 items were at risk of the ceiling effect with participants’ responses skewing to the highest scores. This is understandable, because the sample of this study was from general college students who were not supposed to have serious health problems and had preferable responses to the items. It is reasonable to deduce that if this instrument is used assessing Thai college students with certain health problems, the ceiling effect should decrease. However, the item “Do you need medical treatment to cope with your daily life?” is a concern because of its response proportion of 47% to the highest score greater than the second highest score (26.4%). The modification of indices of LISREL strongly suggested adding an error covariance between items 3 and 4 causes a decrease of 122.6 in χ2. After setting paths free between those two items and two other items (items 17 and 18, discussed below), the model was greatly improved. Furthermore, items 3 and 4 only display 14% and 11% (R 2) contribution to the domain physical health, respectively. Considering the low possibility of depending on medical treatment and suffering from serious pain among general college students, items 3 and 4 need to be used and interpreted carefully, among college students. If items 3 and 4 are only used among general college students, they may be deleted because they are used for adolescents (Chen et al. 2006). But if WHOQOL-BREF is used for college students with some health problems, they might work well in measuring the decrease of the QoL due to limitation of physical health. In addition to items 3 and 4, LISREL gave a strong incentive for adding an error covariance between items 8 and 9 with an extra decrease of 68.1 in χ2. As mentioned above, the fit of the model was improved after adding an error covariance between items 3 and 4, and items 8 and 9. It indicated that because the students were living in a relatively single dimensional environment, i.e., campus, their beliefs about the safety and health around them might be more homogenous. Therefore, in administering the WHOQOL-BREF among college students the risk of collinearity between items 8 and 9 need to be of concern.

Unlike some studies conducted in Asian countries (Wang et al. 2005; Yoo et al. 2005; Chen et al. 2006) where young people usually had a low response rate on some sensitive sex-related questions, all Thai college students in this study responded to item 21 “How satisfied are you with your sex life?” That indicated that emphasizing the confidentiality of the survey and providing sufficient private rooms for filling out the questionnaire is very critical for Asian young people to respond comfortably to sensitive sex-related questions.

Discriminant validity, construct validity, and criterion-related validity of Thai version WHOQOL were tested in this study. The Thai version WHOQOL showed good discriminant validity by comparing the mean scores of items between the upper and lower 30% (approximately) groups of respondents with corresponding domain scores as criteria. The results were consistent with previous studies, although discriminant validity was assessed in different ways (WHO 1998; Skevington et al. 2004a; Sakthong 2007). Construct validity was assessed using CFA and was acceptable in other studies among different countries (WHO 1998; Skevington et al. 2004a, b), patients (Sakthong 2007), and general populations (Bonomi et al. 2000; Chen et al. 2006). Criterion-related validity was assessed by comparing the association of mean scores of each item/domain scores with the single item of the overall QoL. The results of validity indicated that the Thai version of WHOQOL is well validated among Thai college students. Cronbach’s Alpha values for the four domain scores ranged from 0.73 to 0.82, demonstrating acceptable internal consistency. The relatively low Cronbach’s alpha from the social relationships domain should be read carefully, because it was only based on three item scores rather than four, which is generally recommended for assessing internal reliability (DeVellis 2003).

There are some limitations in this study. First, the convenience sampling may limit the generalization of the findings to all Thai college students. Although the size of this study was decent (n = 406), all of the participants were recruited from a single university. Thus, the representatives of this study sample may be limited, to some extent. Second, this study was only based on general college students without including college students with certain health problems. That, too, limited the generalization of this finding from representing the whole college student body. Third, because of limited time, the test-retest reliability was not conducted.

Although studies with the WHOQOL-BREF among patients and the elderly have been widely conducted, studies among college students are sparse. In summary, the present study provides support for using the WHOQOL-BREF on Thai general college students. This instrument appears to have decent item characteristics, good reliability, discriminant validity, construct validity, and criterion-related validity and, therefore, is an appropriate instrument for assessing the QoL of Thai general college students.