Introduction

The prevalence of mental disorders among people aged 12–24 years ranges from 8 to 57%, according to different studies, and it is considered that an average of one in every five young people will suffer from some type of mental health problem [1]. Some psychological disorders that begin in adolescence persist into adulthood, but they may improve if treated early [2].

The General Health Questionnaire (GHQ) is a self-administered questionnaire to assess psychiatric disorders [3]. It was designed by Goldberg to be used in non-psychiatric clinical settings [4]. Nevertheless, the GHQ is also widely used as a health questionnaire in healthy community settings [3]. The shortest version consists of 12 items (GHQ-12) and has shown satisfactory validity and reliability in several studies [3, 58].

Although the GHQ was originally aimed at the adult population, it has also been used with success in adolescents [914]. French and Tait [10] showed that children aged 11–15 years interpret the GHQ in a manner similar to adults.

The GHQ was designed originally as a unidimensional scale. However, several papers have explored its possible dimensions. Even Goldberg himself developed a scaled version of the GHQ. According to him “it was intended for studies in which an investigator requires more information than is provided by a single severity score”. Since then various studies have shown different dimensions in the structure of the GHQ versions and, in particular, in the GHQ-12 [6, 7, 1316]. The three-dimensional model proposed by Graetz consists of the following dimensions: anxiety and depression (which includes items 2, 5, 6, and 9), anhedonia and social dysfunction (items 1, 3, 4, 7, 8, and 12) and loss of confidence or self-esteem (items10 and 11) and is usually considered to be the most appropriate [9, 10, 1619]. However, there is no consensus as to the validity or usefulness of the different dimensions.

Little information is available on the psychometric characteristics of the GHQ-12 in adolescents from Southern Europe. In Spain, one study has assessed its internal consistency and explored its dimensions [13]; to date, however, no confirmatory factor analysis (CFA) has been conducted to evaluate the factor structure of the GHQ-12 and to assess how well it fits with different theory-driven models formulated a priori. The objective of this study was to evaluate the factor structure of the GHQ-12 by using CFA in a population of Spanish adolescents.

Methods

Study design and population

The study data were drawn from the Non-Communicable Diseases Risk Factor Surveillance System (Sistema de Vigilancia de Factores de Riesgo de Enfermedades No Transmisibles-SIVFRENT) targeting the adolescent population. This system monitors the principal health-related lifestyles in a representative sample of the population in the 4 year of Compulsory Secondary Education in the region of Madrid [20].

The study participants were selected through two-stage cluster sampling, with the first-stage units (schools) stratified by area (city of Madrid and other municipalities) and type of school (public or private). The questionnaire was self-administered in the classroom under the guidance of experienced field workers. The participation of schools and students was voluntary, after obtaining informed consent. For this study, we analyzed data from 4,146 adolescents that completed the survey in 2008 and 2009 in 94 schools, which included 185 classrooms.

The response rate was 81.3% for all the selected schools and 90.9% for students within the schools (8.3% were absent on the day of the survey, 0.08% refused to participate, and 0.7% questionnaires were removed due to inconsistent responses). Thus, the overall study response (schools and students) was 73.9%. We collected data from 4,210 students. There was some relevant information missing from 1.5% of the questionnaires. The data from those students were excluded, so that the final analytical sample consisted of 4,146 individuals.

Questionnaire

The questionnaire included the GHQ-12, validated in Spanish by Lobo and Muñoz [3]. The GHQ-12 items can be classified as either positively or negatively worded. An item is considered to be positively worded (6 items) if it refers to health. The response options range from “More than usual” to “Much less than usual.” Items that refer to disease are considered to be negatively worded (6 items). The response options range from “Not at all” to “Much more than usual.” Responses are assigned to a 4-point Likert scale. The score for each response on the GHQ-12 ranges from 0 to 3, with the overall score ranging from 0 to 36. A cut-point of 12 or higher was used to classify subjects as “cases” [5].

Statistical analysis

We first calculated the distribution of the scores on each question and on the overall scale, estimating the percentage of individuals with the highest scores (ceiling effect) and the lowest scores (floor effect). Reliability or internal consistency was evaluated by the Cronbach’s alpha coefficient. EFA was subsequently conducted using the method of principal components analysis. The criteria for selecting the number of factors were the eigenvalues (minimum value close to 1) and their graphic representation (scree plot). An oblique rotation (promax = 2) was used to identify and characterize these factors. Although the Promax-rotation is similar to the Oblimin-rotation, we preferred the Promax-rotation because it is computationally faster for large datasets [21]. To assign an item to a particular factor, we selected factor loadings equal to or greater than 0.4 [22].

A CFA was then performed with four factor structure models: (1) unidimensional; (2) two-dimensional: positively and negatively worded items; (3) three-dimensional, as proposed by Graetz-[16]: anxiety and depression (four negative items: 2, 5, 6, and 9), anhedonia and social dysfunction (six positive items: 1, 3, 4, 7, 8, and 12), and loss of confidence or self-esteem (two negative items: 10, 11); 4) dimensionality obtained from the EFA.

We conducted an EFA before the CFA for various reasons. First, because few EFA have been carried on adolescents [6, 13, 14] and only one of them was done in Spain [13]. Second, because in most CFA in adolescents, the three-dimensional model proposed by Graetz [16] showed the best fit to the data; that model, however, was derived from a study in an adult population. Given that CFA only assesses models specified a priori, we decided to test a model obtained in our study population.

The selection of the rest of the models was based on a literature review. The model most strongly supported by studies with CFA is that of Graetz [9, 10, 1619]. The unidimensional model was also used because the GHQ-12 was designed as an overall scale; in addition, some authors who identified several dimensions with the CFA recommended using a unidimensional model to obtain a uniform score [9, 10]. Other authors have assessed the existence of two dimensions, one with the positively worded items and the other with the negatively worded items [23], and there has even been a study of possible response bias for the negatively worded items [24]. Lastly, French and Tait [10] also suggested that a two-dimensional model of positive and negatively worded items can be obtained by grouping the first and third factors in the Graetz [16] model, which they found to be highly correlated.

The CFA parameters were estimated using the generalized weighted least squares method using asymptotic covariance matrix. The use of the asymptotic matrix relaxes the hypothesis on the distribution of the observed variables and improves the fit and the statistics used in the tests [25]. To evaluate the models, we considered various indicators of goodness-of-fit. The Root Mean Square Error of Approximation (RMSEA) measures the discrepancy between the observed covariance matrix and the one adjusted by the model [26]. An RMSEA of less than 0.05 is considered to indicate a good model fit, and an RMSEA of less than 0.08 is considered a reasonable fit. The Comparative Fit Index (CFI) and the Goodness-of-Fit Index (GFI) evaluate the improvement in the fit in comparison with a simpler model [27], generally one that assumes independence among the items. These indices vary between 0 and 1, and values over 0.95 are considered acceptable [28]. The Expected Cross-Validation Index (ECVI) also compares models [29], with a lower value indicating better model fit. The Akaike Information Criterion (AIC), which penalizes the likelihood function based on the number of parameters in the model, is used in a similar way. A model with lower AIC is considered more plausible than one with a higher AIC. Lastly, for the best fitting model, we also assessed the standard errors and t values of the factor loadings.

The descriptive analyses and the EFA were performed with SPSS 16.0 for Windows (2008 SPSS Inc., Chicago, IL, USA). For the CFA, LISREL 8.8 for Windows (Scientific Software International Inc., Lincolnwood, IL, USA) was used.

Results

The mean age of the study participants was 16.3 years (range 13.7–18.7 years) and 51.6% were women.

The mean GHQ-12 score was 10.9 (SD = 5.3). Taking greater than or equal to 12 as the cut-point, 38.5% of participants were considered as cases. Table 1 shows the mean score for each item and the percentage of responses in each response category. The items with the highest mean scores were 1, 2, and 5; of these, item 5 “Have you felt constantly under strain?” was notable for having the highest score. About 0.2% of the sample had zero points on the overall scale, and only one person had the maximum score.

Table 1 Mean and standard deviation of replies to each item in the GHQ-12, percentage of subjects classified as cases, and score distribution for each item

The Cronbach’s alpha for internal consistency of the GHQ-12 was 0.82. Table 2 shows the correlation of each item with the overall scale, and the Cronbach’s alpha after eliminating the corresponding item. Item 4 “Have you felt capable of making decisions about things?” had the lowest correlation coefficient (0.358). For the rest of the items, the correlations with the overall scale ranged between 0.492 and 0.742. The Cronbach’s alpha did not change substantially when eliminating the corresponding item.

Table 2 Correlation between items and overall GHQ-12 scale

The EFA suggests a three-dimensional structure. All the factor loadings that permit assignment of an item to a specific factor exceeded 0.4 (Table 3). The first factor includes items 1, 2, 5, 6, 7, and 9 and explains 34.7% of the variance. The second factor includes items 3, 10, and 11 and explains 10.9% of the variance. And the third one includes items 4, 8, and 12 and explains 8.2% of the variance. The three factors together explain 53.7% of the variance.

Table 3 Exploratory factor analysis of the GHQ-12

In the CFA, all the models had an RMSEA of less than 0.08 and the values for the rest of the indicators were acceptable, but only Graetz’s three-dimensional model [16] had an RMSEA of less than 0.06 (Table 4). In general, the best results for all the goodness-of-fit indices were obtained with this model. The second best goodness-of-fit indices were obtained with the model derived from the EFA, whereas the unidimensional model yielded the worst values. For Graetz’s three-dimensional model, the t values for the factor loadings were statistically significant (Table 5).

Table 4 Confirmatory factor analysis of the GHQ-12
Table 5 Confirmatory factor analysis of Graetz’s model. Factor loadings for each item

The factors in the three-dimensional model proposed by Graetz [16] were highly correlated (Fig. 1). The correlation was 0.84 between the first and third factor, 0.74 between the second and third factor, and 0.72 between the first and second factor. Given the high correlation between-factor 1 and 3, a nested model with two factors was also fitted. The Chi-square difference was 314.3 with 2 degrees of freedom, so that the model with three factors was preferred. Similar results were obtained when the model was compared with other nested models.

Fig. 1
figure 1

Flow diagram of the three-dimensional model proposed by Graetz for the GHQ-12. Presented from left to right: variance of item not explained by its factor, correlations of items with their corresponding factor, and correlations between factors

Discussion

In Spanish adolescents, our results show that the GHQ-12 has a high internal consistency and that the three-factor structure proposed by Graetz [16] yields the best fit in the CFA. However, the high between-factor correlation observed may question the use of the GHQ-12 beyond a unidimensional scale.

The satisfactory value that we obtained for the Cronbach’s alpha is consistent with the values in the range 0.81–0.88 reported in other healthy adolescent populations [6, 10, 11, 13, 14, 30]. We did not calculate the Cronbach’s alpha of the different subscales because its use is controversial when the measurement scale is composed by very few items. This is the case with factors 2 and 3 of the Exploratory Factor Analysis, because each of them comprises only 3 items, which is considered to be the minimum number for a scale aiming to explore a factor [31]. Moreover, we considered that the reliability of one test, assessed with the Cronbach’s alpha, is meaningful only when it is composed by a number of items combined to represent a global single summative score, and this is not the case of our subscales. The mean GHQ-12 score is also similar to that found in other studies with non-clinical adolescent populations [10, 14]. To our knowledge, most of the articles that assessed the validity and/or reliability of the GHQ-12 in adolescents are based in healthy community samples, such as ours. The ceiling/floor effects of the overall scale were small.

A three-dimensional structure of the GHQ-12 was identified with EFA. The first dimension includes items 1, 2, 5, 6, 7, and 9. The second includes items 3, 10, and 11 and coincides with Graetz’s [16] “loss of confidence” dimension plus item 3. The third dimension includes items 4, 8, and 12. This three-dimensional model explains 53.7% of the variance. Lopez-Castedo and Fernandez [13] obtained a two-dimensional model in an EFA of an adolescent population in Spain. The first factor (anxiety) included items 1, 2, 5, 6, 9, 10, and 11, and the second one (social dysfunction) included items 3, 4, 7, 8, and 12. This model is similar to the one obtained in our study. With the exception of items 3 and 7, our first two dimensions are the same as the “anxiety” dimension described by Lopez-Castedo and Fernandez [13]. However, their two-dimensional model explained less variance (37.8%) than ours did.

Politi et al. [6], in a study of 18-year-old men, identified two dimensions that explained 46.7% of the total variance. These were “general dysphoria”, which included items 2, 5, 6, 9, 10, 11, and 12, and “social dysfunction” which included items 1, 3, 4, 7, 8, and 12. This model is also similar to the model of Lopez-Castedo and Fernandez [13].

Li et al. [14] identified two factors in Chinese adolescents that explained 53.3% of the variance. The first included items 1, 3, 4, 7, 8, and 12, and the second included 2, 5, 6, 9, 10, and 11. Their model classified the items as positive or negative and coincides with the model of Politi et al. [6]. With the exception of items 1, 3, and 7, our first two dimensions are the same as their second factor.

Few studies in adolescents have examined the GHQ-12 with CFA [9, 10, 14]. As far as we know, ours is the first such study in southern Europe. In the study of Li et al. [14], the models with the best fit were the two-dimensional model with 8 items, described by Kalliath et al. [32], followed by the three-dimensional model of Worsley and Gribbin [33] and Graetz’s three-dimensional model with 12 items [16]. In our study, Graetz’s model yielded the best fit in the CFA [16]. The results of the modification indices for the matrix of factor loadings and the measurement error covariance matrix showed that there were no additional paths that would significantly improve the fit of the hypothesized structural model.

This result is consistent with other studies in adolescents [9, 10], but the validity and usefulness of the different dimensions are subject to debate. The high between-factor correlations in the CFA, which have also been assessed in other studies [9, 10], suggests they are not independent and can be explained by signs and symptoms common to all the factors. For example, anxiety and depression can cause insomnia. In turn, depression can lead to lower self-esteem and social functioning. Furthermore, chronic stress promotes depression. The blurred boundaries between the factors of psychological distress, together with the statistical evidence of high correlations between the dimensions, would support the use of the GHQ-12 only to obtain an overall score, without assigning too much value to the different dimensions.

Although a possible response bias between the positively and negatively worded statements has been investigated, we know of no study, including ours, in which the two-dimensional model of positively and negatively phrased items yields the best fit in the CFA. French and Tait [10] in a CFA with adults found that the model with the best fit was Graetz’s three-dimensional model [16], but the correlations between the first and third dimensions (negatively phrased statements) were much higher than with the second dimension (positively phrased statements). For this reason, they suggested the possibility of a bias in the negatively worded items. But in our study, the correlations were only slightly higher between the first and third factor. Cheung [17] in an older population, obtained even higher correlations between the second and third factors than between the first and second. However, we find interesting that this important issue related with the possible response bias, which is beyond our current objectives, should be evaluated thoroughly in further analysis of the data.

In our study, the model derived from the EFA did not yield the best fit in the CFA. This is not uncommon [34] because the possible sources of model misfit are not in the EFA. In CFA, it is usually assumed that each item loads on only one factor, whereas in EFA each item may depend on all the common factors, making it difficult to interpret the results. Furthermore, EFA does not allow evaluation of the model because identification of the factors is based on arbitrary cut-points, and it is not possible to use goodness-of-fit criteria. Li et al. [14] obtained similar results in the context of the GHQ-12.

Our study has some limitations. Specifically, although the response rate was quite high, it is possible that student absence on the day of the survey was related to some health problem that could affect the GHQ-12. However, given that only 8.3% of the students were absent during the survey, it could have only a modest impact on the results of the psychometric characteristics of the GHQ-12. The main strength of the study was its large size and the fact that it was representative of all students in upper Secondary Education in Madrid. This allowed considerable variation in the GHQ-12 score, so that the results can be extrapolated to similar populations.

Conclusions

We conclude that, in an adolescent population from Southern Europe, the GHQ-12 has shown good psychometric characteristics. Also the factor structure that best fitted the data in CFA was the Graetz three-dimensional model [16]. However, the high correlations observed between factors suggest that the GHQ-12 should be used as a unidimensional scale, as currently done.