Introduction

Several recent studies in Afghanistan have demonstrated very high levels of depressive and anxiety symptoms among the general population, especially in women [1, 22, 36]. These studies, as do many others in the aftermath of humanitarian emergencies in low income countries, used brief questionnaires administered by laypersons, to obtain a quick impression of the mental health status of the population. The authors were involved in one of these studies, a cross-sectional survey in which the Hopkins Symptom Checklist-25 (HSCL-25) was used [36]. The decision to use this instrument was pragmatic—people who are not trained mental health professionals can be easily instructed, and little time is lost to administration. These instruments were, however, not originally designed to distinguish between mental disorders and normal reactions to severe environmental stress, and have not been validated for use in Afghanistan. It remains unclear to what extent they can be used to estimate the prevalence of mental disorders in this context [4]. Therefore we felt the need to conduct this additional research. In this study, which should ideally have been done before the mentioned cross-sectional survey, we assessed the psychometric properties of the Hopkins Symptom Checklist-25 (HSCL-25) and of another frequently used brief, lay-administered mental health questionnaires, the Self-Reporting Questionnaire-20 (SRQ-20), to detect persons with psychiatric disorders in a primary care setting in Eastern Afghanistan. As a gold standard we used a structured psychiatric interview conducted by a trained clinician from the same culture as the respondents, which is a procedure widely used in instrument validation in western and non-western populations [2, 5, 10, 11, 15, 19, 21, 25, 31, 33].

Methods

The study site and population

Nangarhar province in Eastern Afghanistan is ethnically dominated by the Pashtun, the largest and most conservative of Afghanistan’s ethnic groups. The province is relatively well off economically due to its strategic location, near the Khyber Pass that connects Afghanistan with Pakistan. During the time of the Russian occupation of Afghanistan (1979–1987) the province was the scene of heavy resistance by mujahedeen forces. After the fall of the Soviet backed communist government, a prolonged fight for power broke out between several mujahedeen factions. During this period hundred of thousands of residents fled the ongoing destruction in the capital Kabul and were given temporary shelter in huge camps in Nangarhar. The period of Taliban government (1996–2001) was relatively peaceful but was characterized by considerable human rights violations and severe restrictions on the rights of women. During the US led attack on the Taliban in the fall of 2001, some parts of the province where training camps of Al Qaida were suspected (such as in the mountain region of Tora Bora) suffered from heavy bombardments. After the installation of the new government, attacks by insurgents on government, aid-organizations, and women’s organizations continued.

Study design

In June 2004 we assessed five rural basic health centers in Nangarhar Province run by HealthNet TPO, a non-governmental organization specialized in health care and psychosocial assistance in post conflict areas. In each health facility a sample of persons older than 15 years was drawn from the registration book in which each patient’s data has to be entered before being seen by the health care staff. Sampling ratios at the health centers differed due to variations in the number of people attending (average sampling ratio was 1:3). The purpose of the study was explained to each potential participant by the local study coordinator (NSR). The literacy rate of the Afghan population is very low: 28.1% for ages 15 and older in 2004 [38]. Therefore informed consent was obtained from each respondent by reading aloud an explanatory text and then asking for participation. Using verbal instead of written consent is often a necessity when conducting research in low-income countries with high rates of illiteracy and a high level of distrust toward signing documents.

The SRQ-20 and HSCL-25 were administered by trained lay interviewers of the same sex as the participant. Subsequently a mental health professional held a clinical interview with each participant, on the site. The interviewer used a semi-structured clinical psychiatric interview, which contained no information about the scores on the HSCL-25 and SRQ-20. The mental health professionals were all male and therefore accompanied by a female chaperone for those female participants who did not want to talk to a male in a one-to-one situation. This survey formed part of a larger research project for which formal review and approval has been given by the medical ethical committee of the University of Amsterdam, Amsterdam, the Netherlands. Local program directors, their boards, and local authorities approved the research procedures, which were consistent with the Declaration of Helsinki [44].

Instruments

Hopkins Symptom Checklist-25 (HSCL-25)

The Hopkins Symptom Checklist-25, derived from the 90-item Symptom Checklist (SCL-90) [8], is a screening tool designed to detect symptoms of anxiety and depression. It is composed of a 10-item subscale for anxiety and a 15-item subscale for depression, with each item scored on a Likert scale from 1 (not at all) to 4 (extremely). The period of reference is the past month. Originally developed as a self-report symptom inventory it is often used as an interviewer administered scale in settings with non-literate populations. A cut-off point of 1.75 was found to be optimal in the UK [42]. In a clinic sample of Indochinese refugees in the USA this finding was replicated and this cut-off point became widely accepted in refugee settings and in cross-cultural research [25, 26]. Few validation studies of the HSCL-25 with non-western populations have been done. For Vietnamese refugees in the USA sensitivity and specificity for detection of DSM-III major depression were estimated as 88% and 73%, based on a cut-off score of 1.75 for ‘caseness’ [15]. In a population of human immuno-deficiency virus (HIV) positive pregnant women in Tanzania a significantly lower value (1.06 with sensitivity 89% and specificity 80%) was found to be the optimum cut-off point [19]. In Nepal, Thapa and Hauff [37] calculated a sensitivity of 87% and specificity of 60% against DSM-IV mild depression diagnosed using the Composite International Diagnostic Interview (CIDI; area under the curve of 0.79).

Self-Reporting Questionnaire-20 (SRQ-20)

The Self-Reporting Questionnaire-20 items (SRQ-20) was developed by the World Health Organization (WHO) as a screening tool for common mental disorders in primary health care settings, especially in developing countries [13]. When patients are literate it can be self-administered; but in developing countries it is usually administered by lay interviewers. The instrument consists of 20 yes/no questions about common mental health symptoms such as anxiety, depressive symptoms, and psychosomatic complaints. The SRQ-20 has been used in numerous settings [43]. Cut-off points vary considerably depending on setting (community, primary care, hospital) and culture. A cut-off point of 8 is widely used [14]. Among primary care attenders in India the most appropriate cut-off score was found to be 12 [43]. In a community sample in the Punjab province of Pakistan, a validation study led to a cut-off score of 9 [35]. As far as we know, no validation study for Afghanistan has been performed. The SRQ was administered among Afghan refugees in Pakistan using a cut-off point of 13; however, this value was not empirically validated among the study population [32].

Psychiatric Assessment Schedule (PAS)

A semi-structured psychiatric interview was conducted using the Psychiatric Assessment Schedule. This instrument uses selected questions from the Present State Examination (PSE [41]) as screening items, subsequently followed by the appropriate ICD-10 research diagnostic criteria. The PAS assesses systematically depressive disorders, anxiety disorders (obsessive-compulsive disorder, panic disorder, agoraphobia, social phobia, specific phobia, generalized anxiety disorder) and somatization disorder. The instrument has never been used in Afghanistan but is well known in the region through its use in various areas in neighboring Pakistan [16, 2729, 34]. The original English version was expanded with a section for post-traumatic stress disorder (PTSD). This was done within the format of the original instrument, during a workshop of the Afghan research team and the team from Rawalpindi Psychiatric Institute that had earlier developed Urdu and Chitrali PAS versions for Pakistan.

Instrument translation

All instruments were translated into Pashto as spoken in Nangarhar province. The HSCL-25 was translated in 2002, in preparation of the epidemiological survey [36], following the principles described by van Ommeren et al. [39] which include translation to Pashto and back translation to English by separate groups of bilingual clinicians. Discrepancies between the various translations were subjected to a panel discussion with the involved clinicians and translators. Each item was then examined to ensure face validity and a transfer of conceptual meaning. Final changes to the Pashto versions were made after field-testing that included focus group discussions. The SRQ-20 was translated according to the same thorough guidelines in 2003, and was subsequently used in training activities for primary health care staff.

Training of study staff

A team of two male and two female interviewers who had no background in mental health care administered the HCSL-25 and SRQ-20. Two of the interviewers had also participated in the earlier epidemiological study [36]. They received a 3-day training in Jalalabad in administering the instrument (elements of the training: ‘explanation and discussion of each item’, ‘learning how to interview with role playing and real patients’, and a ‘1-day field test’). The psychiatric interviews were carried out by five mental health professionals (subsequently referred to as ‘MHPs’) who were all native Pashto speakers. Three were Afghan medical doctors (NRS, HF, and RN) with extensive experience as mental health supervisors in the primary mental health care project of HealthNet TPO in Nangarhar; two were psychiatrists from Peshawar, Pakistan. A 4-day training in the use of the PAS was organized in the Institute of Psychiatry in Rawalpindi, Pakistan by two senior Pakistani psychiatrists and a Dutch psychiatrist (PV). The training consisted of detailed discussions of the different items of the instrument, (using the conceptual definitions of symptoms in the glossary of the PSE as a reference point), role-playing, and group interviewing of typical patients. Subsequently the inter-rater reliability among the mental health professionals was assessed through 52 independent interviews using 12 different Pashto-speaking patients from Lady Reading Hospital Postgraduate Medical Institute in Peshawar, Pakistan. The level of agreement among raters was high: 96% for the main diagnosis, and 78% for second diagnoses. Cohen’s kappa’s for diagnoses were calculated for different combinations of raters, and ranged from 0.76 to 1.00.

Statistical analysis

Factor analyses were performed to check the cross-cultural measurement equivalence of the HSCL-25 and the SRQ-20. Measurement equivalence is based on the concept of construct or theoretical validity, which is defined as the correlation of an observed variable with some theoretical construct (latent variable) of interest [18]. Measurement equivalence refers to the equivalence of theoretical validities across populations. Evidence for measurement equivalence is a psychometric prerequisite for the comparison of prevalence rates or mean scores of (sub)scales [7].

Factor analyses with principal axing factoring extraction were performed to uncover the covariances between items (e.g., latent constructs). To facilitate interpretation of the factor structures varimax rotations were performed on the initial factor solutions. The revealed factor structures were visually inspected.

We used independent t-tests to compare the continuous instrument scores between men and women. For the comparison of the number of psychiatric diagnoses between men and women, the Pearson Chi-square statistic was used. The Pearson’s correlation coefficient was calculated to examine the relationship between the HSCL-25 and SRQ-20 scales.

Receiver operating characteristic (ROC) curves were used to explore the optimal cut-off scores for the HSCL-25 and the SRQ-20 in the current sample with the psychiatric diagnoses as established with the PAS as the gold standard. ROC curves plot the sensitivity (on the y-axis) against 1—specificity (on the x-axis) of each possible cut-off point. Each ROC curve is characterized by an area under the curve (AUC) indicating the overall accuracy of the questionnaire (over the whole range of possible cut-off points to distinguish correctly between a case and a non-case. The AUC can range from 0.0 to 1.0. An AUC of 1.0 indicates a perfect prediction, and 0.5 indicates that the ability of a questionnaire to correctly identify a case is equal to chance prediction. The AUC is used to compare the validity of the two screeners over the total range of scores. Calculations are based on the empirically derived vales and not on interpolations from the binormal distribution. In addition, sensitivity, specificity, and AUC were calculated for the HSCL-25 scale and its subscales for depressive disorders and anxiety disorder, and the SRQ-20 scale.

Differences were considered statistically significant when the two-sided P-value < 0.05. To test whether the criterion value of the instrument at hand exceeded chance level (AUC > 0.5), a one-side P-value was used.

Results

Socio demographic data

The sample was composed of 116 patients (53 men; 63 women) visiting a health center. The male respondents ranged in age from 18 to 80 year-old with a mean of 33 years (SD = 14.8). Women respondents ranged in age between 17 and 57 years with a mean of 29 years (SD = 9.3). The sample differed in several aspects from the cross-sectional population sample of the epidemiological study [36]. The current sample consisted of health care users in the eastern part of the province where HealthNet TPO supports the governmental health services. Our sample is ethnically more homogeneous (100% Pashtun compared to 92% in the epidemiological survey), more rural, less well educated, and more often unemployed. Unsurprisingly, male and female respondents differed on some background variables such as educational level and occupation. On all other variables no statistically significant differences were found (see Table 1).

Table 1 Socio-demographic characteristics of the study sample (N = 116)

Outcomes on HSCL-25, SRQ-20, and PAS

Factor analysis of the HSCL-25 items (principal axing factoring extraction and varimax rotation) revealed a two-factor model (factors ‘depression’ and ‘anxiety’) that explained 48% of the items variances. A factor analysis of the SRQ-20 items revealed a two-factor model (factors ‘common disorders’ and ‘social disability’) that explained 39% of the items variances. The factor structure of the items of both instruments agreed with the factor structures reported in the literature. Thus we found support for the measurement equivalence of the HSCL-25 and the SRQ-20.

Table 2 presents results of the SRQ-20 and HSCL-25 including relevant subscales. We used independent t-tests to compare the continuous instrument scores between men and women. For the comparison of the number of psychiatric diagnoses between men and women, the Pearson Chi-square statistic was used. Mean scores on the HSCL-25 were 2.07 (SD = 0.62) for the total scale with significant gender differences (men: mean = 1.59; women: mean = 2.47; P < 0.001). For the SRQ-20 the mean for endorsed items was 16 (out of 20) for women and 9 for men. The correlation between the two screening instruments was 0.82, indicating a high overlap between them. Table 3 gives an overview of the diagnosis with the PAS. Many men (24/53 = 43%) and many women (38/63 = 60%) had a common mental disorder, mainly depressive and anxiety disorders.

Table 2 Outcomes on screening instruments HSCL-25, SRQ-20 for both males and females
Table 3 Clinical psychiatric diagnoses with PAS for both males and females

HSCL-25 and SRQ-20 as screener for psychopathology

The primary analysis focused on the ability of the HSCL-25 and SRQ-20 to detect any psychiatric disorder. Both HSCL-25 and SRQ-20 performed moderately; the area under the curve measured 0.73 for the HSCL-25 and 0.72 for the SRQ-20 (see Table 4, Figs. 1, 2). When analyzed separately for men and women the HSCL-25 showed a tendency to perform better in men (AUC 0.78) than in women (AUC 0.67). Additional separate analyses were also made for the HSCL-25 depression subscale to detect any depressive disorder, and HSCL-25 anxiety subscale to detect any anxiety disorder. For men, the SRQ-20 scale performed moderately (AUC 0.74), while the HSCL-25 depression and anxiety scales did reasonably well (respectively AUC 0.79, AUC 0.81). For women the AUCs for the HSCL-25 subscales are low (both AUC 0.65). For all analyses, except for that of the HSCL-25 anxiety scale (P = 0.05), the scales still differ significantly from chance level (AUC 0.5). Table 5 gives the sensitivity, specificity, positive and negative predictive values, kappa’s, and percent agreement of the scales with different cut-off points. In evaluating the HSCL-25 and SRQ-20 as potential screeners for psychopathology the most appropriate cut-off points is a trade-off between a high sensitivity with an acceptable specificity. Depending on the purpose of the instrument (here used as a screening instrument) the optimal cut-off point is the one with a high rate of sensitivity and good specificity. For the HSCL-25 this optimal cut-off was 2.00 (sensitivity 0.69, specificity 0.67). However, when men and women were analyzed separately the cut-off point for men had to be lowered to 1.50; that for women had to be elevated to 2.25. The optimum cut-off point for the SRQ-20 was 10 for men and 17 for women.

Table 4 Properties of HSCL-25 and SRQ-20 to detect psychopathology as measured by clinical interview with PAS
Fig. 1
figure 1

ROC curve of the classification of any disorder for the SRQ-20. The solid line displays the curve found for male subjects, the dashed line displays the curve found for female subjects

Fig. 2
figure 2

ROC curve of the classification of any disorder for the HSCL-25. The solid line displays the curve found for male subjects, the dashed line displays the curve found for female subjects

Table 5 Properties of HSCL and SRQ-20 with different cut-off points

Discussion

The aim of the study was to assess whether frequently used, easy to administer questionnaires were able to detect clinical cases of anxiety and depression. It proved difficult to identify an optimal cut-off point for the HSCL-25 or SRQ-20 with both a reasonable positive predictive value and a satisfactory degree of sensitivity. Our study corroborates the conclusions of a study with the HSCL-25 with Afghan refugees in Japan where the usual cut-off point of the HSCL-25 would have led to overestimation of the rate of depression among the respondents and the authors conclude that the usual cut-off points would not even have been acceptable for screening purposes [17]. The implication of our findings is that inferences made from checklist research with the HSCL-25 or SRQ-20 in Afghanistan need to be interpreted with caution. This again highlights the need to calibrate and evaluate screening instruments in cross-cultural epidemiological research.

Some might argue that the use of an ICD-10 derived diagnosis in itself lacks cultural validity. International and presumably ‘universal’ diagnostic categories such as the ICD-10 and DSM-IV are to a large extent embedded in the culturally bound Euro-American psychiatric conceptualization of mental disorders and thus might not capture culturally unique patterns of distress [9]. Our goal was to assess the accuracy of screening questionnaires to detect mental disorders as defined by the IDC-10. We have not assessed the cross-cultural validity of the ICD-10 classification itself. Such an undertaking would have required a different type of research based on in-depth ethnographic fieldwork, which was not feasible in the context of the current research. However, we feel that a diagnosis made by well-trained clinicians, being ethnic Pashtuns themselves and familiar with the cultural ways of expressing distress among Pashtun tribesmen, guarantees a minimal level of cultural competency in the clinical assessment procedure.

An interesting finding is the differences between men and women with regard to the predictive value of scores on the questionnaires. The HSCL-25 and its subscales showed a tendency to perform worse for women than for men, while this could not be observed for the SRQ-20. A possible explanation is that the HSCL-25 compared to the SRQ-20 has fewer questions about somatic equivalents of mental distress (such as ‘Is your appetite poor?’ or ‘Do you have uncomfortable feelings in your stomach’) and questions about social disfunctioning (such as ‘is your daily work suffering’). Socially or somatically oriented questions are, among the Pashtun, less sensitive to gender-specific interpretations than ‘psychologically oriented’ questions about ‘feeling sad’ or ‘crying much’. Indeed, a questionnaire for depression and anxiety that was constructed within neighboring Pakistan included more somatic items and items about social functioning [30]. Similarly, in a sample of Mongolian women in the reproductive age, the SRQ-20 performed better than the Edinburgh Postnatal Depression Scale (EPDS), which does not contain somatic items [31].

Optimizing the cut-off points for men and women separately had opposite effects: for men the cut-off point had to be lowered while for women it had to be elevated. This may be related to the strong differences within traditional Pashtun culture between male and female modes of expression, differences so strong indeed that some speak of a ‘schism between men’s and women’s emotional worlds’ [12]. We concur with Miller et al. [23] who suggest that the use of self-reporting questionnaires as read aloud by interviewers might lead to underreporting of mental distress in Afghan men who have the tendency to downplay the frequency of certain expressions of distress (e.g., crying) in order to save face in the eyes of the surveyors. This suggests that earlier epidemiological studies about common mental disorders among Pashtun in Afghanistan [1, 22, 36] might have overestimated the prevalence in women and underestimated the prevalence in men.

Several factors might have contributed to the rather limited discriminative ability of the HSCL-25 and SRQ-20 in this setting. First, the self-reported symptoms might not represent psychopathology but rather general psychological distress, as was suggested by Bolton and Betancourt [4]. Second, respondents might have aggravated the severity of symptoms on the questionnaires, hoping to get attention for their suffering and possibly hoping to get better treatment. Respondents may have answered affirmative if they perceived an advantage in being seen as ill. This has been suggested in studies with the SRQ-20, in Guinea Bissau [6] and Ethiopia [20]. In the latter study an analysis of the affirmative responses on SRQ-20 items showed a large percentage of the affirmative answers to be invalid due to linguistic problems, a lack of conceptual clarity or deliberately affirmative answers in order to gain something from it. Third, it may be that the psychiatric diagnoses derived by the mental health professionals were biased because of the lack of female professional clinician interviewers. It is possible that women who endorsed certain symptoms to a same sex lay interviewer showed constraint in doing so in the presence of a male clinician. Without a doubt the use of mental health professionals of the same sex as the participant would have improved the quality of the data. However, Pashto-speaking female mental health professionals are extremely rare; therefore this was not feasible.

Another striking feature of our sample of primary care attendees was the high percentage of people who had a mental disorder as assessed by a mental health professional. High prevalence rates for mental disorders among primary care patients have been found in many cultural settings, but had not been established before in Afghanistan. The particularly high figures in our sample can probably, at least partly, be explained by the fact that the ICD-10 diagnostic criteria used do not require a significant social or occupational disability. Unfortunately, no data on associated social disability in this sample are available to put these figures in context.

Conclusion

The results of this study point to a rather limited usefulness of the HCSL-25 and SRQ-20 as screening tools for common mental disorders in the Afghan population. Earlier studies in Afghanistan using the HSCL-25 with the standard cut-off points might have overestimated the prevalence of mental disorder among women and underestimated the prevalence in men. The high prevalence of mental disorders in primary care patients in Nangarhar magnifies this problem since the properties of a screening instrument need to be extremely robust to be useful for diseases with high prevalence among the screening population. If the HCSL-25 and SRQ-20 would nevertheless be used as screening instruments different cut-off points for men and women have to be used. Rather than advocating the use of screeners to dichotomize the primary care attendees in ‘probably mentally ill’ and ‘probably not mentally ill’, we believe that overall attempts need to be made to increase the ability of primary health care staff in Afghanistan to identify depression and anxiety in their clinical encounters with patients. This involves training and supervision of Afghan doctors and nurses in primary mental health care skills, something that the Afghan government and institutional donors have high on their list of health care priorities [1, 24, 40].