Introduction

Differences in prevalence estimates for depression and anxiety between countries have been frequently reported [38] and generally attributed to methodological differences: different instruments, diagnostic criteria, level of impairment and sampling methodologies; however, even in surveys using common instruments, criteria and design, such as ESEMED [16] or WMH [19], differences have to be interpreted with caution since little is known on the stability of the instruments across translations and cultures. A recent study [19] has established concordance between CIDI [39] and SCID [17] by comparing English, French, Italian and Spanish data but without exploring possible cross-cultural and inter-language differences.

In addition, the formulation of mental health questions could fit better some cultures than others. A closely based DSM-IV criteria approach, as that used in the CIDI [39], may be an efficient approach in some cultures, while, on the contrary, an approach based on primary care physician types of questions, such as those used by the CIS-R [28] in the UK, may be more convenient in others.

The objective of our study was to compare the psychometric properties of two short diagnostic instruments, the CIDI-SF [23] and the CIS-R [28], with respect to a diagnostic reference, the Structured Clinical Interview for Non-Patient (SCID-I/NP [17]) across four countries: Italy, Romania, Spain and France taking explicitly into account results across languages on a large re-interview sample (500 cases).

The short form version of the CIDI has been selected in accordance with the EU monitoring group recommendations (ECHI [24]). This questionnaire that could be quickly completed was derived from the CIDI and designed in the USA [23]. It has been widely used since for epidemiological studies in Canada [4, 3133] and Europe [20, 2527] and compared with the CIDI [23, 33]. The short form version of the CIDI has been demonstrated to be generally reliable, although both the CIDI and the CIDI-SF tend to overestimate the prevalence of major depressive disorders [33]. CIDI-SF psychometrics properties have been studied in Finland with respect to the SCAN: results were fair, but the specificity of the CIDI-SF was higher and agreement better for a more global affective disorder category than for major depressive episode [1].

The CIS-R [28, 29] has been widely used in the UK as part of the National Survey of Psychiatric Morbidity [10, 11, 14], as well as in other countries such as Brazil [6, 12], Chile [3] and Taiwan [30], and has been used to compare the prevalence of mental health disorders between different ethnic communities in Great Britain [5, 8, 13, 36].

The respective validity of the CIS-R and the full version of the CIDI has been already assessed in the UK [9, 20] with respect to the schedule for clinical assessment in neuropsychiatry (SCAN) [37]. This last study [21] using concordance and the receiver operating characteristic (ROC) analysis found that the CIDI was a highly valid assessment of common mental disorders, whilst the CIS-R was moderately valid contrary to what was found in the Brugha study [9]. These differences have been attributed to different duration of periods between interviews, since in the protocol used by Jordanova et al. both interviews were conducted during the same session.

Methods

To compare the psychometric properties of the CIDI-SF vis-à-vis the CIS-R to a semi-structured clinical interview, the SCID-I/NP, we conducted face-to-face interviews in four countries (Italy, Romania, Spain and France).

The CIDI-SF and the CIS-R, two structured instruments with open and closed questions on the most common DSM-IV mental health problems, were administered by trained lay interviewers (graduate students in psychiatry or psychology).

The SCID-I/NP is a semi-structured instrument which permits the evaluation of the most common mental health problems of the DSM-IV Axis 1. The SCID-I/NP was administered by experienced psychiatrists.

For practical reasons, only depressive and anxiety disorders were considered. Seven DSM-IV diagnostic categories [2] were selected in the CIDI-SF 12-month prevalence version (Table 1): major depression, generalised anxiety, specific phobia, social phobia, agoraphobia, panic attack and obsessive–compulsive disorders. Since CIS-R provides diagnostic categories according to the ICD-10 classification [40], we used syntaxes implemented in previous surveys in the UK [20] to generate equivalent DSM-IV diagnostic categories (Table 1). The CIS-R yields information on 1-month prevalence rates. For the SCID-I/NP, diagnoses were identified for three types or groups of disorder (Table 1). These are ‘any mood disorder’, including major depressive disorder, mood disorder due to a general medical condition and dysthymia, ‘any anxiety disorder’, including any phobia (specific, social, agoraphobia), general anxiety disorder, panic disorder, obsessive–compulsive disorder, anxiety disorder due to general medical condition and ‘any diagnosis’, which refers to at least one of the above diagnoses being present.

Table 1 Diagnostic categories

Translations

The instruments were specifically translated and then back translated for validation using a standard methodology [34]. Translations were performed by bilingual specialists with experience in the mental health domain. For the CIDI-SF, we extracted items from validated translations of the CIDI 3.0 full version (French and Spanish) when this was available, which were subsequently validated by the reference bilingual translator for the study.

Study population

The study included 500 volunteers aged 18 years or more attending primary care clinics in France, Italy, Spain and Romania.

Interviews

The interviewees administering the questionnaires are presented in Table 2.

Table 2 Interview methods

Lay interviewers met the potential respondents, presented the project, asked agreements and began the interviews with the survey instruments. The psychiatrist was not present during that phase and inversely the interviewer did not attend the psychiatrist interview. Depending on the availability of the psychiatrist, the survey interviews were administered before or after the psychiatrists’ interviews with the SCIDI-I/NP; both were done on the same day.

All the respondents who wanted to participate answered the entire survey instruments. About 60 interviews were done by each interviewer for Italy and for Romania, about 35 for France and about 40 for Spain.

Statistical analysis

The socio-demographic characteristics of the sample and the diagnoses were compared between countries using the chi2 test. To measure the risk of having a disorder, we calculated prevalence rates, which is the proportion of individuals with a disorder in a sample. We then investigated whether the CIDI-SF, the CIS-R or the SCID-I/NP prevalence estimates differed between countries.

To quantify the association between each survey instrument (CIS-R or CIDI-SF) and the gold standard (The SCID-I/NP), we used several statistical measures of the performance of a binary classification test (presence/absence of a disorder). Sensitivity (Se; proportion of true positives correctly identified by the CIDI-SF or the CIS-R) and specificity (Sp; the proportion of true negatives identified by the CIDI-SF or the CIS-R) were the intrinsic values measured. Positive (PPV; the proportion of CIDI-SF or CIS-R cases confirmed by SCID-I/NP) and negative predictive values (NPV; the proportion of CIDI-SF or CIS-R non-cases confirmed by SCID-I/NP) were determined for the two test instruments.

Individual-level CIDI-SF/CIS-R versus SCID-I/NP diagnostic concordance was evaluated using the area under the receiver operator characteristic curve (AUC [18]). The AUC is calculated as a dichotomous outcome between non-cases and cases, such as (Se + Sp)/2. This measure can be interpreted as the probability that a clinical case, which has been randomly selected, will score higher on the CIDI-SF and the CIS-R than a non-case randomly selected. The AUC values are between 0 and 1: 1 equals a perfect performance and 0 the absence of performance. Above 0.9, the concordance level is excellent; between 0.7 and 0.9 is considered as indicative and useful; between 0.5 and 0.7, concordance is fair.

We used also the Youden index (Y = Se + Sp − 1) which provides a single numerical estimation of the overall diagnostic effectiveness and to summarise the accuracy of the test instrument. A value of 1 means that the test is perfect.

The level of concordance between the SCID-I/NP and the test instruments (CIDI-SF and CIS-R) was evaluated by the kappa concordance test. The kappa measures the agreement between two different judgements or instruments. In psychiatry, kappa of more than 0.40 is considered as a good agreement between both instruments. The test of equality of ROC areas (χ 2) was calculated to measure differences between countries and between test instruments compared to the SCID-I/NP [15]. This specific χ 2 is calculated in a non-parametric approach by comparing the areas under two correlated receiver operating curves and depending on the number of observations.

We analysed the results in a global way and cross-cultural one.

Results

Socio-demographic features of the study sample

With respect to gender (female 67.9%), age (mean: 50 ± 18.0 years) and marital status (26.21% ‘never married’; 55.04% ‘married’; 18.75% ‘widowed or divorced’), the four participating countries are comparable. For employment status, differences are observed in the proportion of retired subjects (43.6% French, 50% Romanian, but only 22% Italian and 21% Spanish). With respect to educational level, France differs from other countries in having a majority of participants (74.8%) with higher education. This is a consequence of the fact that the primary care study centre in France is an Health Management Organisation (HMO) for teachers and related professions.

Estimated prevalence rates of the SCID-I/NP versus the CIDI-SF diagnosis instruments

Concerning the SCID-I/NP estimated prevalence rates, no significant difference between countries is found for the mood disorders. The estimated prevalence range between 4.3% for France to 12.5% for Italy. However, divergences are found for anxiety and overall disorders. Italy presents higher estimated prevalence: for any anxiety disorders (from 6.38% for France to 25.83% for Italy) and any diagnoses (9.9% for France and 30.8% for Italy).

For any anxiety and for any disorders, prevalence estimated by the CIDI-SF is higher than the SCID-I/NP, but follow the same divergences between countries. In Table 3, with the CIDI-SF, the estimated prevalence of any mood disorder is significantly higher (χ²(3) = 9.78, p < 0.01) in France and Spain (14.2 and 21%, respectively) than in Italy and Romania (7.5 and 11.7%, respectively). Discrepancy is also observed for any anxiety disorder (χ²(3) = 12.55, p < 0.01) and for ‘any diagnosis’ (χ 2(3) = 14.23, p < 0.01).

Table 3 Consistency of current DSM-IV CIDI-SF 12-month prevalence and SCID-I 1-month prevalence diagnoses in the cross-cultural sample

Measure of the performance for the CIDI-SF

For any countries, performance tests are good to excellent, for mood, anxiety and any disorders. The Youden indexes, respectively, 0.46, 0.44 and 0.48, highlight the idea that sensitivity (from 0.56 to 0.76) and specificity (from 0.72 to 0.90) balance each other, and whatever the type of disorders, the CIDI-SF gives important information on the concordance level. The AUC values between 0.72 and 0.73 confirm that the performance is indicative. The kappa values, respectively, 0.36, 0.33 and 0.37, are quite good whatever the type of disorders. The PPV is quite fair (0.15–0.67). On the other hand, the NPV is very high (0.84–1).

Looking at the sensitivities and specificities values, we note divergences between disorder types. For the mood disorders, the sensitivity is about 0.56 and the specificity 0.90. In other words, the CIDI-SF capacity to detect the true positives is very fair in comparison with its capacity to detect true negatives. For the anxiety disorders, if a better detection of true positives is found, there is a minor loss of detection for the true negatives. For any diagnosis, sensibility and specificity are quite similar, respectively, 0.76 and 0.72, and point out that globally 3/4 of the individuals are well classified, regardless of whether they are true positives or true negatives. For any diagnosis, the CIDI-SF is as sensitive as it is specific. On the contrary, for the mood and anxiety disorders, the CIDI-SF is more specific than sensitive.

Comparisons of the different values of the discriminative indexes of countries give more precise results and several observations could be made. For the mood disorders, good intrinsic values were found for Italy and Spain, but not for France and Romania. In Italy, for instance, the results are better balanced: the specificity and NPV are high, whilst the sensitivity and PPV are correct. These results indicate a very good capacity to detect both true positives and true negatives. Moreover, the agreement between the two instruments is significant (0.57). The Spanish results are very good for the sensitivities, specificities and NPV, whilst the PPV is poor (0.32). For Spain, the probability that an individual is detected as being a case by both the SCID-I/NP and the CIDI-SF is fair. This is a direct consequence of the very few observations (6.7%) estimated by the SCID-I/NP. France and the Romania report results illustrating an important capacity to detect true negatives, but an important incapacity to detect true positives (sensitivity and PPV, fair). These discrepancies are also found in the test of equality of the AUC, where the performance value variability between countries was found (χ²(3) = 29.53, p < 0.01).

For the anxiety disorders, the Italian, Spanish and Romanian results were quite good. The Italian results could particularly detect true positives well. The important number of CIDI-SF estimated cases has a weak influence on the PPV value. However, no performance (AUC values) differences were found between countries (χ 2(3) = 5.84, NS). In the French sample, because the number of estimated cases by the CIDI-SF was overestimated, the instrument had excellent sensitivity, NPV and AUC for the anxiety disorders, but poor PPV and kappa values. In brief, the CIDI-SF better detects the true negatives than true positives.

For any diagnoses, because this section groups together the mood and anxiety disorders, and because a big proportion of cases are from the anxiety disorders category, the same results are found. In brief, good results are observed for the Italian sample, correct ones for the Romanian ones, quite fair ones for the Spanish ones and very poor ones for the French sample for the PPV and kappa values. No performance discrepancies (AUC values) were found between countries (χ 2(3) = 0.84, NS).

In conclusion, results comparing the discriminative performance and the intrinsic values of the CIDI-SF are quite good if only sensitivities, specificities, AUC and Youden indexes are considered. However, because of an overestimation of probable cases by the CIDI-SF, the agreement values are very poor. This observation, which is visible for the whole sample (all countries pooled together), is the direct effect of results in the French and the Spanish samples, whilst for the two other countries, results are usually significant. Concerning the AUC equality, differences between countries are only found for the mood disorders.

Estimated prevalence rates of the CIS-R diagnosis instruments versus SCID-I/NP

On comparing the SCID-I/NP and the CIDI-SF, the CIS-R gives fewer estimated prevalence rates. For all countries, CIS-R estimates mood disorders of about 4.82%, any anxiety disorder of about 12.6% and overall disorders of about 14.2%. No significant differences are found for the estimated prevalence rates between countries. It is important to note that for France, the CIS-R gives higher prevalence rates than the SCID-I/NP does for any anxious disorders and for any diagnosis. For Spain, higher prevalence is found for any mood disorders.

Measure of the performance for the CIS-R

The classification tests are better for the mood disorders and for any diagnosis than for any anxiety disorders (Table 4). It seems that specificity (0.88–1) and the NPV (0.77–0.97) are very good. However, sensitivities (0.22–0.44) and PPV (0.11–0.67) are quite fair (except for PPV for Italy and any mood disorders). In fact, the detection of true negatives and the assurance that individuals who have a positive result on the test instrument are true positives is very poor. The PPVs are correct (0.54 for all diagnoses) and are the direct consequence of the fewer CIS-R observations compared to those estimated by the SCID-I/NP. As a consequence, the probability of detecting a case with both the CIS-R and SCID-I/NP is more relevant. However, agreements between both instruments are low to moderate for any anxiety disorders (from 0.07 to 0.37).

Table 4 Consistency of current DSM-IV CIS-R and SCID-I 1 month prevalence diagnoses in the cross-cultural sample

In the cross-cultural analyses, results for any mood disorders are really close except for the PPV. These values vary depending on the difference in the number of cases between the SCID-I/NP and the CIS-R. More the differences are important between the number of true positives detected by the CIS-R and SCID-I/NP, better are the PPVs.

When the number of cases is fewer with the lay questionnaire than with the gold standard (SCID-I/NP), the probability of detecting a case by both is all the more greater, such that the number of cases detected by CIS-R is smaller and the number of cases detected by SCID-I/NP is considerable. In Italy for instance, the number of cases detected with the CIS-R is four times less important than the number detected by the SCID-I/NP. As a result, PPV is equal to 1, which means that the totality of individuals detected by the CIS-R are also detected by the SCID-I/NP.

For any anxiety disorders, results are comparable to the mood disorders for the Italian and Spanish samples. There is a minor difference in the Romanian sample: the inter-instrument agreement is fair (from 0.23 to 0.35) with a moderate PPV (from 0.50 to 0.56). In the French sample, the kappa (0.07–0.16) and PPV (0.11 for all mood disorders and 0.22 for any disorders) are very poor. However, for the French sample, kappa and PPV are better for the mood disorder than the other one. The detection is very uncommon: the CIS-R detects two times more individuals with mood disorders than the SCID-I/NP does, and two times less individuals with anxiety disorders.

For any diagnosis, the Italian and Spanish are acceptable, fairly good in the case of Romanian and mediocre for the French sample. In this last sample, the PPV and the kappa are very fair. In brief, it seems that the CIS-R is more specific than sensitive. The cases observed by the CIS-R are fairest than detected by the SCID-I/NP, which involves good PPV and good probability to detect true cases. Concerning the inter-instrument agreement, values are from fairly good to poor. However, the CIS-R appears stable between countries when the equality of AUC is compared (test of equality of ROC areas; NS).

Comparison of performance detection between both lay questionnaires compared to the SCID-I/NP

Globally, the CIDI-SF gives significantly better AUC values than the CIS-R does for the anxiety disorders (cf. Table 5; χ²(3) = 7.91; p < 0.01) and for any diagnoses (χ²(3) = 9.75; p < 0.01). For the mood disorders, there is no AUC difference between the CIS-R and the CIDI-SF.

Table 5 Test of equality of the ROC areas comparison between lay questionnaires and by country

In a cross-cultural analysis, the CIDI-SF has significantly better AUC values except for the Romanian one in the mood disorders categories. However, even when the CIS-R is likely to have better AUC values, there is no significant difference in terms of comparison of ROC areas (χ²(1) = 0.20; NS) For the mood disorders, the CIDI-SF’s AUC is significantly better for Spain (χ 2(1) = 5.01; p < 0.05) and for Italy (χ 2(1) = 7.99; p < 0.01), whilst for France and Romania no statistic differences are found.

For the anxiety disorders, a strong difference is found for the French AUC only (CIDI-SF: 0.81 and CIS-R: 0.55; χ 2(1) = 11.56; p < 0.01). Italy and Spain are close to being also significant.

For the overall diagnosis category, CIDI-SF AUC for France and Romania are significantly different and better than the CIS-R ones (respectively, χ 2(1) = 8.97; p < 0.01 and χ 2(1) = 4.84; p < 0.05).

In brief, CIDI-SF globally gives better AUC for the anxiety disorders and any diagnoses. Some cross-cultural differences have been found and confirm this tendency. The test of equality of ROC areas takes also into consideration the number of observations, which can also explain why in some situations the differences are significant, but not in another situation (i.e. for all diagnoses when countries are pooled together compared to Italy or Spain).

Discussion

Compared to the SCID-I/NP, results concerning the CIDI-SF and the CIS-R are quite different. On one hand, the CIDI-SF is likely to be sensitive and specific, but on the other, the PPV is very fair. The CIDI-SF gives doubtful detection of false positives. The risk in using the CIDI-SF is that it detects more cases than expected. On the other hand, with the CIS-R which appears to be more specific than sensitive, the risk is of underestimation of the number of cases. The fact that the CIS-R has medium PPV indicates also a medium reliability in detecting true positives.

In addition, divergences between countries have been found concerning the performance of both instruments in detecting cases. The Italian and the Spanish results are globally good, whereas the results from Romania are good with the CIDI-SF and medium with the CIS-R. The results are globally poor for the French survey and for both instruments.

In consequence, if we require a good instrument for detecting cases, even if the instrument overestimates the number of cases and classifies non-cases as cases, the CIDI-SF might be preferred. On the contrary, if the main objective is to detect true negatives and have a better confidence concerning the true positives, but with the certainty that some cases will be missed, the CIS-R may be chosen. One important factor that allows choosing between both lay questionnaires is the fact that the CIDI-SF gives globally better AUC values.

The results obtained by the WMH authors on the comparison between the CIDI 3.0 and SCID are very similar to those obtained when AUC is considered: for mood disorder, 0.73 versus 0.75; for anxiety disorders, 0.72 versus 0.73; and for any diagnoses, 0.73 versus 0.76. In addition, those results are very similar across countries and languages except for mood disorders in one of the countries.

The results using kappa are lower than in the WMH study; however, kappa is very sensitive to prevalence and when a country (France) is removed, the results become much closer.

It is striking that the abridged version gives such a closer result than the full CIDI version at least for AUC. It is also noteworthy that results are stable across countries, but it could be argued that all the countries included spoke ‘latine’ languages, which are much more similar than English, and a fortiori very different languages, such as Chinese or Yoruba.

The CIDI-SF seems to have the same tendency than the full CIDI, the sensitivity of which has also been considered to be inadequate [1] but to a higher extent. This may be explained by the absence of organic exclusion or bereavement items in this questionnaire, and questions that evaluate the clinical significance of the reported symptoms [33]. In addition, difficulty in assessing ‘clinically significant distress or impairment’ was one of the reasons why the authors of the CIDI-SF did not take these notions into account when developing CIDI-SF syntax. These authors argued that a person who would present dysphoria or anhedonia ‘for most of the day, nearly everyday’ for 2 weeks or more would be considered to show clinically significant distress [35].

An alternative explanation for the low sensitivity of the CIDI-SF may relate to differences in the time periods considered by the different instruments. Both the SCID-I/NP and the CIS-R provide data on 1-month prevalence rates (the SCID-I/NP provides lifetime prevalence rates as well), whereas the CIDI-SF provides 12-month prevalence rates. However, this is unlikely to be the principal explanation for the discrepancy, since it is the CIDI-SF, rather than the CIS-R, that shows a better concordance with the SCID-I/NP. Moreover, since anxiety and depressive disorders tend to be chronic, the time base of the prevalence estimate is unlikely to be a significant source of bias.

One possibility is that the thresholds or cut points used in the CIDI-SF to determine the detection may vary depending on the countries or are incorrect for such population. For instance, for the CIDI-SF major depression disorder, the threshold is superior or equal to 3, which corresponds to a probability of 0.55 to have the same trouble with the full CIDI [22].

In addition, this primary care population was selected because it was expected to be sufficiently representative of the general population, and to contain sufficient cases of mental health disorders to generate prevalence rates that could be compared in a meaningful way between countries.

All subjects are successively exposed to all three instruments. Although the questionnaires are proposed in a random order, so as not to influence responses and to reduce fatigue or boredom associated with responding to multiple questions, we cannot assess whether replies to any of the questionnaires are influenced by fatigue. Fatigue bias may thus contribute to inconsistency between questionnaires. For example, tired participants may have wanted to interrupt the interview, and thus endeavoured to minimise symptoms and to respond in a negative way. Differences between diagnostic instruments may thus be explained by the fact that during the second phase of the interview, respondents may be less likely to report symptoms [7] or may simply deny diagnostic questions [22].

The attitude of the respondent to the instrument can also be a source of bias. We noted that subjects’ attitudes to the CIS-R are incompatible with our initial hypothesis that the CIS-R would be better accepted by the sample population because of its friendly question structure compared to the CIDI-SF, which is often perceived as a clinical interview that could negatively influence responses. In fact, both interviewers and interviewees considered the CIS-R to be less accessible than the CIDI-SF. In particular, participants found that the CIS-R was too long and too fastidious, which incited some participants to respond in a negative way and to stop the interview prematurely. We could thus hypothesise that participants’ attitudes to the instrument used may influence replies to diagnostic questions. Interestingly, these remarks are similarly reported in all countries.

Another one possible explanation is the effect of interviewing methods on the respondents’ responses. The experienced clinicians who use the semi-structured instrument have the possibility to go deep into the symptoms and to take time to reformulate questions. With the structured interviewing method, the lay interviewers and the respondents are guided by the questionnaire and items choices, which restrict the possibility of obtaining precise information.

In addition, the interviewers and the way the instruments have been proposed can also be a source of bias. As we mentioned previously, the interviewers have a tendency to consider the CIS-R to be much longer and fastidious than the CIDI-SF. This feeling may impact on the way the CIS-R had been administered. The fact that the CIS-R obtains fewer observations can be explained by the difficulties of some interviewers in tackling it. One of the problems with the CIS-R is the difficulty to follow the different steps and to calculate for each diagnosis section the number of points, before jumping to another diagnosis section. On the contrary, for the CIDI-SF, no calculation is needed between the diagnosis sections and the interviewers have to follow simple skips after each question. The CIDI-SF is clearly easier to administrate and can be easily self-administered.

Conclusion

In brief, lay questionnaires, such as the CIDI-SF and the CIS-R, are used to reduce cost by eliminating the need for an experienced clinician and to reduce the time duration of the administration. However, because of the need of acceptable and feasible length and low cost investment, limitation and loss of validity can be found.

From our results, we have quite acceptable validity and performance values for both test instruments compared to the gold standard instrument, though it is not perfect. It is clear that the considerable differences in the interviewing methods used for the SCID-I/NP and both lay questionnaires can explain a part of the surprising differences between prevalence rates observed. In addition, we are quite suspicious about the validity of the SCID-I/NP in large-scale general population surveys. Test and re-test reliability studies on the SCID are far form perfect and describe strong variability in the kappa values in the general population [17].

Though our results suggest a lack of accuracy of case identification for the lay questionnaires, our choice of an instrument is the CIDI-SF. In our study, the performance of the CIDI-SF as a lay diagnostic instrument is globally better for the anxiety disorders and, by consequence, for the overall diagnoses than the CIS-R, compared to the SCID-I/NP, though performance disparities are found between countries. The CIDI-SF results are surprisingly quite similar to the CIDI 3.0 and better than CIS-R, compared to the SCID-I/NP. In addition, the CIDI-SF is well accepted by the interviewers and the interviewees. Consequently, the CIDI-SF seems to be a good choice for determining globally the prevalence of the most frequent diagnoses, mainly depressive and anxiety disorders in the general population when time is restricted, for example in a phone survey. However, researchers have to take into consideration the CIDI-SF’s tendency to overestimate the false positives. This overestimation can be a disadvantage to some public researches. Nonetheless, it is cautious to choose an instrument that overidentifies false positives than an instrument that overidentifies false negatives. It is also important to highlight that the respondents considered to be false positive by the lay questionnaires feel symptoms, which are highly associated with the presence of the corresponding disorder. The possible limitations of this survey are the use of GP’s population that is more likely to have borderline disorders, the type of interviewing methods used and, finally, the cutoff points used by the CIDI-SF, which could explain the overestimation.

Furthermore, better estimating the cutoff point’s level of sensibility, adding some impairment measures and adding bereavement and organic restriction items may increase the performance of the CIDI-SF questionnaire for the selected diagnoses.

Thus, we would like to argue that our results and prevalence rates gathered in a large-scale survey with short instruments like the CIDI-SF and the CIS-R must and can be interpreted in a cautious manner. Information has to be given on the instrument limits and methodology biases. However, the short instruments are good and useful tools to provide preliminary pictures of the general population’s mental health. Given the importance of the burden of mental health and its high co-morbidity with many physical disorders, especially chronic illnesses, such short instruments could easily fit into any health-related survey.