A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure

Cameron, Isobel M.; Scott, Neil W.; Adler, Mats; Reid, Ian C.

doi:10.1007/s11136-014-0719-3

A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure

Brief Communication
Published: 22 May 2014

Volume 23, pages 2883–2888, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Quality of Life Research Aims and scope Submit manuscript

A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure

Download PDF

Isobel M. Cameron¹,
Neil W. Scott²,
Mats Adler³ &
…
Ian C. Reid¹

1452 Accesses
33 Citations
1 Altmetric
Explore all metrics

Abstract

Purpose

It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF.

Method

Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ² procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners.

Results

Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive.

Conclusions

Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Background

Measuring psychological well-being and quality of life is more complicated than measuring other aspects of health where aetiology and pathology make a greater contribution [1]. Where scales observe differences in scores between groups, differences may be due to a characteristic of test items other than the scale attribute. For example, there may be items within a given scale which may be more likely to be endorsed by those in a particular age, gender or ethnic group. Differential item functioning (DIF) is considered present where items on a scale show such bias. Awareness of this bias is of particular importance where scale thresholds are used to inform decisions on diagnosis and subsequent treatment. Where DIF is present, this could lead to under or over treatment for particular groups, depending on the direction of bias. As such, it is important that scales should be assessed for DIF and the extent of its presence taken into account in interpreting scale scores. Several methods have been applied to measure DIF in health-related scales, for example: structural equation modelling [2]; ordinal logistic regression [3]; item response theory (IRT) analysis [4]; and contingency tables [5] methods. Personal preference has been advocated to guide choice of method [6] yet these methods may have varying degrees of sensitivity to detect DIF.

We compared three approaches for assessing DIF in the Hospital Anxiety Depression Scale (HADS) [7]; a 14-item self-reported instrument that comprises an anxiety (HADS-A) and depression (HADS-D) subscales where higher scores represent greater symptom severity. HADS is commonly used in clinical practice and research [8]. It is therefore important that comparable results are obtained regardless of demographic aspects such as age and gender of respondents. Three methods of measuring DIF were applied to one dataset and the relative findings examined: (1) ordinal logistic regression; (2) an IRT method using Rasch analysis; and (3) a contingency table method using Mantel χ². The objective was to assess whether methods identified the same items as exhibiting DIF, and whether some methods were more sensitive to detecting DIF than others.

Methods

Sample

In four practices in North East Scotland, 1,068 adult consulting primary care professionals completed HADS [9]. North of Scotland Research Ethics Committee (06/S0802/27) approval was granted.

Statistical methods

DIF analyses were conducted independently by different researchers (ordinal logistic regression (Scott), Rasch analysis (Adler) and Mantel χ² procedure (Cameron). In DIF analysis, assessment is made between a ‘reference’ and ‘focal’ group. Each researcher received an anonymised dataset of HADS items, sex (reference group = female, focal group = male) and age (reference group = <65 years, focal group = ≥65 years). Each researcher completed analysis before appraising the other analyses to reduce interpretative bias. A fourth author, free of methodological preference bias (Reid) appraised the findings.

Method 1: ordinal logistic regression

For each item in HADS-D and HADS-A, an ordinal logistic regression (OLR) model was used with age group, sex and the overall scale score as dependent variables. A log odds ratio greater than zero indicated that those in the focal group (age ≥ 65 or males) were more likely to have higher anxiety/depression symptoms on this item than those in the reference group (age < 65 or females). Items were regarded as having important DIF if p < 0.001 and the magnitude of the log odds ratio was greater than 0.64 [10]. Items associated with p < 0.05 were also noted. For greater detail on the OLR approach, see Scott et al. [3], Crane et al. [11], Zumbo [12].

Method 2: Rasch model

Parametric IRT-models are built on the premise that it is possible to formulate a mathematical function that adequately describes the probability of respondents, at different levels of the dimension, to endorse a response option in a rating scale. Presently, the 1-parameter Rasch model is applied [13]. The quality of the measurement can be evaluated by fit to the model, dimensionality and DIF. The analysis was performed using the Winsteps programme [14]. Magnitude of DIF is referred to as a DIF contrast. DIF contrasts <0.5 are considered negligible, contrasts 0.5 to 1 as moderate and >1 as substantial, provided that the DIF contrasts are statistically significant (p < 0.05, T value > 2). For greater detail on the Rasch model approach, see Bond and Fox [13], Tennant et al. [15].

Method 3: Mantel chi-square procedure

DIF analyses were performed using DIFAS-5 [16]. Data were stratified by the sums of the respective scales and assessed for DIF by sex and age. The Mantel χ² [17] statistic was computed (a contingency table method of assessing DIF in scales made up of polytomous items). The total score on the scale was divided into slices and the performance of each item assessed at these different score levels according to the grouping variables of interest. As fourteen items were being tested by two different groupings, a Mantel χ² value >10.83 was considered indicative of a statistically significant difference at the 0.001 level. The Mantel χ² value was then considered in the context of the effect size. Standardised Liu-Agresti Cumulative Common Log-Odds Ratios (LOR Z) are presented. Where this value is >2 or <−2, evidence of DIF is indicated [18]. Positive values indicate greater propensity for item endorsement by the reference group and negative values by the focal group. For greater detail on this method, see Penfield and Algina [19].

Assessment of unidimensionality and model fit

Prior to the DIF analyses, dimensionality was assessed to ensure that HADS-D and HADS-A were each measuring one underlying construct. Additionally, for a valid analysis of DIF within the Rasch model, it is mandatory that data also show an acceptable fit to the model (within the recommended range of 0.5–1.5 [14] ). Using an IRT-approach to test for unidimensionality, HADS-D and HADS-A were analysed using a principal component analysis (PCA) of the residuals left after the Rasch model was fitted to the data. Each item is modelled to contribute one unit of information (=1 eigenvalue) to the principal components decomposition of residuals. The eigenvalues of the PCA correspond to the number of items that the contrast represents. Contrasts with fewer than two eigenvalues imply low influence from secondary dimensions.

Results

Sample

For age, there were 814 respondents in the reference group (<65 years) and 254 in the focal group (≥65 years). For sex, there were 633 in the reference group (female) and 435 in the focal group (male).

Dimensionality and model fit assessment

All fit values of the Rasch model were within the recommended range (HADS-D: item infit 0.76–1.27, item outfit 0.67–1.41; HADS-A: item infit 0.84–1.31, item outfit 0.81–1.32), and eigenvalues of the first residuals were below two (eigenvalue of first contrast for HADS-D = 1.4 and for HADS-A = 1.6) indicating unidimensionality in both HADS-D and HADS-A.

Method 1: Ordinal logistic regression method

Using the combined criteria of p < 0.001 and |log(OR)| > 0.64 as indicating important DIF, three items (Q1, 6 and 8) had age group DIF but no items met the stricter criteria for sex DIF (Table 1).

Table 1 Uniform DIF based on ordinal logistic regression

Full size table

Method 2: Rasch model method

DIF contrasts are shown in Table 2. There were four items with a DIF contrast >0.5 for age group (Q1, 6, 8 and 10) and one for sex (Q11). All items with contrast values >0.5 were also statistically significant (T value >2).

Table 2 DIF based on the Rasch model

Full size table

Method 3: Mantel chi-squared procedure

Significant DIF by age was identified for four items (Q1, 6, 8 and 10) and two by sex (Q9, 11) (Table 3).

Table 3 DIF based on the contingency tables method

Full size table

Discussion

Ordinal logistic regression, Rasch analysis and Mantel chi-square methods of measuring DIF in HADS-D and HADS-A led to similar findings regarding the presence of DIF. There was remarkable consistency between the methods in the size and direction of the DIF effects found, although there were differences in the number of items crossing the threshold indicating important DIF.

Regardless of method, the analyses of DIF implied that the HADS-D and HADS-A subscales are valid tools for comparisons between sexes and for between age groups for HADS-A. In HADS-D, all three methods identified significant levels of age-related DIF. This is a potential concern given how frequently HADS is used in research studies and clinical practice. Yet, the difference in DIF on the problematic items in HADS-D went in different directions so the effect might be numerically cancelled out at the scale score level.

Within the Rasch model, it is possible to remove DIF by splitting the estimation of measures between the subgroups on the items showing substantial DIF. This is an alternative to removing or reformulating items with DIF.

We studied only three of many methods proposed to assess DIF. Further investigations with structural equation modelling methods would add to the understanding of the benefits and drawbacks of the different methods.

Our findings, in relation to the presence of DIF in HADS, concur with other studies [5, 20, 21]. A small number of studies have examined the relative performance of DIF detection methods in depressive symptom tools [22–24]. The Mini-Mental State Examination (MMSE) has been subjected to several methods to assess for the presence of DIF in relation to translation and other variables [11, 25–28]. The methods assessed included logistic regression, IRT, contingency tables and structural equation modelling methods. In considering the relative findings, there was a general lack of agreement between methods.

Conclusion

Ordinal logistic regression, Rasch analysis and contingency tables methods of investigating DIF yielded consistent results when identifying DIF in HADS-D and HADS-A. Regardless of method, investigators should combine statistical significance, magnitude of DIF effect and investigator judgement to interpret the results.

References

Warner, J. (2004). Clinicians’ guide to evaluating diagnostic and screening tests in psychiatry. Advances in Psychiatric Treatment, 10(6), 446–454.
Article Google Scholar
Crawford, J. R., Garthwaite, P. H., & Slick, D. J. (2009). On percentile norms in neuropsychology: Proposed reporting standards and methods for quantifying the uncertainty over the percentile ranks of test scores. The Clinical Neuropsychologist, 23, 1173–1195.
Article PubMed Google Scholar
Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., De Graaf, R., Groenvold, M., et al. (2010). Differential Item Functioning (DIF) analysis of health-related quality of life instruments using logistic regression. Health and Quality of Life Outcomes, 8(81), 1–9.
Google Scholar
Isacsson, G., Adler, M. (2011) Randomized clinical trials underestimate the efficacy of antidepressants in less severe depression. Acta Psychiatrica Scandinavica, 125(8), 453–459.
Cameron, I. M., Crawford, J. R., Lawton, K., & Reid, I. C. (2013). Differential item functioning of the HADS and PHQ-9: An investigation of age, gender and educational background in a clinical UK primary care sample. Journal of Affective Disorders, 147(1–3), 262–268.
Article PubMed Google Scholar
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31–44.
Article Google Scholar
Zigmond, A. S., & Snaith, P. (1983). The Hospital Anxiety and Depression Scale (HAD). Acta Psychiatrica Scandinavica, 67, 361–370.
Article CAS PubMed Google Scholar
Herrmann, C. (1997). International experiences with the Hospital Anxiety and Depression Scale—a review of validation data and clinical results. Journal of Psychosomatic Research, 42, 17–41.
Article CAS PubMed Google Scholar
Cameron, I. M., Lawton, K., & Reid, I. C. (2009). Appropriateness of antidepressant prescribing: An observational study in a Scottish primary-care setting. British Journal of General Practice, 59, 644–649.
Article PubMed Central PubMed Google Scholar
Bjorner, J. B., Kreiner, S., Ware, J. E., Damsgaard, M. T., & Bech, P. (1998). Differential item functioning in the Danish translation of the SF-36. Journal of Clinical Epidemiology, 51(11), 1189–1202.
Article CAS PubMed Google Scholar
Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar. Medical Care, 44(11 Suppl 3), S115–S123.
Article PubMed Google Scholar
Zumbo, B. D. (1999). A handbook on the theory and methods of Differential Item Functioning (DIF). Ottawa: Directorate of Human Resources Research and Evaluation, National Defense Headquarters.
Google Scholar
Bond, T. G., & Fox, C. M. (2007). Applying The Rasch Model. Fundamental measurement in the human sciences (2nd ed.). New Jersey: Lawrence Eribaum Associates Inc.
Google Scholar
Linacre, J. M. (2010). Winsteps Rash Measurement, 3.70.0.
Tennant, A., Penta, M., Tesio, L., Grimby, G., Thonnard, J. L., Slade, A., et al. (2004). Assessing and adjusting for cross-cultural validity of impairment and activity limitation scales through differential item functioning within the framework of the Rasch model: the PRO-ESOR project. Medical Care, 42(1 Suppl), I37–I48.
PubMed Google Scholar
Penfield, R. D. (2007) DIFAS 4.0: Differential item functioning analysis system user’s manual.
Mantel, N. (1963). Chi square tests with one degree of freedom: Extension of the Mantel-Haenszel procedure. Journal of the American Statistical Association, 58, 690–700.
Google Scholar
Liu, I., & Agresti, A. (1996). Mantel-Haenszel-type inference for cumulative odds ratios with a stratified ordinal response. Biometrics, 52, 1223–1234.
Article CAS PubMed Google Scholar
Penfield, R. D., & Algina, J. (2003). Applying the Liu-Agresti estimator of the cumulative common odds ratio to DIF detection in polytomous items. Journal of Educational Measurement, 40, 353–370.
Article Google Scholar
Lambert, S., Pallant, J. F., Girgis, A. (2010) Rasch analysis of the Hospital Anxiety and Depression Scale among caregivers of cancer survivors: Implications for its use in psycho-oncology. Psycho-Oncology , 20(9), 919–925.
Pallant, J. F., & Tennant, A. (2007). An introduction to the Rasch measurement model: An example using the Hospital Anxiety and Depression Scale (HADS). British Journal of Clinical Psychology, 46(1), 1–18.
Article PubMed Google Scholar
Yang, F. M., & Jones, R. N. (2007). Center for Epidemiologic Studies-Depression scale (CES-D) item response bias found with Mantel-Haenszel method was successfully replicated using latent variable modeling. Journal of Clinical Epidemiology, 60(11), 1195–1200.
Article PubMed Central PubMed Google Scholar
Cole, S. R., Kawachi, I., Maller, S. J., & Berkman, L. F. (2000). Test of item-response bias in the CES-D scale. Experience from the New Haven EPESE study. Journal of Clinical Epidemiology, 53(3), 285–289.
Article CAS PubMed Google Scholar
Huang, F. Y., Chung, H., Kroenke, K., Dellucchi, K. L., & Spitzer, R. L. (2006). Using the Patient Health Questionnaire 9 to measure depression among racially and ethnically diverse primary care patients. Journal of General Internal Medicine, 21, 547–552.
Article CAS PubMed Central PubMed Google Scholar
Dorans, N. J., & Kulick, E. (2006) Differential item functioning on the Mini-Mental State Examination. An application of the Mantel-Haenszel and standardization procedures. Medical Care, 44(11 Suppl 3):S107–S114.
Jones, R. N. (2006). Identification of measurement differences between English and Spanish language versions of the Mini-Mental State Examination. Detecting differential item functioning using MIMIC modeling. Medial Care, 44(11 Suppl 3):S124–S133.
Orlando Edelen, M. O., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006) Identification of differential item functioning using item response theory and the likelihood-based model comparison approach. Application to the Mini-Mental State Examination. Medical Care, 44(11 Suppl 3):S134–S142.
Morales, L. S., Flowers, C., Gutierrez, P., Kleinman, M., & Teresi, J. A. (2006). Item and scale differential functioning of the Mini-Mental State Exam assessed using the Differential Item and Test Functioning (DFIT) Framework. Medical Care, 44(11 Suppl 3), S143–S151.
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgments

We would like to thank the primary care participants and general practices who kindly took part in the original study from which the data were collected. The original research from which the data presently analysed were collected was funded by the Centre for Change and Innovation, of the then Scottish Executive; and from Support for Science funding, Grampian NHS Research and Development. The present methodological investigations were conducted without additional funding.

Ethical standards

The anonymised data analysed in this study were originally collected for research conducted with the approval of the North of Scotland Research Ethics Committee (06/S0802/27).

Conflict of interest

IMC and NWS have nothing to declare. MA has received fees for speaking from Ostuka, AstraZeneca and Servier and served as consultant for Otsuka. ICR has received fees for speaking from AstraZeneca UK and received travel and meeting registration assistance from Lundbeck.

Author information

Authors and Affiliations

Psychiatry Group, Division of Applied Medicine, Royal Cornhill Hospital, University of Aberdeen, Aberdeen, AB25 2ZH, Scotland, UK
Isobel M. Cameron & Ian C. Reid
Medical Statistics Team, Division of Applied Health Sciences, University of Aberdeen, Aberdeen, Scotland, UK
Neil W. Scott
Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
Mats Adler

Authors

Isobel M. Cameron
View author publications
You can also search for this author in PubMed Google Scholar
Neil W. Scott
View author publications
You can also search for this author in PubMed Google Scholar
Mats Adler
View author publications
You can also search for this author in PubMed Google Scholar
Ian C. Reid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isobel M. Cameron.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cameron, I.M., Scott, N.W., Adler, M. et al. A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure. Qual Life Res 23, 2883–2888 (2014). https://doi.org/10.1007/s11136-014-0719-3

Download citation

Accepted: 13 May 2014
Published: 22 May 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11136-014-0719-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure

Abstract

Purpose

Method

Results

Conclusions

Background

Methods

Sample

Statistical methods

Method 1: ordinal logistic regression

Method 2: Rasch model

Method 3: Mantel chi-square procedure

Assessment of unidimensionality and model fit

Results

Sample

Dimensionality and model fit assessment

Method 1: Ordinal logistic regression method

Method 2: Rasch model method

Method 3: Mantel chi-squared procedure

Discussion

Conclusion

References

Acknowledgments

Ethical standards

Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation