Introduction

In the past decade, diagnosis and treatment for rheumatoid arthritis (RA) have advanced greatly. Development and marketing of the anti-cyclic citrullinated peptide antibody (anti-CCP) assay, with high specificity for RA, has allowed earlier and more specific identification of new onset RA. Additionally, the institution of early therapy has resulted in improved outcomes. The 2010 American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) criteria for RA classification were developed to facilitate the identification and study of RA at earlier stages than the existing 1987 ACR criteria. The 2010 criteria incorporated anti-CCP assay results and measures of systemic inflammation, and did not include criteria such as nodules and radiographic erosions, which are seen primarily in individuals with long-standing RA. As the 2010 criteria become the new standard for classification of RA, ascertainment of the degree of overlap between the criteria is important in order to determine the extent to which previous research can be applied in newly classified patients.

The Nurses’ Health Study (NHS) and Nurses’ Health Study II (NHSII) are large prospective cohorts of over > 230,000 women living across the US, who have been followed many years with data concerning lifestyle factors and development of disease collected on biennial questionnaires. Data from these cohorts have been used to examine relationships between multiple environmental, hormonal and lifestyle factors, as well as genetics, and the development of RA [35]. The correct identification and classification of incident RA cases in the NHS cohorts is thus essential to these ongoing studies, as it is to other large population-based cohorts being followed for incident RA [67]. To date, this has been done in a standardized two-step procedure, described below, based on the 1987 ACR criteria and reviewers’ expert opinion. Approximately 40 % of past RA cases included in NHS cohort analyses have been seronegative. While the 1987 and the new 2010 criteria for the classification of RA have been compared in several different clinical early arthritis populations [813], it is not clear how newly reported RA cases in a population-based cohort study such as this should be classified, in particular as this is done on the basis of medical records received up to 2 years from the time of the initial report. We aimed to compare the 1987 ACR and the 2010 ACR/EULAR classification criteria in the NHS/NHSII cohorts, and to examine their performance characteristics and the characteristics of participants classified as having RA by the two different systems, in order to determine how to classify new cases in the future.

Materials and methods

The NHS is a prospective cohort study involving 121,700 female nurses in the USA, aged 30–55 years in 1976, followed since that time. The NHSII was established in 1989 and includes 116,608 female nurses aged 25–42 at cohort inception and followed since then. All participants completed a baseline questionnaire about their medical histories and lifestyles, and have been followed with biennial questionnaires to update exposures and new disease diagnoses [2]. The RA confirmation procedure is a two-stage process, in which all women newly reporting a doctor diagnosis of RA on a biennial questionnaire are asked to complete the Connective Tissue Screening Questionnaire (CSQ) [14] and to sign a medical records release form. For all women with signs and symptoms of RA on the CSQ and available medical records, these records are then independently reviewed by two board-certified rheumatologists (KC, EK) for both the 1987 ACR classification criteria for RA [15], the treating physician’s ultimate diagnosis, the reviewer’s ultimate diagnosis, and whether the treating physician was an ACR member rheumatologist or not.

In 2010, we began using the 2010 ACR/EULAR classification criteria for RA [1] as well. For the current study, we reviewed the medical records of women in NHS and NHSII who newly self-reported doctor-diagnosed RA in 2009–2012. The criteria sets were only applied when no other diagnosis (e.g., gout, systemic lupus erythematosus, psoriatic arthritis) better explained the signs or symptoms. Results of testing for rheumatoid factor (RF), anti-CCP, C-reactive protein (CRP), erythrocyte sedimentation rate (ESR), and other features of RA were based on medical record review. We reviewed reports of radiographs in the medical record, both as documented by the physician and the reports themselves. Cases with ≥4 of 7 of the 1987 ACR criteria documented in the medical record were considered to have definite RA by 1987 criteria, and cases with a score of ≥6 of the 10 from the 2010 ACR/EULAR criteria were considered to have definite RA by 2010 criteria. Ultimate agreement between the two reviewers as to the diagnosis of RA has been by consensus. Demographic characteristics of the women at cohort baseline and at time of self-report of RA were obtained from NHS to NHSII cohort datasets. Using the NHS expert rheumatologists’ opinion as the gold standard, we compared the sensitivity and the specificity, positive and negative predictive values of classification of incident RA by either or both classification systems.

Statistical analysis

We calculated the sensitivity and specificity, using the reviewers’ opinion as the gold standard for the diagnosis of RA. Agreement between the two criteria sets was based on the κ coefficient. All statistical analyses involved use of SAS, version 9.2 (Cary, NC). The Partners’ Institutional Review Board approved this study.

Results

Baseline characteristics of the women self-reporting new RA in the NHS and NHSII cohorts are described in Table 1. Participants in NHSII are younger, and eighty percentage of participants were seen by an ACR member rheumatologist. The characteristics of the women classified by one or the other criteria system differed from those meeting both sets of criteria (Table 2). Ninety-eight (77 %) of the 128 participants fulfilled the 1987 ACR criteria (69 % of NHS and 80 % of NHSII records reviewed), while 79 (63 %) fulfilled the 2010 ACR/EULAR criteria (59 % of NHS and 65 % of NHSII records reviewed). Seventy-two (56 %) met both sets of criteria, 21 (16 %) met neither, 26 (20 %) met only the 1987 ACR criteria, and 9 (7 %) met only the 2010 ACR/EULAR criteria. Discordance between the classification criteria thus occurred in 35 cases (27 %). Concordance between the two sets per the kappa statistic was 0.36 (95 % CI 0.20–0.53). Notably, participants who met the 1987 ACR criteria only were more likely to have involvement of 1–10 small joints (62 vs. 11 %, p < 0.018), and to have a negative RF and negative anti-CCP (69 vs. 11 %, p < 0.005), than those meeting only the 2010 ACR/EULAR criteria. Women who only met the 2010 criteria were less likely to have symmetric arthritis (44 vs. 96 %, p < 0.002).

Table 1 Characteristics of participants in the NHS/NHSII cohorts newly self-reporting doctor-diagnosed RA in 2009–2012
Table 2 RA classification by individual criteria among participants in the NHS/NHSII cohorts newly self-reporting doctor-diagnosed RA in 2009–2012

For the 128 women self-reporting new RA, there was report of a radiograph in the medical record for 95 (74 %). Of those who had a radiograph, 31 had erosions. Of the 56 women who were seronegative for RF or anti-CCP, 11 had radiographic erosions. All 11 fulfilled the 1987 ACR criteria for RA, only three fulfilled the 2010 ACR criteria for RA, and ultimately nine were thought to have RA. On the other hand, of the 72 women who were seropositive, 20 had radiographic erosions. All 20 fulfilled the 1987 ACR criteria, 19 of 20 fulfilled the 2010 criteria, and all 20 were ultimately thought to have RA (Chi squared test for reviewer’s impression of RA among seropositive vs. seronegative women with erosions, p = 0.049). Of the 31 patients with erosions, 100 % were thought to have RA by their treating rheumatologists.

Using the NHS rheumatologist medical record reviewers’ opinion as the gold standard for RA classification, sensitivity was 0.93 for the 1987 ACR criteria compared with 0.79 for the 2010 ACR/EULAR criteria (Table 3). Specificity of the 2010 ACR/EULAR criteria was 0.87, greater than the 0.77 specificity of the 1987 ACR criteria. The positive and negative predictive values when compared to the reviewers’ opinion were highest when either criteria set was fulfilled. NHS rheumatologist reviewers ultimately agreed with the non-ACR rheumatologists’ diagnoses slightly more than the ACR rheumatologists’ diagnoses (kappa coefficient 0.70 vs. 0.54).

Table 3 Sensitivity and specificity of two criteria systems for RA classification compared to the gold standard of the reviewers’ opinion among participants in the NHS/NHSII cohorts newly self-reporting doctor-diagnosed RA in 2009–2012

Discussion

The 2010 ACR/EULAR criteria were developed with the intent of identifying early RA in particular. While past studies have examined the performance of these new criteria in the classification of early arthritis patients in several clinical settings [813], no studies have examined their performance in the setting of a population-based cohort study in which the identification and classification of RA is done on the basis of mailed questionnaires and medical record review, with an unavoidably longer time course than that of a clinical interaction. As we were not certain how the new criteria would compare to the older criteria and whether in transitioning entirely to the new criteria, we would misclassify or exclude new cases, or include a different type of RA, we performed a side-by-side review of the medical records of women newly self-reporting RA, for the 1987 and the 2010 criteria, as well as an expert reviewer opinion, which has always been our gold standard. We have found that agreement between the two criteria sets was not high (kappa 0.36), and a substantial proportion of seronegative cases, in particular, would not be captured by use of the 2010 criteria alone. The best positive and negative predictive values are found when both criteria sets are used.

Several past clinical studies have also documented high levels of disagreement in RA classification by the two sets of criteria, from 21 to 34 % [811], and we found 27 % disagreement. As has been reported in clinical cohorts [1012], the cases that would not be captured in the NHS cohorts with use of the new classification criteria alone were more often seronegative. Use of the 2010 criteria exclusively would thus primarily miss seronegative RA, which may have otherwise been identified with the 1987 criteria. All prior studies in the NHS and NHSII cohorts have included both seropositive and seronegative populations and allowed for stratified analyses of predictors of different phenotypes of RA. In adding cases identified by the 2010 criteria to those classified by the 1987 criteria, we will essentially continue to identify the same RA cases comparable to those identified since cohort inception, while increasing sensitivity for early RA and anti-CCP positive cases. Our gold standard of the expert rheumatologist reviewers’ opinion is slightly problematic in that our opinions are influenced by our knowledge of the criteria. The reviewers were also limited by the need to rely on medical records, which at times were incomplete and did not offer insight into the treating physicians’ thought processes. However, in the absence of a true gold standard, rheumatologist diagnosis has been and continues to be the accepted outcome against which both old and new criteria have been compared [8, 10, 13]. Thus, these data support use of both the 1987 and 2010 criteria for identification of RA study populations in the NHS and NHSII cohorts, and other studies following population-based cohorts over many years may choose to do the same.