Pathergy phenomenon (PP) is a skin hyper-reactivity to trauma [1]. Pathergy test (PT) is an easy to perform skin test to look for the pathergy phenomenon. It is a diagnostic tool used for the diagnosis of Behcet’s disease (BD) [28]. It was first described in 1937 [9]. It is an important criterion of many classification/diagnosis criteria of BD [10]. The sensitivity of PT was 83% in Russia [11], 77% in Morocco [12], 71% in Iraq [13], 62% in China [14], and Egypt [15], 61.5% in Iran [16], 55% in Germany [17], 44% in Japan [18], and 18% in Saudi Arabia [19]. The sensitivity of PT is changing over the time. It has become 53.1% in Morocco [20], 52.5% in Iran [21], 40% in Portugal [22], and 33.7% in Germany [23]. Looking at the gradual decrease of sensitivity of PT in Iran, it went from 71.8% for the first 1,000 patients to 33.9% for patients 5,000 to 6,000. The same phenomenon was observed when patients were classified by the time of their disease onset, going from 61.5% for patients having their first symptom before 1977 to 41% for patients having their disease onset between 1998 and 2007 [24]. The aim of this study was to look for the diagnostic value of PT in the present time, according to the decrease of its sensitivity, and to look at the change of diagnostic value over the past 35 years.

Patients and methods

The Behcet’s disease registry at the Behcet’s Disease Unit, Rheumatology Research Center (RRC), Tehran University of Medical Sciences, has the full data of 6,607 Behcet’s disease patients. The data are regularly updated as patients are followed during their disease course.

PT was performed in all patients 1 day before their first visit. Disposable needles were used to perform the PT. Three needle pricks were done on the skin of the forearm, after thorough asepsis of the skin with povidone iodine 10% (Betadine®). Needles were inserted intradermally. One prick was with a 25-gage needle, the second was with a 21-gage needle, and the third was with a 25-gage needle and the injection of one or two drops of serum normal saline. Results were read 24 h later, at the day of patient’s first visit, by one of the dermatologists and one of the rheumatologists of the BD clinic. A positive result was the formation of a papule or pustule, on the site of the needle prick, surrounded by an erythema.

In BD patients, 123 did not perform the pathergy test. The missing data was 1.9% of all BD patients. Control patients were 4,292, all mimicking BD. They were referred to confirm or rule out the diagnosis of BD, during the past 35 years. The missing data was 113 controls (2.6%).

Patients were divided in four groups of 1,650 patients according to their time of first evaluation in RRC. The first group comprised the most ancient patients and the fourth group the most recent patients. The first patient was seen in 1975, and the last patient of this study on 24th of May 2010. Controls were also divided in four groups of 1,073 patients, by the same method.

To check different aspects of performance and the diagnostic value of PT, the following statistical analyses were done: sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR), and the Youden’s index (YI) [2528].

Results

PT was positive in 64.2% for the first group of BD patients. The 95% confidence interval (95% CI) was 61.8% to 66.5%. PT was positive in 59.2% of the second BD group (95% CI, 56.8% to 61.5), 50.65% of the third BD group (95% CI, 48.2% to 53.1%), and 35.8% of the fourth BD group (95% CI, 33.5% to 38.1%). For the control group, the figures were 13.4% for the first group (95% CI, 11.5% to 15.6%), 9% for the second group (95% CI, 7.4% to 10.9%), 1% for the third group (95% CI, 0.5% to 1.9%), and 1.6% for the fourth group of control patients (95% CI, 1% to 2.6%). Details are shown in Table 1. The difference between the first and the second BD group was significant (Chi-square test)

Table 1 Details of Behcet’s disease and control patients

The sensitivity of a test is the number of patients having a positive test (expressed in percentage). Sensitivity of PT for the first group was 64.2%, for the second group 59.2%, for the third group 50.6%, and for the fourth group 35.8%. The difference between the first and the second group was 5.2%, which was statistically significant (Chi2 = 8.284, p = 0.004). The difference between the second and the third group was 8.6% (Chi2 = 24.588, p < 0.0001), and between the third and the fourth group 14.8% (Chi2 = 72.564, p < 0.0001).

The specificity is the number of non-patients (controls) having a negative test (expressed in percentage). The specificity of PT was 86.6% for the first group, 91% for the second group, 99% for the third group, and 98.4% for the fourth group. The difference between the first and the second group was 4.4%, which was statistically significant (Chi2 = 10.406, p = 0.0013). The difference between the second the third group was 8% (Chi2 = 66.037, p < 0.0001), and between the third and the fourth group 0.6% (Chi2 = 1.321, p = 0.25).

The positive predictive value without taking in account the prevalence of BD was 82.7% for the first group (patients and controls), 86.8% for the second group, 98.1% for the third group, and 95.7% for the fourth group. The PPV, adjusted to the prevalence of BD in the RRC Behcet’s disease clinic (33%) or in the normal population of Iran (80 for 100,000 inhabitants), was different. Details are given in Table 2. The negative predictive value, without taking in account the prevalence of BD, was 82.7% in the first group (patients and controls), 69% in the second group, 66.7% in the third group, and 60.5% in the fourth group. Results after adjustment to the prevalence of BD are shown in Table 2.

Table 2 Performance and diagnostic value of pathergy test

The positive likelihood ratio was 4.8 for the first group, 6.6 for the second group, 50.6 for the third group, and 22.4 for the fourth group. The negative likelihood ratio was 0.41 for the first, 0.45 for the third, 0.5 for the third, and 0.65 for the fourth group of patients and controls (Table 2).

The diagnostic odds ratio was 11.6 for the first group of patients and controls, 14.7 for the second group of patients and controls, 101.4 for the third group of patients and controls, and 34.3 for the fourth group of patients and controls (Table 2).

The Youden’s index was 0.5 for the first, second, and the third groups of patients and controls. It decreased to 0.34 in the fourth group (Table 2).

Discussion

The sensitivity of the pathergy phenomenon decreased gradually in Iranian patients, from 64.2% to 35.8% (Table 1). The difference between the first and the second group, between the second and the third group, and between the third and the fourth group was statistically significant. The same phenomenon of decrease in the sensitivity of PP was observed in Morocco [20], Portugal [22], and Germany [23]. The reason for the decline in the rate of positive pathergy phenomenon, both in patients and controls, remains obscure.

The specificity increased in Iranian patients from 86.6% to 98.4%. The difference in the rate of positive PT between the first and the second control group and between the second and the third control group was statistically significant. The specificity of the PT decreased slightly in the fourth group, but the difference compared to the third group was not statistically significant.

Overall, the sensitivity of PP lost 28.4%, while the specificity gained 11.8%. It is important to see what has become the overall performance of PP as a diagnostic tool.

The positive predictive value (PPV) shows the probability that the test was true positive. PPV is highly influenced by the prevalence of the disease in which it is tested. The prevalence of a disease will change depending where and in which setting patients are seen. Consequently, the results obtained in different settings will differ. The prevalence of BD in Iran is 80 for 100,000 of population (0.08%). In Behcet’s Disease Unit of the Rheumatology Research Center, one third of new patients addressed to check for the diagnosis of BD have the disease (prevalence, 33%). As a result, the PPV of PT will vary from 0.38% (population) to 70.2% (BD Unit) for the first group of patients, from 0.52% to 76.4% for the second group of patients, from 3.89% to 96.1% for the third group, and from 1.76% to 91.7% for the fourth group of patients. As seen by these results, although the sensitivity of PP has decreased over the time, its value as a diagnostic test, when positive, has improved, during the past 35 years, from 70% to 91.7%. However, this is true only for the BD clinic of RRC. In the normal population, the value has improved from 0.38% to 1.76% (a difference of 1.38%). The PPV shows also the value of a test as a screening tool. If a positive PP in the BD clinic of RRC gives a probability of 91.7% to be a BD, in the normal population the probability would only be 1.76%, having therefore no value as a screening test.

The negative predictive value (NPV) shows the probability that a negative test be truly negative. The NPV is also influenced by the prevalence of the disease. NPV has lost some of its value in the BD Clinic, going from 83.1% to 75.7% (Table 2), while in the normal population the value has not changed.

The likelihood ratio (LR) shows how much the odds of having the disease may change upon a positive or a negative result. The prevalence does not influence the LR. Therefore, figures can be used in any disease setting. Positive likelihood ratio (PLR) shows the odds of having the disease. When it is superior to 5, the test is related to the disease. The negative likelihood ratio (NLR) shows the odds of not having the disease. PP has an overall good PLR (22.3) and NLR (0.5). The PLR has improved from 4.8 in the first group of patients to 22.4 for the fourth group of patients, which is a very good improvement. The NLR increased from 0.41 to 0.65, which is an aggravation. The actual PLR of 22.4 and NLR of 0.65 mean that the risk of having BD if PP is positive is 22.4 times, while the chance of not having BD if PP is negative is not good, the error rate being 65%, which is not a good figure.

The diagnostic odds ratio (DOR) combines the results of PLR and NLR. A value of 1 means the criteria do not discriminate between patients and controls. Higher values are synonym of better discrimination. The DOR was 11.6 for the first group; it improved to 34.3 for the fourth group. DOR decreased greatly from the third to the forth group (Table 2). The reason was both the decrease of sensitivity along with the decrease of specificity from the third to the forth group.

Youden’s index (YI) is a simple calculation combining the results of sensitivity and specificity to show the precision or accuracy of the test. The results go from 0 to 1, the latter being the most precise. The YI of the first group was 0.5. It deteriorated to 0.34, showing a decrease in the precision rate over the time. In YI, sensitivity and specificity have both the same influence on the index. As sensitivity decreased more than what the specificity gained, the YI deteriorated during the time.

Looking at the positive rate of PT in the four groups of BD patients and control groups, the trend was a decrease of the positive results during the time, except for the fourth group of control patients, which increased from 1% to 1.6%. Although the increase in this particular group of controls was unexpected, the difference was not significant. No valid explanation exists for the decrease of sensitivity along with the increase of specificity. Both are due to less positive test in patients and controls. One explanation could be the better discrimination of positive and negative results. However, this could be proposed for the first 100 or 200 patients, not for the remaining, because the same dermatologists looked at the PT from the beginning until now. Dilsen and colleagues showed in 1995 that the use of disposable needles instead of blunt needles decrease the rate of positive results [29]. As stated in the “Patients and methods,” our patients were all tested with disposable needles. A major cause to get a false negative result is taking corticosteroids when doing the PT. However, few patients were taking it at their first visit. Their percentage varies from one session of Behcet’s disease clinic to another, but overall, the rate is the same. Therefore, this is not the cause of decrease of positive tests along the time. Even if the use of corticosteroids could be the cause of decrease of positive tests, control patients were not taking it, to explain the decrease of positive test among them.

The rate of positive PT differs from different countries. Beside possible differences in the disease pattern, the interpretation of positive test may differ also. There is no consensus on the characteristics of a positive test. In our opinion, the characteristics of a positive test, as stated in the “Patients and methods,” may lead to less error than the others in both ways, false positive and false negative.

Conclusion

Although the pathergy test has lost some of its sensitivity during the past 35 years, it has not lost its value as a diagnostic test, improving even many of its characteristics as the positive predictive value, positive likelihood ratio, and the diagnostic odds ratio. In a practical view, the chances of getting a positive test have decreased over the time. However, a positive test is rather synonym of Behcet’s disease, with a probability of 98.4% (specificity).