Introduction

Atopic dermatitis (AD) is a heterogeneous [1, 2] and highly symptomatic disease. Itch is the most common and burdensome symptom of AD [3]. In addition, multiple other symptoms occur in AD, including sleep disturbances [4,5,6,7,8], anxiety and depression [9,10,11,12], cognitive dysfunction [13], each with varying levels of intensity, frequency, and duration, and burden.

Skin-pain was recently identified as a common and burdensome symptom in AD of multifactorial etiology [14, 15]. Skin-pain was the most or second most burdensome symptom in 3.8% and 8.2% of United States adults with AD; 8.0% and 7.4% of adults with moderate and severe AD reported pain to be their most burdensome symptom in AD [3]. AD patients who report having skin-pain use a variety of descriptors of their pain, including burning and stinging [15]. In contrast with itch, for which there are multiple measures that were previously validated in AD [16,17,18,19,20,21,22], a few outcome measures of pain were examined in AD. In this study, I sought to evaluate the construct validity, responsiveness, floor/ceiling-effects, reliability, and interpretability of a novel numeric rating scale (NRS) for skin-pain in adults with AD, and compare its performance to an existing validated NRS for pain that is not attributed to skin [23].

Methods

Study design

A prospective, dermatology practice-based study of adults (≥ 18 years), male or female, with AD as defined by the Hanifin and Rajka diagnostic criteria [24] was performed as previously described [19]. Surveys were administered between June, 2017 and February, 2019. The study was approved by the institutional review board of Northwestern University. Informed consent was obtained electronically by patients.

Outcome measures assessed

Self-administered questionnaires were completed by patients of the eczema clinic at an academic medical center prior to their encounter. Patients were assessed with full-body skin-examination by a dermatologist. The patient-reported clinician-reported outcome measures (ClinROMs) examined are reported in Supplemental Methods.

Statistical analysis

Data analysis was performed as previously described [19]. Summary statistics were estimated for baseline population characteristics. Convergent and divergent validity of NRS skin-pain and average overall-pain was established using Spearman correlations with each other and with other PROMs and ClinROMs at baseline. Correlation coefficients scores of ≥ 0.70 or ≤ − 0.70 were considered strong, 0.40–0.69 or − 0.69 to 0.40 moderate, 0.10–0.39 or − 0.39 to 0.10 weak [25]. We hypothesized that the NRS pain assessments would have strong correlations with each other, weaker correlations with other PROMs of AD severity, and weakest correlations with ClinROMs of AD severity. We hypothesized that skin-pain is an important predictor of AD severity, with significant and stepwise differences of pain assessments with each level of AD severity. Criterion validity was determined by comparing NRS pain levels across each level of patient-reported AD severity using Kruskal–Wallis test at baseline. There is no gold-standard assessment for AD severity. Thus, we compared pain assessments with self-reported global AD severity to examine criterion validity. Floor- or ceiling-effects of total scores and individual items were considered present if 15% of responses fell in the lowest or highest scores [26, 27].

Responsiveness of scores was determined using Spearman correlations between change from baseline and follow-up visit for NRS skin-pain and NRS average overall-pain with change of each other, other PROMs and ClinROMs. We hypothesized that the changes of pain assessments would have strong correlations with each other, weaker correlations with changes in other PROMs of AD severity, and weakest correlations with changes of ClinROMs of AD severity.

Determination of interpretability bands and thresholds for meaningful clinically important difference (MCID) of NRS skin-pain and average overall-pain are presented in Supplemental Methods.

Test–retest reliability was assessed by intraclass correlation coefficient (ICC) and 95% CI, using a mixed-effects model for absolute agreement among patients NRS pain score ≥ 1 at baseline and who had no change of patient-reported global AD severity or VRS average-pain between the baseline and follow-up visits. ICC < 0.50 were considered poor, 0.50–0.74 moderate, 0.75–0.89 good, and ≥ 0.90 excellent [28].

Statistical analyses were performed using SAS version 9.4.3 (SAS Institute, Cary, IN). Missing values were encountered in < 5% of respondents for all analyzed variables. Complete case analysis was performed, i.e., respondents with missing values were excluded from analysis. A two-sided P value of 0.05 was considered statistically significant.

No formal power calculation was performed. However, a sample size of > 100 participants per analysis was recommended as sufficient for validation studies [29].

Results

Patient characteristics

Overall, 463 adults (ages 18–97 years) were included in the study; 412 of whom had a follow-up visit, with mean ± std. dev. follow-up visit duration of 0.4 ± 0.5 years (maximum = 1.2 years). The patient cohort included 276 females (64.0%) and 279 self-reported Caucasian/white (60.3%), with a mean ± std. dev. age at enrollment of 43.1 ± 18.4 years. Baseline characteristics of AD severity are presented in Table 1.

Table 1 Subject characteristics (n = 463)

Overlap of NRS skin-pain and NRS average overall-pain

More patients endorsed average overall-pain (n = 344, 74.3%) and then skin-pain (n = 205, 44.6%), with a subset reporting having at least some skin-pain and average overall-pain (n = 184, 39.7%). Most patients reported less (n = 194, 41.9%) or equally (n = 143, 30.9%) severe skin-pain than average overall-pain; 126 (27.2%) reported skin-pain being more severe than average overall-pain. That is, the two NRS questions performed differently from each other, and there was only modest overlap between question responses.

Concurrent and construct validity

Numeric rating scales skin-pain had moderate correlation with NRS average overall-pain (Spearman correlation, rho = 0.53) (Fig. 1). NRS skin-pain had moderate correlations with Patient-Oriented Eczema Measure (POEM) scores, NRS worst-itch and Dermatology Life Quality Index (DLQI), and weak correlations with NRS average-itch, Scoring AD (SCORAD), objective-SCORAD, Eczema Area and Severity Index (EASI), Validated Investigator’s Global Assessment * body surface area (vIGA-AD*BSA), and Rajka-Langeland (Fig. 1).

Fig. 1
figure 1

Spearman correlations between baseline scores and change of scores over time for assessments of pain, atopic dermatitis severity, and quality of life. Spearman rho are presented for correlations of assessment scores at baseline and change of scores from baseline at follow-up. Values are presented using a color-gradient from dark green (lowest) to dark red (highest).# Spearman correlation of scores at baseline. ## Spearman correlation of the change of scores from baseline at follow-up

Numeric rating scales average overall-pain had similar correlations as NRS skin-pain with EASI, vIGA-AD*BSA, Rajka-Langeland, SCORAD, objective-SCORAD, and DLQI, but lower correlations with NRS worst-itch, NRS average-itch, and POEM.

Criterion validity

There were significant and stepwise increases of NRS skin-pain and average overall-pain scores at each level of severity for patient-reported global severity (Wilcoxon rank-sum test, P < 0.0001) (Fig. 2a, b). Similarly, there were increases of NRS average overall-pain scores but no skin-pain at each level of patient-reported global severity (P = 0.0005).

Fig. 2
figure 2

Criterion validity of NRS for skin-pain and average overall-pain scales. Box–Whisker plots and overlaid jitter plots of NRS for skin-pain (a), and NRS for average overall-pain across all patients (b), and among those with no skin-pain (c) stratified by patient-reported global AD severity

Responsiveness

Changes from baseline in NRS skin-pain were weakly-to-moderately correlated with changes of NRS average overall-pain, POEM, and vIGA-AD*BSA, but only weakly correlated with NRS worst-itch and average-itch, EASI, Rajka-Langeland, SCORAD, and objective-SCORAD (Fig. 1). Changes from baseline of NRS average overall-pain had overall similar correlations as NRS skin-pain.

Floor- or ceiling-effects

The proportions of patients with lowest values for NRS skin-pain (55.7%) and average overall-pain (25.7%) were above 15%, indicating that there were floor-effects. The proportions of patients highest values for NRS skin-pain (2.6%) and average overall-pain (1.3%) indicated that there were no ceiling-effects.

Interpretability

The distribution, mean, median, and mode of VRS average-pain for each level of NRS skin-pain and NRS average overall-pain are presented in Supplemental Table 1.

Based on assessment of mean, median, and mode NRS skin-pain values, lower thresholds of 1 were identified for mild, 5 for moderate, 7 for severe, and 10 for very severe pain were identified (weighted kappa = 0.4921). However, previously established thresholds for NRS-itch [30] had a the highest concordance (weighted kappa = 0.4923).

For NRS average overall-pain, a higher threshold of 2 was identified for mild, and 4 was identified for moderate average overall-pain. This strata was tested and found to have the highest weighted kappa coefficient (0.5276) (Supplemental Table 2).

Smallest detectable and meaningful change

The SDC for NRS skin-pain and average overall-pain were 0.5 and 0.7, respectively. The thresholds for MCID for NRS skin-pain were similar based on anchors of a 1-point improvement of patient-reported global severity (− 2.7), ≥ 3.4-point improvement of POEM (− 2.9), ≥ 6.6-point improvement of EASI (− 2.2), and 1-point improvement of VRS average-pain (− 2.5), average-pain (− 2.8), and vIGA-AD success (− 2.5) (Table 2). Whereas, the thresholds for MCID for NRS average overall-pain were consistently lower, ranging from − 1.3 to − 1.7 points.

Table 2 Thresholds for meaningful change for NRS skin-pain and average overall-pain at follow-up from baseline

Reliability

Among patients who reported baseline NRS pain ≥ 1 and had no change of patient-reported global AD severity as the anchor (n = 83), the ICC [95% CI] for NRS skin-pain was 0.50 [0.35–0.62] and that for NRS average overall-pain was 0.64 [0.49–0.74], indicating moderate reliability. Similar results were found using no change of VRS pain as the anchor (n = 68), and the ICC [95% CI] for NRS skin-pain was 0.72 [0.64–0.79] and that for NRS average overall-pain was 0.83 [0.78–0.88], indicating moderate-to-good reliability.

Discussion

This study demonstrated that NRS skin-pain and NRS average overall-pain had similar measurement properties overall, with good concurrent validity, divergent validity, discriminant validity, good-to-excellent reliability, and fair-to-good responsiveness. NRS skin-pain showed a better reliability than NRS average overall-pain. Both measures showed floor-effects. Although, this may not be a shortcoming of the NRS skin-pain or average overall-pain. Rather, skin-pain occurs only in a subset of AD patients, which differs from itch that occurs in all AD patients. Moreover, both measures showed divergent validity, as judged by weak-to-moderate correlations with other AD severity domains. Together, the results indicate that skin-pain is a distinct symptom of AD that is not merely related to itch. Indeed, a previous study of adults in the US found that skin-pain in AD is heterogeneous, with approximately half (48%) of skin-pain reported that pain occurred only after frequent scratching, whereas 42% reported intermittent pain, and 11% reported constant pain throughout the day. Moreover, AD pain was most commonly associated with open areas caused by scratching (27%) and fissures in the skin (27%), followed by inflamed red skin (25%), with only a minority reporting pain mostly caused by burning from creams or ointments (10%). The present study found that NRS skin-pain is simple, time-efficient, easy to interpret, and inherently feasible as a single-item, and may be integrated into clinical practice in conjunction with other assessments of AD symptoms, signs, and QOL.

Numeric rating scales for skin-pain and average overall-pain appear to be distinct measures that are not interchangeable. First, they only had moderate correlations with each other. Second, there were three distinct subsets of overlap between the measures. Many had the same scores for skin-pain and average overall-pain. Though, the largest subset reported more severe average overall-pain than skin-pain, which may be attributable to pain from extra-cutaneous sources. Furthermore, there was an increase in NRS average overall-pain scores with more severe AD, even among those who reported having no skin-pain. Increased extra-cutaneous pain may be due to central mechanisms, particularly centralized sensitization and hyperalgesia after prolong afferent pruriceptive triggers [31, 32]. Extra-cutaneous pain could also be due to comorbid systemic disorders that are associated with pain. Future studies are needed to confirm whether AD is associated with extra-cutaneous pain, and if so why.

The results of this study are consistent with a previous study showing good validity and reliability of an NRS for average skin-pain or soreness using a 24-h recall period [33]. However, there were some notable differences between studies. First, that study included only 74 participants from multiple centers/regions, including adolescents and adults who had inadequate response to topical and/or systemic AD therapy [33]. Whereas, this study included a much larger cohort of adults only, from a single-center, regardless of their previous AD treatment history. Weaker correlations were observed for AD severity with NRS skin-pain and average overall-pain in this study compared to that other study. These differences may be attributable to different pain-descriptors (average vs. worst pain) used in the questions, as well as differences of cohort characteristics. In addition, the NRS used in that study used a 24-h recall period. Whereas, this study used a 7-day recall period based on previous qualitative research that found 7-days to be the optimal recall period for assessing severity of itch and other AD symptoms in clinical practice [19]. While a 7-day recall period may not capture the day-to-day fluctuations of pain, it better integrates the patient-experience over an extended period. Finally, this study used a two-step process to assess skin-pain severity, with a yes/no question to screen for skin-pain in the past week, followed by the NRS for severity of pain. For future use, this could be simplified by combining the questions.

To my knowledge, this is the first published study to develop interpretability bands and threshold for meaningful improvement for NRS skin-pain and average overall-pain in AD. The optimal thresholds for moderate and severe skin-pain were the same as previously identified thresholds for itch [19]. The optimal thresholds for mild pain were slightly different for skin-pain (1) and average overall-pain (2). Nevertheless, the optimal interpretability bands for NRS skin-pain and itch (0/1–3/4–6/7–9/10) also performed well for NRS average overall-pain. Thus, it may be more practical to use the same thresholds for both pain measures. The optimal thresholds for meaningful improvement of NRS skin-pain and NRS average overall-pain were consistently a 2-point and 1-point reduction from baseline across all anchors used. This threshold for clinically meaningful improvement of NRS pain is similar to the threshold previously observed for NRS-itch [19]. Based on these results, I recommend using a 2-point reduction of skin-pain to identify clinically meaningful improvement of skin-pain in clinical practice and trials of AD. In addition, we found that the SDC for NRS skin-pain and average overall-pain were lower than all corresponding MCID estimates, indicating that the MCID are meaningful and able to be measured beyond measurement error.

Skin-pain is a clinically relevant and burdensome symptom in AD patients [15, 34]. Skin-pain was one of several key predictors of how patients described the severity of their AD [15]. The results of this study and previous studies [15, 34] indicate that skin-pain is not adequately reflected by other PROs and ClinROs for itch and AD severity. Thus, skin-pain may represent a distinct symptom domain to be considered for inclusion in core outcome sets for studies and clinical practice of AD. Neuro-immune interactions, and peripheral and central sensitization processes may shape the dual burden of itch and pain in AD. However, several important aspects of skin-pain are not well elucidated in AD, including the longitudinal course of skin-pain, how skin-pain responds to various AD therapies, and whether skin-pain persists in some patients despite resolution of itch. A better understanding of these points is needed before establishing skin-pain as an additional symptom to be included in the core outcomes set of AD.

This study has several strengths, including good representation across gender, race/ethnicity and AD severity, testing of multiple pain assessments, and use of multiple PROMs and ClinROMs when examining the psychometric properties. However, there are some limitations. Patients were recruited from a single academic center, which may limit generalizability. We did not assess content validity of the two pain measures. Future studies are needed to address these points.

In conclusion, both skin-pain and non-skin-pain are common in adults with AD, and worse with more severe AD. NRS skin-pain and average overall-pain were found to have good divergent validity, responsiveness, reliability, and feasibility in the assessment of pain in adults with AD in clinical practice, though ceiling-effects were observed. However, NRS overall-pain had poorer measurement properties and measured a different construct than NRS skin-pain. Thus, we recommend specifically measuring NRS skin-pain in AD. NRS skin-pain may be incorporated into the assessment of AD patients and provides important information about the severity of AD symptoms that can guide therapeutic decision-making.