Introduction

Lupus nephritis (LN) is common in children with systemic lupus erythematosus (SLE) and is associated with poorer renal and non-renal outcomes than in affected adults [1, 2]. The role of the renal biopsy in prognosticating and guiding therapy is still unclear [3].

Several biopsy indices for LN exist. The recent International Society of Nephrology/Renal Pathology Society (ISN/RPS) classification for LN [4] is an extension of the previous World Health Organization (WHO) classification of LN [5], with a more systematic and standardized description of glomerular lesions. This new classification does include a footnote instructing the biopsy reader to note the grade of tubulointerstitial (TI) and vascular lesions in patients with diffuse LN, acknowledging evidence that changes in the TI compartment might be important for prognosis [69]; however, the focus and grading of the ISN/RPS classification remains on the glomerulus. The National Institutes of Health (NIH) index for LN, composed of the activity index (AI) and chronicity index (CI), has been well described in the literature on adults, but it focuses mainly on the glomerular compartment [10]. The NIH index has been only sporadically described in pediatric LN, and, thus, it is unknown to what extent this scoring system is useful in children [1114]. In 2000, Hill et al. derived a new LN biopsy scoring system, the “biopsy index” (BI), which includes a tubulointerstitial activity index (TIAI) [9]. They found that the TIAI score may help to predict renal prognosis. The use of this index has not been described or validated by other authors.

With increasing choices for the treatment of LN, the renal biopsy may play an important role in clinical decision making and in clinical trials of LN. Therefore, the NIH index and the recently derived BI should be systematically examined and described in childhood LN to determine if this improves our understanding of this disease.

The goals of this study were to (1) examine the clinicopathologic correlations and responsiveness to change of the NIH AI and CI, and a recently derived TIAI, in childhood LN, and (2) estimate the extent to which the TIAI adds to the evaluation of the renal biopsy in childhood lupus nephritis, relative to the NIH index.

Methods

Subject selection

This was a retrospective study that took place at the Montreal Children’s Hospital, Montreal, Canada, between July 2002 and September 2004. Ethics board approval for the research was obtained. Inclusion criteria were: (1) diagnosed with SLE according to American College of Rheumatology (ACR) criteria [15] at our institution up to 10 years prior to study start; (2) biopsy-proven LN and (3) aged <18 years at the time of first kidney biopsy.

Biopsy data

Patients with SLE who demonstrated evidence of renal disease (proteinuria, renal dysfunction or hypertension) underwent an initial kidney biopsy (herein referred to as biopsy 1) within 0.7 to 8 (median = 2.0) weeks of nephritis presentation. Some patients underwent a follow-up protocol surveillance biopsy (herein referred to as biopsy 2), at approximately 1 year after biopsy 1.

Renal biopsies were processed for light, electron and immunofluorescence (IF) microscopy using standard methods of fixation and staining. An acceptable biopsy specimen consisted of at least eight glomeruli. Biopsies were read prior to the publication of the most recent ISN/RPS WHO classification of LN, by one pediatric nephropathologist (C.B.), according to the 1995 WHO revised classification for LN [5]. These findings have been published in a previous manuscript [16]. Biopsies were re-scored by the same pediatric pathologist (C.B.) using the NIH index [10] and the BI for LN proposed by Hill et al. [9]; the pathologist was blind to the patients’ clinical status.

The NIH and BI indices are shown in Table 1 for comparison. As shown, the BI consists of a global activity index (GAI) and a chronic lesions index (CLI), which are almost identical in description and scoring to the NIH AI and CI, respectively. In addition, the BI contains a tubulointerstitial activity index (TIAI), and an immunofluorescence index (IFI, Table 1) [9]. We did not evaluate the IFI, because the quality of the IF staining varied due to variations in technique over time, and we felt that the measurement error would be too high. The items of the NIH index were scored on a scale of 0 (no lesions), 1 (lesions in 25% of glomeruli), 2 (lesions in 25–50% of glomeruli) or 3 (lesions in >25% of glomeruli), as previously described [10]; the items of the BI were scored on a scale of 0, 0.5 (lesions in 5–10% of glomeruli), 1, 2 or 3, as in the original publication [9] (Table 1). A total composite score was calculated for each index (AI, CI, GAI, CLI, TIAI). Because we did not score the IFI, we did not evaluate the total composite BI score (which is the sum of the GAI, CLI, TIAI and IFI).

Table 1 Renal pathology scoring systems of the NIH activity index (AI) [10], chronicity index (CI) and of the global activity index (GAI), chronic lesions index (CLI) and the tubulointerstitial activity index (TIAI) of the biopsy index (BI) [9] (PMN polymorphonuclear)

Clinical data

The following clinical and laboratory data on renal disease activity were collected retrospectively from the dates corresponding to biopsy 1 and biopsy 2 (during a period ranging from 1 week prior to biopsy to as late as the day of biopsy): 24-hour urine protein excretion (in milligrams per square meter body surface area per day); the need for non-angiotensin converting enzyme inhibitor or angiotensin receptor blocker anti-hypertensive medication (“need for BP meds”); estimated creatinine clearance (eCCl) by the Schwartz formula [17] (eCCl in milliliters per minute per 1.73 m2 body surface area  = k × height in centimeters/serum creatinine in micromoles per liter), using a previously published locally derived value of k = 41.5 [18] (divide serum creatinine and k by 88.4 for conventional units). The presence of hematuria was not included, because it was observed with both menstruation and nephritis and, therefore, was not a reliable marker of renal disease activity. Details on immunosuppressive treatment were recorded.

Statistical analysis

Descriptive statistics for continuous variables were expressed as mean (SD) and medians (interquartile ranges, IQRs). Categorical variables were expressed as proportions (in percent). Because of the extreme overlap between different segments of the NIH index and the BI, these differences/similarities were examined in detail. The intraclass correlation coefficient (ICC, with 95% confidence interval) of the sum scores of the five similar items of the AI and GAI was calculated, to evaluate their extent of agreement when scored in the setting of the NIH index and BI, respectively. Because the items glomerulosclerosis and glomerular scars of the CLI (from the BI), together generally describe the same features described by the item glomerulosclerosis of the CI (from the NIH index, Table 1), we directly evaluated the level of agreement of the CLI and the CI, by calculating the ICC.

The Kappa statistic (which measures level of agreement between categorical scales) was calculated between the similar items mononuclear cell infiltrate of the AI and interstitial inflammation of the TIAI (Table 1) and between the item glomerulosclerosis of the CI and the sum of the items glomerulosclerosis and glomerular scars from the CLI.

The biopsy index scores followed non-normal distributions. Therefore, non-parametric testing was used for comparisons. To assess the relationship between each of the biopsy index scores with each other, the Spearman correlation coefficient was calculated between each set of biopsy index scores. The relationship between the presence of WHO class IV LN at each biopsy and the other biopsy scores was evaluated, using the Mann–Whitney test. Because we knew from our previous study that lupus disease activity improved from biopsy 1 to biopsy 2 [16], we evaluated the responsiveness to change of each biopsy index by comparing biopsy 1 and biopsy 2 scores, using the Wilcoxon matched-pairs signed-rank test.

Clinicopathologic correlations of each biopsy index were evaluated by the following analyses:

  1. (1)

    The relationships between biopsy scores and clinical renal variables at biopsy 1 and biopsy 2 (eCCl; proteinuria; need for BP meds), were evaluated by Spearman correlation analysis and the Mann–Whitney test.

  2. (2)

    Biopsy 1 to biopsy 2 change scores of all the biopsy indices, proteinuria and eCCl were calculated by subtracting the biopsy 1 values from biopsy 2 values. Spearman correlation was used to evaluate whether change in biopsy score correlated with the change in clinical parameter. For these analyses, only patients who had undergone two biopsies were included.

Data were analyzed with the STATA statistical software package (Intercooled STATA 8.0; STATA Corporation; College Station, TX, USA).

Results

Patients’ characteristics

The characteristics of this group have been described in a previous manuscript [16]. Briefly, there were 25 patients [mean (SD) age 12.4 (2.7) years; 20 female, five male; 12 White, four Black, five East Asian, two Hispanic, two East Indian] who had undergone biopsy 1; 15 patients (13 female, two male) had undergone biopsy 2. Biopsy 2 was performed as part of a disease-monitoring protocol, at 9 months to 17 months after biopsy 1, except in one patient, who was initially followed at another center, for whom biopsy 2 was performed 6 years after biopsy 1.

At biopsy 1, patients had been diagnosed with SLE within the preceding 2 months and were receiving hydroxychloroquine/orally administered prednisone for approximately 2 weeks (one patient had been receiving hydroxycholoroquine and oral prednisone treatment for 2 months). Forty percent (n = 10) had WHO class IV LN, 36% (n = 9) had WHO class III LN, and 24% (n = 6) had WHO class II LN. The mean ± SD hemoglobin level was 108.6 ± 16.3 g/l (normal 130–160 g/l); serum C3 was 0.6 ± 0.3 g/l (normal 0.75–1.4 g/l), serum albumin was 35.0 ± 5.1 g/l (normal 31–48 g/l), anti ds-DNA binding level was 69.3 ± 31.8% (normal < 30%) and eCCl was 99.2 (31.0) ml/min per 1.73 m2 body surface area. Six patients (24%) required calcium channel blockers for high blood pressure. Only 21 of 25 patients had undergone 24-hour urine collection for proteinuria: 7/21 patients (33%) had ≥ 200 mg/m2 per day protein excretion; 5/21 patients (24%) had ≥ 500 mg/m2 per day protein excretion. Two patients were initiated on an angiotensin-converting enzyme inhibitor, and one patient was started on an angiotensin receptor blocker within 6 months of biopsy 1. We were unable to determine retrospectively whether angiotensin-converting enzyme inhibitors were initiated for proteinuria or for hypertension. After biopsy 1, patients were treated with prednisone/hydroxychloroquine (5/6 patients with WHO class II LN); hydroxycholoroquine/prednisone/azathioprine (1/6 patients with WHO class II LN, all nine patients with WHO class III LN, and 5/10 patients with class IV LN); or hydroxycholoroquine/prednisone per 6 months of intravenous, monthly, treatment with cyclophosphamide followed by azathioprine (5/10 patients with WHO class IV LN), depending on biopsy findings and clinical presentation. At biopsy 2, the patients demonstrated statistically significant improvement in hemoglobin, serum albumin, platelet count and DNA binding, and non-statistically significant improvements in eCCl and proteinuria [16].

Examination of differences between the NIH index and the BI

Within the BI, the GAI and CLI are almost identical to the AI and CI within the NIH index, except for the following differences (Table 1): (1) item 6 of the AI (mononuclear cell infiltrate) is not included in the GAI but is part of the TIAI and is called interstitial inflammation (item 7, TIAI); (2) an additional item, glomerular monocytes, defined as the presence of more than one monocyte/macrophage in the glomerular lumen is part of the GAI but is not part of the AI; (3) glomerular scars are in the CLI but are absent in the CI. The CLI items glomerulosclerosis and glomerular scars are, together, almost identical to the description of glomerulosclerosis in the CI, except that glomerular scars in the CLI specifies only segmental areas of solidification, while glomerulosclerosis in the CLI specifies total glomerular obsolescence. The TIAI provides a detailed description of tubular cell and luminal findings and interstitial inflammation (Table 1).

Correlation of similar portions of the NIH index and BI

Biopsy 1 specimens contained 19.8 ± 12.4 glomeruli (median = 16, range 8 to 51 glomeruli) and biopsy 2 specimens contained 18.3 ± 7.1 glomeruli (median = 21, range 8 to 31 glomeruli). There was no statistically significant difference in the number of glomeruli per biopsy specimen between patients with WHO class II, III or IV LN (P > 0.05, Kruskal–Wallis test).

The ICCs (95% CI) of the following pairs of biopsy index components at biopsy 1 and biopsy 2 were calculated: the sum of the first five items of the AI vs the sum of the first five similar items of the GAI (biopsy 1, ICC = 0.997, 95% confidence interval = 0.994–0.999; biopsy 2, ICC = 0.998, 95% confidence interval = 0.993–0.999); and the CI vs the CLI scores (biopsy 1, ICC = 0.994, 95% confidence interval = 0.987–0.997; biopsy 2, ICC = 0.981, 95% confidence interval = 0.947–0.994). The agreement between the similar items mononuclear cell infiltrate of the AI vs interstitial inflammation of the TIAI was 1.0 (100% agreement, kappa = 1.0) at both biopsy 1 and biopsy 2. The agreement between the item glomerulosclerosis from the CI and the summed items glomerular scars and glomerulosclerosis from the CLI was very high (96% and 73% at biopsy 1 and biopsy 2, with respective kappa statistics of 0.82 and 0.62, P < 0.001). Owing to the very high levels of agreement, in scoring of the items similar to both scoring systems (AI vs GAI, CI vs CLI) only the NIH AI and CI are reported for future analyses.

Association between the biopsy indices

We determined how each of the AI, CI and TIAI indices correlated with each other. The NIH AI score was statistically significantly correlated with the TIAI at biopsy 1 (Spearman r = 0.76, P = 0.0001), but less so at biopsy 2 (Spearman r = 0.52, P = 0.05). Correlation of the AI and the TIAI was similar, even when the item interstitial inflammation from the TIAI (similar to the AI item mononuclear cell infiltrate) was removed (not shown). There was moderate correlation between the AI and CI at biopsy 1 (Spearman r = 0.41, P = 0.04) and at biopsy 2 (Spearman r = 0.39, p = 0.11). There was no significant correlation between the CI and the TIAI at biopsy 1 or biopsy 2 (not shown).

Figure 1 displays the median AI, CI and TIAI scores of patients with WHO class IV LN vs those with WHO class II or III LN. At biopsy 1 and biopsy 2, patients with WHO class IV LN had significantly worse AI [biopsy 1, medians (IQR) = 15.5 (4.0) vs 2.0 (6.0); biopsy 2, 14.0 (10.0) vs 1.5 (1.5), Fig. 1a] and TIAI [biopsy 1, 7.0 (1.0) vs 3.0 (3.0); biopsy 2, 9.0 (4.0) vs 4.5 (3.0), Fig. 1c] scores than those without WHO class IV LN. At biopsy 2, patients with WHO class IV LN tended to have substantially worse CI scores [6.5 (7.0) vs 2 (2.0), Fig. 1b); while this difference may be clinically significant, it did not achieve statistical significance (P = 0.1).

Fig. 1
figure 1

Histograms comparing median AI (a), CI (b) and TIAI (c) scores between patients with WHO class IV LN (black bars) vs those without WHO class IV LN (gray bars), at biopsy 1 (left) and biopsy 2 (right). *P < 0.05, **P < 0.0005

Responsiveness to change of the biopsy scores

Figure 2 displays the median AI, CI and TIAI biopsy scores at biopsy 1 and biopsy 2. The AI scores improved from biopsy 1 to biopsy 2 [median (IQR) = 8.0 (13.0) vs 2.0 (6.0), P < 0.05], whereas the CI scores worsened [1.0 (2.0) vs 2.0 (3.0), P < 0.05]. There was no statistically significant change in TIAI score [6.0 (4.0) vs 5.0 (4.0), P > 0.05]. Figure 3 demonstrates that individual subjects’ AI and CI scores changed systematically from biopsy 1 to biopsy 2, generally improving and worsening, respectively. Conversely, although individual subjects did have a change in score of TIAI from biopsy 1 to biopsy 2, the changes did not follow any particular pattern (some improved and some worsened, Fig. 3).

Fig. 2
figure 2

Change in AI (diamond), CI (circles) and TIAI (triangles) from biopsy 1 to biopsy 2 (n = 15) for patients who had undergone two biopsies. *P < 0.05 for change in biopsy index score from biopsy 1 to biopsy 2. AI activity Index, CI chronicity Index, TIAI tubulointerstitial activity index

Fig. 3
figure 3

In each graph (panel a: Activity Index; panel b: Chronicity Index; panel c: Tubulointerstitial Activity Index), the biopsy 2 scores are plotted against the biopsy 1 scores, and the line of unity is the solid line. Dots plotted below the line of unity represent subjects who had an improvement (decrease) in biopsy score from biopsy 1 to biopsy 2. Those above the line of unity represent worsening scores. Values plotted in the top left and bottom right corners represent the most drastic changes in biopsy score from biopsy 1 to biopsy 2, and values around the line of unity represent smaller changes

Clinicopathologic correlations

At biopsy 1 and biopsy 2, higher AI score correlated with higher 24-hour urine protein excretion (biopsy 1: r = 0.81, P < 0.0005; biopsy 2: r = 0.90, P < 0.005). The TIAI scores also correlated significantly with proteinuria at both biopsies, but less strongly (biopsy 1: r = 0.71, P < 0.0005; biopsy 2: r = 0.66, P < 0.05). The CI scores did not correlate significantly with proteinuria at either biopsy period (r in the range of 0.3, P > 0.05). At biopsy 1, higher AI, CI and TIAI scores correlated modestly with lower eCCl (r = −0.41, −0.46 and − 0.42, respectively, all P < 0.05), whereas at biopsy 2, only the CI score was significantly correlated with eCCl (r = −0.58, P < 0.05). The need for BP meds was significantly associated with worse AI and TIAI scores at biopsy 1 (P < 0.05, Mann–Whitney test, not shown), but not at biopsy 2. The CI was not associated with a need for BP meds at either of the biopsy periods.

Improvement of 24-hour urine excretion from biopsy 1 to biopsy 2 was significantly correlated with improvement of AI score (r = 0.85, P < 0.02), but not with changes in the CI score or change in TIAI score. Change in eCCl from biopsy 1 to 2 did not correlate with changes in either of the biopsy index scores (not shown).

Discussion

The NIH index has been reported in the LN literature for over 20 years. Although controversial [3], the literature suggests that higher AI and CI scores may predict poor renal outcome [10, 1921]. The NIH index has not been systematically examined in pediatric LN. Therefore, it is difficult to know whether this biopsy index adds to the clinical care of children with LN. Given the current trends and controversies regarding aggressiveness, length and types of treatment provided for patients with LN [2225], the role of the renal biopsy may become more important in helping to predict who will be more likely to relapse, who will benefit from specific immunosuppression therapies, and who will be least likely to benefit from aggressive therapy. Thus, it is important for us to examine critically the clinicopathologic correlations of the NIH AI and CI in children with LN.

We found that the AI and CI were not highly correlated with each other, suggesting that they provide different information about the kidney biopsy. They were both responsive to change, which is a desirable feature of a biopsy index, with the AI score substantially improving (after 1 year of immunosuppressive treatment) and the CI score worsening (presumably due to chronic damage) between biopsies. At initial biopsy, early in the disease, those with WHO class IV LN had much higher AI scores (both being measures of glomerular cellular activity), whereas, at follow-up biopsy, subjects with persistent WHO class IV LN had both higher AI and CI scores. These findings support the validity of the NIH index for describing pediatric LN biopsies. The clinicopathologic correlations of the AI and CI were also what would be expected. The AI correlated with proteinuria at initial and follow-up biopsy, and an improved AI mirrored a decrease in proteinuria. Conversely, higher CI scores only correlated with eCCl at follow-up biopsy, which might be expected if chronic damage had occurred. Our patients were mostly biopsied very early in their disease, which might explain why the correlation of the CI score and the eCCl was not as strong at initial biopsy.

The BI has yet to be reported or validated by other authors. As we suspected, the GAI and CLI components of the BI were extremely highly correlated, in magnitude and direction, with the AI and CI scores, respectively. Thus, we conclude that there is no reason to suggest using them. The only reason that this would not be true is if there were some benefit to using a composite BI score. However, we question the usefulness of aggregating scores from components that describe different renal compartments because this assumes that information from each compartment is comparable and additive, which is unlikely to be true. For example, a patient with very significant chronic changes may be more likely to develop stage 5 chronic kidney disease within a few years but may not have active inflammation at the time of biopsy. Therefore, although such a patient’s CLI might be fairly high, the GAI and, thus, the BI score, would be, at most, moderately elevated. Thus, we believe that individual components should be examined separately.

The significance of the TI compartment in LN disease has been a topic of much interest, and, for many years, authors have suggested that examination of the TI compartment might be useful for predicting outcome in LN [6, 26]. Recently, studies of urinary biomarkers in LN kidney injury suggest that biomarkers may play a role in diagnosis and risk stratification of kidney damage [24, 2729]. Presumably, biomarkers of tubular injury would correlate more highly with a TI index than a glomerular index [28, 29]; but whether this is true is unknown. Our findings on the use of the TIAI were somewhat contradictory and open to interpretation. The TIAI scores correlated moderately with AI scores at initial biopsy, and were higher in subjects with WHO class IV LN at both biopsies, suggesting that the index will be positive when there is active, inflammatory disease. The fact that it was not perfectly correlated with the AI at biopsy 1 and slightly less correlated at biopsy 2, might indicate that the TIAI is providing different information, on repeat biopsy, from the AI. The clinicopathologic correlations of the TIAI were similar to, but slightly less strong than, those of the AI. Conversely, the fact that the TIAI scores did not change from biopsy 1 to biopsy 2 was concerning, because it implied that this measure could not be used to follow changes in disease activity. However, when we specifically examined individual subjects’ changes in TIAI scores from biopsy 1 to biopsy 2, we observed that individuals did, in fact, have substantial changes in TIAI scores, but that on average, the change approached zero. Thus, it is possible that worsening of TIAI may have a predictive role in a particular subgroup of patients. We did not have a large enough sample to evaluate this further, and this should be a topic for future research.

A limitation of this study was the small sample size, which made it impossible to evaluate the effect of initial biopsy scores on outcomes. There may not have been enough power to detect statistical significance of some correlation coefficients. Thus, future studies should confirm, and elaborate on, our findings in larger samples. Our population was predominantly White, with only four Black children, which limits the generalizability of results to most United States lupus patient populations. In addition, we did not perform biopsy readings using the most recent International Society of Nephrology/Renal Pathology Society (ISN/RPS) classification for LN [4]. However, the focus of our work was to evaluate the use of the NIH index and the TIAI. Pediatric studies should be performed with the aim of evaluating the extent to which the new ISN/RPS classification provides additional information to pediatric lupus biopsies. We were unable to perform an assessment of reliability and reproducibility of the biopsy scoring systems. Our findings that clinicopathologic correlations were similar at biopsies 1 and 2 does suggest that biopsy interpretation was performed in a consistent manner, especially given that the pathologist was blind to the patients’ clinical status. However, given the controversy in the literature as to inter- and intra-observer reliability of the NIH scoring system [3032], future pediatric studies should evaluate these characteristics of the NIH index and the TIAI.

In conclusion, our study supports the systematic use and further study of the AI and CI of the NIH Index in future pediatric LN studies. Future research should evaluate their prognostic utility at follow biopsy. The TIAI offers researchers the opportunity to have a consistent and standardized description of tubulointerstitial disease in LN, for both adults and children, and special consideration should be made to reporting on the TI compartment in LN.