Introduction

Henoch–Schönlein purpura (HSP) is the most common form of vasculitis in children [1]. Although the prognosis is generally good, severe nephritis (HSN) remains the major cause of morbidity and mortality among children with HSP [2]. In a systematic review of 12 studies involving 1133 unselected patients with HSP, 34.2% of the patients were found to have had renal involvement [3]. The outcome and severity of HSN is difficult to predict due to the wide variability in its signs and symptoms [4]. A kidney biopsy has therefore remained the gold standard for evaluating the severity of HSN and the associated prognosis.

The classical grading system for renal biopsies in cases of HSN is the International Study of Kidney Disease in Children (ISKDC) classification [5], which is based mainly on the presence and number of affected glomeruli. However, the ISKDC classification has been criticized on the grounds that it ignores tubulointerstitial and vascular changes [4, 6]. This has led to the introduction of various semiquantitative classifications. We have developed a practical and sensitive histological semiquantitative classification (SQC), as used by Ronkainen et al. for the evaluation of immunoglobulin A (IgA) nephropathy [7].

The aim of the study reported here was to evaluate the feasibility of the modified SQC in cases of HSN. To this end, we compared the ISKDC and SQC classifications for their ability to predict the clinical outcome in a cohort of HSN patients. Clinical variables at the time of the biopsy were also evaluated relative to patient outcomes.

Methods

The study population consisted of patients recruited for our previous HSN projects at Helsinki University Hospital (2000–2010) and Oulu University Hospital (1985–2005) and for a nationwide HSN cohort study (1999–2006) [8] (Electronic Supplementary Material Table S1). A total of 53 patients (24 boys, 29 girls) with biopsy-proven HSN and aged <17 years at the time of diagnosis were identified from our patient register. Their medical histories and laboratory results were retrieved and analysed retrospectively from the onset of HSP until the latest control visit, with a focus on evaluating the clinical course of the disease over time in terms of symptoms, laboratory findings, treatment administered and resolution of symptoms achieved. We also collected information on the duration of time elapsing from the first symptoms to the first biopsy and from the biopsy to the initiation of possible treatment. Renal function was evaluated by calculating the estimated glomerular filtration rate (eGFR) using the Bedside Schwartz equation [9] or the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation for patients aged >18 years [10]. Proteinuria was defined as urine protein excretion of >200 mg/24 h measured from 24-h urine collection (dU-Prot) or calculated from a spot urine sample using estimated protein excretion [11]. Haematuria was defined as >5 red blood cells (RBCs)/high-power field, >20 RBCs/10E6/L, or a positive dipstick test (+ to +++).

The outcome for each patient was assessed at the time of the last control visit, using a modified version of the grading system reported by Goldstein et al. [12] and Ronkainen et al. [13]. The clinical outcome was graded as: A = no signs of renal disease in laboratory tests and normal blood pressure; B = minor urinary abnormalities (microscopic haematuria and/or protein/creatinine ratio of 21–200 mg/mmol) or anti-proteinuric/anti-hypertensive medication in use, normal blood pressure and GFR; C = protein/creatinine ratio of ≥200 mg/mmol or hypertension [blood pressure (BP) > 160/95 mmHg] or immunosuppressive medication in use; D = reduced renal function (GFR <60 ml/min/1.73 m2). Grades A and B were both considered to be favourable outcomes but were analysed separately while grades C and D were considered to be unfavourable outcomes and were analysed together as group C + D due to the small number of patients in each group (3 and 5 patients in groups C and D, respectively).

The clinical characteristics at the time of the biopsy in all patients and in different outcome groups separately are presented in Table 1. At the end of the follow-up (median 7.3 years), there were 27 patients (51%) in outcome group A, 18 (34%) in group B, three (6%) in outcome group C and five (9%) in outcome group D. One patient in group C was hypertensive despite being treated with two anti-hypertensive drugs, one patient had nephrotic-range proteinuria and one was receiving cyclosporine (CyA) due to persistent proteinuria. One patient in group D had died of an HSN-related hypertensive crisis [14], two had undergone kidney transplantations (14.4 and 5.7 years after the initial diagnosis) and two had developed renal insufficiency (estimated GFR 37 and 45 ml/min/1.73 m2).

Table 1 Clinical characteristics of the patients at the time of the biopsy

Of the 53 patients enrolled in the study, 38 (72%) had received immunosuppressive treatment based on kidney biopsy findings. Briefly, most patients with ISKDC grade ≥ III and some patients with ISKDC grade II were treated with methylprednisolone pulses followed by oral prednisolone, CyA, cyclophosphamide, azathioprine or mycophenolate mofetil in various combinations. Angiotensin-converting enzyme inhibitors (ACEs) and angiotensin-receptor blockers (ARBs) were typically used as additional treatments to control proteinuria. Some patients with ISKDC grade II and with ISKDC grade III and non-nephrotic proteinuria received only ACEs and/or ARBs. All patients with an ISKDC grade higher than III had received immunosuppressive treatment. The ISKDC grades refer to the re-evaluation, but the treatment decisions were based on the original biopsy report. Oral prednisolone or prednisone for extrarenal symptoms was prescribed in 26 cases (49%) and ACEs and/or ARBs in 45 cases (85%).

All of the kidney biopsy samples were re-analysed by experienced renal pathologists (P.H. and J.L. at Helsinki University Hospital and H.A.-H. at Oulu University Hospital), who were blinded to the patients’ histories. The biopsies were then classified using both the ISKDC classification (Table 2) and the modified SQC (Table 3) in which the glomerular, tubular, interstitial and vascular findings were scored separately, giving a maximum (total biopsy) score of 26 points. The SQC can also be divided into an activity index (maximum 9 points), chronicity index (maximum 16 points) and focal or diffuse mesangial proliferation (0 points for focal, 1 point for diffuse). In addition, a tubulointerstitial index (combining all tubular and interstitial parameters from the SQC, including both active and chronic changes, maximum 5 points) was calculated. Inter-rater reliability (IRR) was calculated from a randomly chosen subset of ten biopsies which were evaluated and scored with SQC by renal pathologists.

Table 2 The grading system of the International Study of Kidney Disease in Children classification for renal biopsies in cases of Henoch–Schönlein nephritis
Table 3 Histological scoring system of the modified semiquantitative classification for renal biopsies in cases of Henoch–Schönlein nephritis

Statistical analyses

The statistical analyses were performed using IBM SPSS for Windows, version 22 (IBM Corp, Armonk, NY). In addition, areas under the curve (AUC) were compared using Stata 12.1 (StataCorp LP, College Station, TX) in the form of the Stata user-written module [15]. A total of 5000 bootstrap resamples were drawn to estimate the 95% confidence intervals (95% CI) of the AUC for histological classification systems and their AUC difference. For the purposes of AUC analysis, outcome grades A and B were coded as non-diseased and grades C and D as diseased. Youden indices (optimal cut-off points when sensitivity and specificity are given equal weight) were calculated from receiver operating characteristic (ROC) curves [16]. Histological classifications were also compared using logistic regression and reported in terms of odds ratios, profile likelihood CIs and Akaike Information Criterion (AIC) (lower AIC values indicate better model fitting). IRR was assessed using two-way mixed, absolute agreement intra-class correlation coefficients (ICC) . Cut-offs for ICC are as follows: poor for <0.4, fair for 0.4 – 0.59, good for 0.6–0.74 and excellent for 0.75–1.0 [17]. Continuous variables with a normal distribution are reported as means with standard deviation, and data that did not show a normal distribution were reported as medians with their interquartile range (IQR). Categorical variables are presented as numbers and percentages. Missing values are treated as missing in the analyses. Comparisons of multiple groups were performed with the Kruskal–Wallis test, and if a difference was found among the groups tested, a post hoc analysis was performed with the Mann–Whitney U test using the Bonferroni correction and exact p values. Comparisons of categorical variables were performed with Fisher’s exact test and also presented with relative risk (RR) and 95% CI data. Statistical significance was set at p < 0.05.

Results

The distribution of the ISKDC grades and SQC scores with respect to the four outcome groups are presented in Tables 4 and 5, respectively . The total biopsy score (p = 0.001), activity index (p = 0.003), chronicity index (p = 0.030) and tubulointerstitial index (p = 0.022) differed significantly between the three outcome groups (A, B, C + D; Table 5; Fig. 1). According to the post hoc analysis, the total biopsy score and activity index were significantly higher in outcome group C + D than in group A (Bonferroni adjusted p <0.001 for the total biopsy score and p = 0.001 for the activity index) or group B (Bonferroni adjusted p = 0.004 for the total biopsy score and p = 0.008 for the activity index). The corresponding differences in the chronicity and tubulointerstitial indices were significant only between group C + D and group A (Bonferroni adjusted p = 0.009 for chronicity index and p = 0.013 for tubulointerstitial index), but not between group C + D and group B (Bonferroni adjusted p = 0.37 for chronicity index and p = 0.33 for tubulointerstitial index). There were no statistically significant differences in any of the four biopsy categories of SQC between outcome groups A and B (Bonferroni adjusted p >0.99 for the total biopsy score, p > 0.99 for the activity index, p > 0.99 for the chronicity index and p = 0.71 for the tubulointerstitial index). The median number of glomeruli was 22 (IQR 12 – 31).

Table 4 International Study of Kidney Disease in Children(ISKDC) grades in all patients and separately according to long-term renal outcomes
Table 5 Semiquantitative classification (SQC) scores in all patients and separately according to long-term renal outcomes
Fig. 1
figure 1

Box-plots of all semiquantitative classification biopsy categories with respect to the three outcome groups: a total biopsy score, b activity index, c chronicity index, d tubulointerstitial index

Eighteen biopsies (34%) were classified as ISKDC grade II, 32 biopsies (60%) as ISKDC grade III, two biopsies (4%) as ISKDC grade IV and one biopsy (2%) as ISKDC grade V. None of the patients with ISKDC grade II and six of the 32 (19%) patients with ISKDC grade III had an unfavourable outcome. Of the two patients with ISKDC grade IV, one had a favourable outcome and the other an unfavourable one. The only patient with ISKCD grade V had an unfavourable outcome.

The true positive rate (sensitivity) and false positive rate (1 − specificity) of the biopsy classifications were compared using ROC curve analyses and by calculating the respective AUC values (Fig. 2). The total biopsy score had the largest AUC, 0.912 (95% CI 0.824–1.0), followed by the activity index, 0.878 (95% CI 0.780–0.975), the chronicity index, 0.776 (95% CI 0.647–0.906) and the ISKDC, 0.765 (95% CI 0.607–0.923). The AUC difference between the total biopsy score and ISKDC classification was 0.15 (p = 0.04, normal-based 95% CI 0.007–0.29, bias-corrected 95% CI −0.004 to 0.28). The Youden index for the total biopsy score was 0.72, and the corresponding cut-off point was 10.5. When this cut-off point was applied to the patient outcomes, seven of the 14 (50%) patients with a total biopsy score of ≥11 had an unfavourable outcome, while only one of 39 (3%) patients with a total biopsy score of ≤10 had an unfavourable outcome (7/14 vs. 1/39; RR 19.5, 95% CI 2.6–144.7, Fisher’s exact test p <0.001). For ISKDC, the Youden index was 0.40, and the corresponding cut-off point was 2.5. When this cut-off point was used, eight of the 35 patients with ISKDC grade III or higher (23%) and zero of the 18 of the patients with ISKDC grade II or lower (0%; 0.5 was added to all cells to calculate the RR) had an unfavourable outcome (8/35 vs. 0/18; RR 9.0, 95% CI 0.5–147.2, Fisher’s exact test p = 0.040). We also compared the biopsy classifications using univariate logistic regression (Table 6), in which the fit of the model was tested by calculating AIC values; the total biopsy score of the SQC had the lowest value, 30.8, while that for the ISKDC was 38.8. The IRR of the SQC classification, when assessed with ICC, was 0.43 for single measures and 0.61 for average measures.

Fig. 2
figure 2

Receiver operating characteristic (ROC) curves for the total biopsy score, International Study of Kidney Disease in Children (ISKDC) classification, activity index and chronicity index. For the purpose of the ROC analyses outcome groups A  and B were coded as non-diseased and C and D as diseased. Areas under the curve (AUC) for the total biopsy score, ISKDC classification, activity index and chronicity index were 0.912 [95% confidence interval (CI) 0.824–1.0], 0.765 (95% CI 0.607–0.923), 0.878 (95% CI 0.780–0.975) and 0.776 (95% CI 0.647–0.906), respectively. The AUC difference between the total biopsy score and ISKDC classification was 0.147 (normal-based 95% CI 0.007–0.287, bias-corrected 95% CI −0.004 to 0.281), that between the activity index and ISKDC classification was 0.112 (normal-based 95% CI −0.021 to 0.246, bias-corrected 95% CI −0.028 to 0.241) and that between the chronicity index and ISKDC classification was 0.011 (normal-based 95% CI −0.162 to 0.184, bias-corrected 95% −0.168 to 0.178). a ROC curves for the total biopsy score and ISKDC classification, b ROC curves for the activity index and chronicity index

Table 6 Comparisons between the International Study of Kidney Disease in Children (ISKDC) classification and Semiquantitative Classification (SQC) scoring systems using univariate logistic regression analysis

There were statistically significant differences in urine protein excretion between the outcome groups (p = 0.018) at the time of the biopsy (Table 1; Fig. 3). Post hoc pairwise comparisons showed that dU-Prot at the time of the first biopsy was significantly higher in combined group C + D than that in group A (Bonferroni adjusted p = 0.036) or B (Bonferroni adjusted p = 0.026), whereas there was no difference in dU-Prot between groups A and B (Bonferroni adjusted p >0.99). Also, systolic blood pressure at the time of the biopsy was higher in group C + D than in group A (Bonferroni adjusted p = 0.025) or B (Bonferroni adjusted p <0.001) (Table 1; Fig. 3). Again, groups A and B had a similar systolic blood pressure (Bonferroni adjusted p = 0.64). No statistically significant differences between the three outcome groups were found in any of the other laboratory or demographic parameters studied here (Table 1).

Fig. 3
figure 3

a Box-plot of 24-h urine protein excretion at the time of the biopsy in relation to the three outcome groups, b box-plot of systolic blood pressure at the time of the biopsy in relation to the three outcome groups. dU-Prot 24-h Urine protein excretion

Discussion

The correlation between primary kidney biopsy findings and patient outcome was evaluated here in a nationwide cohort of paediatric HSN patients. Re-evaluation of the biopsy samples using both the ISKDC classification and a modified SQC scoring system showed the SQC to be more coherent and more sensitive in terms of predicting patient outcomes. In addition, proteinuria and systolic blood pressure at the time of diagnosis were found to correlate with the outcome. The proportion of patients with a poor outcome in our series was eight in 53 (15%) patients which is slightly lower than that reported in other surveys with similar or shorter follow-up times [18, 19]. One explanation of this difference may be that most of the patients in our cohort had received immunosuppressive and/or antiproteinuric treatment for nephritis.

The ISKDC classification was published in 1977 and has been used widely as a grading system for HSN and IgA nephropathy. It has nevertheless been criticized for taking only mesangial proliferation and the percentage of crescentic glomeruli into consideration and ignoring other glomerular and tubulointerstitial parameters [4, 6]. We have previously shown in an HSP cohort that patients with a low (grades I and II) ISKDC grade in their first biopsy may have an unfavourable long-term outcome [13]. This suggests that evaluation of the activity and chronicity components separately is important as early treatment is warranted in patients whose biopsy shows mainly active changes while a primary biopsy showing predominantly chronic changes does not justify prolonged, aggressive immunosuppressive treatment. The influence of tubulointerstitial changes on clinical severity in patients with HSN has also been addressed [2022], and reports from several authors have shown a discrepancy in the correlation between ISKDC grades and the outcome [13, 19, 23, 24]. Other semiquantitative classifications evaluating multiple glomerular, tubulointerstitial and vascular parameters in addition to our SQC have also been introduced for the histological evaluation of HSN [20, 22, 2527], but they are not widely used in clinical practice. The Oxford classification [28] was published in 2009 as a histological classification for IgA nephropathy, a disease which is histologically similar to HSN [29]. To our knowledge, there have been no studies of the feasibility of using the Oxford classification in the case of children with HSP, although Kim et al. did suggest that it could be used for predicting the long-term outcome in adult HSP patients [30]. The main differences between the Oxford classification and the SQC concern the number of histological parameters evaluated, as the Oxford classification considers four variables and the SQC takes into account a total of 14 variables. There are also some pathophysiological differences between HSP and IgA nephropathy. For example, onset of the disease is typically more acute in HSP than in IgA nephropathy [31], emphasizing the need for analysing the activity and chronicity components separately.

We adopted here a modified version of the classification used in our earlier work in which we found that the chronicity index and the total biopsy score were the best histological factors for predicting the outcome of patients with childhood IgA nephropathy [7]. For the purposes of the present study, the classification was expanded to include evaluations of mesangial proliferation, lobulation and focal or diffuse mesangial proliferation and applied to a cohort of HSN patients. According to our analyses, the total biopsy score and activity index were the best factors for predicting the outcome. Both variables also tended to be better than ISKDC grades for predicting patient outcome when expressed in terms of AUC and AIC values. The importance of active lesions is understandable, since we were analysing primary kidney biopsies. Other studies have also shown a correlation between the acuity score of the primary biopsy and clinical severity [20] and reported that active lesions are predictors of a poor outcome [18]. The total biopsy score obtained in the SQC was also useful for differentiating patients with favourable and unfavourable outcomes, since those with a total biopsy score of ≤10 points usually had a good prognosis [1/39 patients had an unfavourable outcome (3%)], while those with a score of ≥11 had a greater risk of renal impairment [7/14 patients had an unfavourable outcome (50%)].

Several authors have concluded that the long-term outcome of HSN is determined by the severity of renal involvement at the onset of the disease [3, 12, 13], a notion which is also supported by our findings as the patients with an unfavourable outcome had significantly higher dU-Prot than those with a favourable outcome. In addition, long-term follow-up studies of the prognosis for HSN have shown that clinical recovery does not inevitably mean a favourable long-term outcome [12, 13, 32]. It is therefore possible that clinical deterioration will occur even in our cohort of HSN patients in the future, since the median follow-up time was only 7.3 years. Our patients with an unfavourable outcome also had higher systolic blood pressure at the time of the biopsy than those with a favourable outcome, whereas other studies have failed to find initial hypertension to be a predictor of a poor outcome in multivariate regression analysis [24, 32]. One possible explanation for the different results is that we analysed the absolute blood pressure figures and did not categorize the patients into those with or without hypertension.

There are a number of limitations to our findings. One, and probably the most important, is the variability in treatment within our cohort, which is undoubtedly a confounding factor. It is also possible that the active nature of the treatment may have hampered the overall prognostic value of the kidney biopsies [19, 33]. Thirty-eight of our patients (72%) had received immunosuppressive treatment for nephritis and 45 (85%) had received ACE/ARB medication for hypertension and/or to control proteinuria. On the other hand, all eight patients with an unfavourable outcome had received immunosuppressive treatment and seven (88%) had received ACE/ARB medication at some point in the treatment of their disease, although the immunosuppressive treatments used and their timing and duration varied between patients. It is therefore difficult to draw any conclusions on the influence of treatment on patient outcome. Twenty-six patients had also received oral prednisone or prednisolone for extrarenal symptoms before the biopsy, but we have previously shown that early treatment with low-dose steroids does not prevent the development of nephritis in HSP [34] and does not have an effect on the frequency or timing of the nephritis [35]. Also, even though there is bias due to the treatment, it does not hamper the evaluation and comparison of the two classifications since the scorings were performed from the same biopsies obtained from the same patients and were then compared against each other in their ability to predict the outcome. Another limitation of this work is that we did not take into account any confounding factors in the ROC curve and logistic regression analyses. This is due to the small sample size, and especially the small number of patients with an unfavourable outcome which might bias logistic regression results [36].

In conclusion, although a kidney biopsy is mandatory for the diagnosis of HSN, its value as a predictor of the outcome is dependent on many clinical variables, and also on the classification used. Our suggestion is that a semiquantitative classification including activity and chronicity indices should be introduced into clinical practice. For this purpose, larger prospective studies on the prognostic value of kidney biopsies for the treatment of HSN patients are needed in order to evaluate SQC and other scoring systems.