Keywords

1 Introduction/Background

Chronic Kidney Disease (CKD) is a pathological condition consisting of a functional degeneration of the kidney. Kidney transplantation is the primary therapy for patients affected by CKD, more effective than dialysis treatment in terms of long-term mortality risk [1, 2], having, at the same time, a smaller impact on the public health system. Due to the increasing necessity of kidney transplants [3], different studies tried to widen the criteria of inclusion [4, 5]. In [6] was performed a comparison between dual kidney transplantation (DKT) from expanded criteria donors (ECDs) and single kidney transplantation (SKT) from concurrent ECDs and standard criteria donors. The authors assessed that the use of dual kidney transplantation from marginal donors is a viable option and that renal function can be achieved with older, low nephron mass donors provided that both kidneys are transplanted into a single recipient. In [7], techniques to assess the kidney condition by histological biopsy is proposed. The evaluation criterion, called Karpinski score, is based on the percentage evaluation of a pathological condition of four main functional areas: glomerulosclerosis, tubular atrophy, interstitial fibrosis and arterial sclerosis. The score ranges from 0 to 12, and a higher number means a worse condition [7,8,9]. Kidneys with a Karpinski score from 0 to 3 and from 4 to 6 are considered suitable for single and dual transplant, respectively. The computation of the score needs the analysis of kidney biopsies by pathologists, that is usually a time-consuming, prone to error and subjective procedure. Due to the reasons mentioned above, developing a clinical support system based on the tissue image analysis for supporting the computation of the score is a desirable headway.

This work is focused on the automatic evaluation of kidney biopsies, dealing with one of the four pathological conditions evaluated in the Karpinski score: glomerulosclerosis. It consists in detecting and discriminating the sclerotic condition affecting the glomeruli from those non-sclerotic. A glomerulus is part of the nephron, the functional renal unit involved in blood filtration, and performing this discrimination is a challenging task, due to their wide intensity variation and inconsistency in terms of shape and size.

In a previous work [10], a Computer Aided Diagnosis (CAD) system for segmentation and discrimination of blood vessels versus tubules from biopsies in the kidney tissue has been designed and tested. Histological images with PAS staining have been used to segment Regions of Interest (ROIs) and extract Haralick features allowing a subsequent classification procedure based on Artificial Neural Network (ANN) algorithms. Test results determined that the supervised ANN approach is consistent and reveals good performance.

In this work, a combination of different feature extraction algorithms has been designed and evaluated, starting from Whole Slide Images (WSI) with Periodic acid–Schiff (PAS) staining, for discriminating two glomerulus conditions: sclerotic and non-sclerotic. The set of extracted features come from a collection of two wide-used, well-known and general purpose features extractor algorithms families; in particular, morphological and texture features have been computed. The set of features was then reduced by means of feature reduction algorithms and then used as input to a shallow Artificial Neural Network.

2 Materials

Whole Slide Images were collected between 07/2011 and 02/2015 by physicians from the Department of Emergency and Organ Transplantations (DETO) of the Bari University Hospital. All the kidney biopsies with PAS staining were scanned by using the Aperio ScanScope CS at 20x with a resolution of 0.50 μm/pixel. The WSIs were collected from a total of 26 kidney biopsies coming from 19 donors and stored at full resolution in SVS file format (an Aperio file format consisting of pyramidal tiled TIFF with non-standard metadata and compression). The collected images presented wide differences in colour and saturation, even if all treated with PAS staining.

Two medical graduands manually annotated the glomeruli independently; the annotation were subsequently validated by a renal pathologist. The manual annotation was performed by outlining the real glomerulus region using the Aperio ImageScope tool; at the same time, the glomeruli were labelled as sclerotic and non-sclerotic.

The obtained initial dataset was composed of 428 sclerotic glomeruli and 2.344 non-sclerotic glomeruli, with a ratio between the two classes of 1/5.5.

The dataset was subsequently divided into two subsets called train set and test set. In particular about 20% of the original dataset has been used as test subset and the information of the target in the test-set has been used to assess final performances only; furthermore, the selection has been achieved randomly with the constraint that if a glomerulus appear in the test-set, all the other glomeruli belonging to the same biopsy must appear in the test-set only. This is equivalent to asses that the train/test division has been performed at biopsy level. The constrained division avoids that particular context information could be present in both the dataset leading to an unfair dataset split. The latest dataset configuration is reported in Table 1.

Table 1. Dataset configuration.

3 Methods

The main goal of this research was to design a CAD module able to classify the glomerulus condition using a feature-based approach. In detail, the designed solution works with image processing and machine learning techniques to assess the class of the each glomerulus: sclerotic or non-sclerotic.

A detailed representation of the full workflow, described in the following paragraphs, is reported in Fig. 1.

Fig. 1.
figure 1

Full features extraction and classification workflow.

As depicted in Fig. 1, the discrimination process could be divided into three main steps. The first two allow the extraction of several features and the reduction of their space by means of feature reduction algorithm; the last one leads to the assignment of the label.

3.1 Features Extraction

The features extraction is the first step of the workflow [11, 12], allowing the definition of a set of characteristics able to define and discriminate between the two different types of glomeruli. Based on the human reasoning used by the physicians able to address the problem, the best features to face the problem are those related to two main image processing techniques: morphological and texture based features.

As reported by the pathologist involved in this study, the main distinctions between sclerotic and non-sclerotic glomeruli are the shape of the Bowman’s capsule, different dimension and a different texture related to blood vessels. All the evaluations, the thresholding values and the decision regarding the best algorithms configuration have been done on train set only.

Morphological Features

Regarding the morphological characteristics, two features are related to the Bowman’s capsule and the Bowman’s space.

The first feature is computed as the sum of the areas related to the Bowman’s capsule, the blood vessels areas and the inter-capillary spaces. Due to the PAS staining, these structures are characterized by a whiteness colouration and the detection of the mask describing the region is based on three parallel image processing procedures. Each process took into account the channels of three different colour space: RGB, CMYK and Lab. In detail:

  • Green channel of RGB colour space, as it is the most representative of the glomerulus structure [13];

  • complementary of Magenta from the CMYK colour model has been chosen due to the detectable empirical significance of this colour component;

  • a and b components of Lab colour space due to the link with the human colour vision.

The extraction of the masks of green and magenta channels follows the same steps:

  1. 1.

    binarisation: to keep the pixels related to white regions a threshold value has been empirically set to 190 [14];

  2. 2.

    morphological operators: to clean the image obtained from the previous step, erosion, dilation and median filtering have been used with a disk of radius ranging from 1 to 3 as structuring element;

  3. 3.

    active contour: to clean the shape of the obtained mask, active contour algorithm [15] has been used with 200 iterations (the chosen number of iterations avoid an extreme smoothing of the glomerulus shape).

The three previous steps led to the computation of two masks, one for green channel and one for magenta one; the last mask was computed from a and b components of Lab colour space. The ab matrix has been used as input to k-means clustering algorithm [16]; the number of clusters was empirically set to 5, and the number of repetitions of the clustering process using new initial cluster centroid positions to avoid local minima was set to 3. The mask was computed subsequently by retaining just the pixels belonging to the cluster with the greatest mean grey-scale intensity value. Then the steps 2 and 3 of the Green-Magenta segmentation process were applied.

The final mask was the composition of the resulting three masks computed with a majority criterion. The obtained mask was processed to remove artefact and not interesting regions; in detail, too small regions (lesser than 1000 pixels), and a logical AND with a circle of radius equal to the smaller dimension of the image subtracted by 1/8 of its value was performed.

Figure 2 shows the overview of the Bowman’s space segmentation workflow.

Fig. 2.
figure 2

Workflow of Bowman’s space segmentation.

Starting from the final mask, the feature of interest was the sum of Bowman’s space, blood vessels and the inter-capillary region of the glomerulus, that is, in our workflow, the sum of white region. This value was finally normalised considering the image area.

The second morphological feature was related to the diameter of the glomerulus. Assuming that the white region inside the mask computed for the last feature was related to the shape of the glomerulus, the convex hull containing all these regions was computed. Then, considering the convex hull ROI as a circle, the diameter of a circle with the equivalent area was computed.

As a result of the morphological workflow, a total of two features were computed.

Texture Features

Due to the particularity of the glomerulus texture and the differences in blood vessels and inter-capillary space between sclerotic and non-sclerotic, two well-known texture analysis algorithms were used: Local Binary Pattern (LBP) and Haralick features.

As proposed in [17], multi-radial colour LBP (mrcLBP) is a suitable variation of classical LBP to face the glomerulus identification problem. The same configuration was applied to the raw RGB glomerulus images. The obtained features were ten for each radius, thus leading to a total number of 120 (10 features per radius, 4 radius, three channels).

The second set of texture-based features was obtained from the extraction of Haralick features. The four Grey-Level Co-occurrence Matrix (GLCM), one for each direction, has been computed; then, the 14 Haralick indexes were computed, leading to 56 features. To reduce this number, the mean and the range among the four directions was calculated. The final features were 28 (14 mean and 14 range, one for each Haralick feature).

As a result of the texture features extraction, a total of 148 features were computed.

3.2 Feature Reduction

The created set of features is the union of both morphologic and texture-based features. An overall number of 150 features was achieved. Due to the possibility of correlation among the different subsets of features, and to reduce the total number of inputs to the classification step, Principal Component Analysis (PCA) was applied as feature reduction algorithm. Prior to PCA, each feature was z-score normalized.

As stated in Sect. 2, the dataset was previously split into train and test set with the aim to fairly take all the image pre-processing and classification decisions on the train set only. The feature reduction algorithm, instead, doesn’t need or use the label information; for this reason, the application of PCA could be done on the whole dataset or on the training dataset only. Due to the small differences between the number of the two reduced dataset, to take into account all the information inside the dataset, and to avoid the necessity to preserve the transformation matrix for the test phase, the first approach has been chosen.

As a result of the PCA as feature reduction algorithm, a total number of 95 features were computed and will be used for the classification phase.

3.3 Glomeruli Classification

The glomeruli classification steps are based on Artificial Neural Network (ANN), specifically on shallow ANN (Fig. 3).

Fig. 3.
figure 3

Shallow artificial neural network.

All the decisions about the ANN architecture and the tuning of its parameters were taken considering the train set only, whereas all the reported results and performance discussions refer to the test set. To generalise, to avoid overfitting and to obtain a classifier independent from the input dataset, k-fold was used as cross-validation technique. Several network initialisation inside each fold and hard voting among the folds was used both to obtain independency from a particular network initialisation and to compute the overall fold class label.

The input of the classifier was the features set obtained from the image processing algorithm and the subsequent PCA feature reduction; the number of input features was 95, and 10-fold cross validation was used. The fixed training parameters were the following: one hidden layer, tansig and softmax as activation functions for the hidden and output layer, respectively, crossentropy as loss function and scaled conjugate gradient as backpropagation algorithm. The stop criterion was based on the validation set and, an early stop of the training was implemented to promote generalisation and to avoid overfitting, stopping it if the performance on validation set did not decrease inside a sliding window of 6 epochs. The last relevant parameter is the number of neurons for the hidden layer, and the choice of the right value is afterwards reported.

To face the heavy problem of unbalanced distribution between sclerotic and non-sclerotic glomeruli, we selected the Matthews Correlation Coefficient (MCC) [18] as a general performance comparison among the folds. MCC (Eq. 5) takes into account false negative and false positive and computes a correlation coefficient between predicted and target classes. As stated in [19], among the usual performance scores, MCC is the only one that takes into account the ratio of the confusion matrix size, and it revealed to be a better index of performance than accuracy or F1 score on unbalanced datasets.

Subsequently, we used the Receiving Operating Characteristic (ROC) curve to choose the correct classification threshold value. Two approaches were analysed. The first one (Approach A) assumes the optimal value as the first intersection point between the ROC curve and a line with slope equal to the ratio between the total number of negative and positive samples and sliding from the upper left corner of the ROC plot ((FPR, TPR) = (0, 1)). The second approach (Approach B), pro-posed in [20], evaluates the point of minimum distance from the point (0, 1) of the ROC plot. The equation is reported in Eq. 1.

The comparison of the two methods in terms of different performance indexes (Eqs. 2, 3, 4 and 5) is reported in Table 2. Due to the medical domain of the work, a higher recall is preferred, thus the second method was chosen.

Table 2. Comparison between the two ROC thresholding approaches. The reported values are the mean among the 10-fold.
$$ \mathop {\hbox{min} }\nolimits_{i} \sqrt {\left( {1 - sensitivity\left( i \right)} \right)^{2} + \left( {1 - specificity\left( i \right)} \right)^{2} } $$
(1)

The architecture of the shallow ANN chosen for glomeruli classification was fixed to one hidden layer. To choose the right number of neurons per layer, the performance of 95 networks were compared. In detail, several networks with the number of neurons for the hidden layer ranging from 1 to 95 were trained (95 is the fixed number of the input features). Based on the best MCC value computed as the mean MCC of the folds, the final number of neurons for the hidden layer was set to 27.

4 Results

In this section, the results of the proposed glomerulus classification workflow are reported. In particular, we reported the performance obtained considering the reduced set of features by means of PCA and then classified by using cross-validated shallow ANN. As reported in Table 1 the test set was constituted by 579 glomeruli images (87 sclerotics, 492 non-sclerotics).

Several metrics were considered for the evaluation. In particular, Accuracy (Eq. 2) and Matthews Correlation Coefficient (MCC) (Eq. 5) were evaluated on the test set, according to the confusion matrix reported in Table 3.

Table 3. Confusion matrix for metrics computation.
$$ Accuracy = \frac{TP + TN}{TP + FP + FN + TN} $$
(2)
$$ Precision = \frac{TP}{TP + FP} $$
(3)
$$ Recall = \frac{TP}{TP + FN} $$
(4)
$$ MCC = \frac{TP*TN - FP*FN}{{\sqrt {\left( {TP + FP} \right)*\left( {TP + FN} \right)*\left( {TN + FP} \right)*(TN + FN)} }} $$
(5)

To evaluate the workflow stability, 10 runs of the whole process were performed, and the corresponding results are summarized in Table 4; the results are reported in terms of mean and standard deviation and the corresponding equations are reported in Eqs. 2, 3, 4 and 5. The best result, instead, is reported in Table 5 and the corresponding confusion matrix is reported in Table 6. An example of misclassified glomeruli is reported in Fig. 4, mainly due to the presence of artefacts on the images; the pathologist confirmed this.

Table 4. Metrics comparison of 10 network initialization.
Table 5. Metrics comparison of the best network.
Table 6. Confusion matrix of the best network.
Fig. 4.
figure 4

False negative misclassified by the best model.

As reported in Table 4, the workflow allows the classification of sclerotic and non-sclerotic glomeruli with good performances and low variability. Precision and Recall are equal to 0.98 and 0.93, respectively, showing a better performance in the non-sclerotic evaluation. The results show that the proposed workflow is a valid solution to detect glomeruli and, as assessed by the domain expert, the misclassified images are a challenging detection problem for pathologist too and are usually discarded.

5 Discussion and Conclusion

In this work, a complete workflow for sclerotic and non-sclerotic glomeruli classification has been designed and developed. Several features extraction algorithms were analysed and tested, and two feature typologies were selected among them. Both morphological and texture feature extraction algorithms were tuned to achieve good performance on the training set. We collected 150 feature: 2 morphological features and 148 texture ones extracted by means of mrcLBP and Haralick algorithms. Then, a PCA was performed to reduce the number of features to 95. A cross-validated artificial neural network was trained, and unbalanced dataset and network tuning problems were faced. The final results were computed on an independent test set. The classification workflow achieve a mean MCC and Accuracy of 0.9501 and 0.9874, respectively, and low variability over 10 independent iterations. Good precision and recall were obtained too. The reported results suggest that the proposed workflow set-up is reliable for the investigated domain, supporting the clinical practice of discriminating the two classes of glomeruli. However, there are still common glomeruli misclassification in images affected by artefacts, which are usually discarded by pathologists even in the clinical practice.

In the future works, we will investigate how to reduce the number of empirical assumptions on the feature extraction process and to introduce a weighted classification among the folds; furthermore, a feature analysis, to recognise the better ones, will be conducted. Different classification techniques, such as deep learning [21, 22], will be also evaluated and the results will be compared with the approach proposed in this paper. Finally, the presented workflow will be integrated into a complete CAD tool for kidney biopsies analysis.