Keywords

1 Introduction

Pituitary adenomas are rare intracranial tumors, presenting with a prevalence of 1/1500 in the general population. In most cases, they are benign lesions, whose clinical manifestations are related to mass effect signs - depending on tumor size and/or to hormone hypersecretion syndromes. On the other hand, low dimension intrasellar adenomas may be clinically silent and their diagnosis often comes as an incidental finding on MR scans [1,2,3,4,5].

Radiomics, consisting in the conversion of images into mineable data and subsequent analysis for decision support, is an emerging field allowing tumor classification [6]. In particular, texture analysis is a postprocessing technique for quantitative parameter extraction from pixel grey level heterogeneity. It consists of statistical analysis based on both simple intensity value distribution histograms and more complex gray level distribution matrix analyses which also retain information on spatial distribution of voxel intensities [7].

In this setting, machine learning can be applied in order to predict the outcome of patients and help clinicians in decision-making [8,9,10,11]. There is a wide range of applications of machine learning in different areas of medicine, from cardiology to radiology [12, 13]. In particular, studies applying machine learning on texture analysis according to the “radiomic process” were described by Kumar et al. [14]: Zacharaki et al. classified brain tumor type and grade using MRI texture and shape through Linear Discriminant Analysis with Fisher’s discriminant rule, k-nearest neighbour (KNN), nonlinear Support Vector Machine (SVM) and employing leave one out cross-validation [15]; Juntu et al. differentiated benign from malignant soft‐tissue tumours in T1‐MRI images testing three classifiers (neural networks, decision tree and SVM) [16]; Romeo et al. characterized adrenal lesions on unenhanced MRI images [17]; finally, Stanzione and colleagues have recently demonstrated the potential of this approach in prostate cancer local staging [18]. Moreover, recent studies investigated the relevance of first and second order histogram features obtained by diffusion-weighted imaging magnetic resonance in differentiating functional from non-functional pituitary macro-adenoma through a classic statistical analysis [19].

Therefore, the aim of this study is to apply machine learning algorithms on parameters obtained by texture analysis on MRI images in order to distinguish functional from non-functional pituitary macroadenomas.

2 Materials and Methods

2.1 Subjects

We retrospectively reviewed data of 50 patients, who received so-called standard endoscopic endonasal approach for the removal of a pituitary adenoma, between January 2013 and December 2017, at the Division of Neurosurgery of the University of Naples ‘Federico II’ in Italy. All of them underwent preoperative MRI at our Institution prior to the surgical procedure. Demographic data, preoperative assessment - i.e. endocrinological and visual status and presenting signs - tumor features, prior treatments, surgical results and complications, were retrieved from our electronic database (Filemaker Pro 11 - File Maker Inc, Santa Clara, CA, USA).

2.2 MRI Acquisition and Texture Analysis

All exams were acquired on a 1.5-Tesla scanner (Gyroscan Intera, Philips, Eindhoven, The Netherlands). The imaging protocol always included a coronal T2-weighted Turbo Spin Echo sequence (TR/TE: 2600/89 ms, FOV: 180 × 180 mm; matrix: 288 × 288; thk: 3 mm) used for the following radiomic feature extraction.

First of all, lesions were detected by an expert neuroradiologist who then performed their manual contouring by means of a bidimensional polygonal ROI after selection of the slice where it showed maximum extension (Fig. 1). Further editing with a brush tool was performed, when needed. This process was carried on using a freely available segmentation software (ITKSnap v3.6.0) [20].

Fig. 1.
figure 1

Coronal T2-weighted MRI exam showing the maximum extension slice of a functioning pituitary macroadenoma (A). Image B depicts the result of the image annotation of the region of interest to be employed for subsequent texture feature extraction

Image pre-processing and feature extraction were performed on an open-source Python radiomics software (Pyradiomics v2.1.2) [21]. The first step consisted of image gray level normalization with a scale of 100). This step was mandatory since T2-weighted images are not quantitative and intensity values are not absolute in contrast to T2 maps. The latter were not available as only routine clinical scans were selected for the analysis, also in order to guarantee reproducibility of the results in the clinical setting. Subsequently, all volumes and corresponding lesion masks were resampled to a 2 × 2 × 2 mm voxel resolution. The next pre-processing step consisted of intensity value discretization. For this task, a fixed bin width approach was chosen, obtaining an ideal bin count between 16 and 128, as suggested in previous studies [22].

The use of wavelet decomposition, yielding all possible combinations of High and Low pass filtering in the x, y and z dimensions, and edge enhancement Laplacian of Gaussian (LoG) filters, emphasizing gray level change at different texture coarseness, allowed additional feature extraction from the derived images.

Finally, in relation to texture parameter extraction, together with bidimensional shape and first order statistics, we also obtained higher order class parameters. In detail, these were derived from the symmetrical Gray Level Co-occurrence Matrix (GLCM), Gray Level Size Zone Matrix (GLSZM), Gray Level Run Length Matrix (GLRLM, Neighboring Gray Tone Difference Matrix (NGTDM) and Gray Level Dependence Matrix (GLDM).

2.3 Tool

Knime analytics platform (v. 3.7.1) was chosen to conduct this machine learning study, as it is a well-known open source platform implementing a wide range of machine learning algorithms and integrated with Weka, Python and other software; moreover, it was already employed in literature for other studies [23, 24]. The algorithms used in this paper are briefly presented in the next section.

2.4 Algorithms and Evaluation Metrics

J48 is the Java implementation of a C4.5 decision tree [25], which consists of the evolution of the ID3 algorithm. It is an easy structure made up of leaves, representing classes, and nodes, representing test phases over an attribute. Multinomial Logistic Regression (MLR) with ridge estimator is applied through the “Logistic” node of Weka that follows the implementation of Le Cessie, and van Houwelingen [26, 27]. K Nearest Neighbour (KNN) is an easy instance-based classifier that assigns a label basing its choice on the dominance of a class in the nearest neighbours [28]. For all these algorithms, “smote” (Synthetic Minority Over-sampling Technique) was applied [29]. Smote generates artificial data by extrapolating between a real object of a given class and one of its nearest neighbours (of the same class). Boosting was implemented for J48, it converts weak learners into strong learners that predict with higher accuracy; it selects only the parameters that can improve the predictive ability of algorithms during the training phase, making the complexity in terms of dimension decrease and improving execution time [30]. The evaluation metrics employed in this study are:

  • Accuracy: correct classifications over the total;

  • Error: misclassifications over the total;

  • Recall: the ratio of positives correctly classified;

  • Precision: the ratio of positives correctly predicted in the positive class;

  • Sensitivity: capacity to detect true positives;

  • Specificity: capacity to detect true negatives.

Moreover, Area Under the Curve Receiving Characteristic Operator (AUCROC) was computed for each algorithm and for both bagging and boosting groups.

3 Results

Of the included lesions, 25 were functioning adenomas (5 Adreno Cortico Tropic Hormone, 8 Growth Hormone, 5 Growth Hormone/Prolactin, 6 Prolactin and 1 Thyroid-Stimulating Hormone secreting) and 25 non-functioning. A total of 1128 features was extracted from each patient.

Due to the small number of patients, smote technique was applied to make the number of records rise from 50 to 100. Then, a procedure of feature selection was applied to reduce the number of features extracted by the images: the matrix of correlation was computed among all variables and a threshold of correlation of 0.4 was chosen: all the variables with a correlation higher than the threshold were excluded because they did not add information to the classifiers. It allowed us to reduce the number of features from more than one thousand to 28. As the number of patients was not so high, leave one out was applied for all the implemented algorithms. J48, MLR and KNN were implemented together with the boosting node of Knime. Results are summarized in Table 1 while Table 2 shows the features used to build the models.

Table 1. Scores for each algorithm
Table 2. Features used to build the predictive models

MLR obtained the highest accuracy, recall, precision, sensitivity, specificity and AUCROC among the three implemented algorithms. Despite getting the lowest accuracy (83.0%), J48 reached an AUCROC comparable to the KNN’s one.

4 Discussion

First, the MRI acquisition of 50 patients was performed at the department of Advanced Biomedical Sciences of the University Hospital “Federico II” of Naples. Furthermore, a texture analysis was conducted to extract more than one thousand quantitative features from the MRI images. The machine learning analysis was finally performed in order to carry out some evaluation metrics as regards the algorithms.

Mentioning other studies that employed radiomics and machine learning, Romeo et al. [17] characterized adrenal lesions with a diagnostic accuracy of 80%, while Juntu et al. [16] distinguished benign from malignant tumors with an accuracy of 93%; Zacharaki et al. [15] obtained 85% of accuracy classifying type and grade of brain tumours. Although a direct comparison with other studies would not be completely fair (due to the use of different datasets), this study shows greater capacity to correctly make classifications (functional and non-functional pituitary macroadenomas), exploiting features extracted through texture analysis. A comparison may be done with the study of Sanei et al. [19] who distinguished functional from non-functional pituitary macroadenomas with lower scores than those obtained through a machine learning analysis.

The functional status of pituitary lesions has a significant influence on the clinical manifestations of disease: the correct diagnosis and management is crucial for the selection of the correct therapeutic strategy and therefore cure this multifaceted disease. Although a previous study has shown the promise of Apparent Diffusion Coefficient values of pituitary lesions in this assessment [20], Diffusion Weighted Imaging is not routinely performed in the imaging of the sellar region. It is known that this area is potentially more prone to artefacts on echo-planar imaging and this technique is time consuming. For these reasons, an approach that obtains similar results while employing routine MRI sequences has more potential for its application in the current clinical setting.

Of course, this study is affected by some limitations: the dataset was augmented with artificial data in order to improve its size allowing us to perform the analysis. Major dataset could be studied, and machine learning analysis could be performed to reach 100% accuracy in this classification. Nevertheless, machine learning proved to be the best way to distinguish functional from non-functional pituitary macroadenomas using texture analysis on MRI images.

This paper proved that the combination of radiomics and machine learning can be used to predict tumoral behaviour pre-operatively while only blood tests or histopathological analysis were known as providers of this information.