Introduction

Hypophysitis is a heterogeneous condition that includes inflammation of the pituitary gland and infundibulum, and it can cause symptoms related to mass effect and hormonal deficiencies [1]. The prevalence of hypophysitis ranges from 0.2 to 0.88% [2,3,4], and the annual incidence of hypophysitis is 1 case per 9 million individuals [4]. According to the anatomic region of inflammation, hypophysitis is classified as adenohypophysitis, infundibuloneurohypophysitis, and panhypophysitis [4]. According to histopathologic features, it is classified as lymphocytic (71.8%), granulomatous (18.6%), and rarely, xanthomatous (3.3%), immunoglobulin (Ig)-G4-related, necrotizing pituitary, and mixed forms [5]. Secondary hypophysitis can develop in the course of inflammatory, autoimmune, vascular, infectious, and neoplastic diseases, and with adverse effects of some drugs [5].

Pituitary adenomas are the most common intracranial neoplasms with a prevalence of 0.1%, their prevalence at autopsy is 15% [6, 7]. Approximately 35% of pituitary adenomas do not secrete hormones [8]. Non-functioning pituitary adenomas (NFPAs) can be detected in the examination of hypopituitarism, incidentally, or in the examination of neurologic symptoms due to mass effect such as visual disturbances and headache [9, 10]. There are also non-adenoma lesions such as hypophysitis, which cause symptoms similar to NFPA and do not always require surgery. Autoantibodies, imaging, and pituitary biopsy are used for differential diagnosis between hypophysitis and NFPA. However, there is currently no reliable autoantibody available [8].

Machine learning can help create a more reliable assisted diagnostic tool. Establishing the correct diagnosis in these patients may provide better clinical decision support. Machine learning has advantages over other predictive methods as it enables a predictive computer model to automatically learn the best predictive features found in the training data. Unlike using a human operator to manually identify these features, machine learning models can automatically identify the most robust predictive features and potentially generalize this knowledge to new patient groups.

In this study, we aimed to develop an automatic system that can distinguish between hypophysitis and NFPA, rather than a manual method performed by a human operator.

Materials and methods

The medical records of patients with hypophysitis and NFPAs who were followed by the Endocrinology and Metabolism outpatient clinic of Istanbul University-Cerrahpasa, Cerrahpasa Medical School, and Hacettepe University, Hacettepe Medical School were retrospectively reviewed. This study was approved by the Local Ethics Committee of Cerrahpasa Faculty of Medicine. The study adhered to the Tenets of Helsinki.

Patient selection

Inclusion criteria were (a) NFPAs with histopathologic diagnosis; (b) clinical and radiologic or histopathologically established diagnosis of hypophysitis; (c) high-quality dynamic contrast-enhanced pituitary magnetic resonance imaging (MRI) before surgery or treatment. The exclusion criteria were as follows: (a) a history of another intracranial disease; (b) having undergone radiotherapy, or neurosurgery before MRI; (c) complicated lesion with hemorrhage in NFPAs; (d) presence of apoplexy.

MRI protocol

Pituitary MRI was available in all patients, including Gd-based contrast-enhanced gadopentetate dimeglumine (0.1 mmol/kg) T1-weighted (W) sequences and T2-weighted sequences. A 3.0 T MRI scanner (Ingenia; Philips Healthcare, Best, Netherlands) was used, acquiring axial, coronal, and sagittal data. Dynamic MRI scanwas conducted within 200 s following intravenous injection of the contrast agent. Examples of MRI of NFPA and hypophysitis are shown in Fig. 1.

Fig. 1
figure 1

Non-functioning pituitary adenoma (a)and hypophysitis (b) on post-Gd T1-W MRI images

MR texture analysis

The volume of interest (VOI) was manually segmented using the LifeX softwareby one senior and one junior radiologist in a three-dimensional (3D) fashion, slice by slice on coronal and sagittal MRI [11]. The entire tumor volume and hypophysitis involvement areas were included in the VOI. Any disagreement was corrected by the senior radiologist. Feature reproducibility was assessed by the intraclass correlation coefficient (ICC) with the cutoff value of 0.8 and the coefficient of variation (CV) with the cutoff value of 20%.

Texture features were extracted using the LifeX software on post-contrast T1-weighted images (post-Gd T1-W) and T2-W images. Voxels within the VOI outside the range m ± 3sd were rejected and not considered in MRI texture analysis. Spatial resampling was 0.5 mm (x), 0.5 (y), and 4 mm (z) for each image. Intensity rescaling was performed as a pretreatment step by choosing 256 as the number of grayscale levels. Texture features were extracted from each T2-W coronal, T1-W coronal, and T1-W sagittal MRI.

We extracted first-order features based on intensity-based histograms. Six histogram features were computed for both recurrent and non-recurrent tumor volumes: mean, variance, skewness, kurtosis, entropy, and energy. Second-order features were extracted: six features from grey-level co-occurrence matrix (GLCM), 11 features from the grey-level run-length matrix (GLRLM), 11 features from grey-level zone length matrix (GLZLM), and four features from neighborhood grey-level dependence matrix (NGLDM). A total of 38 features were computed for each image.

From these features, those that could differentiate the two lesion classes significantly (p < 0.01) were determined using the Chi-square test. Classifiers using these parameters were trained using different machine learning methods with Matlab software and their performance in classifying NFPA and hypophysitis lesions on test data was measured.The machine learning algorithms were used for model development: linear discriminat analysis, fine,medium and coarse decision trees, k-nearest neighbors, support vector machines (SVM), naive Bayes, ensemble classifiers. The learning and testing phase were performed using 10-fold internal and 10-fold external cross-validation. The performances of the methods used were compared using the Matthews correlation coefficient (MCC). An algorithm that correctly classified the two lesion types with high probability was selected using receiver operating characteristics (ROC) analysis and calculation of error matrices.

Results

In this study, a total of 34 patients, 17 (50%) of whom had NFPAs and 17 (50%) had hypophysitis, were evaluated. Of the 17 patients with hypophysitis, 14 (82%) had primary hypophysitis and three (18%) had secondary hypophysitis. One case of secondary hypophysitis was Erdheim-Chester disease, one was IgG4-related hypophysitis, and the other was cancer immunotherapy-associated hypophysitis. Nine (52.9%) of the patients with hypophysitis were female and eight (47.1%) were male, and eight (47.9%) of the patients with NFPA were female and nine (52.9%) were male. There was no significant difference between the groups in terms of sex (p = 0.732). The age at the diagnosis of NFPA was found to be significantly higher, and the median age at diagnosis of hypophysitis was 27.0 [range 23.0–43.0] years, and the median age at diagnosis of NFPA was 50.0 [range, 45.0–56.0] years (p < 0.001).

Among the 38 radiomic parameters obtained from T1A-C images, 10 parameters were selected that could differentiate the lesions. These 10 tissue features (p < 0.01) are listed in Fig. 2 in order of importance. To avoid the effect of confounding factors and to increase the performance of the classifiers, the parameters with high correlation (correlation coefficient > 0.7) with each other were eliminated using the independent component analysis method. The differences between hypophysitis and NFPA were significant for gray-level run-length matrix-low gray level run emphasis (GLRLMLGLRE), gray-level co-occurrence matrix-correlation, and gray-level co-occurrence entropy. Machine learning algorithms were performed with the combination of these three selected texture features (Fig. 3).

Fig. 2
figure 2

Ten tissue features used to differentiate hypophysitis from non-functioning pituitary adenoma. Univariate chi-square test is used to calculated the importance score of each texture feature. The values in scores are the negative logs of the p-values

Fig. 3
figure 3

Boxplots of three features with the highest predictor importance score. GLRLM_LGLRE feature has a significantly higher mean in HP cases whereas GLCM Entropy and correlation is higher in NFPA

Error matrices were calculated by using the machine learning algorithm’s decision tree classifier, Bayesian classifier, K-nearest neighbor algorithm, and SVM using the Matlab software. It was seen that SVM (Sensitivity: 0.74, specificity: 0.97, PPV: 0.96, NPV: 0.78, accuracy: 0.85, MCC: 0.72) showed the best performance in distinguishing the two lesion types using ROC analysis (Fig. 4) and confusion matrix criteria (Fig. 5). In contrast, K-nearest neighbor has the least performance in distinguishing the two lesion types (Sensitivity: 0.64, specificity: 0.52, PPV: 0.57, NPV: 0.60, accuracy: 0.58, MCC: 0.11). The results of the prediction performance of machine-learning classifiers are shown in Table 1.

Fig. 4
figure 4

The classification results of SVM with the three highest-ranked features. The linear SVM has the highest performance among all classifiers with AUC of 0.91. HP: Hypophysitis, SVM: Support vector machines, AUC: The area under the ROC curve

Fig. 5
figure 5

The performance of the SVM model with confusion matrix. The algorithm differentiate hypophysitis from nonfunctioning adenoma with a sensitivity = 0.74, specificity = 0.97, precision = 0.96, negative predictive value = 0.78, false positive rate = 0.03, false discovery rate = 0.039, false negative rate = 0.04, accuracy = 0.85. Matthews Correlation Coefficient = 0.72. HP: Hypophysitis, NFPA: Non-functioning pituitary adenoma, SVM: Support vector machines

Table 1 The prediction performances of machine-learning classifiers

Discussion

This study reports that texture analysis-based machine learning tools show feasible performance in discriminating hypophysitis from NFPAs on post-Gd T1-W MRI. In this discrimination, three texture features, GLRLMLGLRE, gray-level co-occurrence matrix-correlation, and gray-level co-occurrence entropy, were significant. SVM successfully differentiated the two lesions using ROC analysis and confusion matrix criteria.

Patients with hypophysitis may be misdiagnosed as having NFPA and may even undergo surgery [12,13,14]. Although some radiologic and biochemical methods have been developed [8], a non-invasive method that clearly distinguishes hypophysitis from NFPA has not been identified at present. Gutenberg et al. reported a new radiologic score to distinguish autoimmune hypophysitis from NFPA [8]. Although the use of this score provides a significant improvement in the differential diagnosis of hypophysitis and NFPA, discrimination is still problematic [2]. For this reason, ML has recently gained an important place in the differential diagnosis of hypophysitis and NFPA. In our study, SVM was able to successfully differentiate hypophysitis from NFPA. However, due to the rarity of hypophysitis, the number of our patients was small, but with larger studies to be done, the place of ML in the differentiation of hypophysitis and NFPA may be stronger.

ML is used successfully in diagnosis and predicting the outcomes of post-treatment of diseases in the sellar region. Kitajima et al. used artificial intelligence (AI), and they reported that their artificial neural network showed high performance in differentiating pituitary adenoma, craniopharyngioma, and Rathke’s cleft cyst (area under the receiver operating characteristic curve, 0.990) [15]. In another study, Steiner et al. showed with AI that pituitary adenomas could be distinguished from normal pituitary tissue [16]. Ugga et al. evaluated the preoperative prediction of pituitary adenoma proliferative index class, and the authors demonstrated that ML is effective in predicting the pituitary macroadenomas ki-67 proliferation index class [17]. Zhang et al. reported that quantitative radiomic methods provided a preoperative prediction of NFPA subtypes, especially distinguishing between null cell adenomas and other subtypes, using T1 and CE-T1 images [18]. Cuocolo et al. evaluated the accuracy of tissue analysis combined with ML in the preoperative evaluation of pituitary macroadenoma consistency in patients undergoing endoscopic endonasal surgery. Trained on radiomic data extracted from T2-weighted MRI the ML model showed high accuracy in classifying soft and fibrous macroadenomas [19]. In our study, hypophysitis and NFPA were successfully differentiated using machine learning methods based on MRI‑derived texture features.

In addition to being able to distinguish diseases, AI has also made significant advances in determining personalized treatments. Koçak et al. showed that the machine learning-based qTA of T2-weighted MRI performed better than the qualitative and quantitative relative signal intensity and immunohistochemical evaluations in predicting response to a somatostatin analog in patients with acromegaly [20]. In another study, Hollon et al. evaluated a large cohort of 400 patients with pituitary adenomas using machine learning and they were able to predict the early outcomes of pituitary adenoma surgery with 87% accuracy [21]. In our study, we could not evaluate the response to treatment, the parameters affecting the treatment, and the personalization of the treatment due to the small number of patients in the hypophysitis group. However, larger-scale studies on this subject may contribute to the literature.

The main limitation of this study is the heterogeneity of the hypophysitis group. In addition, our study is retrospective in design and the number of patients is small.

In conclusion, we showed that machine learning tools had feasible performance in distinguishing hypophysitis from NFPA on post-Gd T1-W. With the development of this tool, patients with hypophysitis can be diagnosed using a non-invasive method and unnecessary surgeries can be avoided.