Introduction

The lung is the most common organ involved with sarcoidosis with a frequency of 90 percent in most series [1, 2]. In addition, pulmonary sarcoidosis is problematic to diagnose, with an average delay of 3 months between symptom onset and diagnosis and a delay of more than one year in 20 percent of cases [3]. The delay in the diagnosis of pulmonary sarcoidosis has been attributed to the lack of specificity of its presenting symptoms that are often found with several common alternative pulmonary diseases [3, 4]. Significant delays in the diagnosis of pulmonary sarcoidosis may lead to significant disease-related morbidity as well as inappropriate treatment of alternative conditions.

There is currently no gold-standard diagnostic test for sarcoidosis. The diagnosis of sarcoidosis requires a compatible clinical presentation, histologic evidence of non-caseating granulomatous inflammation, and exclusion of other disorders capable of producing similar histology and clinical features [5]. However, the diagnosis of sarcoidosis is never completely secure, because the diagnostic criteria of “a compatible clinical presentation” and “exclusion of other disorders capable of producing similar histology and clinical features” are subjective clinical decisions that cannot be rigorously defined and are dependent on the subjective opinions of the medical provider [6, 7].

Although it was previously thought that a tissue biopsy was the gold-standard diagnostic test for diffuse lung disease, certain radiologic features have reached the level of diagnostic specificity for certain pulmonary disorders. Idiopathic pulmonary fibrosis (IPF) is the prototypical diffuse lung disease where the diagnosis can be established on the basis of the clinical presentation and lung imaging findings [8]. Several chest computed tomography (CT) features of pulmonary sarcoidosis are thought to be highly specific for the disease [9], although their diagnostic power has not been formally tested.

There is increasing evidence that artificial intelligence (AI) has the potential to provide clinical radiologic assessment at an expert level [10]. In the past several years, AI and machine learning tools have been extensively constructed to reliably diagnose lung diseases [11,12,13,14,15], including interstitial lung diseases [13]. The establishment of objective radiologic criteria for the diagnosis of pulmonary sarcoidosis has the potential to accelerate the diagnostic process as well as avoid invasive biopsy procedures. Herein we present an AI/Deep Learning (DL)-based method designed to diagnose pulmonary sarcoidosis based on chest CT imaging features. We present data from a pilot study using this platform to distinguish CT scans from pulmonary sarcoidosis patients from those of smokers who had negative lung cancer screening chest exams.

Methods

This research was approved by the Albany Medical College Institutional Review Board (study number 6039). The purpose of this study was to determine the sensitivity and specificity of a machine learning AI platform to identify chest CT scan images of pulmonary sarcoidosis patients versus chest CT scan images obtained from patients who underwent lung cancer screening where the scan was interpreted as showing no evidence of lung malignancy (defined below in the data section). This research involved first identifying chest CT scans for analysis and then subjecting them to the AI/DL method.

Data

The chest CT scans for this study were identified and screened consecutively as follows. The chest CT scans of pulmonary sarcoidosis patients (n = 126) were obtained either from an institution-approved clinical database or through the radiology records at Albany Medical Center. An author with extensive experience in pulmonary sarcoidosis (MAJ) carefully reviewed the clinical records of these patients and confirmed their diagnosis using established international criteria [5]. Subsequently, a board-certified radiologist (CD) with expertise in chest CT scan interpretation reviewed the chest CT scans of these patients to confirm that their chest CT scan findings were consistent with pulmonary sarcoidosis. In cases where sarcoidosis patients had multiple chest CT scans, the first scan showing significant disease was selected in order to minimize the possibility of developing a second pulmonary condition. For all sarcoidosis patients whose CT scans were selected for analysis, the clinician excluded them if they had clinical evidence of a concomitant additional lung disease. Similarly, the chest CT radiologist excluded chest CT scans from these sarcoidosis patients with radiologic evidence of a concomitant additional pulmonary disease or where the chest CT findings were inconsistent with sarcoidosis. No CT scan of a pulmonary sarcoidosis patient was excluded because of a specific radiographic form of the disease (e.g., fibrocystic disease, micro-nodularity), because we desired our model to learn to distinguish the chest imaging findings of all pulmonary sarcoidosis cases from those of other pulmonary disorders. The CT scans of the controls (N = 96) were obtained from patients cared for at Albany Medical Center who had undergone lung cancer screening. The criteria for patients to undergo chest CT scan screening for lung cancer changed in 2021 [16]. Therefore, these patients ranged from 50 to 80 years old, had at least a 20 pack-year history of cigarette smoking, and were either currently smoking or had quit smoking within 15 years [16, 17]. The CT chest exams from these patients were low-dose lung cancer screening studies which received a Lung-RADS score (Lung Imaging Reporting and Data System score) of “1” or “negative for lung cancer” [18]. These CT scans were either normal exams without evidence of an acute or chronic pulmonary disease or revealed minimal findings of chronic smoking-related changes, such as mild emphysema.

The AI Method to Diagnose Pulmonary Sarcoidosis

We have developed an AI/Deep Learning (DL)-based method, which is an ensemble network architecture combining Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), to classify pulmonary sarcoidosis vs. Lung-RADS score 1 from 3D-chest CT volume. CNNs have an inherent capability to learn discriminative features within convolutional blocks for classification tasks from image patches. However, with more recent advancements in DL, ViTs have become popular in building robust classification models—sometimes outperforming CNNs [19]. Unlike CNNs that capture only local information of the image within the receptive field of the convolutional filters, ViTs capture global dependencies for contextual understanding within an image for a classification task. However, one limitation of ViTs is that they require a large amount of data to train the model [20, 21]. The combination of CNNs and ViTs in an image recognition method dramatically reduces the number of test images required for learning [20, 22, 23]. Furthermore, CNNs are superior to ViTs in capturing local contextual information which was another motivation for combining these two techniques. In addition, our method only requires knowledge of the diagnosis and image data. No further human interaction is required, such as identifying regions of interest.

The overall analytic approach of this study consisted of training and validation of the AI/DL classification method using a K-fold cross-validation technique [24], where K = 5, i.e., 4/5th of the available dataset was used to train a diagnostic model and tested on the remaining 1/5th of the dataset and repeated four more times with non-overlapping validation/test data. For each validation fold, similar numbers of sarcoidosis and healthy lungs were maintained. Table 1 shows the data that were used in each fold of the five-fold cross-validation. The AI/DL framework was developed using Python 3.7.16, PyTorch 1.8.1, and using a NVIDIA V-100 Graphics Processing Units (GPU), enabled with CUDA 10.1 and CUDNN 8.0.5. The five-fold cross-validation was performed using scikit-learn 1.0.2.

Table 1 Training and validation data partitions, across each cross-validation fold for different lung conditions

Statistical Analysis

The AI/DL method takes each 3D CT scan as input. It then extracts, manipulates, and reduces these inputs to a set of features in the CNN + ViT framework with a mathematical sigmoid function that provides probabilistic values belonging to the class label of sarcoidosis. A probability value greater than 0.5 was assigned a label of sarcoidosis and probability value of ≤ 0.5 was assigned to the label of Lung-RADS score 1. Therefore, a binary classification decision was made for each input test CT scan. The probability values were also used to generate the Receiver Operating Characteristic (ROC) curves that measure discrimination power of the predictive classification model. The area under the curve (AUC) from the Receiver Operating Characteristic (ROC) curve was computed examining the proportion of true positives versus the proportion of false positives at different probability cutoffs. The performance metrics of the AI/DL method for the diagnosis of sarcoidosis vs Lung-RADS score 1 subjects were computed for each fold using the following equations: Sensitivity = TP/(TP + FN); Specificity = TN/(TN + FP); PPV = TP/(TP + FP); NPV = TN/(TN + FN); and Accuracy = (TP + TN)/(TP + FP + FN + TN).

Results

The validation results for the five-fold cross-validation are presented in Table 2. High values of performance metrics for the diagnosis of sarcoidosis were achieved in all folds of cross-validation. The overall sensitivity, specificity, positive predictive value, negative predictive value, and accuracy for the model to distinguish sarcoidosis from Lung-RADS score 1 were at least 94 percent.

Table 2 Sensitivity, specificity, positive predictive value, and negative predictive value as evaluation of the AI model performance across 5 folds of cross-validation

Figure 1A and B shows the ROCs for each fold of validations and the overall validation set, respectively. High AUCs for each of the 5 validation folds and the overall validation set were all at least 97%. We also constructed training/validation loss and accuracy curves for each of the 5 folds (Fig. 1) that demonstrated well-converged training-validation loss curves in each of the folds. This suggests that our model was optimally fitted in each fold. While the loss was tracked for training convergence, the model (epoch) with the best validation accuracy was chosen for prediction on the test data for the fold.

Fig. 1
figure 1

A Receiver operating characteristic (ROC) curves for all cross-validation folds for classifying pulmonary sarcoidosis from Lung Imaging Reporting and Data System (Lung-RADS) score = 1. B: Combined receiver operating characteristic (ROC) curve across all validation folds for classifying pulmonary sarcoidosis from Lung Imaging Reporting and Data System (Lung-RADS) score = 1

Discussion

We found that our artificial intelligence/deep learning method of chest CT scan analysis accurately distinguished CT scans of pulmonary sarcoidosis patients from those with a Lung-RADS score of 1 on a lung cancer screening examination. The sensitivity, specificity, positive, and negative predictive value, and accuracy of this method were all over 94%. The area under the ROC curve of over 97% suggests that our method can reliably distinguish these two conditions radiologically.

Artificial intelligence and machine learning have tremendous potential to be useful for chest medical image analysis and interpretation [25, 26]. These techniques have already been shown to be as or more accurate than radiologists in the detection of lung nodules [27], tuberculosis [28], and pneumonia [29]. We suspect that artificial intelligence/machine learning chest imaging platforms will be a particularly valuable assessment and diagnostic tool for interstitial lung diseases because multidisciplinary conferences attended by clinicians, pathologists, and radiologists are now considered the standard of care in the management of these diseases [30]. In particular, these platforms should be very useful for pulmonary sarcoidosis, where the diagnosis is commonly delayed [3] and based on subjective criteria [5]. AI methods could serve as an excellent screening tool prior to a final read by a radiologist.

The AI/DL method that we used is particularly useful in the radiographic diagnosis of ILD for several reasons. First, unlike ViT methods which require a large quantity of data, the combination of CNNs and ViTs vastly decreases the data required for training [20,21,22,23]. This is important in the case of ILDs as many of them are relatively rare diseases and a large number of ILD radiographic images are not available for machine learning. Second, this method does not require segmentation of lung parenchymal regions of interest as a preprocessing step. This allows the method to be developed without human direction. This may allow for novel approaches in radiographic diagnosis that may equal or surpass the current clinical approach. We believe that the use of ViTs is a critical component of our method, because many ILDs are distinguished on the basis of the specific location of the radiographic abnormalities relative to anatomic structures within the thorax. ViTs explicitly capture relative positional information along with image features for classification tasks. Finally, many ILDs such as sarcoidosis have no known cause and no standardized diagnostic test and therefore, the diagnosis of these ILDs is based on clinical judgment. It is conceivable that an analytic diagnostic approach to the radiographic features of these ILDs may surpass clinical judgment and lead to the establishment of a diagnostic standard based in chest imaging findings.

Our analysis has some limitations. First, our sarcoidosis and lung cancer screening patients were all from one medical center. Furthermore, this pilot study included a relatively small population with only 126 pulmonary sarcoidosis patients. It is possible that these patients were not representative of a universal sample of individuals with these conditions. Second, it is possible that these patients were misdiagnosed or had additional or alternative pulmonary diagnoses. However, we believe this was not common, as we rigorously searched for these conditions. Third, in this pilot analysis, we only distinguished pulmonary sarcoidosis from lung cancer screening patients with Lung-RADS score 1 chest CT scans. It is possible that other lung diseases might mimic the radiologic features of sarcoidosis more closely and it may be more problematic to distinguish pulmonary sarcoidosis from such diseases. These limitations suggest that future studies should analyze the diagnostic power of our AI/DL platform in a multicenter trial with multiple non-sarcoidosis diseases as alternative conditions. Fourth, although there was no referral or selection bias in the study, demographic, race, ethnicity, gender, CT machine vendor, and CT reconstruction biases remain. AI models are typically unaware of the biases and can lead to faulty predictions unless the data selection for training the AI represents all the variations possible. All these above factors are human biases that are probably introduced in the AI model, which is then training with unrepresentative data can lead to faulty predictions. One method to mitigate this bias is by engaging a human in the loop i.e., a radiologist to review the model’s predictions confirming sarcoidosis. Finally, although we did not observe over-fitting in the K-fold cross-validation method, the lack of validation on multicenter data is a potential limitation of this study.

In summary, we have demonstrated that our AI/DL method can reliably distinguish CT scans of pulmonary sarcoidosis patients from those with a Lung-RADS scores of 1 on a lung cancer screening examination. Our method is capable of being applied to any specific lung disease. We plan to test our method to distinguish pulmonary sarcoidosis from a variety of other pulmonary diseases.