Keywords

1 Introduction

Segmentation is the process of delineating specific areas within an image, achieved by assigning labels to each pixel based on regions of interest (ROIs). Various methodologies exist for performing semi-automatic or automatic segmentation, intelligent image processing analysis in the scenario [1, 2]. Thresholding is an easiest algorithm used for image segmentation which groups pixels based on intensity differences [2].

In lung images, segmentation serves as a preprocessing step to separate the lungs, enabling various machine learning tools to concentrate their actions exclusively on this tissue, thus improving performance. Thresholding algorithms work well when lung pathologies are absent, as there is a noticeable contrast between the lungs and the surrounding tissue in both computed tomography (CT) images and plain X-rays. However, complications arise when the lung tissue density increases due to diseases such as pulmonary fibrosis [3], interstitial lung disease [4], and cancer [5], among others, resulting in X-ray beams interacting similarly with both lung and surrounding tissues [6, 7].

To overcome this challenge, state-of-the-art (SOA) approaches often employ K-means due to their simplicity and interpretability. For instance, Gupta et al. (2022) used K-means, fuzzy C-means to separate anatomical structures within the CT images, incorporating wavelet techniques to enhance the obtained mask. Their method achieved an accuracy, Dice Similarity Coefficient (DSC), and Jaccard Similarity Index (JSI) of 0.9928, 0.9872, and 0.9787, respectively [1]. Similarly, Hu et al. (2020) employed Convolutional Neural Networks for lung region mapping and utilized Bayes, Support Vector Machines, and K-means as kernels, achieving an accuracy of 0.97 [8]. Liu et al. (2023) utilized K-means in conjunction with Hough transform to remove cavities, obtaining a DSC and JSI of 0.9786 and 0.9512, respectively [9].

While these studies effectively tackle lung segmentation (LS), they rely on the manual identification of lung clusters per image to generate masks. This can be time-consuming for specialists dealing with sizable image volumes during diagnosis and treatment, where accuracy is vital. Hence this work proposes an approach to automate the selection of lung and non-lung clusters. The goal is to have a methodology for the segmentation of diseased lungs that achieves competitive performances regarding SOA methods.

2 Materials and Methods

K-means is an unsupervised clustering algorithm designed to identify K groups within a dataset. In the context of image segmentation, the dataset consists of elements represented by a dimension M × N image matrix denoted as X. The algorithm groups the pixels in X into K based on their similarity. The user determines the number of clusters, K, based on prior X data analysis. The resulting segmentation, Y, allows for drawing conclusions and characterizing the K groups [10].

2.1 Image Segmentation Process using K-means

The image segmentation process is showed in Fig. 1 [11]. It involves treating the image as a dimension M × N matrix, denoted as X, where each element Xmn represents the intensity information. In CT images, this intensity is expressed in Hounsfield Units [12]. To facilitate processing, the matrix is transformed into a vector using lexicographical ordering ·\(\mathcal{L} \){\(\cdot \)}, this consists of the column-by-column to left-to-right stacking of the matrix X [13]. This vector is then subjected to the K-means algorithm, resulting in a vector where each pixel is assigned to one of the K clusters. Then K-means clustering model output is reorganized through lexicographical reordering ·\(\mathcal{L} \)−1{\(\cdot \)}, resulting in a Y matrix of dimension M × N. With this, the user can identify and select the clusters of interest. In the context of LS, the user designates all the pixels within the lung clusters with an intensity value of 1, while non-lung pixels are assigned an intensity value of 0. This process is referred to as binarization, given its binary decision nature in this study.

Fig. 1.
figure 1

Representation of image segmentation process using K-means.

One drawback of the image segmentation process using K-means is the manual assignment of labels (such as lung or not-lung) to each cluster in every image, which can compromise its effectiveness.

2.2 Automatic Cluster Selection

To address the issue of manual selection and labeling of clusters in K-means LS, an automatic methodology is proposed (see Fig. 2). The main idea of this methodology is to determine an upper threshold \(\alpha \) and a lower threshold \(\beta \). Once these thresholds are computed, the ratio of pixels within each cluster that falls within the limits of \(\alpha \) and \(\beta \) is calculated.

First, the data set A is divided into two sets: 70% of the studies for training (A1) and 30% for test (A2). The training set, A1, contains Bi thorax CT images, while the test set, A2, consists of Ci thorax CT images. Additionally, the Di mask is required as a counterpart for Ci and Bi; every image matrix of M × N dimension. The Di mask contains labeled pixels indicating whether they belong to lung and non-lung regions, with lung pixels assigned an intensity value of 1 and non-lung pixels assigned an intensity value of 0. This resulting mask serves as gold-standard for each Bi and Ci images.

The training process involves the following steps: for each Bi image in A1, the Di mask is applied to compute the element-wise product (×) between Bi and Di resulting in the extraction of lung intensities. This procedure is repeated for all A1 images. Using the lung intensities from all A1 images, the global mean \(\overline{x}\) and global standard deviation \(\sigma \) are calculated. These values are then used to determine the threshold using the following equations:

$$ \alpha = \overline{x} + \sigma , $$
(1)
$$ \beta = \overline{x} - \sigma . $$
(2)

During the experiment, all Ci images from A2 are collected and concatenated into a matrix C′ with dimensions M × Ni. Subsequently, the K-means image segmentation process, as described in Sect. 2.1, is applied to C′.

Fig. 2.
figure 2

Schematic of the proposed methodology. δ and ε are values greater than 0.001.

To determine the number of clusters, various experiments were conducted with values of k ranging from 2 to 10. Optimal results were achieved with 4 clusters. Through the K-means segmentation process every pixel in the matrix C′ is assigned a label corresponding to one of the K clusters. To determine whether a cluster represents the lungs, a ratio is calculated based on the pixels within the threshold defined by \(\alpha \) and \(\beta \). This ratio is computed using the following equation:

$$ \gamma = \frac{p}{P} $$
(3)

where p is the number of pixels within the threshold for a given cluster, while P represents the total number of pixels in that cluster. By calculating \(\gamma \), clusters with values greater than 0.001 are selected. Subsequently, all pixels belonging to the selected clusters are assigned a value of 1, while the remaining pixels are assigned 0. This generates the M × Ni matrix binary mask C′, that needs to be divided into i separate matrices. As a result, we obtain Zi matrices with M × N dimensions.

2.3 Post-processing Mask Strategy

The resulting Zi is obtained for each Ci and undergoes post-processing, which involves applying morphological transformation and image-processing techniques. This process aims to refine the mask and ensure that it covers most lung pixels. Further details of this post-processing procedure are shown in Fig. 3.

Fig. 3.
figure 3

Schematic post-processing including the gold-standard comparison.

The initial step involves applying a median filter as suggested by Tukey [13], with the defined kernel size of three pixels. This process eliminates scattered pixels in the image and smooth out the relevant ones. Next, a morphological closing operation, initially proposed by Matheron and Serra [14] is performed. The image undergoes dilation followed by an erosion to enhance the solidity of the mask. A circular structural element with a radius of eight pixels is utilized for this operation.

The presence of air outside the body and within the lungs leads to similar intensity values, making it necessary to remove it from the mask. To accomplish this, a morphological transformation is applied, taking advantage of the proximity of air pixels along the image edge. By intersecting the input image with its edge, a marker image is created, containing seeds for each connected pixel or particle at the edge. Through reconstruction, an image consisting of these particles is obtained and subsequently erased [15]. To fill any remaining unconnected holes, an erosion-based reconstruction is performed using the mask and a marker image with consistent lung values [15].

Depending on the disease, cavities can appear at the edge of the region of interest (ROI). To address this, the circular shape of the cavities can be leveraged using an algorithm described by Liu et al. [9]. The algorithm utilizes the Hough transform to detect circles to be filled. Selection criteria were defined as follows: circles with less than 1/2 of lung pixels, and circles with more than 2/3 of lung pixels on their perimeter. This process results in the post-processing matrix \({\widehat{Z}}_{i}\) with M × N dimensions.

2.4 Evaluation Metrics

To assess the performance of the proposed methodology, the post-processed matrix \({\widehat{Z}}_{i}\) was compared to the corresponding gold-standard mask Di using DSC [16] and JSI [17]. These metrics provide a similarity value ranging from 0 to 1, where 0 indicates no spatial overlap and 1 represents complete spatial agreement.

Following extensive comparisons using DSC and JSI metrics, the global mean and standard deviation for each metric were calculated and then compared to previous works in the SOA.

We use DSC and JSI metrics in this study based on their common usage in segmentation methodologies and their ability to evaluate performance. However, it is important to note that JSI offers advantages over DSC. Unlike DSC, JSI satisfies all the properties of a metric, including the crucial triangular inequality property [18]. The relaxed triangular inequality in DSC can affect efficiency and approximation ratios, rendering it not fully considered as a metric [19].

2.5 Computational Tools

This methodology and experimentation were developed using Python 3.9.16. Pydicom 2.3.1 for reading DICOM files and their metadata. OpenCV 4.7 for image morphological transformations, NumPy 1.21.5 and Pandas 1.5.3 for matrix analysis, operations, data transformation, and structures, and Matplotlib 3.6.3 for image and results displaying. All this work was carried out using the hardware CPU AMD Ryzen 7 5800 H 3.20 GHz, GPU Nvidia GeForce RTX 3060 6 GB VRAM and 16 GB RAM.

3 Results and Discussion

The evaluation of the proposed methodology use the Multimedia Database of Interstitial Lung Diseases created by Depeursinge et al., and its use for research purposes permitted by the ethics committee of the University Hospitals of Geneva [20]. This database comprises 3076 CT images in DICOM format, obtained from 113 patients diagnosed with various lung diseases within ILDs. Each image has dimensions of 512 × 512 pixels and is of uint16 value type.

Additionally, a gold-standard lung mask, manually annotated by a medical specialist, is available for each image in the database. For the quantitative assessment, the metrics DSC and JSI were employed. Figure 4 presents box plots illustrating the outcomes obtained by applying the proposed methodology to all 3076 images split randomly into a training set (70% - 2,153 images) and test set (30% - 923 images). Furthermore, the results reported by Gupta et al. [1] and Liu et al. [9] are also depicted in Fig. 4.

Fig. 4.
figure 4

Performance of the proposed methodology showed in JSI and DCS.

The performance of the proposed automatic cluster selection using K-means can be observed. The box plot reveals a low dispersion of 0.0660 and 0.0520 for JSI and DSC respectively. Notably, Quartile 1 for JSI is 0.887 and for DSC is 0.9351, while Quartile 3 for JSI is 0.9403, and for DSC is 0.9723. These results indicate that approximately half of the automatic segmentation indexes fall within these ranges, suggesting stability in the proposal. However, there are three outliers for JSI and two outliers for DSC, implying that certain images pose challenges for segmentation using this approach. Hence, it is expected to achieve performances close to the median values of 0.9092 for JSI and 0.9539 for DSC. In the comparison with the SOA, Fig. 4 demonstrates that the segmentation results obtained by this proposal are comparable to Gupta et al. [1], and in certain cases, they even surpass the results reported by both authors.

Additionally, Table 1 presents the results obtained using the same metrics and dataset, with K-means as the clustering method. First, Gupta’s approach [1] incorporated fuzzy C-means and wavelets into their methodology, while Liu et al.‘s approach [9] incorporated the Hough transform. Both authors noted that manual intervention is required for selecting lung or not-lung clusters. Gupta et al. also reported the need for manual selection of clusters during image decomposition and reconstruction using wavelets, which further increases the level of user intervention required for creating the mask.

Table 1, demonstrates that with the proposed automatic cluster selection approach, it is possible to achieve performances above 0.90 for both JSI and DSC metrics, with a standard deviation of 0.066 or lower. In contrast, Gupta et al. [1] and Liu et al. [9] did not provide information on error rate results or standard deviation resulting from manual interaction. Consequently, the potential impact of this manual selection on reproducibility within their methodologies cannot be determined.

Table 1. Comparison of the proposal with the SOA.

Furthermore, results above 0.90 in DSC and JSI (see Table 1) were accomplished, which are near to the values reported by each author, but without the manual intervention of experts that spend important time and effort.

4 Conclusions

This work introduces a methodology that enables automatic cluster segmentation eliminating the need for manual selection of clusters by experts. This approach facilitates fully automated LS, which can be valuable for various applications such as disease classification and delineating ROIs within the lungs for radiological analysis allowing to decrease specialist’s work and time spent in the analysis of large image volumes.

In future work, this methodology will be enhanced by incorporating additional image characteristics such as textures and pixel relationships. It is also crucial to evaluate the performance of the proposed methodology on different datasets to assess its robustness. By testing the methodology in diverse scenarios, its applicability and generalization capabilities can be examined.