Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Magnetic resonance imaging (MRI) has non-invasive, non-ionizing nature and provides a superior soft tissue contrast. This technique is rapidly developing over the last years. For example, continuous technological advances such as parallel imaging [16, 54] and integrated parallel acquisition techniques (iPAT) [30] significantly reduce the acquisition time and enable standardized imaging acquisition at a very high spatial and temporal resolution providing morphological and functional information of the highest detail. This makes MRI attractive in clinical practice and research settings.

Lung imaging is one of the most challenging topics in MRI due to motion artifacts and low signal. Nevertheless, in recent years lung MRI is becoming increasingly popular in comparison to the pulmo-imaging “gold standard”, computed tomography (CT). High quality MRI assists in detection of numerous pulmonary diseases. Moreover, MRI of the lung is important for specific clinical applications. For instance, it can contribute to decision making the case of such diseases as lung cancer [67], malignant pleural mesothelioma, and acute pulmonary embolism. The advantages of MR over CT are not limited to the lack of ionizing radiation, which is of particular interest for the assessment of lung disease in children, pregnant women, or in patients, who require frequent follow-up examinations (e.g., immunocompromised patients with fever of unknown origin). Chest wall invasion by a tumor and mediastinal masses are accepted indications benefiting from MRI superior soft tissue contrast. Dynamic examinations to study respiratory mechanics and contrast enhanced first pass perfusion imaging reach far beyond the scope of CT [25].

MRI of the lung has multiple application areas. For example, conventional (so-called proton or1 H) MRI provides anatomical details and is standardly used for assessing lung volumes, boundaries, detection of nodules, infiltrates, and masses. There is a significant number of protocols designed for these purposes [5, 6]. In Fig. 1, two example slices of anatomical MR images from two different sequences, namely, T1-weighted VIBE (volume-interpolated breath hold examination) and T2-weighted HASTE (half-Fourier single shot turbo spin-echo), are shown. The VIBE sequence has a higher spatial resolution and a smaller slice thickness than the HASTE sequence (for instance, 1. 8 × 1. 8 × 3 mm vs. 2. 3 × 1. 8 × 5 mm on 1.5 Tesla Magnetom Avanto Siemens device) and is preferable for lung volumetry, whereas lung infiltrates serve as key pathologies for applying the HASTE sequence.

Fig. 1
figure 1

An example of two slices from two anatomical MR thoracic scans from the same participant. Left: T2-weighted Half-Fourier Acquisition Single-Shot Turbo Spin-Echo sequence (HASTE), 256 × 206 × 44; Right: T1-weighted volume-interpolated breath hold sequence (VIBE), axial orientation, 512 × 416 × 88. Image courtesy of Study of Health in Pomerania, Germany [18]

MRI with non-radioactive noble gases, such as hyperpolarized3 He (helium-3) or129 Xe (xenon-129), is applied for detection and evaluation of functional, e.g., ventilation, abnormalities [40, 42, 62]. MR angiography (MRA) gains its popularity for assessment of pulmonary vascular disease [25].

(Semi-)automatic segmentation is an essential step in medical image analysis, medical visualization, and computer-aided diagnosis. Since manual segmentation of large volumes of MR data is a very laborious, observer-dependent, and time-consuming process, which is prone to inter- and intra-observer variability, automatized segmentation methods for extraction of different organs and structures are actively developed.

This paper aims to provide an overview of recently published literature on automatic and semi-automatic segmentation techniques for MR images of human lungs. Our main source of references was the Internet: we have searched for the terms lung, segmentation, volumetry, pulmonary, magnetic resonance imaging (MRI) on Google Scholar, PubMed, and IEEE-Xplore.

The paper is organized as follows. First, a short introduction to human lung anatomy is given in Sect. 2. In Sect. 3, we introduce the methods and the data, which they are applied to. The performance of the methods is discussed in Sect. 4. Section 5 concludes the paper.

2 Pulmonary Anatomy

In this section, we briefly present the human pulmonary system. For a more detailed description, we refer to specialized medical literature, for example [35].

Human lungs are located in the chest wall, consisting of the rib cage, diaphragm, and mediastinum, the area between the lungs, which includes all of the organs in the chest, such as heart, vessels, trachea, except the lungs. The right lung is composed of three lobes, while the left lung has only two lobes. The lobes are further divided into sub-lobar segments, which are defined by the branching pattern of the airway tree. The lobar fissures, space between the surface of the lobes, allow the lobes to rotate relative to one another to accommodate body posture-related changes in chest wall geometry.

Air travels into the lungs through the nose or mouth and through the pharynx, larynx, and into the tracheobronchial (airway) tree (cf. Fig. 2). The pulmonary blood circulation is provided by pulmonary arteries and veins.

Fig. 2
figure 2

A schematic description of the human respiratory system

3 Lung Segmentation

Segmentation is the first and essential step in computer analysis of medical imaging data. Segmentation results are further used, for instance, for volumetric measurements, visualization, and biomechanical modeling.

In thoracic MR images, various anatomical entities are presented, such as lungs (including lobes and fissures), bronchial tree, and pulmonary vessels. Whereas in high-resolution CT images all these structures are clearly visible and can be distinguished, lung imaging is a challenge in MRI. Here, usually only organ borders can be identified. Moreover, such fine structures as the complete bronchial tree or the fissures are hardly visible in anatomical MR images. Therefore, the task of lung segmentation from MR images usually consists of lung region detection, separation of the lungs from the large airways, and exclusion or inclusion of the pulmonary vessel regions.

Numerous approaches have been proposed for CT data [13, 38, 53], whereas there are only a few methods for MRI data presented in the literature. Here, we classify and discuss these methods for automatic and semi-automatic lung segmentation from MRI data. The literature can be roughly divided into the following categories: intensity-based, model-based, and active contour-based approaches. A summary of published works is presented in Table 1. The table rows are ordered as the techniques are discussed in the paper body. We leave table cells dashed, if the correspondent information is missing in the original papers.

Table 1 A summary of published works on MR lung segmentation

3.1 Intensity-Based Segmentation

Intensity-based segmentation steps are similar in MR and CT data. Namely, intensity-based approaches usually consist of the following steps. First, lungs are extracted and separated from the large airways. Second, if needed, anterior and posterior lung junctions are corrected. Third, the lung contours are smoothed and the cavities are filled.

Sensakovic et al. proposed a multi-step approach based on thresholding, shape descriptors, and morphological operations [5052]. The method works on a slice-by-slice basis. In each slice, the thorax is separated from the background using a histogram-based thresholding and a boundary smoothing [17]. Once the thorax segmented image is computed, a histogram-based thresholding [17] is applied to it to create a lung-segmented images. Thereafter, the connected components are analyzed, and all regions that fail to satisfy the descriptor thresholds (area, compactness, center of mass, and perimeter) are classified as non-lung. If no lung region is detected, the threshold is decreased and the procedure is repeated. After this procedure, which is referred as the “core stage”, a refinement step (“the correction stage”) starts. This step is important especially in the presence of disease and acquisition artifacts. It aims to capture valid lung regions that may have been obscured by disease or acquisition artifacts. First, a gray scale erosion operator [55] is applied to the thorax-segmented image. It lowers the intensity values of the lung regions masked by acquisition artifacts or disease. Second, a thresholding procedure, similar to the one in the core stage, is applied. A series of rolling ball filters [2] is applied to the internal boundaries the lungs. Candidate lung regions from the correction stage lung-segmented image that satisfy the circularity criterion are combined with lung regions from the core stage, and the final lung-segmented image is constructed.

Ivanovska et al. implemented a 3D method for lung segmentation [22]. The approach consists of four steps: the main extraction and three refinement steps. First, the lung and trachea are segmented. Any automatic clustering technique (for example, K-Means [34]) is applicable here. Second, the trachea and main bronchi are found using a 3D connected component analysis [17]. The separation of the large airways from the lungs is done slice-wise using 2D Watershed [63] on the inverted distance map [55]. Third, the lungs are separated from each other using a 3D Watershed procedure on a subimage, which represents an anterior or posterior junction. The subimage is selected such that the computational costs of the watershed procedure stay relatively low. Fourth, the lung cavities are corrected with a morphological closing [55].

Kullberg et al. presented an automatic approach for lung segmentation as a part of the adipose tissue depots segmentation from the whole-body MRI datasets [27]. Segmentation of lungs is performed with thresholding and morphological operations [17]. To increase robustness to erroneous inclusion of intestinal gas, the lungs are segmented one by one. First, the right lung is segmented, since the position of the liver increases the distance between the right lung and the intestines. The segmentation is performed as follows. Initial thresholding is followed by a distance transform [55]. Then, a region growing [17] is applied to the distance transform image. The condition for the growing process is continuous decrease of the distance transform values. The result is dilated to ensure inclusion of the whole lung. An intensity threshold is applied in the dilated lung to determine the final result. The threshold value is computed from the fitted Gaussian function to the histogram from the dilated lung region.

Lui et al. proposed a semi-automatic segmentation algorithm to isolate areas of ventilation from hyperpolarized3 He MRI [33]. First, the background is removed by determining an optimal threshold from a sampled background noise distribution located outside of the lung field. Second, the lung mask is refined with a four-class fuzzy C-Means [56] clustering. Third, the trachea is semi-automatically removed with a seeded region growing method [56]. Finally, morphological operators [55] are used to refine the result.

Plathow et al. applied a semi-automatic method based on an interactive region growing technique [56] to evaluate lung volumes during the breathing cycles using dynamic MRI [45]. First, displacement of the chest wall and the diaphragm as a surrogate of the lung volume and lung surface was measured in dynamic 2D MRI images during the breathing cycle. Thereafter, vital capacity (VC) was calculated using a model, consisting of a half ellipse, an ellipsoid cone, and a prisma [44]. An automatic segmentation in 3D dynamic MRI failed due to intensity inhomogeneities and motion artifacts. Therefore, the 3D data was segmented in a slice-by-slice manner using interactive region growing. For each dataset, the 2 segmentation results were combined into a voxel-based 3D description of the current breathing state. The voxel-based segmentation results were transformed into triangular meshes using the marching cubes algorithm [31]. The isosurfaces were then smoothed in a post-processing step. Total surface lung areas were computed by summing up the area of each individual triangle.

Heydarian et al. proposed a semi-automated method for generating3 He measurements of individual slice (2D) or whole lung (3D) gas distribution [19]. The same technique was also used by Kirby et al. [26]. The authors applied hierarchical K-Means [34] clustering for the3 He and a seeded region-growing algorithm [56] for anatomical MR images. Thereafter, the images were registered semi-automatically using landmark-based registration [41]. Finally, the segmented areas were postprocessed with a morphological closing algorithm with a structuring element disk [55]. The vertical central region was excluded from the processing to avoid connecting the right and left lung areas.

Tokuda et al. measured lung volumes in dynamic 3D lung images [58]. The segmentation was performed in a slice-by-slice manner. In each slice, the lung area was segmented using a combination of confidence-connectedness and fuzzy-connectedness region growing algorithms, implemented in the Insight Segmentation and Registration Toolkit [21]. First, a rough segmentation of the lung area was obtained with the confidence-connectedness region growing method. The algorithm extracted a connected set of pixels whose pixel intensities are consistent with the pixel statistics (the mean and variance across a neighborhood) of predefined seed points. The pixels, connected to the seed points, whose values are within the confidence intervals are grouped together. The width of the confidence interval is controlled by a multiplier parameter. Second, the mean and variance of intensities in the lung area were then used for the fuzzy connectedness algorithm [21] to compute an affinity map, which represents degrees of adjacency and the similarity of pairs of nearby voxels. Finally, the refined lung area was extracted by thresholding the affinity maps. The multiplier and threshold parameters were tuned such the segmented lung area included the blood vessels.

Woodhouse et al. used a combination of anatomical and3 He MR images to compare the ventilated and thoracic lung volumes in groups of smokers and never-smokers [66]. The processing was done manually in a slice-by-slice manner.3 He images were also segmented semi-automatically with adaptive thresholding [56]. The threshold value was derived from the signal-to-noise (SNR) [56] value of each image and the mean signal in a manually selected region of interest in each image.

Virgincar et al. analysed129 Xe and SSFP (steady state free precession)1 H images in a semi-automatic manner [64]. The anatomical MRI was segmented by a region growing method [56]. The seeds were placed manually in the lowest intensity areas of the right and left lung in the central slice of the thoracic cavity. Thereafter, a threshold range was computed from the intensities of the seeds. Then, the morphological closing [55] was applied to the extracted lung. The1 H and129 Xe were acquired over different breath-holds, they required registration. It was performed using either affine transform [3] or similarity transform [20]. After the registration, ventilation images were also segmented with the region growing method.

3.2 Model-Based Segmentation

Lelieveldt et al. developed a model-based method to simultaneously segment lungs as well as some other nearby organs [28, 29]. An anatomical model of thorax is built by modeling of individual organs with implicit surfaces from manually delineated training images and subsequent grouping of single organ models into a tree structure. The whole hierarchical scene is described by a boundary model, which characterizes the scene volume as a boundary potential (or energy) function. Thereafter, a model is matched to image data. First, an initial parameter set for the pose and scale parameters is selected. Second, an automatic thresholding based on empirical histogram evaluation discriminates air from tissue. A set of target boundary points is obtained. Third, a model is placed and the energy function is minimized with Levenberg-Marquardt nonlinear by fitting the minimization [32] by fitting the target boundary points to the model. The authors reported that their method was designed for a coarse pre-segmentation and a more accurate local segmentation process is further required.

Tustison et al. proposed an automated segmentation method for differentiation of the ventilated lung volumes on3 He MRI [60]. The method consists of two main steps: template and statistical model construction and individual subject processing. All images in the database are registered to a normalized space. A normalized unbiased template is built from seven representative subjects with a symmetric diffeomorphic registration algorithm [4]. Thereafter, a principal component analysis (PCA) model [15] from an image database is built. Each image is transformed to the template using an affine transformation, so that the presence of any global shape differences in the statistical model is avoided. Then the processing of individual subjects starts. Each dataset is mapped to the template, the bias field is corrected [59], and then a shape-based level set procedure [43] starts to extract the lungs, where the PCA model is used.

3.3 Active Contours and Neural Network Segmentation

Ray et al. implemented a slice-based active contour approach for automatic segmentation of lungs from1 H MRI [48]. Initial snakes are placed automatically within the lung cavities that are to be segmented. Then all snakes are evolved independently of each other. Finally, the union of the regions covered by all snakes is considered as the final results. For the snake evolution, the authors proposed to apply a modified gradient vector flow (GVF) with Dirichlet boundary conditions [47].

Middleton and Damper proposed a combined method, consisting of neural networks and active contours (snakes) to segment lungs in MRI [39]. First, a neural network was trained for binary classification of each pixel as a “boundary” or a “non-boundary”. The inputs to the neural network are 7 × 7 image patches of the pixel to be classified. The weights were determined by training on MR sections with lung boundary pixels segmented by an expert observer. The resulting edge-point image was used as the external energy for the snake evolution.

Tetzlaff et al. analysed lung areas on 2D dynamic MR images [57]. The segmentation was performed semi-automatically, using Graph Cut algorithm [10]. The algorithm was initialized by adapting a bounding box to the size of the region of interest, and then the areas inside and outside of the lungs were roughly marked. If the segmentation leaked into the thoracic wall, an additional scribble had to be drawn at the point of leakage.

Böttger et al. presented a new segmentation approach [8, 9], based on parametric active contours [24]. Discrete surface meshes [12] were used here. The authors proposed to use an additional speed term, which was derived from the magnitude of the Gaussian gradient image. New attractor forces were introduced into the simplex mesh deformation scheme [7]. These forces were obtained from the user defined attractor points. To cope with complex surface geometries and avoid oscillating behavior during the evolution, the mesh was refined during the deformation automatically after every 50 iterations.

4 Discussion

4.1 Methodical Analysis

There are several aspects that define the relationships between the described algorithms. As it is shown above, most of the presented techniques are pipelines consisting of well-known algorithms, such as region growing, intensity clustering or thresholding, morphological operations [55, 56], combined together for a specific purpose. The novelty of the presented pipelines is in the application areas. Therefore, the algorithms are usually designed for a specific data type, i.e., MR sequence, and would often require substantial changes to be adapted to other data. Most of the presented techniques are developed for coronal or axial anatomical MRI. Unfortunately, often no exact data protocols are given in the papers (cf. Table 1). Some techniques are designed for detection of ventilation defects in MR with noble gases (3 He and129 Xe). Since3 He images are often acquired concurrent with the traditional anatomical MR images, some techniques of3 He segmentation are based on the prior segmentation of1 H images and subsequent image registration, for instance, [19]. Moreover, some methods were designed only for 2D (slice-wise) processing [48, 51] and do not consider any 3D object consistency and connectivity. More complicated approaches, such as presented in [22, 60], were developed for a 3D organ extraction. Most of the presented techniques consist of one or two steps and include the user interaction. Usually, the more users are involved in the processing, the simpler the algorithmic pipeline is, since the complicated parts are accomplished by the users. For example, for many algorithms [19, 33] the mediastinum area is manually pre-excluded from the processing, and the user initiates the computations by placing the initial seed points [33, 58, 64] or adapting threshold values [66]. Of course, semi-automatic algorithms are less tedious for the users than the completely manual segmentation, but still can be inappropriate for processing of thousands of datasets. The fully automated pipelines, such as [22, 60], have a significant number of algorithmic steps and pre-selected parameters.

Therefore, the practitioners should take into account the following facts, while selecting, which technique to apply:

  • What type of data is available?

  • How many datasets need to be processed?

  • Is 2D or 3D processing required?

  • What level of user interaction is accepted?

For example, if the user interaction and 2D processing are acceptable, then slice-by-slice algorithmic solutions for1 H or3 He MR data, similar to the ones presented in [19, 26, 66], can be considered.

4.2 Performance Evaluation

All automatic and semi-automatic methods require some form of quantitative measure of the accuracy. Usually, ground truth masks are obtained manually and the results produced by the algorithm are compared to it. Here, region based metrics, such as DICE coefficient [14] or sensitivity and specificity, are often applied. Moreover, for certain applications the correlation to the clinical examinations, such as spirometry, is important. Hence, the correlation coefficients between these measurements are assessed.

However, there is no unique system of quality measures used in the community, which complicates the evaluation of the algorithms’ efficiency. This section reviews the approaches, presented above, from the quantitative and qualitative perspectives. A summary of the test sets and details of validation for each method is given in Table 2. The order of the rows in kept the same, as in Table 1.

Table 2 A summary of performance measurements of the segmentation methods on MR lung segmentation

4.2.1 Region-Based and Edge-Based Metrics

Sensakovic et al. [51, 52] and Sensakovic and Armato [50] applied their technique to a random sample of 101 thoracic MR sections. True lung regions were manually delineated by two radiologists. 90 % of the patient data included abnormalities, such as mesothelioma, scarring, enlarged lymph node and others. To measure the segmentation quality, the authors used an area-of-overlap (AOM) measure, also known as the Jaccard coefficient [23]. The AOM measure of two regions, A and B, is defined as the number of pixels contained within the intersection of the regions divided by the number of the pixels contained within the union of the regions:

$$\displaystyle{ AOM = \frac{A \cap B} {A \cup B}\mbox{, }AOM \in [0,1], }$$
(1)

where 0 corresponds to disjoint regions and 1 is the complete overlap. The AOM is similar to the Dice coefficient [14], but the normalizing part (the denominator) is different. The reported AOM values were equal to 0. 82 ± 0. 16 and 0. 83 ± 0. 13, when to compared to the first and second observers, respectively.

Ivanovska et al. [22] tested the proposed method on ten randomly selected participants with normal lungs. The accuracy evaluation was done by the methodology proposed by Udupa et al. [61]. Namely, the following volume measures (VF: volume fractions) were computed: TPVF (true positives is the fraction of voxels in the intersection of the automatic and manual segmentation results), FPVF (false positives is the fraction of voxels falsely identified by the automatic segmentation), FNVF (false negatives is the fraction of voxels defined in manual segmentation, but missed by the automatic method). Additionally, TPVE (true positive volume error) was calculated. Let C exp and C auto denote the binary masks produced the expert and the automatic pipeline, correspondingly. Then, the measures are defined as

$$\displaystyle\begin{array}{rcl} TPV F = \frac{\vert C_{auto} \cap C_{exp}\vert } {\vert C_{exp}\vert },& &{}\end{array}$$
(2)
$$\displaystyle\begin{array}{rcl} FNV F = \frac{\vert C_{exp} - C_{auto}\vert } {\vert C_{exp}\vert },& &{}\end{array}$$
(3)
$$\displaystyle\begin{array}{rcl} FPV F = \frac{\vert C_{auto} - C_{exp}\vert } {\vert C_{exp}\vert }.& &{}\end{array}$$
(4)

Let V auto and V exp denote the volumes, computed by the algorithm and the expert, correspondingly. Then TPVE is defined as

$$\displaystyle{ TPV E = \frac{\vert V _{auto} - V _{exp}\vert } {\vert V _{exp}\vert }. }$$
(5)

The accuracy measures have been computed for left and right lungs separately. For the left lung: \(TPV F = 97.06\pm 1.36\,\%,FPV F = 2.97\pm 1.08\,\%,FNV F = 2.94\pm 1.36\,\%,TPV E = 1.58\pm 1.16\,\%\). For the right lung: \(TPV F = 97.33\pm 1.57\,\%,FPV F = 3.96\pm 1.69\,\%,FNV F = 2.66\pm 1.57\,\%,TPV E = 2.37\pm 1.74\,\%\). Moreover, the authors acquired the ground truth from two independent experts and observed that agreement about 93. 5 %, which shows that the automatic result accuracy lies with the variation interval between the experts.

Lui et al. [33] validated their method on four healthy and six asthmatic subjects, calculating the ventilated lung volume (LVL). To assess the accuracy of the method, the authors calculated a Dice coefficient [14], comparing their results to the manually obtained ground truth. The Dice coefficient measures the overlap between the regions A and B:

$$\displaystyle{ DICE = \frac{2\vert A \cap B\vert } {\vert A\vert + \vert B\vert }\mbox{, }DICE \in [0,1], }$$
(6)

when 0 there is no overlap, 1 is the perfect match. The reported Dice value is 0. 96 ± 0. 01. Bland-Altman analysis [36] was used to determine the 95 % limits of agreement calculated from the mean and standard deviation of the volume difference between the segmentation results and the ground truth. The authors processed a total of 109 coronal slice for ten subjects. It showed that the means LVLs of the semi-automatic approach were 3. 88 ± 0. 75 L and 3. 83 ± 1. 11 L for healthy and asthmatics subject, respectively. The differences to the manual and standard spirometric measurements were statistically insignificant.

Ray et al. [48] applied their approach to ten different datasets. The authors measured Pratt’s figure of merit (FOM) [1]. FOM ∈ [0, 1]. The FOM quantifies the comparison between ideal edges (ground truth) and detected edges of the image.

$$\displaystyle{ FOM = \frac{1} {max(I_{A},I_{I})}\sum _{i=1}^{IA} \frac{1} {1 + d_{i}\alpha ^{2}}, }$$
(7)

where I A and I I are the detected and ideal edge images, respectively. d i is the distance between the actual and ideal edge i, and α is a penalty factor for displaced edges. The average obtained FOM is 0. 69. The authors also utilized a region-based measure, namely, the percentage error, computed as

$$\displaystyle{ Error = \frac{\sum _{i}\vert Segm(i) - I_{g}(i)\vert } {\sum _{i}I_{g}(i)} 100\,\%, }$$
(8)

where Segm and I g are the segmentation result and the ground truth, correspondingly. This metric is a combination of FP (false positives) and FN (false negatives) measures. The mean percentage error for ten datasets was about 6 %.

Tustison et al. [60] used data from seven random subjects to build the unbiased template. Experimentally, it was seen that such number of datasets provided a satisfactory compromise between quality of results and required computational time. For the PCA statistical model 156 images from normal subjects were used. For the evaluation two comparative analyses were performed. First, the number of defects scored by two human readers in 43 subjects. Here, the intraclass correlation coefficient (ICC) [37].This metric is a general measurement of agreement or consensus, where the measurements used are assumed to be parametric (continuous and has a Normal distribution). There was a high agreement between the algorithm and the readers (ICC = 0. 85 and ICC = 0. 86 for the first and second observers, correspondingly). Second, the simultaneous truth and performance estimation (STAPLE) [65] was performed on 18 subjects in which the ventilation defects were manually segmented by four human readers. Here, the sensitivity and specificity are defined as

$$\displaystyle\begin{array}{rcl} Sens = \frac{TP} {TP + FN},& &{}\end{array}$$
(9)
$$\displaystyle\begin{array}{rcl} Spec = \frac{TN} {TN + FP},& &{}\end{array}$$
(10)

where TP, FN, FP, TN denote the true positives, the false negatives, false positives, and true negatives, respectively. The STAPLE results yielded the best sensitivity and specificity combination for the algorithm (\(Sens = 0.898,Spec = 0.905\)).

Heydarian et al. [19] tested their method on two healthy subjects and two subjects with chronic obstructive lung disease (COPD). The results of hierarchical K-means 2D and 3D segmentation were compared to an expert observer’s manual segmentation results using linear regression, Pearson correlations [11] and the Dice similarity coefficient. 2D hierarchical K-means segmentation of ventilation volume (VV) and ventilation defect volume (VDV) was strongly and significantly correlated with manual measurements (VV: r = 0. 98, P < 0001; VDV: r = 0. 97, P < 0. 0001) and mean Dice coefficients were greater than 0. 92 for all subjects. 3D hierarchical K-means segmentation of VV and VDV was also strongly and significantly correlated with manual measurements (VV: r = 0. 98, P < 0. 0001; VDV: r = 0. 64, P < 0. 0001) and the mean Dice coefficients were greater than 0. 91 for all subjects.

Kirby et al. [26] applied the same approach for segmentation of ventilation defect volume (VDV) and ventilation volume (VV) on five patients with asthma, five patients with chronic obstructive pulmonary disease (COPD), and five patients with cystic fibrosis (CF). They compared semi-automatic and manual measurements and observed strong significant correlations between the VDV values generated by each method (asthma: r = 0. 89, P < 0. 0001; COPD: r = 0. 84, P < 0. 0001; CF:r = 0. 89, P < 0. 0001). The spatial agreement for VV values was measured with the Dice coefficient (asthma: 0. 95; COPD: 0. 88; CF: 0. 9).

Middleton and Damper [39] applied their technique to 13 datasets. To evaluate the performance, they utilized the following region-based measures: precision (P), recall (R), effectiveness E [49], and segmentation performance \(F = 1 - E\). The measures are defined as:

$$\displaystyle\begin{array}{rcl} P = \frac{TP} {TP + FP},& &{}\end{array}$$
(11)
$$\displaystyle\begin{array}{rcl} R = \frac{TP} {TP + FN},& &{}\end{array}$$
(12)
$$\displaystyle\begin{array}{rcl} E = 1 - \frac{PR} {(1-\alpha )P +\alpha R},& &{}\end{array}$$
(13)

where α = 0. 5, i.e., the precision and recall are weighted equally. The mean of F was 0. 866 and 0. 844 for the left and right lungs, respectively.

Böttger et al. [9] applied their tool to 10 datasets and compared the semi-automatically obtained results to two expert readings. For evaluation, the average surface distance, Hausdorff distance, and the Dice coefficient of two compared segmentation were computed. For all segmentation, the average surface distance was significantly lower than 2 mm, and the Hausdorff distance was lower than 20 mm, apart from one outlier. The Dice coefficient was higher than 0. 88.

4.2.2 Other Metrics

Plathow et al. [45] measured on a dynamic 2D MRI displacement of the chest wall and the diaphragm as a surrogate of the lung volume and lung surface in 20 healthy subjects. Moreover, a 3D volumetric evaluation of the breathing cycle was done, using a dynamic 3D MRI. The results were correlated to spirometry and the vital capacity (VC) [46] was measured. VC using spirometry was 4. 3 ± 1. 0 L; using the 2D model, VC was 4. 9 ± 1. 2 L. Using the 3D MRI VC was 4. 65 ± 0. 9 L. Correlation between spirometry and the 2D and 3D models was highly significant: the Spearman’s correlation coefficient r > 0. 83 and significance level P < 0. 005 [11]. The differences of absolute VC values between spirometry, 2D and 3D MRI were insignificant.

Tokuda et al. [58] assessed lung motion sequences from two volunteers, the quality of segmentation was determined by visual observation of the original images.

Woodhouse et al. [66] measured the ventilated lung volumes from a combination of3 He and proton single-shot fast spin echo (SSFSE) coronal MR images in groups of “healthy” smokers (five subjects), smokers with moderate COPD (five subjects), and never-smokers (eight subjects). The results (volume values) from slice-wise semi-automatic segmentation and manual segmentations from two observers were compared to each other. It showed high agreement (Pearson correlation coefficient r ≥ 0. 94, P < 0. 05). The main source of disagreement occurred in the central five slices of each dataset, with the middle slice centered on the bifurcation of the trachea.

Virgincar et al. [64] developed their method for129Xe MR ventilation and anatomical MR images and tested in on a group of forty four participants: 24 healthy subjects, 10 subjects with COPD, 9 subjects with Global Initiative for Chronic Obstructive Lung Disease (GOLD), and 10 age-matched control (AMC) subjects. Ventilation images were quantified by two methods: the ventilation defect percentage (VDP), as the ratio between the thoracic cavity volume and ventilated volume, was computed from the semi-automatically segmented images; an expert computed the ventilation score percentage (VDS%). For the ventilation images, the intensity histograms from the thoracic cavity volume were analysed, and the coefficient of variation (CV) was computed there. The study showed that there was a correlation between VDS% and VDP (r = 0. 97, P < 0. 0001), and between VDS% and CV (r = 0. 82, P < 0. 0001).

Lelieveldt et al. [28, 29] tested their pipeline on 15 MR scans and assessed the total model matching. However, no separate evaluation of the lung segmentation results was done.

Kullberg et al. [27] used the segmented lungs only as a marker for further detection of adipose tissue in the body. No additional evaluation of the lung segmentation accuracy was done.

Tetzlaff et al. [57] tested their method on 10 datasets from healthy patients and compared the semi-automatically obtained volumes with the spirometric volumes. The comparison showed high agreement (mean Pearson correlation coefficient is ≥ 0. 97).

5 Conclusion

The paper provides a review of existing automatic and semi-automatic methods for human lung segmentation from MR data. The categorization of the methods is done according to the main segmentation strategy. The approaches are related and compared to each other from the application point of view. Namely, such aspects as data types, which the techniques are designed for, the level of user involvement and the methods’ complexity are discussed. Accuracy of the segmentation methods is crucial according to the nature of the work. The review of performance evaluation approaches is also done. Unfortunately, there is no standardized evaluation system, which would make the methods really comparable to each other. Moreover, for certain application areas there is no existing ground truth, which additionally complicates the evaluation.