Accurate assessment of liver volume is fundamental in hepatic surgery prior to major hepatectomy and transplantation. Performing liver volumetry is of growing importance given recent increases in extended hepatectomies and split-liver and living-donor liver transplantations [1]. Automation of liver volumetric methods has been shown to improve repeatability and accuracy while reducing processing times [24].

Liver segmentation has traditionally been performed on CT images due to easy accessibility, short acquisition time, and high spatial resolution [46]. However, MRI offers the advantage of simultaneous assessment of vascular and biliary anatomy and biomarkers of diffuse liver disease (fat, iron, and fibrosis) [710]. Advances in MRI have prompted new indications for accurate whole-liver segmentation in estimating volume-averaged biomarkers, such as steatosis distribution maps [1012]. Furthermore, MRI minimizes the risk of radiation exposure and nephrotoxicity [7, 13].

Studies examining automated liver volumetry on MRI are limited, presumably because of increased variability and difficulty compared to CT [14]. Once validated, automated liver volumetry could be integrated into a complete preoperative evaluation which includes the assessment of vascular and biliary anatomy and diffuse liver disease on MRI [7, 11].

Although numerous studies have previously proposed automated segmentation methods, these have not necessarily translated to routine clinical use [15]. Limitations in clinical validation, rather than lack of technical ingenuity, are thought to be the cause of this slow adaptation by the medical community [16]. In order to overcome such methodological weaknesses, a validation framework for a novel automated segmentation method should include the following elements [16]: use of a valid reference standard; datasets for validation which are reflective of actual clinical practice; clear metrics for measurement of segmentation precision, accuracy, efficiency, and error; and comparison of metrics using effective statistical tools. We attempted to incorporate these defined elements into our validation framework.

In this article, we evaluate a novel semiautomated segmentation method which uses variational shape interpolation and a Laplacian mesh optimization framework [17]. This method is compatible with both MRI and CT, which has only sparingly been previously described [18]. The method does not require prior statistical input and includes mesh-based correction tools to improve precision during interactive segmentation.

The purpose of our study was to compare the repeatability, agreement, and efficiency of liver MRI- and CT-based semiautomated segmentation when compared to CT-based manual segmentation. A secondary aim was to validate segmentation quality using error metrics which highlight various aspects of segmentation agreement and facilitate comparison with prior literature [15]. Subsegmental volumetry was performed based on classic anatomic principles and vascular landmarks as defined by Couinaud [19].

Materials and methods

This study received approval prior to commencement from our institutional review board. Given the study design (retrospective, cross-sectional), informed consent requirements were waived.

Study subjects

Patients were included if they underwent both MRI and CT examinations within 2 weeks between January 2010 and March 2013 for preoperative assessment of hepatobiliary and pancreatic disease. The MRI study protocol was required to include gadolinium injection. The CT study protocol required image acquisition in portal venous phase. A total of 31 subjects (18 men, 13 women; mean age, 59 years) requiring preoperative evaluation using MRI and CT were included. These subjects had a spectrum of liver diseases. The demographic and clinical information of the study subjects are summarized in Table 1.

Table 1 Subject demographics

MRI technique

MRI was performed with a 1.5-T unit (Discovery MR450, GE Medical Systems, Milwaukee, WI) using a 12-channel phased-array body coil. Segmentation was subsequently performed on the portal venous phase of a dynamic contrast-enhanced fat-suppressed 3-dimensional (3D) T1-weighted gradient-recalled echo (GRE) sequence (LAVA sequence). The 3D GRE sequence parameters were as follows: repetition time, 3.9–4.8 ms; echo time, 1.7–2.1 ms; flip angle, 12°; section thickness, 4–8 mm (average 5.5 mm); spacing between sections, 2.2–4.5 mm (average 2.7 mm); field of view, 380 mm; reconstruction matrix, 256 × 256 or 512 × 512; and parallel imaging acceleration factor, 2. A weight-adjusted dose (0.1 mmol/kg body weight) of gadobenate dimeglumine (MultiHance; Bracco Diagnostic Inc., Princeton, NJ) was administered intravenously as a bolus at a rate of 2 ml/s using a power injector (Mallinckrodt, Optistar™ Elite, St. Louis, MO), followed by a saline flush of 15 ml.

CT imaging technique

CT imaging was performed with a 64-detector MDCT scanner (Brilliance 64, Philips Medical Systems, Cleveland, OH) under standard abdominal imaging protocols. The parameters were as follows: rotation time, 0.75 s; detector collimation, 64 × 0.625 mm; helical pitch, 0.9; tube voltage, 120 kV; X-ray tube current: 126–499 mA; and tube current–time product, varied based on noise index. Image reconstruction was in a 282–500 mm display field of view, depending on the patient’s physique. Reconstruction section thickness was 2.5 mm with a section gap of 2 mm. Reconstructed CT slices had a matrix size of 512 × 512 pixels with pixel spacing ranging from 0.55 to 0.78 mm. Prior to all examinations, a weight-adjusted dose of a non-ionic, low osmolar, iodinated contrast agent (375 mgl/ml Isovue; Bracco Diagnostic Inc., Princeton, NJ) was administered intravenously at a rate of 4 ml/second. All CT examinations included a portal venous phase with a delay of 60 s.

Study workflow

Liver segmentation was performed by three image analysts; two radiology residents (AG and KV, 2 and 3 years of experience) and one biomedical engineering PhD candidate (GC, 3 years of experience). The image analysts were previously trained during a CT-based liver segmentation validation study on a different dataset. Two analysts independently performed semiautomated segmentation of MRI and CT images. The same analysts repeated segmentation in a random order 2 weeks later to prevent recall bias. A third analyst performed manual segmentation of CT images to establish the reference standard. The manual segmentation results were supervised by an abdominal radiologist (AT, 8 years of experience). Image analysts were blinded to their own segmentation results and to the results of the other readers. Interaction time was recorded for all segmentations.

Manual segmentation

To establish the reference standard, axial portal venous phase CT images were uploaded onto an imaging display software (SliceOmatic 4.3 Rev-11, TomoVision, Montreal, QC). Analysts manually outlined the liver using a cursor on each axial slice. This allowed for the creation of “active contours” which could be propagated to adjacent slices [20]. Furthermore, manual deformation of the active contours was performed for each axial image to adequately delineate the liver. Cross-sectional areas were compiled and multiplied by the slice thickness to obtain section volumes. These were added to determine the total liver volume for each patient. 3D surface meshes created for each liver were used for visual comparison and error metric calculations. Manual segmentation of MR images was also separately performed in a similar manner. These segmentations were specifically used for error metric calculations when comparing semiautomated MRI and manual MRI surface meshes.

Semiautomated segmentation and subsegmentation

The semiautomated segmentation method was developed at an imaging laboratory (Imaging and Orthopaedics Research Laboratory (LIO), Montreal, QC) in collaboration with the biomedical imaging team. The method is adapted from a previously validated CT-based liver segmentation method [21]. Certain features were modified to allow compatibility with both MRI and CT modalities, such as the feature-matching step outlined below. The code was implemented in C++ using VTK (Kitware Inc., 2014, Clifton Park, NY) as a rendering external library. Contrast-enhanced MRI and CT examinations were uploaded to the segmentation program. The user (interactive) and automated (computer) tasks required for semiautomated liver segmentation are presented in Fig. 1. The segmentation method consists of three main steps.

Fig. 1
figure 1

Overview of steps involved in semiautomated liver segmentation of CT and MRI images. The user initially delineates the liver surface (2 contours per multi-planar view) from which an initial shape is defined. Variational shape interpolation is then applied to generate a 3D surface mesh. Feature-matching and Laplacian mesh optimization deform the mesh vertices toward matched targets on the actual liver boundary. The surface mesh can then be further manipulated with the aid of locally constrained optimization

Initialization

In order to generate an initial shape, the user must click to position nodes around the liver contour in multi-planar views. Two contours are placed per orthogonal plane in such a way to globally outline the liver contour while being sufficiently apart to capture specific hepatic features. In our experience, this number of contours is generally sufficient to generate a reliable initial shape, corroborating the findings of Wimmer et al. [22]. The drawn contours automatically snap onto the liver boundary using an algorithm based on image warping and minimal path segmentation [17]. An energy-minimizing implicit function (variational shape interpolation) is then applied to generate a 3D surface mesh from these initial sparse contours [23, 24].

Shape deformation

After adequate initialization of a primary liver shape, an automated optimization method is used to further refine the segmentation. Feature-matching assigns each vertex of the initial 3D surface mesh with a corresponding target point representing the most probable location of the liver boundary. This target point is determined along intensity profiles as the point of maximal intensity difference between inward (liver) and outward (non-liver) intensities. This step differs for CT and MRI images, contributing to the multi-modality versatility of the segmentation method. For MRI, the inward intensity is predicted for each vertex based on intensity of the surrounding tissues, while for CT it is a fixed value based on the estimated liver parenchymal intensity. Laplacian mesh optimization [25] is used to deform the mesh vertices toward their matched targets on the liver boundary while ensuring a smooth local curvature.

Interactive corrections

At times, the initial 3D surface mesh might be too distant for the intensity profile to reach the liver boundary. Additionally, adjacent structures may display similar intensity as the liver parenchyma leading to target error. For such cases, a correction tool was implemented to modify the final mesh shape. This tool allows the user to click on the surface mesh and manipulate it to the desired location. This launches a locally constrained optimization of the mesh with relocation of adjacent vertices.

For subsegmentation, the vertical planes were defined by drawing straight lines through the hepatic veins and the insertion at the inferior vena cava (IVC). A horizontal plane was established by drawing a straight line at the level of the portal bifurcation, and these separated segments II/III, IVa/IVb, V/VIII, and VI/VII. The liver tissue between the posterior aspect of the portal bifurcation and the IVC was encapsulated via a polygon which was automatically propagated to define the caudate lobe (Fig. 2).

Fig. 2
figure 2

A Axial portal venous phase contrast-enhanced MRI slice demonstrating the caudate lobe, and segments II, IVa, VII, and VIII. Three vertical planes are defined by drawing straight lines through the left, middle, and right hepatic veins with their insertion at the IVC. A polygonal shape is automatically propagated to define caudate lobe. B Oblique anterior-posterior 3D rendering defining the liver subsegments

Statistical analysis

Statistical analyses was performed with SPSS software for Windows, version 21.0 (Chicago, IL). Mean whole liver volumes obtained from semiautomated segmentation of MRI and CT images were calculated by averaging the four readings for each modality. Intra-class correlation coefficients (ICC) and Bland–Altman analysis were used to determine intra-reader repeatability for semiautomated segmentation of CT and MRI images. ICC and Bland–Altman analysis were also used to determine inter-reader and inter-method agreement, with manual CT-based segmentation being used as the reference standard for the latter. The agreement for liver volume was reported as bias ±1.96 SD of the differences, followed by the 95% limits of agreement interval [26].

The differences between semiautomated and manually segmented surface meshes for both MRI and CT were analyzed with 4 additional error measures: volumetric overlap error, average symmetric surface distance, root mean square (RMS) symmetric surface distance, and maximum symmetric surface distance. Detailed description of these error metrics can be found in a study by Heimann et al. [15]. In addition, paired T-tests were used to compare total interaction time for MRI- and CT-based semiautomated segmentations with CT-based manual segmentation.

Results

Liver volumes

The mean liver volume obtained from semiautomated MRI segmentations was 1831 ± 679 ml (mean ± 1.96 SD), from semiautomated CT segmentations was 1756 ± 702 ml, and from manual segmentations of CT images (reference standard) was 1817 ± 680 ml.

Detailed repeatability and agreement results are reported for both readers in Table 2. To simplify the results in this section, we report the weaker (i.e., larger limits of agreement) results obtained by reader 1 or 2 below. Inter-reader agreement for segmental volumetry is shown in Table 3.

Table 2 Intra-reader repeatability and inter-reader and inter-method agreement
Table 3 Inter-method segmental correlation and agreement

Intra-reader repeatability

The ICC were above 0.987 for MRI-based intra-reader repeatability and above 0.995 for CT-based intra-reader repeatability. Bland–Altman analysis revealed an intra-reader repeatability of 30 ± 217 ml (mean ± 1.96 SD) (95% limits of agreement: −187 to 247 ml) for MRI-based semiautomated segmentation and −10 ± 143 ml (−153 to 133 ml) for CT-based semiautomated segmentation.

Inter-reader agreement

The ICC was 0.996 for both MRI- and CT-based inter-reader agreement. Bland–Altman analysis revealed an inter-reader agreement of 6 ± 123 ml (−117 to 129 ml) for MRI-based semiautomated segmentation and 20 ± 125 ml (−105 to 145 ml) for CT-based semiautomated segmentation.

Inter-method agreement

The ICC were above 0.995 for MRI-based semiautomated segmentation and above 0.986 for CT-based semiautomated segmentation when compared to manual CT. Bland–Altman analysis revealed an inter-method agreement of −14 ± 136 ml (−150 to 122 ml) for MRI-based semiautomated segmentation (Fig. 3) and 50 ± 226 ml (−176 to 276 ml) for CT-based semiautomated segmentation when compared to manual CT (Fig. 4).

Fig. 3
figure 3

Bland–Altman plot of the volume difference between MRI-based semiautomated and CT-based manual liver segmentation and their mean volume for reader 2. Mean bias is demonstrated with solid line and 95% limits of agreement with dashed lines

Fig. 4
figure 4

Bland–Altman plot of the volume difference between CT-based semiautomated and CT-based manual liver segmentation and their mean volume for reader 1. Mean bias is demonstrated with solid line and 95% limits of agreement with dashed lines

Correlation for segmental volumetry varied between 0.584 (segment IVb) and 0.865 (segment VII) for MRI- and between 0.596 (segment II) and 0.890 (segment VII) for CT-based semiautomated segmentation (Table 3). Inter-method segmental agreement ranged from 10 ± 47 ml (−37 to 57 ml) (segment I) to 2 ± 214 ml (−212 to 216 ml) (segment V) for MRI- and 9 ± 45 ml (−36 to 54 ml) (segment 1) to −46 ± 183 ml (−229 to 137 ml) (segment IVa) for CT-based semiautomated segmentation.

Clinical examples

Examples of concordant and discordant cases displaying MRI- and CT-based semiautomated segmentations are shown in Figs. 5 and 6, respectively.

Fig. 5
figure 5

A 61-year-old male with a Klatskin tumor. A MRI and B CT axial images demonstrating segmentation concordance between readers using manual and semiautomated segmentation methods. Manual CT = green tracing, reader 1 semiautomated = blue tracing, reader 1′ semiautomated = red tracing, reader 2 semiautomated = magenta tracing, reader 2′ semiautomated = yellow tracing

Fig. 6
figure 6

A 47-year-old man with pancreatic cancer metastases. A MRI and B CT axial images demonstrating segmentation discordance between readers using manual and semiautomated segmentation methods. Segmentation error on MRI and CT is noted at indistinct boundaries with adjacent organs (stomach, body muscles, vessels) and at the liver hilum. Segmentation error on MRI is also noted at convex boundaries and areas of high curvature. Manual CT = green tracing, reader 1 semiautomated = blue tracing, reader 1′ semiautomated = red tracing, reader 2 semiautomated = magenta tracing, reader 2′ semiautomated = yellow tracing

Error measures with MRI

Intra-reader measures of MRI-based segmentation performance were comparable between readers 1 and 2. Comparison between semiautomated MRI and manual MRI surface meshes for reader 2 revealed a volumetric overlap error of 11.6 ± 3.4% (mean ± standard deviation), an average symmetric surface distance of 2.3 ± 0.6 mm, a root mean square symmetric surface distance of 28.0 ± 10.4 mm, and a maximum symmetric surface distance of 3.8 ± 1.2 mm (Table 4).

Table 4 Segmentation performance measures for MRI

Error measures with CT

Intra-reader measures of CT-based segmentation performance were comparable between readers 1 and 2. Comparison between semiautomated CT and manual CT surface meshes for reader 1 revealed a volumetric overlap error of 9.2 ± 2.5% (mean ± standard deviation), an average symmetric surface distance of 1.7 ± 0.4 mm, a root mean square symmetric surface distance of 24.9 ± 6.9 mm, and a maximum symmetric surface distance of 3.0 ± 0.9 mm (Table 5).

Table 5 Segmentation performance measures for CT

Interaction time

Interaction time (mean ± SD) per case was significantly shorter for MRI-based semiautomated segmentation (7.2 ± 0.1 min, p < 0.001) and CT-based semiautomated segmentation (6.5 ± 0.2 min, p < 0.001) than for CT-based manual segmentation (14.5 ± 0.4 min).

Discussion

This retrospective study evaluated the repeatability, agreement, and efficiency of MRI- and CT-based semiautomated segmentation, using CT-based manual segmentation as the reference standard method. A strength of our study lies in the paired comparison of two imaging modalities, while using the same independent reference standard. Our choice of a semiautomated liver segmentation method was supported by recent studies which found interactive methods to be generally more accurate and reliable than fully automated methods [15]. Segmentation was customized for MRI and CT using a varying feature-matching strategy, demonstrating the versatility of our method.

Overall, semiautomated volume measurements for both MRI and CT strongly correlated with volumes obtained by manual segmentation. MRI-based and CT-based semiautomated volumetry were highly repeatable and showed strong agreement with the manual method. Intra-reader repeatability for MRI-based semiautomated segmentation was comparable to the results for CT. However, Bland–Altman analysis showed slightly higher variability compared to previous studies evaluating automated segmentation methods with two readers. Mazonakis et al. [27] examined 38 consecutive patients referred for MRI examination and found the repeatability coefficients of 51.6 and 68.2 ml, while Hermoye et al. [2] studied 18 liver donors and found the repeatability coefficients of 52 and 64 ml. Our study examined only pathological livers which may explain increased variability in the repeatability calculation.

Our study showed superior ICC values for MRI-based inter-method agreement compared to prior studies: 0.98 [14, 18, 28] and 0.76–0.93 [29]. Furthermore, our limits of agreement were similar to those obtained in recent studies: −108 to 91 ml [27], −163 to 134 ml [14], and −278 to 204 ml [29].

Inter-method agreement between CT-based semiautomated segmentation and manual segmentation compared favorably to recently published studies. Previous studies have shown ICC values for CT-based semiautomated segmentation from 0.94 to 0.994 [18, 30] and limits of agreement of −117 to 124 ml [21], −230.3 to 327 ml [6], −211 to 278 ml [30], and −503 to 509 ml [31].

In our study, the 95% limits of agreement were 136 ml for MRI-based semiautomated segmentation and 226 ml for CT-based semiautomated segmentation when compared to manual CT segmentation. When reported as a percentage over the mean liver volume obtained by manual CT segmentation, these limits of agreement only represent 7.5 and 12.4%.

In comparison, studies that have assessed differences between liver volumes determined by CT and volumes measured by water displacement—the intraoperative reference standard—have reported differences of 33% [32, 33], 13% [34, 35], and 2% [36]. Hence, while the volume differences between semiautomated and manual segmentation volumes may appear large, they remain small compared to errors observed intraoperatively. Further, the error of our semiautomated method is expected to be closer to the small bias of 14 ml for MRI-based segmentation and 50 ml for CT-based segmentation than the extreme differences reported based on the 95% limits of agreement.

Subsegmentation was performed using anatomic and vascular landmarks to divide the hepatic segments. Our previous work on CT-based liver subsegmentation had shown difficult segmentation of the caudate lobe as the boundaries are defined somewhat arbitrarily and not by vascular structures, which was also demonstrated in this study [37]. Segmentation of the caudate lobe demonstrated the narrowest limits of agreement, likely due to segment 1 being the smallest of the liver segments.

Inter-method segmental agreement was similar for both CT- and MR-based semiautomated segmentation. The segmental limits of agreement were generally wider than those for whole-liver volumetry. The choice of portal vein phase sequences likely contributed to this discrepancy. Although this phase is ideal to outline liver parenchyma, the hepatic veins were often not fully enhanced, which increased error when establishing the vertical planes for subsegmentation. Future research focused on liver subsegmentation may benefit from alternative injection phases for this purpose.

Segmentation quality was further evaluated with volumetric and surface error measures which have previously been used in the setting of segmentation evaluation frameworks [21, 38]. The comparison of meshes obtained from semiautomated and manual methods also aided in the direct visualization of segmentation discrepancies. In order to adequately evaluate MR segmentation meshes, we also performed manual MR segmentation. This was required as inter-modality comparison of meshes (i.e., semiautomated MR and manual CT) is not possible due to inherent differences in image acquisition such as variable breath-holds and elastic liver deformation. Such a comparison would result in misregistration of meshes and artificial elevation of surface error measures.

For MRI, the comparison between semiautomated and manual surface meshes revealed a volumetric overlap error similar to CT: 11.6 ± 3.4%. Our results were comparable to those of another MRI-based automated segmentation study which achieved a volumetric overlap error of 11.2%, an average symmetric surface distance of 2.2 mm, and a maximum symmetric surface distance of 34 mm [39].

Semiautomated segmentation significantly reduced the interaction time required for determination of liver volume. Recently published studies have described semiautomated segmentation times ranging from 8 ± 2 min to 13.3 ± 4.5 min for MRI-based methods [11, 28] and from 4.4 ± 1.9 min to 8.0 ± 1.2 min for CT-based methods [6, 21, 40]. Our aim was to reduce the total interaction time to less than 10 min as any task requiring a longer period would not be sustainable in the context of a busy radiology practice. We believe that such an interaction time offers the right balance of user feedback and software automation.

Segmentation errors on MRI were noted at similar locations to CT: primarily at the liver interface with adjacent structures (i.e., muscles, diaphragm, spleen, stomach), at the liver hilum, adjacent to tumors, and near blood vessels. In addition, areas of convex and concave boundaries and high curvature (such as liver dome) contributed significantly to segmentation error. Under-segmentation on MRI occurred at low-contrast liver boundaries and areas of inhomogeneous density, whereas over-segmentation usually occurred at organs abutting the liver, as noted by Huynh et al. [14]. Motion, pulsation, and partial volume artifacts have also been shown to impede segmentation accuracy.

Certain outliers were identified in the Bland–Altman analysis which likely influenced the final results. In the MR-based analysis, two patients had advanced metastatic pancreatic cancer. Imaging of these patients demonstrated paucity of peritoneal fat, likely secondary to cachexia. This made differentiation of liver and adjacent soft tissues on MRI difficult leading to semiautomated segmentation error. A third patient with colorectal cancer metastasis had a particularly dysmorphic liver which the semiautomated method could not accurately delineate. For the CT-based analysis, one patient had a very large abscess within segment 5/8. While this abscess was completely included during manual segmentation, it was only partially included during semiautomated segmentation due to similarities in attenuation between liver parenchyma and the abscess itself.

Our study had certain limitations. First, our choice of manual CT segmentation as the reference standard for validating an MRI-based semiautomated method had not previously been described. Other studies validating automated MRI methods have traditionally relied on manual MRI segmentation as the reference standard [2, 14, 18, 2729, 41, 42]. We opted for an independent reference standard in order to validate both MRI- and CT-based semiautomated segmentation as our method is compatible with both. This common standard promotes head-to-head comparison of automated segmentation accuracy between MRI and CT, which was not addressed previously. Manual CT segmentation has been used as the reference standard in numerous other similar studies [1, 6, 15, 30, 31, 43, 44]. Resected surgical volume or weight may also have been alternate reference standards. However, ex vivo liver volume may be falsely estimated due to blood loss and changes in hydrostatic pressure following surgical resection [43, 45, 46], thus making manual segmentations a more reliable choice.

Second, our validation scheme utilized similar MRI acquisition parameters as we did not perform a systematic study of segmentation robustness. As in previous studies, a 3D T1-weighted GRE sequence was used for MRI-based segmentation [45, 46]. The portal venous phase was chosen as it maximizes contrast between the liver and adjacent structures [30].

In conclusion, our validation study suggests that a semiautomated liver segmentation method compatible with both MRI and CT can provide strong agreement and repeatability when compared to manual segmentation while shortening the interaction time. Given recent advances in MRI-based biomarkers of chronic liver disease, accurate estimation of liver volume using MRI is of significance. Automated volumetry could also be integrated into a complete MRI-based preoperative evaluation to assess vascular and biliary anatomy and liver quality. Future studies may validate alternative MRI sequences for liver volumetry, particularly fat quantification sequences, and study vascular subsegmentation.