Introduction

Myocardial perfusion SPECT (MPS) has been one of the most important imaging modalities for the assessment of left ventricular (LV) function.1,2,3,4 Photons emitted by an injected radioactive perfusion tracer taken up by the LV myocardium are detected to reconstruct perfusion images. With electrocardiographic gating, MPS provides 8 or 16 volumetric perfusion image sets corresponding to different phases of the cardiac cycle.5 Evaluations are performed based on these images by visual and quantitative estimation of the variation of LV during the cardiac cycle.6 LV contractile functional indices can then be derived from MPS images for the diagnosis/prognosis of coronary artery disease and patient risk assessment.7,8,9,10,11,12

The fidelity of LV function assessment by MPS directly relates to the quantification accuracy of the LV myocardium volume.13,14 The measurement of LV myocardium volume starts with delineation of epicardial and endocardial boundaries on the perfusion images, and is calculated as the volume bounded by the epicardial and endocardial surface. Manual segmentation is tedious when it involves studies on multiple volumetric phase images of the cardiac cycle, and is dependent on observers’ experience. It is desirable to develop an observer-independent segmentation method to improve efficiency and reproducibility with comparable accuracy.

Current automated methods extract the epicardial and endocardial boundaries based on general assumptions and rules with empirical parameters. For example, commercially available methods estimate the profiles of the myocardium by identifying the maximal myocardial count, then applying Gaussian fitting with empirical standard deviation or threshold to extract endocardial and epicardial boundaries.15,16 This method is easy and fast to implement, though it neglects the anatomical variations and pathology abnormalities among different patients. Studies have shown that LV myocardium volume would be over- or underestimated by this method,17,18 and manual adjustments are usually required.14

In recent years, machine learning methods are being integrated into segmentation studies. They have been shown to feature better results while requiring less time than traditional methods for CT and MR images19,20,21 due to its data-driven approaches toward automatically learning image features and model parameters. Compared with these common imaging modalities, MPS images have advantages for machine learning methods in that image size is greatly reduced and with higher image contrast, which leads to more efficient extraction of global features from the whole image set during the training stage. Thus, the machine learning method is promising in automatic MPS image segmentation.

In this paper, we propose a novel machine-learning-based method to automatically segment LV myocardium by delineating its endocardial and epicardial surface, and measure its volume in gated MPS imaging. Our method uses a multi-class 3D V-Net, which is an end-to-end fully convolutional neural network. A compound loss function, which simultaneously encourages similarity and penalizes discrepancy between prediction and training datasets, was utilized in training stage to achieve excellent performance. To evaluate our proposed method, we retrospectively investigated 32 normal patients and 24 abnormal patients with clinically acquired MPS. The LV myocardium was segmented by our proposed method and compared with ground truth approved by physicians for evaluation on a total of 32 + 24 patients.

Methods and Materials

The proposed SPECT LV myocardium segmentation method consists of a training stage and a segmentation stage. For a given SPECT image dataset, the clinically implemented physician-drawn contours of the endocardial and epicardial surface of myocardium are available. These clinical contours were used as the learning-based target of the SPECT image. The region within endocardial surfaces, region within epicardial surfaces, and background region are regarded as training and segmenting classes in our method. The original SPECT images were first automatically cropped into 32 × 32 × 16 voxels to reduce background region: a threshold was used to get rid of background and the centroid of the active heart region was then calculated, based on which a 32 × 32 × 16 voxel region was cropped to cover the active heart region. A volume-based deep learning network was trained based on such extracted SPECT image volume. The 3D multi-class V-Net architecture was used to enable voxel-wise error back-propagation during the training stage, and directly outputting an equal-sized prediction patch with the input patch during the testing procedure.22 By up-scaling low [4 × 4 × 2], modest [8 × 8 × 4], and high-level [16 × 16 × 8] feature volumes at each forwarding path from left to right portions of the hidden network using additional deconvolutional layers, and incorporating last output feature volume, the softmax function was employed on these equal-sized feature volumes to obtain final contour prediction. The Adam gradient decent optimizer was employed to train the V-Net. We used the whole volume as a patch and the batch size (number of patches) is 20. The number of epochs is 180. Compound loss supervision was then integrated into this prediction by considering both binary cross entropy and Dice loss to supervise the back-propagation of gradients for parameter updating in each training epoch. During the segmentation stage, the new arrival 3D patient SPECT was automatically cropped to reduce the background region ([32 × 32 × 16] in this study), and then input to the trained networks. The output volume was a multi-class contour probability maps. Finally, the segmentation was generated by thresholding the probability maps larger than 0.5. Figure 1 outlines the workflow schematic of our segmentation method.

Figure 1
figure 1

Schematic flow chart of the proposed algorithm for LV segmentation. The upper part of this figure shows the training stage of our proposed method. The upper part also shows the V-Net architecture which has single channel volume input and 3 channels (background, region within endocardium, and region within epicardium) volume output. The lower part (brown) shows the segmentation stage. In segmentation stage, a new SPECT heart image is fed into the well-trained model to get the segmentation

In this study, we propose a compound loss function incorporating both the effectiveness of logistic loss and Dice loss functions to supervise our network. Since the optimization of the prediction task is a binary regression, we first used the voxel-wise binary cross entropy (BCE) loss as the logistic loss function. The BCE loss is defined as follows:

$$ L_{{{\text{BCE}}}} \left( {C,\hat{C}} \right) = - \mathop \sum \limits_{j} C_{j} \log \hat{C}_{j} + \left( {1 - C_{j} } \right)\log \left( {1 - \hat{C}_{j} } \right),$$
(1)

where \( C_{j} \) and \( \hat{C}_{j} \) denote the jth voxel in clinical contour \( C \) and prediction \( \hat{C} \), respectively.

The endocardial surface contour often occupies a small region of the MPS images as compared with epicardial surface contour. This may cause the network to ignore segmented regions and bias network output toward the background. The learning process can be trapped in local minima and unable to obtain accurate results. To address this issue, we additionally incorporated the logistic loss with Dice loss in the final stage as the final objective function. The Dice loss for segmentation was originally proposed in a 3D model defined as

$$ L_{\text{Dice}} \left( {C,\hat{C}} \right) = 1 - \frac{{2 \times V\left( {C \cap \hat{C}} \right)}}{{V\left( C \right) + V\left( {\hat{C}} \right)}}, $$
(2)

where \( V \) indicates the volume of the region enclosed in the contours.

The compound loss function is a combination of the above BCE and Dice loss functions, which are related to the dissimilarity and similarity between prediction and training dataset, respectively. It is defined as follows:

$$ L_{\text{final}} \left( {C,\hat{C}} \right) = L_{\text{BCE}} \left( {C,\hat{C}} \right) + \mu L_{\text{Dice}} (C,\hat{C}), $$
(3)

where \( \mu \) is an empirical parameter, which balances the loss from binary cross entropy and Dice coefficient. In order to fairly compare the performance of the method on different patients, our hyper parameters of the network were fixed before we conducted the leave-one-out experiments. The batch size was set to 20. The number of epochs was set to 180. For the parameter μ, we employed fourfold cross-validation to evaluate its setting. It was shown that the performance is not sensitive when μ is between [0.7, 1.3], and thus we set μ = 1.

To evaluate the performance of the proposed method for LV myocardium segmentation, we compared the difference of contours generated by our method with clinical contours. In this retrospective study, we studied the dataset of 32 patients without hypertension, diabetes, heart dysfunction, family history of heart diseases (mean ± STD age: 63 ± 10, 23 males, 9 females). The cohort of 32 patients were used to evaluate our method using the leave-one-out cross-validation. For one test patient, the model is trained by the rest 31 patients. The model is initialized and re-trained for next test patient by another group of 31 patients. The training dataset and testing dataset are separated and independent during each study. In addition, 24 patients (mean ± STD age: 57 ± 10, 17 males, 7 females) diagnosed with myocardial ischemia ranging from mild, moderate to severe extents were also included to further test the proposed segmentation method with a leave-one-out validation strategy. Institutional review board approval was obtained with no informed consent required for this HIPAA-compliant retrospective analysis. Each patient underwent 8-frame ECG-gated resting SPECT performed 30 minutes after injection of 20-30 mCi Tc-99m sestamibi. The SPECT images were acquired by a dual-headed camera (CardioMD, Philips Medical Systems) using a standard resting protocol. The acquisition parameters were 20% energy window around 140 keV, 180° orbit, 32 steps with 25 seconds per step, 8-bin gating, and 64 projections per gate. Images were reconstructed into transaxial slices by ordered subsets expectation maximization with 3 iterations and 10 subsets, with Butterworth filter of power 10 and cutoff frequency of 0.3 cycles/cm. The reconstructed voxel dimensions of each SPECT image volume was 6.4 × 6.4 × 6.4 mm3.

Each patient had contours of myocardium with endocardial and epicardial surface delineated and approved by physicians, which were treated as the ground truth. Corresponding contours were also generated by our proposed method. The volume between endocardial and epicardial surface was considered as myocardium where the LV myocardium volume is calculated. We first visually checked the similarity of the contours between ground truth and the results of the proposed method. Quantitatively, we characterized the accuracy of the proposed method by calculating two widely used metrics: Dice similarity coefficient (DSC) and Hausdorff distance. The DSC describes the overlapping of the segmented volumes between ground truth and proposed method, which can be calculated by 1-Eq. (2) with \( C \) and \( \hat{C} \) being the contours of ground truth and proposed method results, respectively. A magnitude of DSC closer to 1 indicates higher overlapping with ground truth, thus high accuracy of the proposed method. The DSC metrics were calculated on the contours of endocardial surface and epicardial surface individually, as well as the combined myocardium contour. Hausdorff distance measures the difference of two contours in distance. It is defined as the maximum of the closest distances from each point on one contour to all points on the other contour,23 i.e.,

$$ H\left( {C,\hat{C}} \right) = \hbox{max} \left\{ {\mathop {\hbox{max} }\limits_{a \in C} \mathop {\hbox{min} }\limits_{{b \in \hat{C}}} \left\| {a - b} \right\|,\mathop {\hbox{max} }\limits_{{b \in \hat{C}}} \mathop {\hbox{min} }\limits_{a \in C} \left\| {a - b} \right\|} \right\}, $$
(4)

where \( a \) and \( b \) are points on contours of ground truth \( C \) and proposed method results \( \hat{C} \), respectively. \( a - b \) represents the Euclidean distance between point \( a \) and \( b \). A smaller Hausdorff distance means higher similarity between the two contours. Hausdorff distance metrics were calculated on the contours of endocardial surface and epicardial surface individually. To determine interobserver variability, the DSC and Hausdorff distance metrics were also calculated on the contours delineated by a second observer for three randomly selected patients with all phases from the total dataset.

The accuracy of the LV myocardium volume was evaluated by Pearson Correlation analysis between ground truth and the results of proposed method among all patients for each gating phase (0,1/8,… 7/8), and its correlation coefficient (r) and P value were calculated. A correlation coefficient closer to 1 indicates higher accuracy of proposed method, and a P value of less than 0.05 was considered to be statistically significant. The relative error of measured LV myocardium volume was calculated as the difference between ground truth and the results of proposed method relative to ground truth. A Bland–Altman figure was then plotted to show the absolute systematic bias in error and dependence of error on volume size.

Moreover, we calculated LV ejection fraction (EF) to further evaluate our method. We considered the region within endocardial surface of myocardium as the LV cavity. The maximum and minimum volumes within endocardial surface among all phases were then used as end-diastolic volume (EDV) and end-systolic volume (ESV), respectively, to calculate EF = (EDV − ESV)/EDV. The EF calculation based on the segmentations of our proposed method was compared with that based on the ground truth, as well as compared with results from a commercial software package (Emory Cardiac Toolbox 4.0, Atlanta, USA).

Results

Normal Group

In Figure 2, the segmentations by the proposed method are compared side-by-side with the clinical ground truth at different slices of gating phase 0 from patient #1 (normal) as an example. In this case, the LV myocardium volume measured by the proposed method was 191.5 cc, underestimated 1.74% from ground truth 194.9 cc.

Figure 2
figure 2

The axial views of patient #1 (normal) at different slices of gating phase 0 with segmentations of ground truth and proposed method. The black lines indicate the contours of endocardial and epicardial surface

Figure 3 shows a side-by-side comparison between the segmentations by the proposed method and ground truth from the same patient as Figure 2, but across different gating phases of the same slice. The mean and standard deviation (STD) of DSC of endocardial and epicardial surface, and myocardium, and Hausdorff distance of endocardial and epicardial surface among all 32 normal patients are plotted in Figure 4 for each phase and summarized in Table 1. The average DSC metrics and Hausdorff distance are all larger than 0.900 and less than 1 cm, respectively. The minimum DSC on myocardium is 0.783 which is the only case with DCS less than 0.800. These results quantitatively demonstrate high accuracy of contours delineated by the proposed method. No case shows unreasonable result among the 32 patients and 8 phases using our method.

Figure 3
figure 3

The axial views of patient #1 (normal) from phase 0 to 7/8 at same slice with segmentations of ground truth and proposed method. The black lines indicate the contours of endocardial and epicardial surface

Figure 4
figure 4

Mean and STD of DSC and Hausdorff distance of contours between ground truth and proposed method among all 32 normal patients for each phase

Table 1 Mean ± STD of DSC and Hausdorff distance among all 32 normal patients

The LV myocardium volumes changing with phases are demonstrated in Fig. 5 for patients #1 to #4. Our method accurately quantified the variation of LV myocardium volume during the cardiac cycle. Note that we present the results of four patients in figures as examples, but similar results can be seen on the other patients.

Figure 5
figure 5

LV myocardium volumes of ground truth and measured by ground truth at different phases of patient #1 to #4 (normal patients)

The correlation analysis of LV myocardium volume between ground truth and the proposed method is shown in Figure 6. Mean (and STD) of r among all phases is 0.910 ± 0.061, and all P values are less than 0.001, which indicates statistically significant linear correlation between LV myocardium volumes measured by proposed method and ground truth. The relative error in LV myocardium volume measurement for each phase is presented in Figure 7 as a Bland–Altman plot. The mean (and STD) relative error among all patients and all phases is − 1.09± 3.66%. The average linear correlation coefficient between volume error and volume size for all phases is − 0.222 (P = 0.238).

Figure 6
figure 6

Correlation analysis of LV myocardium volume between ground truth and proposed method at each gating phase among all 32 normal patients. Blue circle indicates measurement of each patient at that phase, and dashed red line is line of identity

Figure 7
figure 7

Relative error of LV myocardium volume measured by the proposed method of each normal patient at each phase

The EF results on normal patients are calculated based on our results and ground truth, and obtained from commercial software which are shown in Figure 8 as Bland–Altman plots. Good correlation on EF is shown between our results and ground truth (r = 0.893, P < 0.001). Correlation between our results and commercial software is fair (r = 0.644, P < 0.001). Similar studies on ESV and EDV are shown in Figure 9. Excellent correlations are found between our method and ground truth for both ESV and EDV, and between our method and commercial software for EDV. The correlation of ESV between our method and commercial software is fair.

Figure 8
figure 8

Left: Correlation analysis of EF between ground truth and proposed method (upper) and between commercial software and proposed method (bottom). Right: Difference of EF between ground truth and proposed method (upper) and between commercial software and proposed method (bottom)

Figure 9
figure 9

Correlation analysis of ESV (left) and EDV (right) between ground truth and proposed method (upper) and between commercial software and proposed method (bottom)

Abnormal Group

Figure 10 demonstrates a side-by-side comparison between our results and the clinical ground truth at different slices of gating phase 0 from patient #33 (diagnosed with moderate ischemia) as an example. In this case, the LV myocardium volume measured by the proposed method was 212.0 cc, overestimated 1.49% from ground truth 208.9 cc. The mean and standard deviation (STD) of DSC of endocardial and epicardial surface, and myocardium, and Hausdorff distance of endocardial and epicardial surface among all 24 abnormal patients are plotted in Figure 11 for each phase and summarized in Table 2. Overall, the results on abnormal patients are very similar to those of normal patients, with mean DSC larger than 0.9 and Hausdorff distance less than 1 cm. The correlation coefficient of the LV myocardium volume between ground truth and our results is 0.939 ± 0.103 (P ≤ 0.001), and the mean relative error of LV myocardium volume is − 0.567 ± 3.47%.

Figure 10
figure 10

The axial views of patient #33 (abnormal) at different slices of gating phase 0 with segmentations of ground truth and proposed method. The black lines indicate the contours of endocardial and epicardial surface

Figure 11
figure 11

Mean and STD of DSC and Hausdorff distance of contours between ground truth and proposed method among all 24 abnormal patients for each phase

Table 2 Mean ± STD of DSC and Hausdorff distance among all 24 abnormal patients

Interobserver Study

The interobserver study results on normal patients are shown in Figure 12. The average DSC and Hausdorff distance between contours from two observers are 0.890 and 10.99 mm, respectively, which is higher but still comparable to the discrepancy of our method.

Figure 12
figure 12

DSC and Hausdorff distance of contours between two observers among 3 patients for each phase

Discussion

In this study, we proposed a novel machine-learning-based method to segment LV and measure LV myocardium volume in gated MPS imaging. The average DSC metrics and Hausdorff distance of the contours delineated by our method are > 0.9 and < 1 cm, respectively. The results on abnormal patients are very similar to those on normal patients using our proposed method. The correlation coefficient of the LV myocardium volume between ground truth and results by the proposed method is 0.910 ± 0.061 with statistical significance, and the mean relative error of LV myocardium volume is − 1.09 ± 3.66 %. These results strongly indicate the feasibility of the proposed method in accurately quantifying the changing LV myocardium volume during the cardiac cycle. It also demonstrates the great potential of learning-based segmentation method in gated MPS imaging for clinical use.

Segmentation of LV in MPS imaging studied in this paper is a critical step in clinical evaluations for quantifying multiple LV contractile functional indices. In this study, we validated the accuracy of LV myocardium volume measured by the proposed method with ground truth. An accurate LV myocardium volume measurement can predict adverse cardiovascular events and premature death based on a well-established model,24 and provides prognostic information beyond traditional cardiovascular disease risk factors.25 The endocardium contours segmented for all gating phases would also be tracked to calculate regional endocardial wall motion by computing the distance of the endocardial surface between end-diastole and end-systole. Thus, a segmentation method with high performance is essential to avoid introducing error from the beginning of MPS imaging practice.

Manual contours rely on observer’s experience, and are reported to have substantial intraobserver and interobserver variability and less reproducibility.26 The manual contour from different observers may have systematic errors and random errors. Our learning-based method can mitigate random errors, but cannot correct systematic errors induced by the observers. In this study, we find that current contours for clinical use are represented as unsmooth curvature (see Figures 2 and 3). The contours segmented by our method have better refinement, which is more physically plausible when considering real anatomical structures. Secondly, our method provides comparable results but spends significantly shorter time. With a trained model, it takes around 10 seconds to accurately delineate contours and measure volumes for all phases of MPS imaging on a NVIDIA TITAN XP GPU. Moreover, our method requires no manual input parameters, correction, or intervention. Its speed and reproducibility allow it to be promising for clinical use.

Overestimation of LV myocardium volume for small heart and underestimation for large heart is commonly seen in current existing methods,17,18 which may lead to systematic errors on patient groups with small or large hearts. In our result, such correlation between volume measurement error and volume size is not observed from Figure 6. The linear correlation coefficient between volume error and volume size for all phases is − 0.222 (P = 0.238) on average, which indicates very low correlation without statistical significance. Thus, our method is able to work equally well regardless of the size of LV myocardium volume.

We compared the EF calculated based on our segmentation results with that obtained from a commercial software package, and found a fair correlation. Compared with correlation on EDV which is good (r = 0.914), ESV has larger discrepancy from commercial software (r = 0.818). Thus, the difference in the EF is mainly contributed from the difference in ESV. It may be explained by the different methodology used to determine ESV in the commercial software with various post-processing steps, which leads to larger discrepancy in results from our method. Studies showed that the correlation of EFs between two commercial softwares using different methods could be around 0.800.27 Thus, the commercial software results should be considered as a benchmark instead of a gold standard in this study.

Note that this study does not aim to demonstrate the absolute accuracy of the output contours of the proposed method by comparing with patients’ true myocardial contours which are always unavailable. Instead, we showed the high correlation of the output contours with its training dataset, which is manual contour from one observer in this study. Such high correlation would still exist if the training contours are from another experienced observer since the method is not designed for a specific observer. Thus, if the training dataset is closer to the true patient contour (e.g., consensus contours by multiple observers), the result of our method would also be closer to the true contour. In other words, our method generates contours with similar quality as training contours, and the quality of training dataset directly determines the quality of the output results.

In this study, we proposed a novel method for MPS automatic segmentation, and demonstrated its feasibility with 32 normal patients and 24 abnormal patients. This training/testing dataset has intermediate number of patients with anatomical variations and pathology abnormalities. Future study would involve a comprehensive evaluation with a larger population of patients with diverse demographics and pathological abnormalities. Different testing and training datasets (including normal and abnormal cases) from different observers and institutes would be valuable to further evaluate the clinical utility of our method. Moreover, this study validated the proposed method by quantifying the shape similarity of contours. Small differences from ground truth are observed, and its potential clinical impact (e.g., on functional indices) needs to be understood. Thus, a further investigation in diagnostic accuracy of the proposed method in detection and localization of coronary artery disease would be of great interest for clinical use.

Conclusion

We proposed a learning-based method to automatically segment LV and measure LV myocardium volume in gated MPS imaging. This method would benefit the gated MPS imaging in providing high-quality automatic quantification on multiple LV contractile functional indices without manual intervention. The proposed method was evaluated among 32 normal patients and 24 abnormal patients. The results demonstrate the feasibility of the proposed method in contouring with comparable accuracy as that based on physician experience.

New Knowledge Gained

A learning-based method has been proposed to automatically segment LV and measure myocardium volume in gated MPS imaging. Results show that the proposed method has the feasibility in contouring with comparable accuracy as that based on physician experience.