Introduction

Transcatheter aortic valve replacement (TAVR) has become therapy of choice for a large proportion of patients with severe symptomatic aortic valve stenosis [1]. Multiple sizes of devices are available to accommodate different anatomies. The standard of care to perform the sizing of TAVR devices is based on CT aortic annulus planimetry [2,3,4]. The methodology requires the use of a multiplanar reconstruction of the aortic root to measure the area and perimeter of the aortic valve annulus. The aortic valve annulus is defined as the virtual endocardial ring of tissue measured in a plane defined by the nadir of the three cusps of the valve [5]. Given that the aortic root plane lies at an oblique angle, the operator typically uses a software package capable of 3D multiplanar reconstruction to sequentially rotate and translate reference planes to achieve an appropriate view through the valve [2, 6]. The process is time-consuming and requires significant training to achieve a reproducible annular plane. Given that the field is rapidly expanding, institutional-level expertise in CT analysis may be lacking. Automated image analysis may deliver a much-needed service for an efficient workflow and for quality assurance. Several manufacturers have developed software packages to streamline the annotation of CT dataset for TAVR planning.

Convolutional neural networks (CNN) have recently garnered much interest in the field of computer vision [7,8,9]. A significant research effort has been directed to their application in medical imaging [10,11,12]. They have been used successfully as both classification and regression models. Most computer vision algorithms suggested in the literature are aimed at 2D images. In the case of medical images, however, the information contained within 3D datasets is important when analyzing anatomical features [13]. Indeed, a feature may be readily identified only within its 3D environment. This is the case for the aortic valve annulus.

Herein, we propose an automated fully tridimensional recursive multiresolution CNN regression model to determine the location and orientation of the aortic valve plane in a volumetric cardiac CT dataset.

Related work

The automated landmarking and analysis of the aortic valve and its surrounding structures has been the topic of multiple studies. A statistical analysis of 2D region-growing algorithm applied to coronary artery segmentation on CT images was described in [14]. Model-based method for aortic root evaluation provided a landmark position error of 0.4 mm to 2.28 mm [14,15,16,17]. These methods rely on the availability of a 3D geometric model of the heart structure, which is highly nontrivial to generate. Similar methods have also been applied to C-arm CT images to provide optimal implantation fluoroscopic angulations [18]. Classic image segmentation techniques such as 3D normalized cuts have been applied to automatically extract the aortic root annulus [19, 20].

Machine learning has been applied to mitral valve analysis on echocardiographic images to automatically measure the mitral valve anteroposterior and intercommissural diameters in a small number of patients [21]. Machine learning has also been applied to modeling of the aortic valve with a mesh mean discrepancy of 1.57 mm in a small number of patients [22]. Extraction of aortic valve landmarks using colonial walk, a computationally efficient regression tree-based machine learning algorithm was recently published [23]. The number of CT image volumes used in that study was limited to 71. A recent article addressed landmarking of several aortic root features using neural networks trained on 198 image volumes and achieved an accuracy level of approximately 2.0 mm for several structures [24]. Multiscale deep learning approaches have been successfully applied to 3D landmarking of CT datasets [25].

Classical, non-machine learning methods/techniques rely generally on extensive a priori knowledge of the aortic valve geometry. In contrast, machine learning approaches, including the work presented here, rely on large amounts of data and virtually no a priori anatomical knowledge. The use of no heuristics specific to the aortic valve allows them to be readily adaptable to different anatomical structures if enough labeled data are available. Furthermore, the availability of high-performance open-source neural network libraries reduces the implementation complexity of many machine learning methods.

Methods

The aortic valve planimetry problem

The first and most crucial step in the analysis of a cardiac CT scan for the purpose of TAVR prosthesis sizing is to obtain an oblique multiplanar reconstruction at the level of the aortic valve annulus. In practice, the operator translates the volume in a visualization software until the valve in centered; they then successively rotate the multiplanar plane until a view that displays the nadirs of the three coronary cusps is obtained. This process takes on average 1–5 min depending on the expertise of the operator. Figure 1 demonstrates three multiplanar reconstructions parallel to the aortic valve annular plane at different levels. The figure pane (d) is at the level of the aortic valve annulus, which has been traced in yellow. The accuracy of the plane selection is important for the measurement of the aortic valve annulus area, perimeter and diameter, which are crucial for TAVR device sizing. Excessive oversizing may result in annular rupture and coronary ostial obstruction, while undersizing may result in device embolization and paravalvular leakage. These complications have been linked to increased mortality [26]. Furthermore, planimetry of the aortic valve annulus provides optimal fluoroscopic implantation views used during the procedure [27].

Fig. 1
figure 1

Oblique multiplanar reconstruction of the aortic valve. a Long-axis view, bd three cuts parallel to the aortic valve annular plane. The location of the short-axis slices is indicated in (a): green is (b), orange (c) and blue (d). The aortic valve annulus is shown in yellow in pane (d)

The aortic valve annular plane is fully described by a plane unit normal vector \( \left\{ {\varvec{n} \in R^{3} : \left\| n \right\| = 1} \right\} \) and the position of a point on the plane \( \left\{ {\varvec{x}_{0} \in R^{3} } \right\} \). The plane is thus defined as the set \( P = \left\{ {\varvec{x} \in R^{3} : \varvec{n} \cdot \left( {\varvec{x} - \varvec{x}_{0} } \right) = 0} \right\} \). We use the aortic valve centroid as point \( \varvec{x}_{0} \).

The concept of recursive multiresolution CNN for feature localization

Fundamentally, a CNN formulates a regression or classification task as a deep neural network of convolutions. Most layers of a CNN correspond to sets of kernel filters. During inference, each filter is convolved with the activations of the previous layer. As the depth of the CNN increases, more and more complex features are described. This framework is particularly powerful when applied to spatial datasets such as images.

For the specific problem of 3D feature localization, one can use the information obtained from a low-resolution CNN as a provisional feature position. This provisional position is expected to be inaccurate, but it can be used to generate a recentered sampling of the 3D dataset with an equal dimension but a lesser voxel spacing and thus, a lesser field of view. If the feature is contained within the new field of view, this process can be applied recursively to achieve arbitrarily high position accuracy. This is what we refer to as recursive multiresolution CNN.

Formally, we define the recurrence relation for the provisional position \( \hat{x}_{i} \in R^{3} \) at step i as

$$ \hat{x}_{i} = C_{i} \left( {S_{i} \left( {\varvec{\mu};\hat{\varvec{x}}_{i - 1} ;s_{i} } \right)} \right), $$

where \( C_{i} :R^{n \times n \times n} \to R^{3} \) represents the CNN induction at step i; \( S_{i} :R^{l \times m \times o} \to R^{n \times n \times n} \) is the trilinear interpolation of image volume \( \varvec{\mu}\in R^{l \times m \times o} \) over a regular n × n×n × grid with isotropic voxel spacing \( s_{i} \in R \) and centered at image position \( \hat{x}_{i - 1} \in R^{3} . \) For simplicity, we omit the formal definition of \( C_{i} \) and \( S_{i} \).

The choice of the initial provisional position \( \hat{\varvec{x}}_{0} \) is set to the center of the image volume \( \varvec{\mu} \), and \( s_{0} \) is set such that the field of view of the \( S_{0} \) interpolation covers the entire image volume to ensure that the feature of interest is contained. The field of view at step i is defined as the region

$$ F_{i} = \left\{ {\varvec{x} \in R^{3} :\left\| {\varvec{x} - \hat{\varvec{x}}_{i - 1} } \right\|_{\infty } \le ns_{i} /2} \right\}. $$

The sequence of voxel spacings \( s_{i} \) determines the rate of reduction in the size of the field of view. In order for the accuracy to improve at each step, we generally require

$$ s_{i} < s_{i - 1} , $$

i.e., that the field of view decreases and that the interpolated image has a greater resolution at each step. Furthermore, the reduction in the voxel spacing must not be so great as to exclude defining regions of the image for the feature of interest. Specifically, \( \varvec{x}_{0} \in F_{i} \) must hold true at all steps, where \( x_{0} \) is the actual position of the feature of interest.

In the work presented here, we used \( s_{i} = rs_{i - 1} \) with \( r = \frac{1}{2} \). In other words, the field of view and voxel spacing are reduced by half as each recursive step. We also selected the total number of steps such that the final voxel spacing has the least difference with the original image volume voxel spacing. We selected a final voxel size of \( s_{4} = 0.5\, {\text{mm}} \), a maximum number of recursive steps of i = 4, an interpolation grid of 64 × 64 × 64 \( (n = 64) \). The algorithm defined by these design choices is depicted graphically in Fig. 2.

Fig. 2
figure 2

Architecture of the proposed recursive multiresolution CNN

An important point to note regarding this architecture is that it enables efficient serving of the trained CNNs for inference over the network. Indeed, the interpolation and recentering step \( S_{i} \) can be performed on the client side, while the inference step \( C_{i} \) can be performed on the server side as depicted in Fig. 2.

This framework does not specify the architecture of the set of CNNs \( \left\{ {Ci} \right\} \). In the current work, we used a convolutional neural network defined in Fig. 3 to perform a regression. An identical neural network architecture is used at all resolution steps.

Fig. 3
figure 3

Generic regression CNN used at each step of the recursive multiresolution architecture of the localization algorithm. Node tensors are shown as rectangles. The same CNN is also used for the orientation neural network. Each layer called CONV is a 3D convolution operation with kernel size 3 × 3×3 and uses a rectified linear unit (ReLu) activation function. The layers labeled as FC are fully connected dense layers with 512 nodes for FC 512 and three nodes for FC 3. Output tensor dimensions are indicated after each operation

CNN for feature orientation

The framework described in the previous section focuses on the localization of the aortic valve annulus centroid. We also require the orientation of the valve annular plane. We formulated a single CNN with an identical architecture to that described in Fig. 3 to infer the three components of the normal vector. Note that inference of the valve orientation is not amenable to a recursive CNN formulation as is the case of the localization problem. Our implementation used a voxel spacing of 0.75 mm and an interpolation grid of 64 × 64 × 64. It uses the valve position inferred at the last step of the localization algorithm from the previous section as the center of the recentered and resampled image volume.

Tomographic data

We used multiphase ECG-gated datasets from 94 patients with severe symptomatic degenerative aortic stenosis who were referred to the McGill University Health Centre for TAVR candidacy evaluation. The Research Ethics Board approved this study. Clinical characteristics are described in Table 1. The dataset is representative of patients undergoing CT-based sizing of a TAVR device. We excluded patient with a prior surgical valve replacement. The CT scanner used for data collection was a Discovery CT750 HD scanner (GE Healthcare, Chicago, USA). The slice thickness was 0.625 mm and the spacing between slices was 0.625 mm. The in-plane pixel spacing was 0.88 mm. Retrospective ECG-gating was applied to yield five to ten image volumes at cardiac phases from 5% to 95% R–R. An additional volume at 35% is also produced by our TAVR protocol. An ungated volume is also acquired in order to plan vascular access, which we also included in analysis if it covered the aortic root. The volumetric datasets were anonymized to remove all identifying information prior to image analysis.

Table 1 Clinical characteristics of patients included in training and evaluation datasets

A single operator experienced with aortic valve planimetry manually segmented the aortic valve annulus with a closed 3D cubic spline for all 1007 CT image volumes. The operator processed images from different phases individually in ascending order. The oblique multiplanar reconstruction method was applied starting from standard axial, sagittal and coronal views for each image volume. The technique used was previously described in detail in [2]. A visualization software developed in-house and capable of 3D multiplanar reconstruction was used for the labeling process.

Implementation

The convolutional neural networks were implemented using TensorFlow (Google, USA) version 1.13 using the Python 3.7.2 API and a NVIDIA CUDA 10.1 backend. The Python code defining the generic regression CNN from Fig. 3 as a TensorFlow Estimator including specification of the training operations comprises 100 lines of code. Training and evaluation were performed using a workstation with dual Nvidia GTX 1080Ti graphic cards (NVIDIA, USA) each with 11 GB of memory. When running inference locally, each CNN produces predictions in 195 ms. The entire localization recursive algorithm can be executed in 780 ms.

Training and data augmentation

We used K-fold cross-validation for training and evaluation with K = 9. The image volumes were thus separated randomly into nine separate groups each with ten or 11 patients. Image volumes from any one patient were assigned to only one group. Nine separate training datasets were created by assembling eight groups, leaving in each case one of the groups as an evaluation dataset. Nine separate CNNs were trained at each resolution level of the localization CNN in addition to nine separate CNNs for the orientation CNN. A total of 45 CNNs were trained.

Training was performed using the TensorFlow (Google, USA) framework version 1.13. The pre-built Adam optimizer was used, and batch normalization was applied after each convolution layer and prior to the rectified linear unit (ReLu) activation function. The learning rate hyperparameter was optimized manually.

A batch size of 18 was used since it was the maximum allowed given our GPU memory restriction. Each batch of data was generated by randomly selecting 18 distinct image volumes, applying data augmentation and resampling the image on a \( 64 \times 64 \times 64 \) isometric grid. Data augmentation was applied on a total of 7 random degrees of freedom:

  • 3D translation within range 32 times the target voxel spacing (3 degrees of freedom)

  • 3D rotation about 3 Euler angles for the full angular range, i.e., from − 90° to 90° in cranio-caudal direction, − 180° to 180° in right- and left-anterior oblique and − 180° to 180° about the axial direction (3 degrees of freedom)

  • Scaling of voxel spacing by a factor from 0.8 to 1.2 (1 degree of freedom)

A total of 3500 epochs of training were applied. Each CNN was trained in approximately 12.4 h. K-fold cross-validation required a total of 558 h of training.

Evaluation metrics

The accuracy of the four localization CNNs was evaluated using two metrics: 3D error and out-of-place error. The 3D error \( e_{3D} \) for estimated position \( \hat{x} \in R^{3} \) is defined as:

$$ e_{3D} \left( {\hat{\varvec{x}}} \right) = \hat{\varvec{x}} - \varvec{x}_{02}, $$

where \( x_{0} \in R^{3} \) is the labeled position of the aortic valve annulus centroid. The out-of-plane error \( e_{\text{OOP}} \) is defined as:

$$ e_{\text{OOP}} \left( {\hat{\varvec{x}}} \right) = \left| {\left( {\hat{\varvec{x}} - \varvec{x}_{0} } \right) \cdot \varvec{n}} \right|, $$

where \( n \in R^{3} \) is the unit normal vector of the labeled aortic valve annular plane. These metrics are expressed in millimeters.

The accuracy of the orientation CNN was evaluated using the 3D angular error \( e_{\text{angular}} \) defined as:

$$ e_{\text{angular}} \left( {\hat{\varvec{n}}} \right) = \arccos \left( {\hat{\varvec{n}} \cdot \varvec{n}} \right) ,$$

where \( \hat{n} \in R^{3} \) is the estimated unit normal vector. The 3D angular error is expressed in degrees.

The error metrics were evaluated for each CNN for both the evaluation and training data to assess the presence of overfitting and assess the model generalizability. The error metrics were also correlated with the aortic valve area and the cardiac phase of the image volume to evaluate the algorithm robustness relative to these variables. Evaluating the dependence on the aortic valve area is important because at each recursive step of the multiresolution algorithm, the field of view of the resampled volume decreases by half. It is important for defining features of the structure of interest to remain within the volume. Should this not be the case, this may lead to a dependence of accuracy metrics on valve size. Evaluating the dependence on the cardiac phase is also crucial because the aortic valve changes configuration significantly between systole and diastole. All phases may be relevant for TAVR planning [2].

The ultimate goal of the proposed method is to simplify measurements of the aortic valve annulus dimensions. Errors in the plane position and orientation are expected to generate inaccuracies in the measurement of annulus area and perimeter. To study this effect, we randomly selected an image volume for each patient (n = 94). We generated oblique multiplanar reconstructions at the location and angulation inferred by the algorithm for which the patient was in the evaluation set. We traced the annulus manually using a 3D cubic spline to assess the error in the measured annulus area and perimeter.

Statistical analysis

The statistical analysis was performed using SciPy Library (SciPy Developers) version 1.2.1. The error values are expressed as mean ± standard deviation. The Pearson correlation coefficient is used. Statistical tests used p ≤ 0.05 as a significance level.

Results and discussion

Aortic valve annulus centroid localization

After training was completed at each of the resolution steps, we applied the recursive localization CNNs to the entire training and evaluation datasets without data augmentation. Figure 4 illustrates the evaluation metrics in a representative example with localization errors in the 50th percentile.

Fig. 4
figure 4

Orthogonal slices through the resampled datasets at each resolution used for the localization algorithm. The first row (ad) presents axial slices; the second row (eh), coronal slices; and the third row (il), sagittal slices. Images in each column have a different voxel spacing: 4 mm in the first column, 2 mm in the second column, 1 mm in the third column and 0.5 mm in the fourth column. The blue point is the labeled aortic valve annulus centroid and the dotted line the aortic valve annular plane. The yellow point is the provisional centroid position inferred by the CNN at that resolution level. The red line is the 3D error and the green line is the out-of-plane error

Table 2 reports the mean 3D and out-of-plane errors at each resolution step. Note that both metrics improve at each step of the algorithm. The error in both metrics is slightly greater for the evaluation datasets. This is expected given that the neural nets were not trained on the evaluation datasets. For unseen image volumes in patients with severe aortic stenosis, the error in the centroid 3D localization is (1.4 ± 0.8) mm and the error in the out-of-plane localization (0.9 ± 0.8) mm. Note that the localization error is approximately 5% of the annulus systolic excursion measured at (32.1 ± 35.2) mm in 3D and (16.5 ± 15.9) mm out-of-plane.

Table 2 Mean measurement error for aortic valve position inferred by CNN for both training and evaluation datasets

Histograms of the absolute error metrics at each resolution step for both the training and evaluation datasets are presented in Fig. 5. We observe that the distribution is different for the 3D and out-of-plane errors; the peak of the distribution is near a value of zero for the out-of-plane error, while this is not the case of the 3D error. This is interesting since it means that the 3D error is biased away from zero, while the out-of-plane error is unbiased.

Fig. 5
figure 5

Histogram of the absolute inference error achieved with the localization recursive CNN applied to the training and evaluation datasets. Each column corresponds to a different resolution step from 4 mm on the left to 0.5 mm on the right. The evaluation dataset is plotted in blue, and the training dataset in orange

Aortic valve annular plane orientation

After training of the orientation CNN, we applied it to the entire training and evaluation datasets without data augmentation. We used the inferred aortic valve annulus centroid from the recursive localization CNN as the center of the image volume. Figure 6 illustrates the difference in the angulation and position of the image centroid in three representative patients near the 5th, 50th and 95th percentiles of angular error.

Fig. 6
figure 6

Representative CT slices in three patients at the 5th percentile of angulation error (a)–(e), 50th percentile (f)–(j) and 95th percentile (k)–(o). The first column is an axial slice; the second column is a coronal slice; the third column is a sagittal slice, the forth column is an oblique multiplanar reconstruction at the level of the manually labeled aortic valve annulus (cyan), while the last column is at the level of the inferred aortic valve annulus (yellow)

The mean angular orientation error is presented in Table 3. For unseen image volumes in patients with severe aortic stenosis, the angular error was measured at (6.4 ± 4.0)°, which is approximately 53% of the systolic annular plane deflection (12.1 ± 5.0)°.

Table 3 Angular error for the training and evaluation datasets

Figure 7 presents histograms of the angular error for both the training and evaluation datasets. The 95% percentile error was 8.2° for the training dataset and 13.4° for the evaluation dataset. For the evaluation dataset, we observe that in 84.6% of cases, the angular error was within 10° of the labeled angular plane. The gold standard used here is a trained expert human observer. A published study reported that trained expert human observers disagreed by less than 10° in 84.3% of cases [6]. While we cannot reach a strong conclusion given that our datasets are different, we believe that the accuracy of our algorithm approaches published expert-level accuracy.

Fig. 7
figure 7

Histogram of the angular inference error achieved with the orientation CNN applied to the training and evaluation datasets

Correlation of error with cardiac phase and aortic valve annulus area

We stratified image volumes based on the cardiac phase during which they were acquired. This observation did not translate into a statistically significant linear correlation as shown in Table 4.

Table 4 Correlation of accuracy metrics relative to cardiac phase and aortic valve area

We measured the projected area of the 3D cubic spline manually traced at the level of the aortic valve annulus during the labeling of each image volume. We studied the dependence of the error metrics relative to the labeled aortic valve annulus area. The gold standard used here is thus a trained expert human observer. We noted a trend toward higher errors at greater aortic valve areas. This observation is supported by the positive, statistically significant correlation between the 3D centroid error and the aortic valve area as reported in Table 4. This is a consequence of the progressively smaller field of view \( F_{i} \) of the resampled image volume at each resolution step of the algorithm. Note that despite the slightly greater inaccuracy at larger aortic valve annuli for the CNN at 0.5 mm voxel spacing, the accuracy in the inferred centroid position improved by more than 50% from the voxel size of 1 mm to 0.5 mm as shown in Table 2.

The angular error of the orientation CNN did not demonstrate a significant correlation with the aortic valve annulus area as reported in Table 4 and Fig. 7. This is likely a consequence of the voxel size selected at 0.75 mm, corresponding to a field-of-view dimension of 48 mm, which covers the entire aortic valve annulus even in larger anatomies.

Impact of inferred aortic valve annulus plane on dimensions measurements

The errors in the measured aortic valve annulus area and perimeter when measured in the inferred plane location and orientation are presented in Table 5. Note that on average the relative error in the valve area was (4.73 ± 5.32)% and the relative error in the valve perimeter was (2.46 ± 2.94)%. We noted a positive linear correlation between the error in plane position and orientation and the error in measured area and perimeter (Fig. 8).

Table 5 Error in the aortic annulus dimensions measured in the predicted plane position and orientation
Fig. 8
figure 8

Errors in the annulus area and perimeter measured in the plane predicted by the proposed algorithm as a function of the error in the angular orientation and 3D centroid distance of the predicted plane. The red line represents a linear regression relative to these quantities. Each data point corresponds to a different patient (n = 94)

Limitations and future directions

The dataset included only 94 patients for a total of 1007 CT image volumes, which remains limited. The patients included in the study had degenerative aortic valve stenosis. We excluded patient with prior surgical valve replacement. We also excluded congenitally abnormal valves such as bicuspid or quadricuspid valves. It would be an interesting study to extend our dataset to include cases with these pathologies. It would also be interesting to study how image quality affects the results.

The method presented in this manuscript does not directly provide a measurement of the aortic valve annular size. By using the method presented herein, one may generate a multiplanar reconstruction at the level of the aortic valve annulus. The sizing problem reduces to a 2D image segmentation, which has been addressed extensively in the literature [28]. In a recently published manuscript, the area and perimeter of the aortic valve annulus were measured accurately by a convolutional 2D neural network when applied to 2D MPR at the level of the aortic valve annulus [29]. An alternative approach would be to formulate the problem directly as a regression in order to yield the area and perimeter directly from a CNN. We aim to study these methods in follow-up studies.

Another important aspect is that the design of the method does not rely on a priori information specific to the aortic valve annulus but on a large amount of labeled CT datasets. We hypothesize that the method could be readily generalized to other anatomical structures, cardiac or otherwise.

Conclusion

Herein, we provided implementation details of a recursive multiresolution CNN algorithm to infer the location and orientation of the aortic valve annular plane in CT image volumes. Despite using fully tridimensional convolutions, this algorithm remains computationally efficient. We trained and evaluated the algorithm in 1007 ECG-gated CT image volumes. We observed a performance that approached expert-level accuracy. The algorithm includes no heuristics specific to the aortic valve and may thus be generalized to other anatomical features in the future.