Introduction

Left ventricular (LV) geometric patterns provide clues in the diagnosis, management, and monitoring of cardiovascular diseases. Two-dimensional (2D) transthoracic echocardiography is an accessible and reliable imaging modality routinely used for the measurement of LV dimensions and wall thickness, which has been shown to predict prognosis in various cardiomyopathies [1,2,3,4,5]. Diagnostic accuracy of transthoracic echocardiogram interpretations by a novice interpreter is low, and an accurate assessment of LV geometry using echocardiography may be limited by the requirement of an experienced interpreter [6,7,8]. Further, these measurements have been shown to show significant interobserver and intraobserver variability [9]. Automation of echocardiographic interpretation using deep learning (DL) can potentially allow non-experts to assess LV dimensions and wall thickness with an accuracy comparable to experts while eliminating intraobserver variability, thus improving reproducibility [10]. As predictions made by DL models are subject to noise and inference errors, it is important to be able to represent uncertainty in the model’s predictions [11]. Previous DL models measuring LV dimensions lacked uncertainty quantification and required manual input of individual end-diastolic frames for data analysis, which limited their real-life applicability [12,13,14]. The aim of this study is to determine the accuracy and reproducibility of DL derived LV dimensions and wall thickness with incorporation of prediction uncertainty.

Methods

Study population

All transthoracic echocardiography studies of consecutive patients ≥ 18 years old from 01/01/2017 to 12/31/2021 at a single tertiary care center with annotations for the measurement of LV dimensions that were verified by level 3 echocardiographers were included in the development of the model. The echocardiographic cines were split in 80:10:10 proportions for training, validation, and testing respectively. The study was approved by the University of British Columbia Clinical Research Ethics Board.

Two-dimensional transthoracic echocardiography and preprocessing

Transthoracic echocardiogram studies were performed by certified academic sonographers using standard commercially available ultrasound systems E9 [GE Healthcare, Milwaukee, WI], EPIQ [Philips Medical Imaging, Andover, MA]), and iE33 [Philips Medical Imaging, Andover, MA]) and recorded on an image and information system (Syngo Dynamics [Siemens Medical Solutions, Ann Arbor, MI]). The distribution of the machines used for data acquisition was E9 (7%), iE33 (43%), and EPIQ (50%). Echocardiographic images were acquired, and measurements were performed offline in accordance with the recommendations of the American Society of Echocardiography [15]. End-diastolic frames were identified visually by frames with the greatest LV cavity. Left ventricular internal diameter (LVID) was measured at end-diastole in the LV minor axis plane at the level of mitral valve leaflet tips. The anteroseptal and inferolateral septal wall thickness were measured at end-diastole to assess the interventricular septum at end-diastole (IVS) and posterior wall thickness at end-diastole (PWT) respectively. LV mass was calculated using the Cube formula. All measurements were verified by a level 3 echocardiographer at our center. All annotations, on-screen texts, and identifications were removed, and the parasternal long axis (PLAX) videos were converted to multidimensional numeric arrays of pixels, which were resized to 224 × 224 while maintaining the aspect ratio, by adding pad pixels of 0 value while maintaining the beam in the center.

Deep learning model development and training

We developed and trained our convolutional neural network model using the open-source library Keras with TensorFlow backend in Python. We used a U-net architecture with skip connections. The downstream (first half of the U-net) consisted of four sets of two 2D-convolutional layers with three max-pooling layers in between and the upstream (second half of the U-net) consisted of two sets of two 2D-convolutional layers, with upsampling at the beginning of each set, followed by two output heads, each consisting of another upsampling layer, two 2D-convolutional layers, and a full connected layer with sigmoid activation function to generate binary segmentation map for that output head. The number of convolutional kernels started from 8 and increased to 64 by the end of the downstream section, and decreased from 32 to 8 in the upstream section. All 2D convolutional kernels had the kernel size of 3 × 3 pixels. The model was optimized for image segmentation to perform landmark detection. The model first identified the correct end-diastolic frames in the cine loops, then used the end-diastolic frames to train end-to-end a two-foci network, with one focus responsible for finding a pair of the landmarks at the septal endocardial border and the inferolateral wall endocardial border below the level of the mitral valve leaflet tips. The other focus was responsible for finding a pair of landmarks to identify the pericardium and right ventricular endocardial border for estimation of IVS and PWT respectively (Fig. 1).

The model selected the correct end-diastolic frames with the largest LVID to identify key landmarks for measuring the LV dimensions in each cine. LVID was measured for all the frames of a PLAX video, and the frame with the largest LVID measurement was used as the end-diastolic frame for further measurement (Fig. 2). Using manually labeled annotations as a reference, the model was trained to automatically measure the LVID, IVS, PWT, and LV mass. Each frame was processed individually, and no temporal information was used. Measurements of LVID, IVS, and PWT were done as a postprocessing step.

Fig. 1
figure 1

Architecture of the deep neural network framework

Fig. 2
figure 2

Frame-by-frame assessment of left ventricular internal diameter (LVID) in all frames of a parasternal long axis video using the deep neural network model

The model was validated with 10% of the total echocardiographic cines, which were also used to calibrate the prediction uncertainty. We then evaluated prediction uncertainty in all of the test sets. Prediction uncertainty was presented as a binary result, and an echocardiographic cine was discarded by the model if the model identified > 2 landmark locations per focus or was unable to generate predictions in five consecutive frames around the end-diastolic frames, or if the Z-score of uncertainties was > 1 based on our validation set. The accuracy of the model was evaluated with a test set from the remaining 10% of the echocardiographic cines, and the mean relative error and 𝑅2 scores were calculated.

The model was trained for 100 epochs, meaning that each training data was seen by the model 100 times. To prevent overfitting on or memorizing the training data, each image was augmented with random rotation (up to 10 degrees), horizontal and vertical shift (by 10% of the image resolution), and zooming in (to up to 90% of the image size). Each batch of input data at each training iteration consisted of 16 end-diastolic frames of 16 randomly selected echo cines. To optimize the model, DICE loss and binary cross-entropy loss were used as the error measures, with equal weight. These were applied on both output heads. The ADAM optimizer with the learning rate of 0.001 was used.

Statistical analysis

Statistical analyses were performed using SAS v9.4 (SAS Institute, Cary, North Carolina). Continuous variables were expressed as mean ± standard deviation. Categorical variables were expressed as numbers (percentages). Data was analyzed using two-way ANOVA, chi-square, and student’s t-test. A p-value of < 0.05 was taken to be significant.

Results

A total of 30,080 unique transthoracic echocardiographic cines from the parasternal long-axis (PLAX) view over a 5-year period were used to train, validate, and test a convolutional neural network model to automatically assess LV dimensions. A total of 24,013 unique studies were used to train the model, and the model was validated with 3,014 echocardiographic cines, which were also used to calibrate the prediction uncertainty. The accuracy of the model was evaluated with a test set of 3,053 echocardiographic cines (Fig. 3). We note that the training, validation, and test sets had mutually exclusive group of patients.

Fig. 3
figure 3

Echocardiographic cines distribution for the model development

In our test set of 3,053 echocardiographic cines, a total of 240 echocardiographic cines were discarded due to high uncertainty. Of the remaining 2813 echocardiographic cines, our model automatically measured the LVID, IVS, PWT, and LV mass with a mean percent error of 5.40%, 11.73%, 12.76%, and 13.93%, and with a mean absolute error of 2.4 mm, 1.1 mm, 1.2 mm, and 20.8 g respectively (Figs. 4 and 5).The 𝑅2 of the model for the LVID, IVS, PWT, and the LV mass was 0.88, 0.63, 0.50, and 0.87 respectively. Our model prediction on all of the measurements improved in its accuracy with prediction uncertainty, and the analysis of the rejected cases demonstrated poor model prediction (Table 1).

Table 1 Comparison of model measurements based on uncertainty prediction
Fig. 4
figure 4

Scatter plot and regression of predicted left ventricular dimension measurement using deep neural network models

Fig. 5
figure 5

Blind-Altman plots for the mean percent error

Our model had similar mean percentage error compared to the state-of-the-art models in landmark detection of LVID, IVS, and PWT (Table 2). Model prediction for IVS and PWT were less accurate than LV mass and LVID, especially with very large LV wall thickness. Our model predictions were more accurate or at least similar to human interobserver variability between two independent expert readers using manual measurements (Table 3) [6]. On average, the model generated an output of 270 frames/second, processing each frame in 3.7ms.

Table 2 Comparison of current study model to state-of-the-art model in landmark detection of left ventricular wall dimensions
Table 3 Comparison of current study model to interobserver variability between two expert readers using manual measurement

We assessed our model’s accuracy of the classification model in predicting LV geometry pattern based on the 2015 ASE/EACVI guideline (Fig. 6) [15]. Our model was able to accurately categorize the LV geometry pattern with F1 scores of 0.73, 0.58, 0.62, and 0.44 (normal, concentric remodeling, eccentric hypertrophy, and concentric hypertrophy respectively).

Fig. 6
figure 6

Confusion matrix demonstrating the accuracy of the classification models in predicting the left ventricular geometry pattern based on the 2015 ASE/EACVI guidelines

Discussion

To the best of our knowledge, the dataset of 2D transthoracic echocardiography studies used for the development of this DL model for assessing LV dimension and wall thickness represented the largest of all datasets that have been used for similar development. The principal findings of this study are as follows: (1) the novel DL model was able to accurately assess the LV wall dimension; (2) the model was able to automatically estimate the LV wall thickness without manual selection of end-diastolic frames; and (3) DL automated measurements of IVS and PWT were less accurate with greater wall thickness.

Our DL model for automating LV dimensions from PLAX views compare favorably with the current state-of-the-art models. In practice, LV thickness is measured using an end-diastole frame that is identified manually by selecting the frame with the largest LV cavity. Previous DL models have required users to manually select end-diastolic frames to be inputted into the model, which limits the real-life applicability of automated LV wall dimension assessment. Previous DL models have been trained to select end-systolic and end-diastolic frames automatically, but have not been used to measure and report LV wall thickness [16,17,18,19,20,21,22]. The mean percentage error ranged between 5.40 and 13.93% for our model, which is well within the range of typical interobserver variability between expert clinicians, while eliminating intraobserver variability [9, 23, 24]. Our model contributes to the previous work in LV wall dimension measurements by making a truly autonomous model one step closer. Although our study selected PLAX videos specifically for assessment, our group has demonstrated that DL models can accurately classify 15 standard views in two-dimensional echocardiography, and that automated selection of PLAX video is feasible [25, 26].

Similar to previous work, our model was less accurate in estimating IVS and PWT when compared to LVID, especially in cases of greater wall thickness [13]. Several considerations are important. First, the measurements could have been overestimated by including the RV trabeculation, papillary muscles, or the sigmoid septum; however, our model generally underestimated the wall thickness as opposed to overestimation. It is possible that our model measured the septum at the tapered section, underestimating its true thickness. As well, the model had fewer images of markedly thickened walls to train on. One way of improving the accuracy of the model would be to include more images with very thick LV walls in the training set and to set a geographical limitation on the landmark detection to avoid the end of the septum where the tapering may occur.

This study was the first to incorporate prediction uncertainty in a DL model for assessing LV wall dimensions. Uncertainty quantification in artificial intelligence is important, as the model is able to convey to the users the level of confidence for its predictions and users can be apprised of any uncertainty of its predictions. However, there are no universally accepted parameters for the level of uncertainty in measurement of LV wall dimension by a DL model. Further studies are needed to demonstrate optimal uncertainty quantification for the measurement of LV dimensions.

This study adds to the growing body of research demonstrating the feasibility of DL models for complex echocardiographic assessments [27]. DL models can assess the quality of transthoracic echocardiograms, identify PLAX views from other standardized views, and automatically assess the LV wall from PLAX videos, but no one DL model has yet been developed that can comprehensively complete all theses tasks. Automation of LV wall dimension assessment from an undifferentiated set of echocardiographic videos would be of interest in future studies, as well as assessing the utility of DL models in the diagnosis and the management of various cardiomyopathies.

Study limitations

We acknowledge several limitations in our study. We used the manual measurements by trained sonographers as the ground truth label when developing our model, which are susceptible to human errors. Our model was trained from echocardiograms done at a single tertiary care center, limiting its generalizability. However, we included a large set of consecutive echocardiograms done in both inpatient and outpatient echocardiography labs at our tertiary care center that includes a diverse set of populations and pathologies. We only included echocardiographic studies of good quality for LV dimension and wall thickness. However, poor echocardiographic studies are unlikely to yield accurate LV wall dimensions, and our model’s prediction uncertainty would likely have discarded those studies. There is also uncertainty in how the model will perform with certain anatomic variations, such as basal septal hypertrophy or apical hypertrophy.

Conclusion

In this pilot study, we developed a novel DL model for automating the measurement of LV dimensions and wall thickness by 2D echocardiography. The model performed well for measurements of LVID and LV mass, and moderately for IVS and PWT. The accuracy appeared reduced with greater LV wall thickness.