Keywords

1 Introduction

Atrial fibrillation (AF) causes a 5-fold increase in risk of ischemic stroke, being the cause for approximately 15% all strokes in the United States [7]. Around 70% to 90% of the cases, thrombi are formed inside the left atrial appendage (LAA) in patients with non-valvular AF [17]. The LAA is a complex tubular structure, with a high inter-patient variability, originating from the left atrium (LA). Studies have shown there is a correlation between LAA morphology and risk of ischemic stroke [4, 8]. Di Biase et al. [4] reported that the popular named Chicken Wing morphology is associated with lower risk of stroke compared to non-Chicken Wing morphology. Several studies have focused on describing the varying LAA morphology, where the morphology is described by the LAA length, width, orfice/ostium size, and number of lobes. In a study based on 220 LAA obtained from necropsy studies, Ernt et al. [5] reported variation in LAA volumes ranging from 770 to 19,270 \(\mathrm {mm}^3\), minor orifice diameters ranging from 5 to 27 mm, major orifice diameters between 10 and 40 mm, and LAA lengths ranging between 16 and 51 mm.

The aim of this work was to quantitatively describe the LAA shape variation and clustering using a statistical shape model. We trained a point distribution model (PMD) based on LAA surfaces reconstructed from multidetector computed tomography (CT) images and later combined the trained PMD together with unsupervised clustering methods to examine the natural clustering of the LAA shapes.

2 Data and Preprocessing

The LAA surfaces are reconstructed from CT images, provided by the Department of Radiology, Rigshospitalet, University of Copenhagen. The data are acquired as part of the Copenhagen General Population Study [12], where participants are offered a research cardiac computed tomography angiography (CCTA) examination [6]. Participants are excluded from the examination if they, among other things, suffer from AF. The CCTA examinations are performed on a 320 detector CT scanner (Aquilion One, Toshiba, Medical Systems), with the scanner settings: Gantry rotation time 350 ms, detector collimation \(0.5 \times 320\), X-ray tube voltage 100–120 kV, and X-ray tube current 280–500 mA. The acquired CT images have a matrix size \(512 \times 512 \times 560\) and a voxel size \(0.5 \times 0.5 \times 0.25\) mm.

One hundred and five CT images with high contrast are randomly selected from the database (see Fig. 1a for an example of CT image). The raw CT-volumes are manually cropped, using Osirix, to only contain the tracer-enhanced regions with the LAA. After cropping, CT-volumes are blurred with a Gaussian filter kernel with standard deviation at 0.5 mm and the iso-surfaces of the inner part of the LAA is computed using the Marching Cubes algorithm [11] with a manually set iso-level in the range 150–250 Hounsfield Units. The selected iso-surface level varies, due to variations of the amount of tracer in the LAA. Image blurring and surface reconstruction are conducted using 3D Slicer [1]. A reconstructed LAA surface from the example CT image is shown in Fig. 1b.

Fig. 1.
figure 1

(a) Slice from raw cardiac computed tomography (CT) image. (b) Left atrial appendage (LAA) surface reconstructed from the CT image shown in (a), where the red marks are manually placed landmarks. HU: Hounsfield Units (Color figure online)

3 Methods

The first goal of this work is to build a statistical shape model [3] to quantitatively describe the shape variation of the LAA. This model is created from a training set containing N Procrustes-aligned shapes; shapes in the training set are represented as a series of corresponding points.

3.1 Point Correspondence

Point correspondences between LAA surfaces are determined by registering a source surface \(\mathcal {S}\) to each target surface \(\mathcal {T}\) in the training set, such that each vertex is positioned on the same anatomical structures in both \(\mathcal {S}\) and \(\mathcal {T}\). Initially, \(\mathcal {S}\) is aligned to \(\mathcal {T}\) with a similarity transform by registration of four manually placed landmarks equally distributed in the LAA orifice (two out of the four landmarks are visible as the red marks in Fig. 1b). Furthermore, the registration is fine-tuned by an iterative close point (ICP) alignment [16]. The aligned source is now denoted \(\mathcal {S}_{ICP}\). The surface registration of \(\mathcal {S}_{ICP}\) and \(\mathcal {T}\) is performed using a non-rigid volumetric registration algorithm. To be able to use the volumetric registration algorithm, \(\mathcal {S}_{ICP}\) and \(\mathcal {T}\) must be represented as volumes. We represent \(\mathcal {S}_{ICP}\) and \(\mathcal {T}\), as signed distance fields (SDF), where each voxel value in the SDF is equal to the signed Euclidean distance to the surface [13, 15].

The non-rigid volumetric registration is conducted by solving the optimization given by:

$$\begin{aligned} \hat{\mathbf{T }}_\mu = \underset{\mathbf{T _\mu }}{\arg \min }\left( \mathcal {C}\left( \mathbf T _\mu ;I_F,I_M\right) \right) \end{aligned}$$
(1)

Here \(\mathcal {C}\) is a cost-function, \(I_F\) is the fixed volume and \(I_M\) is the moving volume, where \(I_F\) and \(I_M\) are the SDF representation of \(\mathcal {S}_{ICP}\) and \(\mathcal {T}\) respectively. \(\mathbf {T_\mu }\) is the non-rigid volumetric transformation that transform \(I_M\) to \(I_F\). The transformation is parameterised by a parameter-vector \(\varvec{\mu }\). In this work, we use a multi-level B-Spline transformation with five resolution levels. The cost function we are going to minimize is described by:

$$\begin{aligned} \mathcal {C} = \omega _1MSD\left( \mu ;I_F,I_M\right) + \omega _2\mathcal {P}_{CP}(\mathbf {x},\mathbf {y}) + \omega _3\mathcal {P}_{BE}\left( \mathbf {\mu }\right) , \end{aligned}$$
(2)

where MSD is the mean squared voxel value difference similarity measure, \(\mathcal {P}_{CP}(\mathbf {x},\mathbf {y})\) is penalizing large distances between landmarks and \(\mathcal {P}_{BE}\left( \mathbf {\mu }\right) \) is the bending energy penalty term. The weights: \(\omega _1 = 1\), \(\omega _2 = 0.15\) and \(\omega _3 = 2\) are optimized using a grid search. The optimal transformation parameters are found using adaptive stochastic gradient descent [9] as optimizer, with 2048 random samples per iteration for a maximum of 500 iterations as implemented in the elastix library [10]. The estimated transformation determined between \(I_F\) and \(I_M\) is applied to \(\mathcal {S}_{ICP}\) and the transformed surface is \(\mathcal {S}_T\).

Since the volumetric registration is conducted on the SDF, it is not guaranteed that the zero level iso-surfaces fits perfect after the registration. This problem is solved using an approach originally described in [14], where vertices in \(\mathcal {S}_T\) are propagated to \(\mathcal {T}\) using Markov Random Field regularization of the correspondence vector field. After the vertices in \(\mathcal {S}_T\) are propagated to \(\mathcal {T}\) we have obtained a point correspondence surface \(\mathcal {S}_{COR}\), where each vertex corresponds to a vertex in \(\mathcal {T}\). The set of surfaces with point correspondence is used to construct a point distribution model using Procrustes alignment and principal component analysis (PCA) as described in [3].

3.2 Shape Clustering

To examine the natural shape clusters formed by our data set, we use the trained point distribution model to represent the surfaces by their PCA loadings and use the loadings to identify shape clusters. The PCA loadings \(\mathbf {b}\) of a given surface is determined by:

$$\begin{aligned} \mathbf {b} = \mathbf {P}(\mathbf {x}' - \mathbf {\bar{x}}) , \end{aligned}$$
(3)

where \(\mathbf {x}'\) is the input surface, \(\bar{\mathbf {x}}\) is the Procrustes average shape of the N aligned \(\mathcal {S}_{COR}\) and \(\mathbf {P}\) is the set of the t first eigenvectors. We use the PCA loadings to estimate the natural number of shape clusters, by examining the log-likelihood (LLH) computed from multivariate Gaussian mixture models (GMM) fitted to the loadings. The probability density function of a GMM can be written as:

$$\begin{aligned} p(\mathbf {x}) = \sum _{i=k}^K\pi _k\mathcal {N}\left( \mathbf {x}|\varvec{\mu }_k,\varvec{\varSigma }_k\right) , \end{aligned}$$
(4)

where \(\mathbf {x}\) is the loadings, \(\pi _k\) is the mixing coefficient, K is the number of mixture components and \(\mathcal {N}\left( \mathbf {x}|\varvec{\mu }_k,\varvec{\varSigma }_k\right) \) is the multivariate Gaussian distribution with mean \(\varvec{\mu }_k\) and covariance matrix \(\varvec{\varSigma }_k\). From Eq. (4) the LLH function is given by [2]:

$$\begin{aligned} p(\mathbf {x}|\varvec{\pi },\varvec{\mu },\varvec{\varSigma }) = \sum _{i = 1}^N\ln \left( \sum _{k = 1}^K \pi _k\mathcal {N}(\mathbf {x}_i|{\varvec{\mu }}_k,{\varvec{\varSigma }}_k) \right) \end{aligned}$$
(5)

In order to avoid over-fitting, the number of shape clusters is determined by using two level cross-validation. The first level performs leave-one-out cross-validation. Here the data are divided into \(N-1\) training shapes and one test shape. The training set is used to train a GMM with K mixture components, while the test shape is used to validate the trained GMM, using the LLH as quality metric. This procedure is repeated until all N shapes have been used as the test shape, after which the mean test LLH is computed based on the N test LHH. The second cross-validation level iterates through \(K = 1\dots 10\) mixture components, where the first level is conducted for every K. The number of shape clusters is equal to the number of mixture components, which results in the highest mean test LLH.

In order to identify shape appearance of the natural formed clusters, a new GMM is trained on the entire data set, where the number of mixture components is equal to the number of estimated shape clusters. We can now use the model to randomly sample PCA loadings within the different shape clusters and generate synthetic shapes base on the loadings by:

$$\begin{aligned} \mathbf x = \bar{\mathbf{x }} + \mathbf {Pb} \end{aligned}$$
(6)

The synthetic shapes can be visualized to identify the different shape appearance of each cluster.

The GMMs are fitted to the training data by estimating a set of model parameters: \(\pi \), \(\varvec{\mu }\), and \(\varvec{\varSigma }\), that maximize the LLH function. In this work, we estimate the parameters by the Expectation Maximization algorithm, with 100 random initialization and use the set of model parameters with highest training LLH.

4 Results

The point correspondence framework is applied to our 105 reconstructed LAA surfaces. We use the template surface shown in Fig. 2a as source. The template is the average shape of N Procrustes aligned \(\mathcal {S}_{COR}\). The set of \(\mathcal {S}_{COR}\) is computed as an initial registration of \(\mathcal {S}\) and \(\mathcal {T}\), where \(\mathcal {S}\) is selected randomly from the pool of LAA surfaces.

We are able to determine point correspondences of the majority of the target surfaces (103 out of 105), with a median root mean square distance (RMS) between \(\mathcal {T}\) and \(\mathcal {S}_{COR}\) at 0.6 mm and a 75th percentile at 0.9 mm. The surfaces with RMS equal to the median and 75th percentile are shown in Fig. 2b and c, respectively. The figure shows \(\mathcal {T}\), where the color scale indicate the distance between \(\mathcal {T}\) and \(\mathcal {S}_{COR}\). It is seen that \(\mathcal {S}_{COR}\) matches \(\mathcal {T}\) in most of the surface. It is also seen that the point correspondence framework are not able to find point correspondences in the most distal lobes of the LAA. A visual analysis of all \(\mathcal {S}_{COR}\) shows that two of the surfaces have poor point correspondence and are therefore excluded from the training set, leaving 103 surfaces for the rest of the analysis.

Fig. 2.
figure 2

(a): Template surface determined as the average shape of the N Procrustes-aligned point correspondence surfaces (\(\mathcal {S}_{COR}\)). (b) and (c) target surface, where the color scale indicate the distance to the \(\mathcal {S}_{COR}\). (Color figure online)

The point distribution model is trained on the 103 Procrustes-aligned \(\mathcal {S}_{COR}\) and we choose to represent the shapes using their first five PCA loadings. The first five PCA loadings are used, since the remaining 98 PCA loadings each describes only a small fraction (less than 5 %) of the total shape variation in the studied data. Ten GMMs, with \(K = 1\dots 10\) mixture components, are trained on the PCA loadings and the test LLH is computed from each GMM using cross-validation. The mean test LLH and mean train LLH are shown in Fig. 3 for each validated GMM. It can be seen that, according to the LLH test, a GMM with two mixture components gets the best validation performance. This means that the studied dataset of LAA most likely form two different shape clusters.

Fig. 3.
figure 3

Training and testing log likelihood (LLH) after training Gaussian mixture models (GMMs) on 103 PCA loadings using two level leave-one-out cross-validation.

In order to identify the shape appearance of the clusters, we train a new GMM, with two mixture components, on the entire data set. We generate four synthetic shapes by sampling PCA loadings from mixture component one and two of the new GMM, which can be visualized in Figs. 4 and 5. It can be observed in Fig. 4 that surfaces sampled from cluster one have similar LAA morphology, with an obvious bend in the primary lobe, a particular characteristic of Chicken Wing morphologies. It can also be appreciated the variability within the cluster in terms of LAA orifice characteristics and volumes. On the other hand, surfaces samples from cluster two, illustrated in Fig. 5, do not present a bending of the primary lobe, but a wider one with several secondary lobes. These particular characteristics are typical of non-Chicken Wing LAA morphologies such as Cauliflower ones.

Fig. 4.
figure 4

Four synthetic shapes generated by sampling PCA loadings from mixture component one of a Gaussian mixture model (GMM) with two components.

Fig. 5.
figure 5

Four synthetic shapes generated by sampling PCA loadings from mixture component two of a Gaussian mixture model (GMM) with two components.

5 Conclusion

In this work we have presented a full framework for the extraction and quantification of shape clusters of left atrial appendages and demonstrated that the two primary shape clusters broadly correspond to the main LAA morphological categories in standard clinical classification, Chicken Wing and non-Chicken Wing LAA shapes. The framework enables future statistical inference on the relation between LAA shape characteristics and stroke risk.