Introduction

Despite the success of total knee arthroplasty, reported overall rate of dissatisfaction is usually in the 20% range with functional satisfaction being even lower [1]. One of the important reasons for the lack of functional satisfaction after total knee arthroplasty (TKA) is the inability of the implant to replicate the natural movement of the individual’s knee joint. The knee moves differently before and after the implant is placed, and there is significant movement pattern variability among patients [2]. Contributing to these changes is the considerable variability in distal femoral shape [3,4,5]. Patient dissatisfaction may therefore be partially related to a mismatch between the preoperative shape of the distal femur and its shape postoperatively, either due to the shape of the femoral component or its positioning [3]. This mismatch can have detrimental functional implications to the joint and consequently an awkward ‘feel’ during use, as well as adversely affecting the ligament tensions, which can affect range of motion, patellar tracking, stability, and wear.

Historically, the objective of TKA was only to relieve the pain of severe osteoarthritis (OA), whereas functional expectations were low. With younger and more active patients receiving TKA, functional expectations are higher and relief of pain alone is no longer adequate [6]. Surgical navigation (i.e., computer-assisted surgery) for TKA has been shown to improve coronal plane precision in TKA [7] and in a recent meta-analysis to improve functional outcomes in the short term [8]. However, the goal of surgical navigation to this point in time has been to improve the precision of positioning the components only in a standardized way, as with traditional instrumentation, which does not take into account variations in anatomy of individual distal femurs and proximal tibias [4]. Patient-specific implant positioning may be a strategy to improve function postoperatively, and early results from systems using custom instrumentation to fit the femoral component to the patient’s native anatomy based on preoperative imaging have resulted in improved functional outcome scores [9]. However, these techniques rely on preoperative cross-sectional imaging and fabrication of custom cutting jigs.

A technique that would allow for quick feedback on morphologic concordance between the preoperative femur and the implanted one with no additional imaging requirement, for intraoperative use and postoperative analysis, could potentially offer a pathway to optimize function and satisfaction, if first validated against clinical data.

Statistical shape models (SSMs) can generate various plausible shapes of a certain anatomical structure by adjusting a sparse deformation model, the so-called shape modes [10]. By adjusting the shape modes, an SSM can be fit to given sample points yielding a complete model of the respective anatomy. There have been many data sources described in the literature including surgical navigation data [11,12,13,14], biplanar fluoroscopy [15, 16], ultrasound [17], or 3D reconstruction from 2D radiographs [18]. However, none of these works have validated the accuracy of a statistical shape model fit to clinically acquired point cloud data against gold-standard 3D image data. A successful reconstruction would provide an alternative to additional medical imaging.

Fig. 1
figure 1

Fitting an SSM to surgical navigation point cloud data. Comparing the result to a surface reconstructed from manually segmented MRI data

The objective of the present study was therefore to develop a method to fit an SSM of a distal femur to the respective point cloud data collected during routine navigated total knee arthroplasty and to validate it against MRI data for the same subjects; two different SMM fitting procedures were developed and compared.

Materials and methods

Subjects

A total of 10 patients (4 males, 6 females), all having had severe radiographic and clinical OA of the knee, underwent navigated TKA. The average age of the patients was 63 (range 46–71) and average BMI 36.6 kg/m\(^2\) (range 26–48.3 kg/m\(^2\)). Average preoperative alignment from 3-foot standing X-rays was 10\(^{\circ }\) of varus (range 2\(^{\circ }\)–16\(^{\circ }\)). All patients also had an MRI scan within 2 months preoperatively as part of a previous clinical study protocol. Both studies received institutional review board approval, and informed consent was obtained. Although MRI data are not the most accurate method of validation (e.g., in comparison with cadaveric studies), they have the important advantage of being easily clinically available, non-radiating, and sensitive to relevant soft tissue structures such as cartilage. All personal identifiers were removed prior to data analysis to ensure patient anonymity and confidentiality.

MRI imaging and ground truth generation

MR examinations were performed for each patient using a 1.5 Tesla system (GE Healthcare, Chicago, Illinois, USA) and sagittal PDw turbo spin-echo images acquired (TR/TE 3000/25 ms, flip angle \(90^{\circ }\), slice thickness 2 mm, resolution 0.3125/0.3125 mm). The distal femora including the femoral cartilage were manually segmented by expert users and checked by an experienced clinician (DW) to reconstruct 3D surfaces using the dedicated 3D geometry reconstruction and visualization software AmiraZIBEdition (Zuse Institute Berlin, Germany) [19]. The reconstructed femoral surface triangulations consist of 11,986 vertices with a maximum edge length of about 3 mm, which was accurate enough to serve as a control for comparison of our SSM approximations.

Point cloud acquisition

The \(\hbox {Stryker}^{{\textregistered }}\) Precision Total Knee Arthroplasty Navigation System (\(\hbox {Stryker}^{{\textregistered }}\) Corporation, Kalamazoo, MI, USA) was used for all TKA cases. During the TKA procedure, following capsulotomy, but prior to ligamentous or bony dissection, two infrared segment trackers were used as fixed reference points by attaching them to the femur and tibia using bicortical anchoring pins. A digitization tool with an affixed tracker was used to record the three-dimensional position of anatomical landmarks, axes on the femur, and surfaces, including the anterior cortex (106–241 points), distal medial and lateral femoral condyles (50 points each), posterior medial and lateral femoral condyles (35 points each), lateral and medial epicondyles, and the trochlear center. The accuracy of the camera system has previously been validated at 0.058 mm with estimated clinical accuracy of 0.22 mm [20]. Postoperatively, the saved point cloud data files from each patient were extracted from the operating room and exported to \(\hbox {MATLAB}^{{\textregistered }}\) (The Mathworks, USA) for further analysis.

Distal femur SSM fitting

In order to determine the shape and pose (position and orientation) of the distal femur as accurately as possible, the shape and pose of an SSM of the distal femoral bone was adjusted to optimally match the given point cloud data. Throughout this section, we refer to the following surface distance measures (with vector valued components marked in bold face):

$$\begin{aligned}&\hbox {Mean surface distance}\\&\quad = \frac{1}{N}\mathop \sum \nolimits _{i=1}^N \mathop {\min }\nolimits _{{\varvec{x}}\in {\varvec{S}}} \Vert {{\varvec{y}}_i -{\varvec{x}}} \Vert _2,\\&\hbox {Root mean square (RMS) surface distance} \\&\quad = \sqrt{\frac{1}{N}\mathop \sum \nolimits _{i=1}^N \mathop {(\min }\nolimits _{{\varvec{x}}\in {\varvec{S}}} \Vert {{\varvec{y}}_i -{\varvec{x}}} \Vert _2 )^{2}},\\&\hbox {Maximum surface distance}\\&\quad =\mathop {\max }\nolimits _{{\varvec{y}}_i } \mathop {\min }\nolimits _{{\varvec{x}}\in {\varvec{S}}} \Vert {{\varvec{y}}_i -{\varvec{x}}} \Vert _2 , \end{aligned}$$

where \({\varvec{S}}\) is an arbitrary surface and \({\varvec{y}}_1, \ldots , {\varvec{y}}_N \) are point coordinates in \({\mathbb {R}}^{3}\), representing either vertices of the surface model or point cloud elements, depending on whether a distance between two surfaces or between a surface and a point cloud is measured.

The iterative closest point (ICP) [21] algorithm is often used to adjust an SSM to sparse points. However, since Gaussian mixture models (GMM) outperformed ICP in a recent study [22], we assessed the ability of fitting an SSM to navigation point cloud data for both algorithms. The resulting SSM fit was compared to the ground truth surface reconstructed from MRI data (Fig. 1).

To approximate the pathological femoral shapes for the 10 patients with severe OA, we used an SSM consisting of 184 training cases, including male and female subjects with and without varus/valgus knee malalignment. The shape model creation and establishing of correspondence between shapes was described in detail previously [23]. The explained variance by the number of shape modes is shown in Fig. 2. Sixty shape modes explain approximately 98.9% of variability. The patients enrolled in the current study were not used to develop the SSM. To test the approximative power of the SSM, prior to fitting to the point cloud data, the fitting was adjusted to match the individual patients’ anatomies from the MRI data (rather than the point cloud data) as closely as possible. For the 10 patients of this study, the averaged mean surface distance between SSM reconstruction and ground truth segmentation was 0.54 mm ± 0.2 mm (Table 1). Thus, the SSM was capable of accurately representing the patients’ anatomy. An example is shown in Fig. 3 illustrating the approximation error for case 3 (mean surface distance of 0.60 mm).

Fig. 2
figure 2

Percentage of total variance explained with respect to the number of shape modes for the used SSM with 184 training shapes

Table 1 Mean surface distances (± standard deviation) for each case for the SSM fitted to the ground truth segmentations (rather than the point cloud data) to test the ability of the SSM to approximate the true shape
Fig. 3
figure 3

Surface distance with isolines (1, 3 and 5 mm) between the ground truth MRI surface of case 3 and an SSM consisting of 184 training instances fitted to that surface, showing that the patient-specific shape could be well-captured, before attempting to fit to the point cloud data. (The scale to 6 is provided for consistency with Fig. 5; in this case, no isolines were >1 mm)

Initial alignment of SSM and point cloud data

Since SSM and navigation point cloud data require an initial alignment, independently of the method used, a suitable transformation had to be found. To cover differences in position, orientation and scale, a similarity transformation with uniform scaling was calculated by minimizing the distance between three digitized anatomical landmarks (lateral and medial epicondyles and the femoral center) and the corresponding landmarks defined on the SSM (Fig. 4).

Fig. 4
figure 4

a Initial alignment of the SSM to three anatomical landmarks (red) and adjusted SSM (b)

Adjusting the SSM to point cloud data via an ICP approach

After initial alignment, an iterative optimization was carried out to adjust the SSM. For each point of the acquired cloud, a closest vertex in the SSM is detected in every step. The closest vertices are collected in \({\varvec{v}}\) and their 3D distance vectors to the point cloud in \(\Delta {{\varvec{v}}}\). Vertices of the SSM without matching point cloud data are not considered within the SSM fitting procedure. To minimize the distance (in mm) between the SSM and the point cloud, the following problem was solved by adjusting SSM modes \({\varvec{p}}_k \) through the real coefficients \(b_k\) and a rigid transformation with uniform scale T in an alternating fashion

$$\begin{aligned} \mathop {\min }\nolimits _{b,T} \left( {\left| \left| \left( {\varvec{v}+\Delta {{\varvec{v}}}} \right) -T\left( {{\bar{{\varvec{v}}}} +\mathop \sum \nolimits _k b_k {\varvec{p}}_k} \right) \right| \right| ^{2}+{\upgamma }\cdot \mathop \sum \nolimits _k \frac{b_k ^{2}}{\lambda _k }} \right) , \end{aligned}$$

with the mean shape’s vertices \(\bar{{\varvec{v}}}\) and the eigenvalues \(\lambda _k \) of the SSM as well as a factor controlling regularization \(\gamma \in {\mathbb {R}}_0^+ \). This regularization penalizes shapes that are far away from the mean shape and thus favors configurations that are in a normal range of anatomical variation.

In a compromise between accuracy, robustness and with respect to the sparsity of the navigation data, of the 183 SSM modes only the first third of the most significant modes was adjusted (approx. 98.9% variability). Between the SSM and the point cloud data, the RMS surface distance was calculated, serving as a stopping criterion. The choice of regularization is important for the fitting accuracy. Studying the sensitivity with respect to this parameter showed that \(\gamma <0.05\) yields an anatomically incorrect fit. Choosing \(\gamma >2\) regularizes too strong toward the mean shape and hardly adjusts the SSM at all. Tests showed that regularization with \(\gamma =0.5\) improved the mean results slightly by approx. 0.05 mm compared to parameters between 0.1 and 1.

Table 2 Average number of landmarks (± standard deviation) used for sparse fitting

Adjusting the SSM to point cloud data via GMM

In a recent study, a GMM-based algorithm outperformed the ICP algorithm for fitting an SSM to a sparse set of points [22]. That algorithm adjusted the modes of an SSM with a probabilistic approach considering anisotropic covariances, which are oriented according to the surface normal, however, without simultaneously optimizing the transformation. The expected amount of anisotropy is chosen with the parameter \(\eta \), usually in the range of 2–8 (c.f. [22]). In this study, we used \(\eta = 2\).

We deployed the aforementioned method to adjust the shape modes of our SSM. To optimize both transformation and shape modes, in the GMM-based method, the initial alignment and the adjustment of shape modes were executed in an iterative manner. Acknowledging the sparsity of the point cloud, we brought in a strong regularization toward the mean shape. Due to implementation details, this was achieved through scaling of the SSM eigenvalues by a factor of 0.025 before applying the GMM-based fitting procedure in order to match the SSM to the sparse data. The GMM approach uses regularization similar to the ICP approach, and thus, this scaling is similar to a regularization factor \(\gamma =40\) in the aforementioned ICP setup. However, the remaining objective functions of ICP and GMM cannot be compared directly. Again, we evaluated the sensitivity of fitting accuracy to the choice of regularization. Reasonable regularization was found for \(\gamma \) in the interval from 20 to 100. The chosen factor \(\gamma =40\) improved the mean results slightly by 0.02 mm compared to parameters in this interval.

Validation

The final SSM fits were further analyzed in AmiraZIBEdition. The quality of the generated models was evaluated visually (Fig. 5) and quantitatively with the help of mean surface distance, RMS surface distance, and maximum surface distance between the fitted SSM and the ground truth surface. Additionally, to evaluate the robustness of the proposed methods to reduced point cloud size, we performed a series of tests on randomly reduced landmark sets. Fitting was done on 100, 75, 50, 25, 15, 5, and 0% (all conditions used a minimum of the three points used to set initial position) of the original number of landmarks. Points were removed from each area of the femur (anterior cortex, distal medial and lateral femoral condyles, posterior medial and lateral femoral condyles) independently. Three landmarks (trochlear center, both epicondyles) that are crucial for the fitting process were not considered for reduction and thus were included in every landmark set used for testing. To take randomness into account, we performed five tests for every reduction level. The average number of landmarks per reduction level is given in Table 2.

Fig. 5
figure 5

Mean surface distance between the SSM fitted to the point clouds and the ground truth surfaces for the two methods. The boxplot showing mean value ± standard deviation with whisker bars at minimal and maximal value

Results

Using the SSM-generated femurs that were fit to the navigation data and then compared to the MRI ground truth, the ICP-based approach performed slightly better on average in mean and RMS error than the GMM approach, but the differences were not significant (\({p}>0.01\)), (Table 3; Fig. 5). The maximum errors were similar for both methods, and they occur in clinically less relevant areas such as the intercondylar notch and the superomedial edge of the articular surface (Fig. 6). Also, there was typically a larger error in areas with osteophytes (Fig. 6). Case 3 had a mean surface distance of 1.20 mm (ICP) and 1.37 mm (GMM), which is close to the mean surface distance averaged over all 10 cases (1.19 mm for ICP and 1.38 mm for GMM). Since case 3 in this way represents a typical result, its surface distance is shown in Fig. 6.

Table 3 Surface distance measures between the SSM fitted to the point clouds and the ground truth segmentations
Fig. 6
figure 6

Surface distance with isolines (1, 3 and 5 mm) between the ground truth surface and the ICP-based fitted SSM to the point cloud for case 3. (Mean surface distance \(=\) 1.20 mm)

Both methods achieved good results in the experiments with a reduced number of landmarks (Fig. 7). We performed pairwise t tests between the results using all landmarks (100%) and the results using a reduced number of landmarks (75, 50, 25, 15, 5 and 0%) for ICP and GMM, respectively. We found significant differences between 100% and 5% (ICP, GMM: \({p} < 0.001\)) and between 100 and 0% (ICP, GMM: \({p} < 0.001\)).

Fig. 7
figure 7

Averaged mean (left) and RMS surface distances (right) between the SSM fit to the point clouds and the ground truth surfaces for the two methods. For each reduction level (apart from 100 and 0%) five runs were performed. Boxplots showing mean value ± standard deviation with whisker bars at minimal and maximal value

For the 10 patients, the average computing time for reconstructing the anatomy was approx. 20 s (ICP) and approx. 25 s (GMM) on a standard personal computer.

Discussion

This study evaluated the accuracy of fitting an SSM to digitized surgical navigation data. To our knowledge, this is the first time a SSM fit to clinical surgical navigation data has been validated against MRI data on the same patient. Two approaches for SSM adjustment were evaluated. For both methods, the maximum errors occurred in less clinically relevant areas, and the mean errors appear acceptable, given that postoperative prosthesis vs preoperative bone distances of up to 6 mm have been reported even in well-functioning patients [2, 3, 5]. The error within the results stayed acceptable even for a reduced number of landmarks, showing the robustness of using an SSM for surface reconstruction.

SSMs have been explored for use in navigated orthopedic surgery previously [11, 13, 14]. They have been used to develop models of the proximal femur using sparse data obtained from femoral heads [11] but were only validated against surgical point cloud data obtained under optimal circumstances from dry cadaveric bone. This technique has also been used on the distal femur using sparse data collected during navigated TKA [12]; however, only the distance from the shape model to the nearest point in the point cloud used to create it was calculated. The technique was not validated against an alternate method of model generation (CT, MRI), leading to uncertainty regarding the accuracy of the technique in areas not digitized. This technique has been applied to ACL surgery [14] with similar limitations in the validation.

Alternative methods of obtaining intraoperative femoral morphology using an SSM fit with biplanar fluoroscopic images have also been used [15], with a high degree of accuracy. However, while this technique may provide superior accuracy, in the case of knee arthroplasty, the intraoperative time and effort taken to perform the fluoroscopic examination is impractical.

One of the major limitations of this work is the accuracy of the MRI segmentation. Although we used this as our ground truth, there can be relatively large errors associated with using this technique [24]. However, MRI is an excellent image modality for the assessment and segmentation of cartilage. The anatomical landmarks digitized during routine navigated total knee arthroplasty were partly acquired on the medial and lateral condyles and thus are likely be located on cartilage instead of bone, thus justifying the use of MRI as the imaging modality. While other methods could possibly provide a more accurate ground truth, a strong feature of the present work was the use of real patient data under standard clinical conditions. Another limitation was the registration of the point cloud data to the MRI data. As this is an imageless navigation system, we had to rely on a few standardized points (medial, lateral epicondyle and femoral center) to set the initial position of the model. Due to the small number of subjects in this study, the data for the males and females were not analyzed separately; future analyses may reveal differences between these groups, in which case the accuracy might be able to be improved even further.

Both methods of SSM fitting would be fast enough to allow for intraoperative feedback with respect to computation time. However, the implementations were not optimized for run time yet. Note that for a given point cloud the SSM fitting is deterministic for both approaches and uniquely determined.

It was attempted to only use areas of clinical interest to perform our comparison (e.g., the femoral condyles and trochlea); however, due to difficulty in predictably determining the transition between condyle and osteophyte it was decided to use the entire distal femur despite the relatively large errors this produced.

As TKA evolves, a patient-specific approach will be demanded by patients. Patient-specific implant positioning may be a strategy to improve function postoperatively, whereby surgical navigation allows the surgeon to place the components with high precision in any position [7].

The results of this study show that even with the relatively sparse dataset available from routine navigated TKA, the SSM can provide a reasonably accurate approximation of the distal femur. These models can be used retrospectively to compare native anatomy with implant positioning, providing valuable insight into patient function and satisfaction. If clinical significance is proven through a larger study (navigation data has been acquired on over 1000 patients to date, in addition to other clinical and functional data), these models could be incorporated into a surgical navigation unit, providing a surgeon with accurate real-time feedback on the exact concordance of the proposed femoral component positioning with the native anatomy without any additional imaging or intraoperative steps. This could allow for optimization of implant selection and position for a given patient and potentially improve patient satisfaction and function.