Introduction

Pedicle screw fixation is a spinal fusion technique that involves the implantation of screws into vertebral pedicles to act as anchor points for rods to restrict movement between those vertebrae [21, 27]. Fusions are a common treatment for a variety of spinal conditions including lumbar stenosis, spondylolisthesis, degenerative disk disease, and disk herniation [1, 4, 19]. Although fusion can occur at any spinal level, the majority of cases in this study are in the lumbar region. The number of lumbar spinal fusion (LSF) cases is increasing annually, with over two million people having undergone a LSF between 2004 and 2015 [19]. The prevalence of LSF was estimated to be 79.8 per 100,000 individuals [19].

The conventional method for pedicle screw insertion is the freehand method, oftentimes with intraoperative fluoroscopy guidance [30]. The primary outcome measure for pedicle screw insertion is placement accuracy. Placement accuracy is conventionally measured using a grading scale that rates the implantations based on the amount of screw deviation outside of the pedicle. There are numerous grading scales, including Gertzbein and Robbins [7], Rampersaud [23], and Youkilis [31]. A standard metric for acceptance of screw placement is less than 2 mm outside of the pedicle, as measured in the medial–lateral direction [7, 25]. Measuring screw placement accuracy in this manner requires subjectivity and does not quantify screw placement related to the intended, ideal location for that patient.

In efforts to improve placement accuracy and clinical outcomes, including operating room time, radiation exposure, and longevity of hospital stay, surgical robots were developed to assist in spinal fusion surgery. There are a variety of surgical robots currently on the market including Renaissance [8, 12], Mazor X [8, 12], ROSA [8, 15], TINAVI [16], and ExcelsiusGPS [30]. There is a compilation of literature comparing robot-assisted screw placement to the freehand method, with debate as to whether or not robotic assistance actually leads to increased accuracy [4, 6, 16, 17, 32]. A review by Ghasem et al. included 12 studies that compared robot-guided surgery to the freehand method and showed that 10 studies demonstrated an increase in placement accuracy when robot-assistance was used compared to one study that showed no difference between the methods and another study that showed worse accuracy with robotic guidance [8]. However, it has been shown that procedures that utilize robot-assistance compared to those without have decreases in length of hospital stay [10, 13] and radiation exposure [10, 14, 16, 24]. These factors are beneficial to both patients undergoing the procedure and hospital staff, as well as an associated cost reduction.

Previous studies have compared robot-assisted procedures to conventional methods by analyzing screw placement accuracy using the aforementioned classifications [3, 11, 22, 25, 26, 28]. The largest of these studies evaluated robotic guidance of 3131 pedicle screws in 593 patients over a 4 year period [3]. Although this was a large multicenter study across 14 locations, there was variability in the criteria used for clinical acceptance of placement across locations and surgeons, so implants could not be directly compared. Three studies have quantified robotic accuracy by comparing implanted screws directly to the target locations, but they only analyzed entry and exit point deviation or angular deviation in axial and sagittal views and not the deviation in the pedicle region where clinical grading scales measure accuracy [3, 28]. To the authors’ knowledge, no studies have used automated measurements to remove human input and bias from the measurement process. The process of fusing preoperative images with intra- or postoperative images, which is a necessary step to compare implanted screws to the planned locations, involves manual alignment which has not been previously quantified.

The objective of this research is to measure pedicle screw placement accuracy using a novel automated measurement system that directly compares the final implanted screw location to the planned target location in all three anatomical views. A second objective is to quantify the uncertainty associated with the fusion process of aligning preoperative and intra- or postoperative scans. This system was used to quantify accuracy of a robot-assisted pedicle screw insertion procedure using the Mazor X Stealth Edition robotic-guidance system in a large cohort of 122 patients with a total of 500 screws implanted across four surgical centers.

Materials and Methods

Patient Inclusion and Demographics

A total of 122 patients were included in this study with 529 pedicle screws implanted. Of the 529 total screws implanted, 500 screw placements were included in the analysis with 29 excluded due to visibility of the implanted screws in the postoperative scans—all metrics for an implant were excluded if the scan resolution in the sagittal or coronal plane was too low to properly differentiate between screw and bone. Of the total screws analyzed, 420 were in the lumbar spine region, 70 in the sacral, and 10 in the thoracic. 115 of the patients had 3 or less vertebrae fused together and the remaining 7 patients had 4 or more vertebrae included in their fusions. Of the 122 patients, 72 were female and 50 were male. The mean age of the patients was 62 ± 12 years. The mean body mass index (BMI) of the patients was 30.0 ± 5.6, and 13 patients were current smokers. Patient clinical diagnoses included 44 patients with spondylolisthesis, 37 with spinal stenosis, 7 with flat back deformity, 7 with lumbar instability, 5 with spondylolysis, 2 with retrolisthesis, and 1 with each of the following—scoliosis, recurrent disk herniation, recurrent synovial facet cyst, pseudoarthritis, and 16 with a combination of the above conditions. These patients underwent surgery at four surgical centers, with a single surgeon operating at each center. The minimum number of screws implanted at any given center was 84. This resulted in a statistical power of 0.92–0.99 for all metrics except perpendicular deviation in the axial plane, which had a power of 0.56. All patients included were implanted consecutively at each center from July 2018 to December 2020. Of the total cases, 108 were minimally invasive (88.5%), and the other 14 were open procedures. As this was a retrospective study where all data were collected as part of standard patient care and these data were anonymized at their respective centers before inclusion in this work, this study was granted exempt status by the Boise State University Institutional Review Board.

Surgical Procedure

All patients received a preoperative computed tomography (CT) scan. This scan is used by the surgeon to plan pedicle screw placement in the navigation software (Mazor, version 4.0 and 4.2; Medtronic, Dublin, Ireland). On the day of surgery, the patient is held in a prone position. An O-Arm is used to take a fluoroscopy scan of the patient to register their position and the position of the robotic arm in relation to their anatomy (Fig. 1a). This scan is used to register the patient’s current position with the scan used for the preoperative plan. The robotic arm is then moved to the necessary position for the pre-planned screw trajectory. The robotic-end effector is used as a guide while the surgeon inserts the screw (Fig. 1b). The screw placements are verified either intraoperatively using an O-Arm scan or postoperatively using a CT scan. The scans included in this study to measure placement accuracy include 90 patients (375 screws) that had intraoperative O-Arm images and 32 patients (125 screws) that had postoperative CT images taken between 10 and 17 months after surgery.

Fig. 1
figure 1

a Registration of the robotic platform in the operating room. AP and oblique intraoperative X-ray images are taken of the patient’s bony anatomy and the amber-colored frame attached to the robot arm positioned over the patient’s body. These images establish the patient’s anatomy and relate it back to the preoperative scan used to plan the screw placements. b Placement of percutaneous screws through the robotic-end effector with real-time navigation on the guidance system screen

Screw Placement Accuracy

Deviation from the intended screw location was determined in all three anatomical planes. The metrics measured to determine placement accuracy are medial–lateral (ML) and superior–inferior (SI) deviation in the pedicle region, perpendicular deviation and angular deviation in the axial plane, and perpendicular deviation and angular deviation in the sagittal plane (Fig. 2). These metrics are measured between the target screw location from the preoperative plan and the actual location of the implanted screw as seen on post-implantation scans.

Fig. 2
figure 2

Metrics used to determine pedicle screw placement accuracy. All measures are determined as the deviation between the planned target screw location (red lines and dots) and implanted screw location (blue lines and dots). a Superior–inferior (SI) and medial–lateral (ML) deviation in the pedicle region measured in the coronal plane. Perpendicular deviation in the b axial and d sagittal planes from the base of the screw tulip to the implanted screw trajectory. Angular deviation in the c axial and e sagittal planes measured as the angle between the trajectories of the planned and implanted screw locations

An algorithm was developed in MATLAB 2020b (The Mathworks, Inc., Natick, MA) to automate the measurement of screw placement accuracy. This algorithm was adapted from a previously published approach to automatically quantify electrode placement accuracy after deep brain stimulation surgery in patients with Parkinson’s Disease [29]. It utilizes image-processing tools to locate the target screw location and the implanted screw and then quantifies placement accuracy. Color filtering is used to locate the planned screw locations in the images. The implanted screws are found using a contour map based upon the grayscale values of the intra- or postoperative image. Due to all measurements being taken in the pixel space of the image, all distance measurements must be converted from pixels to a standard unit of mm. The ML and SI deviations in the pedicle region are measured as the horizontal and vertical distances, respectively, between the center of the target screw location and the center of the implanted screw (Fig. 2a). The center locations are determined when looking at the screws from the coronal plane at the smallest diameter of the pedicle. The perpendicular deviations in the axial and sagittal planes are measured as the perpendicular distance from the posterior of the planned screw shank at the base of the tulip to the trajectory along the shank of the implanted screw (Fig. 2b, d). The angular deviations in the axial and sagittal planes are the angle between the trajectory of the target screw location and the trajectory along the shank of the implanted screw (Fig. 2c, e).

Measurement Uncertainty

To compare the location of the implanted screws to the target screw locations, the post- or intraoperative scan, showing the implanted screws, must be fused to the preoperative CT scan containing the target location. This involves aligning pre- and intra- or postoperative scans in all three anatomical planes (Fig. 3). The fusion process is completed in the Mazor robotic software (RND version 4.2) and begins with an initial alignment by the software registration algorithm. Then manual adjustment, specifically rotation and translation in six degrees of freedom, occurred until the spinous processes, transverse processes, base of vertebral body, and spinal canal were properly aligned. Fusions were performed by two evaluators with each evaluator completing all fusions within a single center.

Fig. 3
figure 3

Fusion of intra- or postoperative images to preoperative CT scans in the a axial and c sagittal views. The postoperative image showing the implanted screw locations is displayed inside of the red circle. The planned locations for the screws, with the left implant shown in yellow and the right implant shown in blue, are overlain on the postoperative image in the b axial and d sagittal views

The fusion of the preoperative and intra- or postoperative scans is the only part of the measurement process that requires human input that could cause potential variance to the calculated screw placement accuracies. To quantify this uncertainty associated with the fusion process, a subset of 40 implants (10 from each center) were fused by both evaluators. The fusion process maps the planned screw location from the preoperative image onto the scans showing the implanted screws. When this is performed independently by both evaluators, the target location shows up in a slightly different location on the intra- or postoperative scan. The difference between the two mapped targets is the uncertainty associated with the fusion process. This uncertainty was calculated for the ML and SI deviations in the pedicle region and angular deviations in the axial and sagittal planes.

To measure the effect this fusion uncertainty had on the overall screw placement accuracy values, the interobserver variability of the final placement accuracy values was calculated. The same subset of 40 implants as those used to calculate the uncertainty in the measurement system were utilized. Each implant was evaluated using the automated measurement system for all six screw placement accuracy metrics. The resulting placement values for each evaluator were compared to see if there were statistical differences.

Manual Measurement Comparison

The development of the automated measurement process eliminates human variance in measurement and bias. To assess the benefit of an automated approach, the same six screw placement accuracy measures described previously were manually and independently measured by two evaluators. The manual measurements were performed on a subset of 40 implants (10 from each center). The metrics found to be significantly different within this subset had a statistical power between 0.87 and 0.99. Each evaluator followed the same set of step-by-step instructions for each metric. The measurements were taken after the evaluators completed tutorials on the software and were confident using the necessary tools. The manual measurements were compared to each other as well as the automated placement values.

Grading Scale Placement Accuracy

The Gertzbein and Robbins criteria were used to grade screw placement accuracy using conventional methods to highlight the difference between the measurement system presented in this study and traditional accuracy measures [7]. All measurements and classifications were performed by an independent radiologist. Placements were given a grade of A through E with the following criteria: (A) screw is fully within the pedicle, (B) 2 mm or less deviation outside of the pedicle, (C) greater than 2 and up to 4 mm deviation outside of the pedicle, (D) greater than 4 and up to 6 mm deviation outside of the pedicle, and (E) greater than 6 mm deviation outside of the pedicle.

Statistical Metrics

Statistical comparisons between manual measurements, interobserver reliability, and left and right sides were quantified using a paired t test. The effects of center, spinal region, and type of procedure were evaluated using a one-way ANOVA. A p value below 0.05 was considered statistically significant. All accuracy values given are mean ± one standard deviation.

Results

A total of 500 pedicle screws were analyzed, of which 420 were in the lumbar spine region, 10 in the thoracic, and 70 in the sacral. The screw placement accuracies based on spinal region are shown in Table 1. The mean ML deviation in the pedicle region was 1.75 ± 1.36 mm, and 333 screws (66.6%) had a deviation less than or equal to 2 mm. Of the total screws, 123 and 377 were implanted with a deviation in the medial and lateral directions, respectively. The mean SI deviation in the pedicle region was 1.52 ± 1.26 mm and 370 screws (74.0%) had a deviation less than or equal to 2 mm. The deviation occurred in the superior direction in 141 screws and in the inferior direction in 359 screws. In the axial plane, the mean perpendicular deviation was 2.00 ± 1.54 mm and the angular deviation was 2.40° ± 2.07°. In the sagittal plane, the mean perpendicular deviation was 2.16 ± 1.74 mm and the angular deviation was 3.88° ± 3.43°.

Table 1 Screw placement accuracy values based on spinal region (mean ± SD)

The uncertainty of the measurement process associated with the fusion step was calculated on a subset of screws that included 10 from each of the four centers. The resulting uncertainties in the ML and SI deviations in the pedicle region were 0.67 ± 0.81 mm and 1.45 ± 2.00 mm, respectively. The uncertainty associated with angular deviation in the axial plane was 1.69° ± 1.22°, and sagittal plane was 1.85° ± 1.66°. The potential effects of the uncertainty in the measurement process can be seen in Fig. 4.

Fig. 4
figure 4

a Target screw location (red) in relation to the implanted screw (outlined in blue) when looking from the coronal plane into the pedicle region with the average ML and SI deviation for the entire cohort shown. b Fusion uncertainty (dashed red) associated with the portion of the measurement process that involves fusing the preoperative CT to the intra- or postoperative scan. c One standard deviation (green) of the ML and SI measurements of the entire dataset. The area inside of the green-dashed oval accounts for all variability in the measurement process

From this same subset of patients, the screw placement accuracies were calculated for each evaluator using the automated measurement system to quantify any interobserver variability occurring during the fusion process. The results show no statistical differences between any of the six metrics. The angular deviation in the sagittal plane was trending towards significance (p = 0.053). This shows that different evaluators performing the fusions do not significantly change the overall screw placement accuracy results, but the additional uncertainty the fusion process adds to the measurements should be considered.

The screw placement accuracies were compared for differences between left and right-side implants, center, spinal region, and procedure type. There was a significant difference between left and right screw implants in the SI deviation in the pedicle region and perpendicular deviation in the sagittal plane. There was a significant difference between the four centers in all metrics except the perpendicular deviation in the axial plane. The SI deviation in the pedicle region and perpendicular deviation in the sagittal plane are the two metrics that had significant differences between the spinal regions implanted. There were significant differences in the ML deviation and the perpendicular and angular deviations in the sagittal plane between percutaneous and open procedures (Table 2).

Table 2 Screw placement accuracy values (mean ± SD) for percutaneous and open cases

The accuracy values for the manual measurements and their comparison automated values for the subset of 40 implants are in Table 3. There was a statistical difference between evaluator 1 and both evaluator 2 and the automated measurements in the ML deviation in the pedicle region. There was a statistical difference between evaluator 2 and the automated measurements in the SI deviation in the pedicle region.

Table 3 Manual measurement screw placement accuracy values (mean ± SD)

The grading classifications for the 500 implanted screws were 356 A, 130 B, 8 C, 3 D, and 3 E. 486 screws (97.2%) were within the clinically acceptable range with a deviation less than or equal to 2 mm outside of the pedicle region. The primary direction a breach occurred in, reported for the 144 screws not graded as an A, was medial in 22.2% of cases, 37.5% lateral, 22.2% superior, and 18.1% inferior.

Discussion

The screw placement accuracies detailed in this study were calculated using an automated measurement system that can analyze screw accuracy as it relates to planned target location for multiple metrics in all anatomical views. The key difference between the new measurement system presented here and conventional grading scales is that grading scales measure the amount of screw outside of the pedicle, but not how much it deviated from the planned, optimal location for that specific patient. The two measures are not directly comparable, and a deviation over 2 mm using the automated measurement system does not directly equate to a C or worse rating according to the grading scale (Fig. 5). A placement that is clinically acceptable according to the conventional grading scale can still deviate significantly from the planned location, and therefore, might not be the ideal placement for that patient. This was further demonstrated by the 66.6% of total implants that had an accuracy to plan value less than or equal to the clinically accepted metric of 2 mm, compared to the 97.2% of acceptable placements according to the Gertzbein and Robbins classification [7].

Fig. 5
figure 5

Measurement differences between accuracy of the implanted screw (blue) in relation to the planned location (red) versus conventional grading scale metrics. The pedicle edge (green-dashed line) was used to judge placement outside of the pedicle region. a Categorized as A using grading scale but has a ML deviation of 3.21 mm away from the planned target location. b Grading scale category B with a ML deviation of 7.38 mm from the planned location. c Grading scale category C and a ML deviation from the target trajectory of 5.14 mm

Previous studies looking at screw accuracy, both using a grading scale or comparing directly to the planned screw location, have utilized manual measurements, whereas this study used an automated measurement algorithm. The automated measurement algorithm removes human variance after the fusion step, which is a required step for all comparisons of implanted locations to robotic preoperative plans. The benefit to eliminating human input was illustrated by the significant difference between the ML and SI deviation values in the pedicle region between the manual measurements and the automated measurement values, (Table 3) particularly since those are the most clinically relevant metrics. The automated measurement system can also more easily and consistently quantify large cohorts.

A potential source of variability in accuracy measurements is the deviation that occurs during fusion of the preoperative plan to the intra- or postoperative scan. Fusion is the only manual part of the measurement process, so the uncertainty was quantified to better understand the limits of screw placement accuracies (Fig. 4). The significance of the fusion uncertainty was tested on a subset of 40 implants and was shown to not have a significant difference on the final placement accuracy values. This variability could account for why the accuracy values in the ML and SI directions within the pedicle region are greater than the robotic system trajectory accuracy of 1.5 mm [20]. The navigation camera used with the guidance system has a spatial accuracy of 2 mm [20], which also adds variance to the accuracy quantified in this study because the camera was assumed to be in the correct orientation.

Previous studies have compared the accuracy of implanted screws to the robotic preoperative plan [3, 28]. One study measured entry point deviation and axial and lateral angular deviation on 178 screws in 63 patients [28]. The average angles measured in this study for the angular deviations in the axial and sagittal planes were higher than those reported previously [28] (2.40° compared to 2.2° and 4.21° compared to 2.9°), but this difference did not have a direct impact on improper screw placement within the pedicle. A second study performed the analysis on 646 screws in 139 patients but only measured deviation in the axial and sagittal planes based upon entry and exit point deviation [3]. The study presented here also includes the ML and SI deviation in the pedicle region, which are key clinical metrics.

There were statistical differences in multiple metrics between the left and right-side implants on a single vertebrae, spine region, procedure type, and center. The difference in accuracy between implants on the same vertebrae could be caused by artifact from the first screw when looking at intraoperative images. Differences between spinal regions could be due to the ease of access to specific vertebrae and the angles necessary to accurately implant the screws. Previously, there was no significant difference found between deviations in the thoracic, lumbar, and sacral regions [28], which is not the case in this study, but there were significantly more implants in the lumbar region than the other two spine regions, particularly the thoracic. The implants in the thoracic region were only statistically different from the sacral implants for the SI deviation metric, but additional screws would be needed to confirm the significance of this difference. The higher accuracy seen in the procedures performed open instead of percutaneously could be explained by the increased visibility of an open procedure, as well as the screw being inserted through less tissue, which could lead to slight changes in the angle at which the screw is implanted into the vertebrae.

Accuracy differences between centers can be attributed to a variety of factors including length of time using the robot because a long training curve has been established for robot-guided procedures [2, 9] and variability in the cases performed between centers including spinal region implanted. The difference between centers can also be attributed to the difference in imaging used for the accuracy measurements. One of the four centers used postoperative CT imaging that was taken approximately one year after surgery while the other three used intraoperative O-Arm images from the day of surgery. It has been shown that screw loosening is a common complication after spinal surgery that can occur in anywhere from 1 to 60% of cases depending on the bone density of the patient [5]. Loosening was quantified for the 32 patients (125 screws) with postoperative CT images based upon the presence of a radiolucent zone around the implanted screws [18]. It was found that 4.8% had a radiolucent zone of less than 1 mm, 1.6% had a radiolucent zone of greater than 1 mm, and 93.6% had no sign of loosening. The effect of bone mineral density on loosening could not be determined as it was not collected for these patients. The average placement accuracy of the 375 implants with intraoperative image, excluding the postoperative CT scans, was 1.63 ± 1.19 mm in the ML direction and 1.39 ± 1.18 mm in the SI direction. It is unknown if the variation in placement accuracy between intraoperative O-Arm imaging and postoperative CT scans is due to the difference in imaging modalities or the length of time that passed after surgery until the CT was taken. An additional difference between the centers is that one used both divergent (medial-to-lateral) and convergent (lateral-to-medial) approaches while the other three used only convergent approaches. Regardless of the approach used though, both divergent and convergent approaches had the same percentage of implants that breached the pedicle.

This study was limited by minor manual input during the fusion process of overlaying the preoperative plan onto the intra- or postoperative scan, which trended towards having interobserver variability in the sagittal plane. This could be due in part to the variability of the intra- or postoperative images since some centers took intraoperative O-Arm images and others used postoperative CT scans. Future studies should involve automating the image fusion process and using the same imaging modality taken at consistent times to reduce the number of factors that can impede placement accuracy. In addition, the sample sizes in the thoracic and sacral spinal regions were limited, and future work should include larger cohorts to verify the differences observed here between regions. The accuracy values were also not related to any complications in the operating room or clinical outcomes of the patient postoperatively, as these data were not available, but could be included in future analyses of screw-to-plan accuracy.

This study used a novel automated measurement system to analyze the robotic accuracy of the Mazor X Stealth Edition robotic-guidance system using six metrics that analyze the screw placements from all three anatomical views. These metrics were determined by directly comparing the final implanted screw to the planned, ideal location for that patient, compared to conventional grading scales that require subjectivity in determining deviations only in the pedicle. Implementing an automated measurement algorithm ensured measurement consistency across centers and regions. The uncertainty associated with the alignment of preoperative and intra- or postoperative scans has been quantified and can be used as a metric when analyzing placement accuracy values. This was demonstrated across four surgical centers in 500 implanted screws.