Background

Conventional thyroidectomy using Kocher’s transcervical incision is the standard operative method, with low morbidity and mortality in a variety of benign and malignant thyroid diseases [1]. However, this technique leaves a scar on the neck that causes patients psychological distress. Therefore, many surgeons have tried to reduce or remove this scar by minimizing the incision or using an extra-cervical approach. Various operative methods using endoscopic or robotic systems have recently been introduced for thyroidectomy [2], offering good aesthetic results and less scarring and psychological distress compared to conventional thyroidectomy [3, 4].

In 2000, the da Vinci® surgical robot system (Intuitive Surgical, Sunnyvale, CA, USA) was approved by the US Food and Drug Administration (FDA) for surgical procedures. This system allows the surgeon to perform an operation by manipulating the robot arms from a seated position at a console panel, without the need for the surgeon to hold and manipulate an endoscope or other surgical instrument. Thus, the three-dimensional (3D) operative field is magnified using motion filtering and, most importantly, the endo-wrist technology improves the ergonomics and makes it easier for the surgeon to perform complex operative procedures [5]. As this generation of robotic systems evolves, reducing tremors and improving fine motor control of surgical instruments, surgical robotic systems have been used to develop remote access, minimally invasive, approaches to thyroidectomy without neck scarring [6, 7].

Despite the distinct advantages of robotic systems, there were several studies with similar technical and oncological outcomes of robotic thyroidectomy comparable to open thyroidectomy [8], but there are very limited data describing the functional outcomes of robotic thyroidectomy without cosmetic effects. Although previous studies have evaluated postoperative voice symptoms [9, 10], functional recovery outcomes after robotic thyroidectomy have not been thoroughly evaluated. Moreover, there was no study to determine if there was a difference in recovery over time according to the surgical methods. The aim of this study, from surgical and oncological outcomes to functional outcomes over time perspective, was to identify whether robotic total thyroidectomy (RTT) has different surgical outcomes compared to open total thyroidectomy (OTT).

Methods

Patient selection

Initially, 796 patients who underwent total thyroidectomy by three high-volume surgeons using both robotic and open methods at Haeundae Paik Hospital between July 2010 and December 2015 were reviewed. This retrospective cohort study was approved by the institutional review board of Inje University Haeundae Paik Hospital.

For patients with DTC, the inclusion criteria for robotic thyroidectomy at our institute at that time were: tumor size less than 4 cm, even with clinically suspicious extrathyroidal extension (ETE; cT3); no lateral lymph node metastasis (cN1a); no other organ invasion; and no distant metastatic disease (cM0). The indications for total thyroidectomy were: tumor size larger than 1 cm, with suspicion of ETE (cT3) or other organ invasion (cT4) in preoperative imaging studies, clinically apparent metastatic disease to lymph node metastasis (cN1), or cytological/histological confirmation or high suspicion of distant metastatic disease in imaging findings (cM1). For tumors that were located in both thyroid glands, a personal history of radiation therapy to the head and neck, and familial differentiated thyroid cancer, total thyroidectomy was always performed with informed consents.

In total, 796 patients were assessed for eligibility and 178 patients who were ineligible for analysis were excluded based on the following criteria: benign disease, non-differentiated thyroid cancer, advanced stage unsuitable for robotic surgery (≥T4, N1b, or M1), endoscopic operation, or combined with other operations. Of the 618 patients who were included in this study, 495 patients in the open group and 123 in the robotic group were included for propensity score matching analysis (Fig. 1).

Fig. 1
figure 1

Flow diagram of patient selection

Operative procedure

The key surgical techniques of bilateral axillo-breast approach (BABA) robotic thyroidectomy used in this study have been described in a previous report [11]. The patient’s position and anesthesia method were the same as the method of operation. The key procedures in RTT are the same as those in OTT, except for flap dissection before robot docking.

Data collection

A Web-based thyroid database maintained in the thyroid center at Haeundae Paik Hospital was used to collect data prospectively on patients who had undergone thyroidectomy since 2010.

The surgical outcomes included surgical, oncological, functional safety. Pathological, clinical, and complication results were collected to evaluate surgical safety. Operation time and intraoperative bleeding amount were recorded by anesthesiologist. Operation-related complications such as bleeding, seroma, chyle leakage, and infection were recorded during the hospital stay and in the outpatient clinic.

The results of radioactive iodine (RAI) treatment and follow-up data were analyzed to evaluate oncological safety. We evaluated oncological safety using successful remnant ablation rates and response-to-therapy reclassification for the patients with total thyroidectomy according to the 2015 American Thyroid Association (ATA) management guidelines [12]. Successful remnant ablation was defined as undetectable stimulated serum thyroglobulin (sTg < 0.15 ng/mL) in the absence of interfering thyroglobulin antibodies (Tg-Ab < 60 U/mL) via radioimmunoassay methods, with or without confirmatory nuclear or other imaging studies. An alternative definition was used in cases in which Tg-Ab was present in the absence of visible RAI uptake on a subsequent diagnostic RAI scan. The concept of four response-to-therapy categories analyzed here was described by Tuttle et al. [13]. According to the 2015 ATA guidelines, an excellent response is defined as having no clinical, biochemical, or structural evidence of disease. Biochemical incomplete response is defined as abnormal sTg or rising Tg-Ab levels in the absence of localizable disease. Structural incomplete response is defined as persistent or newly identified loco-regional or distant metastasis. Indeterminate response is defined as nonspecific biochemical or structural findings that cannot be confidently classified as either benign or malignant. This includes patients with stable or declining Tg-Ab levels without definitive structural evidence of disease.

The recurrent laryngeal nerve (RLN) palsy and postoperative hypoparathyroidism were analyzed to evaluate functional safety. Postoperative hypoparathyroidism was defined as both hypocalcemic symptoms/signs and a low intact parathyroid hormone (iPTH) level < 15 pg/mL. Postoperative hypocalcemic symptoms included tingling sensation, numbness, and tetany of the hands, feet, or perioral area. Hypocalcemic signs included Chvostek’s or Trousseau’s signs. Recovery of postoperative hypoparathyroidism was defined as both symptom/signs resolved without calcium or active vitamin D supplementation and normalization of iPTH level. The RLN function was observed in all patients who underwent thyroidectomy by indirect laryngoscopy preoperatively, 2 weeks postoperatively, and then followed postoperatively at 1, 3, and every 6 months until recovery. The RLN palsy was defined as paralysis or paresis of the vocal cord.

Statistical analysis

Comparisons of clinical and pathological characteristics between the OTT and RTT groups were conducted using Student’s t test for continuous data and the chi-square test or Fisher’s exact test for categorical data. Kaplan–Meier curve estimates using log-rank statistics were analyzed to compare the recovery of hypoparathyroidism and RLN function.

To remove the effects of selection bias resulting from non-randomized treatment assignment, propensity score matching analysis adjusted for seven clinicopathological characteristics (sex, age, body mass index, extent of central node dissection, tumor size, extrathyroidal extension, and thyroiditis) was conducted a maximal 2:1 matching ratio. The validated predictors that may influence oncologic safety (sex, age, tumor size, and extrathyroidal extension) [14] and technical safety (obesity and thyroiditis) [15] were adjusted. Assuming a mean difference of the parathyroid function recovery time of 10 days and standard deviation of 30 days based on our data, statistical power of 80%, significance level alpha of 0.05, a two-sided t test, and a 1:2 allocation ratio of groups, a total of 107 patients for the RTT group and 214 patients for the OTT group would be required. Performance of 1:2 propensity score matching included as many patients as possible to achieve the highest statistical power, while maintaining an absolute standardized difference within 10%. In 1:1 matching, only 246 patients were included in the analysis. In 1:3/1:4 matching, the absolute standardized difference was over 10%, resulting in poor comparability. Within propensity score matching, covariates in the two groups were similarly distributed without the loss of a large number of observations; thus, propensity score matching controlled more than 90% of the bias due to the covariates used to estimate the score and increased the minimum necessary number of observations in the sample [16]. An absolute standardized difference of <10% suggested inconsequential residual bias and enabled a high level of causal inference (Fig. 2).

Fig. 2
figure 2

Absolute standardized differences in baseline characteristics before and after propensity score matching. BMI body mass index, CND central node dissection, ETE extrathyroidal extension

All statistical analyses were performed using SPSS ver. 23.0 (IBM Corp., Armonk, NY, USA) and R software ver. 3.4.0 (R Development Core Team, Vienna, Austria) for propensity score matching. All reported P values were two-sided. A P value < 0.05 was considered statistically significant.

Results

Demographics and clinical characteristics before and after propensity score matching

Propensity-based matching consisted of 246 patients in the OTT group and 123 matched patients in the RTT group. Before matching, patients in the RTT group were younger (40.8 ± 8.6 vs. 51.5 ± 11.5 years; P < 0.001), predominantly female (94.3% vs. 87.7%; P = 0.035), and had a lower BMI (23.0 ± 3.2 vs. 24.8 ± 3.9 kg/m2; P < 0.001). Tumor size was smaller in the RTT group than in the OTT group (0.78 ± 0.43 vs. 0.96 ± 0.74 cm; P < 0.012).

There were no differences in demographic or clinical characteristics between the two groups after matching with standardized differences, indicating a high degree of similarity in the distribution of covariates between the two groups (Table 1).

Table 1 Baseline characteristics of patients before and after propensity score matching

Comparison of the surgical safety after propensity score matching

Most of the histological results indicated papillary thyroid cancer in both groups (98.8% vs. 98.4%; P = 0.345). The numbers of retrieved lymph nodes and metastatic lymph nodes resected in the OTT and RTT groups were similar (8.23 ± 3.87 vs. 7.53 ± 4.08; P = 0.087 and 1.20 ± 2.19 vs. 0.95 ± 1.80; P = 0.270), respectively. Pathologic T and N stages and positive resection margin rates were similar in the two groups. There were no differences in laboratory tests in terms of the postoperative day 1 calcium, ionized calcium, or iPTH levels. While the mean operation time was longer in the RTT group (123.51 ± 32.63 vs. 198.39 ± 37.93 min; P < 0.001), the mean intraoperative bleeding amounts and postoperative hospitalization periods were similar in the two groups.

Postoperative hypoparathyroidism was observed in 35.8% and 31.7% of OTT and RTT patients (P = 0.438), and RLN palsy was observed in 13.0% versus 11.4% (P = 0.656), respectively. The permanent hypoparathyroidism and RLN palsy rates were 2.8% versus 1.6% (P = 0.723) and 3.3% versus 0.8% (P = 0.283). Although the incidence of these most important complications was lower in the RTT group, the difference was not statistically significant. In the OTT group, two cases (0.8%) of bleeding and three cases (1.2%) of seroma occurred postoperatively, whereas only two cases (1.6%) of seroma were observed in the RTT group. Postoperative hemorrhages were resolved by reoperation, and hematoma was resolved by repeated aspirations (Table 2).

Table 2 Comparison of the surgical results between patients treated with open and robot surgery after propensity score matching

Comparison of the oncologic safety after propensity score matching

Among the 369 patients treated with OTT or RTT and assessed for eligibility for RAI treatment, 177 (72.0%) OTT patients and 83 (67.5%) RTT patients underwent RAI ablation (P = 0.375) for remnant ablation or adjuvant therapy and the OTT and RTT groups were administered similar mean I-131 activity doses of 123.50 ± 28.41 versus 122.89 ± 25.78 mCi (P = 0.868), respectively.

At therapy initiation, there were no significant differences in mean TSH levels (74.87 ± 19.45 vs. 72.95 ± 21.22 μIU/mL; P = 0.472), sTg (1.08 ± 1.62 vs. 1.17 ± 3.37 ng/mL; P = 0.756), or Tg-Ab (65.59 ± 261.74 vs. 33.02 ± 92.11 ng/mL; P = 0.270) between OTT and RTT patients, respectively. The percentage of sTg below 1 ng/mL was higher in the RTT group (66.7% vs. 74.7%; P = 0.191), although the difference was not significant. There were no significant differences in RAI uptake on post-treatment whole-body scan (P = 0.683). The percentage of successful remnant ablation was higher in the RTT group than in the OTT group (55.9% vs. 67.5%; P = 0.077), but this difference was not significant.

On subsequent diagnostic scans at 6–12 months after RAI therapy, successful remnant ablation rates were 97.5% versus 98.7% (P > 0.999) in the OTT and RTT groups, respectively. According to the response-to-therapy restaging system classification, the RTT group had a superior response rate (83.0% vs. 86.7%; P = 0.600), but this difference was not significant (Table 3).

Table 3 Comparison of the oncologic results between patients treated with open and robot surgery at the time of post-therapy and subsequent diagnostic scan

Comparison of the functional safety after propensity score matching

The recovery times for temporary hypoparathyroidism ranged from 12 to 449 days in the OTT group and from 13 to 296 days in the RTT group. The recovery times for RLN palsy ranged from 39 to 357 days in the OTT group and from 35 to 181 days in the RTT group. The median recovery times of hypoparathyroidism differed significantly between the OTT and RTT patients [100 ± 16.20 (95% CI: 68.242–131.768) vs. 88 ± 33.09 (95% CI: 23.148–152.852) days; P  = 0.044]. Additionally, the median recovery times of laryngeal nerve function were significantly faster in the RTT group than in the OTT group [87 ± 32.40 (95% CI: 23.489–150.511) vs. 118 ± 49.50 (95% CI: 20.985–215.015) days; P = 0.002] (Fig. 3).

Fig. 3
figure 3

Kaplan–Meier estimates of the relationship between the recovery functions and surgical method (open vs. robot): a hypoparathyroidism (P = 0.044) and b recurrent laryngeal nerve (RLN) palsy (P = 0.002)

Discussion

Although the da Vinci® surgical robot system has several disadvantages, such as high cost, long operation times, and lack of tactile sensation [17], it has been widely used in several operations that require precise surgical maneuvers because of its significant advantages, such as a good geometric accuracy, stable and untiring, scale motion, diverse sensors in control [18]. The system provides a high degree of ergonomics to surgeons and is especially useful in procedures such as thyroid or prostate surgery in deep and narrow working spaces. The high cost is expected to be reduced at the end of the patent period with the appearance of new robotic device companies. Operating times may decrease with accumulation of surgical experience. The lack of tactile sensation is overcome by visual adaptation by experienced surgeons and will also be improved by further evolution of robotics engineering. The limitations of robotic surgery are expected to decline rapidly as surgical experience accumulates and robotics technology develops.

The hospital costs of 369 patients were obtained from the hospital data processing department of our institution. These costs included those related to hospitalization (i.e., injections and medication, blood test and imaging studies, admission charges, etc.), anesthesia, and equipment (i.e., robotic instruments, open instruments, sutures, hemostatic agents, anti-adhesive agents, etc.). When the exchange rate was set at $1 to ₩1100, RTT was found to be 17.6, 60.8, and 345.5% more expensive than OTT in terms of hospitalization, anesthesia, and equipment costs, respectively. A comparison of the total cost revealed an up-to-2.9-fold cost advantage of OTT, with an average of 123 consecutive patients undergoing RTT ($9736.43 ± 1052.75) and 246 consecutive patients undergoing OTT ($3326.44 ± 1032.11). However, the cost advantage was three times as much in 2010 ($9002.42 ± 330.93 vs. $3001.37 ± 1804.42), reducing to 2.3 times as much in 2015 ($10,838.69 ± 1682.88 vs. $4613.49 ± 700.94), and the gap has been steadily declining ever since.

Since BABA robotic thyroidectomy was introduced, it has been widely adapted for the surgical treatment of a variety of thyroid diseases, from benign to malignant, because of the distinct advantages of the method. Significantly improved cosmetic results after BABA robotic thyroidectomy compared to open thyroidectomy have been well demonstrated in previous study [19]. Because the key procedures of BABA robotic thyroidectomy after making a flap are the same as those of traditional open thyroidectomy, inexperienced users who are familiar with open surgery can easily learn the robotic method. Additionally, as the anatomical view in BABA robotic surgery is similar to that of open surgery, it is easy to perform a total thyroidectomy. Previous studies on BABA RTT reported no significant technical or oncological differences from OTT in terms of surgical completeness [20, 21]. As a result, there has been a tendency to expand the indications for the BABA method to include advanced thyroid carcinoma [22, 23]. In our institution, if not at an advanced stage (≥T4, N1b, or M1), robotic surgery could be indicated regardless of tumor size, ETE, or central lymph node metastasis (≤T3, N1a, or M0).

The RLN palsy and hypoparathyroidism are common and serious complications of thyroidectomy. These complications result in not only physical symptoms but also a reduced quality of life [24, 25]. Preservation of the RLN and parathyroid gland (PTG) is the most important procedure in thyroid surgery. However, maintaining not only the structure but also the function of the RLN and PTG can be challenging due to their resemblance to perithyroidal tissues, such as the thyroid vessels, connective tissues, fatty tissues, and lymph nodes. The RLN and PTG can also be damaged by the tumor itself. Regardless of cancer invasion, technically, they could also be damaged by direct ligation, dissection, indirect traction, compression, or thermal injury. To avoid technical injury to the RLN and PTG during thyroidectomy, it is important to differentiate the anatomy from the perithyroidal tissues and dissect via highly precise manipulation. A well-organized surgical procedure agreed with the operative assistant is required. The prevalence of hypoparathyroidism and RLN palsy after thyroidectomy is influenced by preoperative disease characteristics, the extent of resection, and surgeon experience [26, 27]. As in my study, other studies did not report any significant differences in RLN palsy or hypoparathyroidism rates according to the surgical methods [28, 29]. As a result, robotic thyroidectomy has been criticized for its longer operation time and high cost without definite superiority to open surgery, except in cosmesis.

However, this study is the first to report faster recovery of parathyroid and laryngeal function in robotic thyroidectomy within the context of a well-controlled comparative design. Similarly, robot-assisted radical prostatectomy has been reported to improve functional outcomes in urinary continence recovery [30]. Faster recovery results were a direct advantage of robotic surgery due to the delicate and highly precise manipulation of the robotic system without the aid of assistants. A robotic system with four arms enables surgeons to perform a procedure alone and may prevent unpredictable traction or compression injuries caused by assistants. Magnified 3D imaging may prevent direct or indirect injuries by distinguishing the RLN and parathyroid gland from perithyroidal tissues. Moreover, it is easier to preserve the RLN and the vessels of parathyroid glands because the magnified operative view is more parallel to the RLN and inferior thyroid artery.

This study had several limitations. First, reported recovery times were only an approximation of actual recovery times because we followed patients at regular intervals rather than consistently. Thus, it may be more accurate to define the recovery period rather than the recovery time. A review of the literature shows that the recovery time of RLN palsy varies widely. Recovery from RLN palsy may take 2 years, although most cases resolve within a year [31]. The time to recovery of functions would be influenced by outliers. After removing the effects of eight cases with delayed recovery of more than 1 year, no change in the median recovery time was observed. However, the level of statistical significance increased from P = 0.044 to P = 0.019 (Fig. 4). In addition, there was a significant difference in recovery time after excluding patients with a permanent loss of parathyroid and laryngeal function. The median parathyroid and laryngeal function recovery times were shorter in the RTT group compared to the OTT group [82 ± 32.23 (95% CI: 18.835–135.165) vs. 98 ± 5.85 (95% CI: 86.535–109.465) days; P = 0.011 and 87 ± 26.90 (95% CI: 34.283–139.717) vs. 103 ± 6.736 (95% CI: 89.799–116.203) days; P = 0.003] (Fig. 5).

Fig. 4
figure 4

Kaplan–Meier estimates of the relationship between the recovery of functions and surgical method (open vs. robot) after removing the effect of outliers: a hypoparathyroidism (P = 0.019) and b recurrent laryngeal nerve (RLN) palsy (P = 0.002)

Fig. 5
figure 5

Kaplan–Meier estimates of the relationship between recovery of functions and the surgical method (open vs. robot) after excluding patients with a permanent loss of parathyroid and laryngeal function: a hypoparathyroidism (P = 0.011) and b recurrent laryngeal nerve (RLN) palsy (P = 0.003)

Second, the surgeries included in this study were performed by three different surgeons. Therefore, there may have been individual differences in the operative methods and results. However, the operations were performed in my institution by the same surgeons for both robotic and open surgery. The surgeons completed the formal robot fellowship course over 1 year in the same institution, under the same supervisor using the same methods. Therefore, whereas there may have been individual differences, it can be assumed that there were no differences in the surgical methods.

Third, these surgeons had relatively little experience (7 years or less), although they were large-volume surgeons who performed more than 100 surgeries a year. As surgical experience accumulates, the surgical outcomes of both methods may improve and converge. Therefore, the faster recovery times associated with robotic thyroidectomy may be not because of an advantage of robotic surgery but because of the steep learning curve associated with robotic surgery [32].

As far as we are aware, there are no systemic reports on the learning curve of BABA robotic surgery. According to the previous study, the majority of robotic surgeons considered that robotic thyroidectomy required less than 50 cases to overcome the learning curve, while open thyroidectomy required more than 50 cases (81.3% vs. 56.3%, P = 0.023) [33]. Based on our experience, the use of magnified 3D imaging alongside robotic systems may help surgeons understand the small and complex anatomy surrounding the thyroid. Moreover, because the robotic device has short pivot points from the finger to the acting point compared to open or endoscopic surgical instruments, it may offer the most realistic movements, similar to those of the fingers. For inexperienced surgeons who have just completed formal education in both methods, BABA robotic surgery may be used as not only an alternative option but also the preferred method for quality control in thyroid surgery. To verify a definitive conclusion about the superiority of robotic thyroidectomy in terms of parathyroid and RLN function recovery, it may be necessary to carry out large-scale analyses of operation experiences.

Conclusion

There were no significant differences in surgical outcomes, in terms of surgical safety or oncological safety, between the OTT and RTT groups, except in mean operation times. However, the recovery times of hypoparathyroidism and RLN function were significantly shorter in RTT patients than in OTT patients. The delicate and highly precise manipulation of the robotic system, and the magnified 3D surgical view that is more parallel to the RLN and inferior thyroid artery, seems to offer slight advantages, especially for inexperienced surgeons, in terms of parathyroid and RLN function recovery. To verify a definitive conclusion about the superiority of robotic thyroidectomy in terms of parathyroid and RLN function recovery, further well-controlled studies may be necessary.