Introduction

Surgery is the standard therapy for many thyroid diseases. Resections of the thyroidal gland are frequently performed in general surgery. In the United States, the surgical volume reaches 80,000 thyroidectomies per year [1].

Due to the high vascularisation of the thyroid gland, hemostasis is one of the keys limiting morbidity and mortality in thyroid surgery. Besides bleeding, major potential sources of postoperative morbidity following thyroid resections are dysphonia and dysphagia owing to unilateral—or bilateral—recurrent and/or superior laryngeal nerve injury, hypocalcaemia due to parathyroid ischemia or unintended deprivation, postsurgical haemorrhage and neck hematoma [2, 3]. Other less frequent surgical complications include wound infection, postoperative pain due to brachial plexus stretching and unsatisfactory cosmetic results [4]. The risk of perioperative mortality or major disability after surgery is low: The peri-operative mortality and haemorrhage are <1 % and about 1 %, respectively, and the nerve palsy rates range between 2 and 6 % [5].

Although minimally invasive or video-assisted surgery is employed increasingly, the open surgical approach is still the standard of care. In recent years, new energised vessel sealing systems such as electrothermal bipolar-activated devices (e.g. LigaSure®, LS) or ultrasonic systems (e.g. UltraCision® or Harmonic Focus® devices, HS) have been applied to thyroid surgery with the aim of reducing blood loss, operating time and length of skin incision. The LS creates a seal using a combination of pressure and electrothermal energy to change the vessel wall structure and obliterate the lumen. A characteristic specific to the LS is the possibility to modulate the quantity of energy by applying appropriate pressure to the grip [6]. The HS uses mechanical energy (ultrasound) to simultaneously cut and seal vessels by denaturing and coagulating collagen fibres. Both types of devices have been tested in a number of randomised controlled trials (RCTs) since 2000.

Most of these individual RCTs were powered to detect a decrease in operation time as the primary endpoint. A meta-analysis will allow to better quantify the magnitude of the effect on operation time and to compare all three techniques with each other. Moreover, it may provide sufficient statistical power for detecting differences in safety outcomes. Although several reviews exist evaluating the potential superiority of energized devices in thyroid surgery [713], they each have limitations (e.g. regarding comparisons made and completeness of evidence that made an updated systematic review of the literature desirable).

The aim of the ENERCON systematic review and meta-analysis was to conduct a three-way comparison between energised vessel sealing systems (HS and LS) and conventional “clamp-and-tie” or traditional electrosurgical methods to evaluate operation time and post-operative complications in thyroid surgery.

Methods

This systematic review and meta-analysis was performed in accordance with the “Preferred Reporting Items for Systematic Reviews and Meta-Analyses” statement [14].

Search strategy

A systematic literature search was conducted independently by three authors (P.C., K.Go. and K.Gr.) according to the standards of the Cochrane collaboration. The following databases were searched: MEDLINE (via PubMed), Cochrane Library, EMBASE and ISI Web of Science. No language or time-period restrictions were applied. For MEDLINE, the Cochrane highly sensitive search strategy for RCTs was employed [15]. The search was carried out on 25 July 2012. An update on 12 December 2012 yielded no new studies. Other sources searched for relevant trials are reference lists of previous systematic reviews and included trials, journal homepages and publications citing included trials. Experts in the field of thyroid surgery were contacted to ensure that all relevant studies were included. The detailed search strategy used in MEDLINE is presented in Table 1.

Table 1 Search strategy in MEDLINE

Eligibility criteria

Publications related to the trials that met the following criteria were eligible for inclusion. RCTs without any language restriction comparing at least two of the following hemostasis techniques in open partial and/or total thyroidectomy: ultrasonic systems (UltraCision®, Harmonic Ace®, Harmonic Focus® or related systems, HS), electrothermal bipolar-activated vessel sealing systems (LigaSure® Precise, LF1212 or unspecified LigaSure instruments, LS) or conventional techniques for hemostasis (CH). Conventional hemostasis was defined as knotting and tying the vessels, applying vascular clips or using traditional electrosurgery. Trials evaluating methods for minimally invasive or video-assisted surgery were excluded. Eligibility was assessed for each publication by two authors (out of P.C., K.Go. and K.Gr.). Any disagreement was resolved by discussion.

Data Extraction

Data were extracted using a standardised electronic extraction sheet (available upon request).

The primary outcome parameter was total operation time, defined as the time from skin incision to skin closure. Secondary outcome parameters were as follows: postoperative mortality, intraoperative blood loss, weight of specimen, length of hospital stay, transitory and definitive laryngeal palsy (within or over 30 days after operation as has been used in most of the included articles), transient and persistent hypocalcaemia (within or over 90 days after operation), amount of drainage fluid (millilitres) 24 h after operation, rates of: hematoma/seroma (combined), reoperations and wound infections, as well as postoperative pain after 24 h (visual analogue scale) and any information on the cosmetic result.

The following baseline characteristics were recorded: study name, publication year, journal reference, country, funding, study design (number of participating centres, treatment arms, study duration, randomization, allocation concealment, blinding), participants (main inclusion and exclusion criteria, sample size; baseline data such as age, gender, body mass index, American Society of Anaesthesiologists (ASA) classification, thyroid function, type of thyroid disease, concomitant treatment with non-steroidal anti-inflammatory drugs or anticoagulants), interventions (intervention groups, type of surgery, surgical experience). In case of missing data regarding the primary outcome, corresponding authors were contacted via mail by P.C. and K.G. to gather further information.

If a study generated multiple publications, data were extracted from the most comprehensive. Additional publications were used to complete this information.

Quality assessment

The methodological quality of the included studies was assessed using the Cochrane Risk of Bias tool [16]. Random sequence generation was considered adequate if the allocation process was determined by a chance process and unpredictable (e.g. drawing techniques, random-number tables or computerised random number generation). Allocation concealment was judged to be satisfactory whenever clinicians and participants were unaware of upcoming assignments (e.g. centralised allocation, sealed envelopes). Recognising the difficulties of blinding the surgical team, performance bias was considered low if patients were blinded to the treatment arm. Most outcomes were considered objective, not requiring blinding of outcome assessment to avoid detection bias. Detection bias was considered low in subjective outcomes (pain, cosmetic satisfaction) if assessors were blinded to the treatment arm. Analyses were considered adequate if all recruited patients were analysed in the group to which they were originally allocated, regardless of the treatment received (intention-to-treat principle (ITT)). Outcome data were considered complete (low attrition bias) if analysis was performed according to the ITT principle or if there were explicitly no withdrawals or patients lost to follow-up. Selective reporting was assessed by comparison of reported endpoints with abstracts and previously published study protocols and by whether variance measures were reported for each outcome. Other important sources of bias considered were the role of funding and the relative experience of the operating surgeons in each group.

Statistical analysis

For the primary endpoint “total operation time,” we used a Bayesian random effects model for multiple treatment comparison with minimally informative prior distributions. It preserves the comparison of randomised treatments within each trial while combining all available comparisons between treatments and accounts for multiple comparisons within a trial when there are more than two treatment arms [17]. The model included random effects at the level of trials to account for variation between trials due to clinical heterogeneity. Whenever possible, we used results of intention-to-treat analysis including all randomised patients. Pooled effect sizes were estimated from the mean or the median, as specified, of the posterior distribution. A positive effect size of one treatment versus another indicates a benefit in operation time of the latter treatment. The 95 % credible intervals were estimated from the 2.5th and 97.5th centiles of the posterior distribution. Credible intervals in Bayesian statistics can be interpreted in a similar way to conventional confidence intervals in frequentist statistics.

In the multiple treatment comparison model, heterogeneity was estimated from the median standard deviation between trials (τ) observed in the posterior distribution with a uniform distribution (τ unif(0,5)) as prior distribution. For all trial baselines and treatment effects, vague priors (normal (0, 10,000)) were used. The total residual deviance and the deviance information criterion (DIC) were given as goodness-of-fit measures. The consistency assumption was tested by the method of Bucher et al. [18] comparing the effect size from direct comparisons within randomised trials and the effect size from indirect comparisons between randomised trials with one intervention in common. All Bayesian analyses are based on 150,000 iterations, of which the first 50,000 were discarded as burn-in period. Convergence of Markov chains was checked by the Brooks–Gelman–Rubin statistic.

Finally, we performed pairwise meta-analyses with random effects at the level of trials for primary and secondary endpoints. Each pairwise meta-analytical comparison was restricted to the corresponding trial results irrespective of whether a third treatment arm was investigated. For the primary endpoint, subgroup analyses by sort of thyroidectomy (total, partial or total, partial), by sponsor (industry-sponsored, investigator-initiated, unclear sponsor) and by surgical experience (balanced, high risk for unbalance, unclear experience) were added.

For continuous endpoints, the effect size per trial was calculated by the mean difference (MD) between treatment groups and pooled as the weighted mean difference (WMD) with 95 % confidence interval using the inverse variance method. If in case of continuous data, estimates for mean and standard deviations (SD) were not reported, we used the methods by Hozo et al. [19] to convert median and range estimates into mean and SD. For dichotomous secondary endpoints, the risk difference (RD) was chosen as effect size measure per trial due to sparse data and pooled by the Mantel-Haenszel method [15]. All results were investigated for clinical and statistical heterogeneity. Clinical heterogeneity was explained where appropriate and possible. Statistical heterogeneity was explored by I 2 statistic.

We used WinBUGS (Version 1.4, MRC Biostatistics Unit 2003, Cambridge, UK) for multiple treatment comparisons and RevMan (Version 5.1, Nordic Cochrane Centre 2011, Cochrane Collaboration, Copenhagen) for pairwise meta-analyses.

Results

The electronic and manual literature searches identified 116 potentially eligible titles and abstracts. The study flow and reasons for exclusion are detailed in Fig. 1. Overall, 39 publications with the results of 35 trials were included in the qualitative and quantitative data synthesis.

Fig. 1
figure 1

Number of abstracts and articles identified and evaluated

The clinical characteristics of included trials are presented in Table 2. All were published between 2000 and 2012. One multicenter trial was retrieved [42], and in two cases, a university centre with satellite hospital was involved [30, 51]. Trials were performed worldwide, with an emphasis on Italy (14 trials). The majority of publications were in English, four in Italian [27, 33, 49, 55], and one each in German [50] and French [21]

Table 2 Clinical characteristics of included trials

Out of 4,061 randomized patients, the percentage of females ranged from 59 to 94 % and the mean age from 40 to 56 years (Table 3). Fifteen trials were limited to patients with benign disease; eighteen trials included patients with both benign and malignant thyroid diseases, and two trials included only patients with papillary carcinoma. Twenty trials dealt only with total thyroidectomies, and in three trials, this was combined with central neck or lymph node dissection (CND). Exclusively, partial thyroidectomies were performed in one trial. In the remaining 11 trials, both total and partial thyroidectomies were performed. Information on body mass index, ASA status, thyroid function and use of non-steroidal anti-inflammatory drugs and anticoagulants was not provided to a sufficient extent for further data analysis.

Table 3 Baseline characteristics of included patients

Risk of bias

The quality of included studies varied in terms of sample size, allocation concealment and blinding as well as of other sources of bias (Fig. 2, Electronic supplementary material Figure 1 and Electronic supplementary material Table 1). Overall, the quality of reporting was low, with the majority of studies revealing few methodological details.

Fig. 2
figure 2

Quality assessment using the Cochrane risk of bias tool

Less than a third reported adequate methods of allocation concealment. Blinding of participants was not stated in 26/35 trials; the remainder reported patient blinding. In only one study [37], the observer was also blinded for outcome assessment. The risk of detection bias was considered low for endpoints other than pain scores and cosmetic satisfaction because of their objectivity.

Surgical experience was described in the majority of studies (27/35, 77 %). In 21 studies, a limited number of experienced surgeons performed all procedures, so that bias from this source is likely to be low. Junior surgeons participated in six trials. In two of these [20, 50], randomisation was stratified by surgical experience, and the resulting intervention groups were balanced with respect to senior and junior surgeons. In the other four, risk of bias is high because the junior surgeons may have tended to perform the conventional procedure more often. Finally, no information on surgical experience was provided in eight trials. The influence of both funding source and surgical experience on the primary outcome was investigated in subgroup analyses as detailed below.

Primary outcome: operation time

Operation time was generally defined as the time from first skin incision to skin closure.

Network meta-analysis for primary outcome

Out of 35 trials, 34 trials were included in the network-analysis for the primary endpoint. Marchesi [24] had to be excluded because of a dichotomisation of operation time. Three trials had a three-arm parallel group design (CH vs. HS vs. LS, 303 patients), the remainder a two-arm parallel group design (21 trials: CH vs. HS, 2,371 patients; 7 trials: CH vs. LS, 730 patients; 3 trials: HS vs. LS, 471 patients) (Fig. 3).

Fig. 3
figure 3

Evidence network for primary endpoint ‘operation time’. CH = conventional hemostasis, HS = harmonic scalpel, LS = LigaSure

The results of the multiple treatment comparison are presented in Table 4. Heterogeneity between trials was high in relation to the mean treatment effects (τ = 4.91 between-trial standard deviation in the consistency model, τ = 4.87 in the inconsistency model). In other words, most of the study effects (95 %) ranges twice the between-trial standard deviation, here around ±9.8 min, in both directions from each estimated treatment effect given in Table 4. The Bucher test for inconsistency revealed a significant difference between the direct comparison HS vs. LS and indirect comparison HS vs. LS derived from the comparisons CH vs. HS and CH vs. LS (p = 0.013). In consequence, the results of a consistency model [59] as well as an inconsistency model [60] are given. In contrast to the consistency model, the inconsistency model takes a possible discrepancy between the direct and the indirect estimated treatment effect of HS vs. LS statistically into account.

Table 4 Primary outcome: operation time [in minutes]

The mean treatment effect of CH versus HS and CH versus LS of both models were concordant with a reduction in operation time of around 22 min under HS or of around 13 min under LS in comparison to CH, respectively (CH versus HS—22.26 with 95 % credible interval 19.87 to 24.65 in the consistency model, 22.7 with 95 % credible interval 20.23 to 25.17 in the inconsistency model; CH versus LS—13.84 with 95 % credible interval 10.27 to 17.39 in the consistency model, 12.18 with 95 % credible interval 8.029 to 16.32 in the inconsistency model).

The mean treatment effect of HS versus LS differed substantially between the two models. In mean, a thyroidectomy under HS took around 8 min less than under LS, according to the consistency model and around only 2 min less according to the inconsistency model (HS versus LS, −8.42 with 95 % credible interval −12.14 to −4.73 in the consistency model, −2.45 with 95 % credible interval −9.484 to 4.595 in the inconsistency model). The total residual deviance and the DIC as measures of model fit gave no hint of which model is more appropriate.

Pairwise meta-analysis for primary outcome

Comparison of CH with HS

All trials comparing CH with HS (2,573 patients) demonstrated a significant reduction of operation time with HS. The pooled estimate was 23.6 min (95 % CI, [19.5, 27.6]; P < 0.001; 24 studies, I 2 = 89 %; Fig. 4) in correspondence with the results of the network meta-analysis. The mean difference between the treatment arms ranged from 6.7 to 47.3 min throughout studies.

Fig. 4
figure 4

Meta-analysis for the primary outcome ‘operation time’; RCTs directly comparing conventional hemostasis (CH) to harmonic scalpel (HS) in thyroidectomies

In subgroup analyses for total thyroidectomies, the mean reduction of operation time was 24.9 min (95 % CI, [19.9, 29.9]; 20 trials, I 2 = 91 %), in trials with partial or total thyroidectomies 18.6 min (95 % CI, [9.8, 27.4]; 4 trials, I 2 = 67 %) and in partial thyroidectomies 17.4 min (95 % CI, [14.1, 20.7]; 3 trials, I 2 = 0 %). A sensitivity analysis excluding three-arm studies provided similar results.

A subgroup analysis by sponsor revealed that industry-sponsored trials had a significantly lower mean difference in operation time than investigator-initiated trials (16.8 vs. 41.9 min, P < 0.001 for subgroup differences, Fig. 5).

Fig. 5
figure 5

Sensitivity analysis for the primary outcome ‘operation time’ by sponsor (CH vs. HS)

When the primary outcome was analysed in subgroups of low, high or unclear risk of imbalances in surgical experience, no differences were detected (P = 0.400 for subgroup differences, Fig. 6); the use of HS led to shorter operation times in all subgroups.

Fig. 6
figure 6

Sensitivity analysis for the primary outcome ‘operation time’ by surgical experience (CH vs. HS)

Comparison of CH with LS

There were 882 patients in the intervention groups comparing CH with LS. The meta-analysis provided a significant reduction in operation time by 13.0 min (95 % CI, [6.3, 19.7]; P < 0.001; 10 trials, I 2 = 87 %; Fig. 7). The mean difference between the CH and LS groups ranged from −11.0 min to +32.4 min in individual studies.

Fig. 7
figure 7

Meta-analysis for the primary outcome ‘operation time’; RCTs directly comparing conventional hemostasis (CH) to LigaSure (LS) in thyroidectomies

In the subgroup for total thyroidectomies, five included studies contributed to a computed mean difference of 16.4 min (95 % CI, [7.1, 25.6]; P < 0.001; 5 trials, I 2 = 91 %). Subgroup analysis for partial thyroidectomies was not possible as the subgroup was composed of only one study [52]. Our analysis of their data showed a significant reduction of operation time by 15.0 min (95 % CI, [8.3, 21.7]; Fig. 7). Interestingly, in the remaining studies, in which partial and total thyroidectomies are not differentiated, the result calculated for operation time was below 10 min and not statistically significant.

A sensitivity analysis excluding three-arm studies did not substantially affect the results. The sponsor was unclear in the majority of trials for this outcome (7/10), so that no subgroup analysis could be performed by sponsor. The saving of operation time by using LS was significantly lower in trials with a low risk of unbalanced surgical experience than in the others (low risk, 5.4 [0.200, 10.6], 6 trials, I 2 = 65 %; high risk, 15.000 [8.3, 21.7], 1 trial; unclear risk, 30.2 [24.0, 36.3], 3 trials, I 2 = 0 %; P < 0.001 for subgroup differences; Fig. 8). However, because only a single study with high risk of bias due to surgical experience was included, the validity of this result remains unclear.

Fig. 8
figure 8

Meta-analysis for the primary outcome ‘operation time’ by surgical experience (CH vs. LS)

Comparison of HS with LS

The comparison of HS with LS comprised 673 patients in the relevant intervention groups.

The meta-analysis provided a reduction in operation time by 9.3 min when using HS (95 % CI, [−17.8, −0.8]; P = 0.032; 6 studies, I 2 = 91 %; Fig. 9).

Fig. 9
figure 9

Meta-analysis for the primary outcome ‘operation time’; RCTs directly comparing harmonic scalpel (HS) to LigaSure (LS) in thyroidectomies

A sensitivity analysis was performed excluding the Dionigi trial [57], which was the only study employing the new Ligasure instrument LF1212. When this major source of clinical heterogeneity was removed, both the overall result and the “total thyroidectomy” subgroup results were statistically significant: For total thyroidectomies, the use of the LS was slower by 7.1 min than of the HS (95 % CI, [−11.1, −3.0]; P < 0.001; 2 studies, I 2 = 0 %), and overall, the LS led to an operation time reduction by 12.3 min [−20.4, −4.2]; P = 0.003; 5 studies, I 2 = 82 %). A sensitivity analysis excluding three-arm studies was not possible due to the low remaining number of trials, nor was a subgroup analysis for surgical experience because 5/6 trials were performed by experienced surgeons.

Several pooled results for the pairwise comparisons of the primary outcome have to be used with caution because their statistical heterogeneity is really high (maximum 91 %). Nevertheless, there is no change in the direction of the treatment effects for the comparison of CH versus HS (Fig. 4); for only two out of ten trials, there is a change in direction for the comparison CH versus LS (Fig. 7), and for only one out of six trials, there is a change in direction for the comparison HS versus LS (Fig. 9).

Publication bias

Electronic supplementary material Figures 2, 3 and 4 present the corresponding contour-enhanced funnel plots plotting the WMD against the precision of the study (standard error of the WMD) for pairwise meta-analysis. The shaded areas in the funnel plots indicate different levels of statistical significance (<0.01, <0.05, <0.1) as an aid to differentiating asymmetry due to publication bias from that due to other bias factors. Only for the comparison conventional techniques versus Harmonic scalpel the Egger’s test revealed significant asymmetry, stipulating a significance level of 10 % (CH vs. HS, P = 0.055; CH versus LS, P = 0.308; HS vs. LS, number of trials too small to test for small study effects). No non-significant trials can be found in the area without shading in the contour-enhanced funnel plot, and therefore, it can be assumed that the asymmetry is caused by publication bias based on statistical significance. The trim-and-fill method revealed pooled weighted mean differences without contradicting the above given results (CH vs. HS, MD 18.0 with 95 % CI 13.8 to 22.5; CH vs. HS, MD 9.1 with 95 % CI 2.3 to 15.8; HS vs. LS, no trim-and-fill method available).

Secondary Endpoints

The pairwise meta-analytical results for all secondary endpoints are presented in Table 5 (CH vs. HS), (CH vs. LS) and (HS vs. LS).

Table 5 Meta-analysis of the secondary outcome measures for RCTs

Mortality

No deaths were reported in any included RCT.

Intraoperative blood loss

The mean intraoperative blood loss was generally measured by weighing the gauzes [21, 26, 27, 30, 32, 38], by measuring the fluids collected intra-operatively via suction (Pons 2009), or by a combination of both [53]. The method used was not defined in Voutilainen 2000.

Mean intraoperative blood loss ranged from 21 to 268 mL throughout studies. The pooled value was lower by 28.5 mL for HS compared with CH (P < 0.001) but not significant for LS compared with CH (P = 0.520) and HS compared with LS (P = 0.448). In a sensitivity analysis comparing HS to CH and excluding three-arm studies (Pons 2009, Sartori 2008), the magnitude of the effect was slightly larger (34.9 mL; 95 % CI [17.4, 52.3]; P < 0.001; I 2 = 84 %).

Statistical heterogeneity was high for all three comparisons.

Weight of specimen

Wherever possible, the weights of the thyroid tissue surgically removed was recorded and analysed in order to assess any bias arising in the respective treatment groups. The weighted mean of specimens amounted to 56 g throughout published reports. There were no differences between the intervention groups (P = 0.754 for CH vs. HS in 9 studies, 0.566 for CH vs. LS in 2 studies and no data available for HS vs. LS).

Length of hospital stay

The weighted average for the length of hospital stay was 2.7 days for conventional hemostasis procedures. It was lower by 0.28 days for the HS compared with the CH group and did not vary within the other two comparisons. There was a high degree of heterogeneity in this outcome. When three-arm studies were excluded, the result did not change.

Transitory and definitive recurrent laryngeal nerve palsy

Transitory and definitive recurrent laryngeal nerve palsy were common complications, with incidences throughout the CH groups of 4 in 100 patients (66/1,702) and 1 in 100 patients (8/552), respectively. No difference in risk was observed in the three intervention groups, and the results were highly homogeneous (I 2 = 0 % for each comparison).

Transient and persistent hypocalcaemia

A variety of definitions of postoperative hypocalcaemia are in use [61], and the lack of standardisation will introduce bias for this endpoint. Examples for definitions of postoperative hypocalcaemia in the publications include “numbness in the lips and hands, Chvostek or Trousseau sign, carpopedal spasm”, “ionized calcium <8.2 mg/dL [or 1.14 mmol/L], associated with a positive Chvostek sign or patient complaint of paresthesia”, or “need for calcium substitution”). In many papers, postoperative hypocalcaemia was not clearly defined. We pooled results only for trials reporting clinically symptomatic, rather than biochemical hypocalcaemia.

Transient hypocalcaemia was very common, with close to two in ten patients affected (272/1,560, 17.4 %). However, only about a tenth of these led to persistent hypocalcaemia (14/552, 2.5 %). The meta-analytical data point to a marginally higher risk of transient hypocalcaemia with CH compared with HS (RD 0.04; 95 % CI [−0.00, 0.07]; P = 0.066; 24 studies, I 2 = 62 %). In a sensitivity analysis excluding three-arm studies, this result became statistically significant (RD 0.05; 95 % CI [0.01, 0.09]; P = 0.025; 21 studies, I 2 = 61 %).

Still, the combined data from five studies with serum calcium measurements detected no difference for CH compared with HS (WMD −0.004 mmol/L).

No differences in the rate of persistent hypocalcaemia were detectable.

Postoperative bleeding

The amount of postoperative bleeding as defined by the volume of fluids in the suction drainages after 24 h was significantly higher for CH compared with HS by 11.2 mL (endpoint reported in 15 studies; no major difference when excluding three-arm studies), whereas no differences were observed in the other comparisons. However, this did not translate to an increased need for re-operation due to bleeding for this comparison (RD 0.001; 95 % CI [−0.01, 0.01]; nine studies).

Hematoma/seroma

Hematoma and seroma were relatively common, with 4 in 100 patients in the CH groups affected (40/1013 patients, 3.9 %) and balanced throughout intervention groups.

Re-operation

About 1 in 100 patients in the combined CH groups required re-operation (10/1167, 0.9 %), in many cases due to post-operative bleeding. Again, the risk did not differ among treatment groups.

Wound infection

Wound infection was an uncommon complication (1/746 CH patients, 0.13 %), and no differences were observed for the three hemostasis methods.

Postoperative pain

Overall, six studies comparing CH versus HS assessed 24-h postoperative pain by a ten-point visual analogue scale. Because of the clinical heterogeneity of these studies, particularly the diversity of pain medication given, as well as the relative mildness of postoperative pain following this type of surgical intervention, the data for this endpoint were not pooled. For the other comparisons, data available were insufficient for pooling.

Cosmetic satisfaction

Cosmetic satisfaction was analysed only by Kilic et al. [29], therefore pooling was not possible. Kilic had explored the patient-reported cosmetic satisfaction on a ten-item visual analogue scale and found a marginally higher cosmetic satisfaction when patients were operated with the HS compared with CH (7.8 ± 1.1 vs. 7.2 ± 1.3, P = 0.04). However, no patient blinding was reported, and the results might be biased.

Three authors reported the incision length as a cosmetic outcome. The results were not pooled due to their low number. Frazzetta et al. found no difference between CH and HS (5.5 vs. 5.5 cm mean incision length) [27], nor did Lombardi (4.07 ± 0.77 vs. 3.96 ± 0.75 cm) [34]. Dionigi et al. [57] reported the incision length measured with a ruler on the day of discharge. This parameter was lower for the new LF1212 compared with HS (4.4 ± 1.1 vs. 5.4 ± 1.2 cm; P < 0.05) but may be subject to bias because the surgeons performing the incision were not blinded to the haemostatic technique employed [57].

Discussion

This meta-analysis has demonstrated that using energised vessel-sealing systems can significantly reduce operation time. Additionally, the use of HS was associated with several small-scale benefits, i.e. reduced intra- and postoperative blood loss, reduced rates of transient hypocalcaemia and postoperative pain as well as a reduced duration of hospital stay. While these improvements were marginal and not observed for the use of LS, the conventional technique was not superior in any outcome investigated. In particular, the clinically important safety outcomes of recurrent nerve palsy and rates of clinically symptomatic hypocalcaemia were not negatively affected by using any of the energised vessel systems. Nevertheless, the detected differences between the devices could be due to the different spectrum of use the devices offer. While the LS offer more time-consuming multiple-sealing approaches at the same vessel, the HS divides the tissue at the same time of coagulation. Harmonic and Ligasure devices have evolved over the years. In particular, the Harmonic has moved from CS 14 to Focus, while the Ligasure Precise now has a blade, which has led to considerable improvement in operating times. Moreover, information on other time-consuming procedures during thyroid surgery, like neuromonitoring, are not entirely included in the analyzed papers.

In contrast to our results, energised vessel sealing systems were described as a potential source of heat-related iatrogenic injury to adjacent structures, especially to the recurrent laryngeal nerve [62]. In our analysis, the percentage of recurrent nerve palsies was low (95%C.I., 3.4–5.7 % for transient recurrence palsies), which is probably partly due to the experience of the surgeons involved in the trials. Recurrence nerve palsies were homogeneously distributed in all three groups. The meta-analysis clearly demonstrates that these energized devices are not a source of RLNs.

Given that complication rates in thyroid surgery are rare, the included trials are underpowered to state a clear benefit in terms of safety of these sealing devices. Thus, future trials within this context should be powered on the basis of existing systematic reviews. Since the clinical relevance of operating time as a primary outcome can be questioned from a patient-centred point of view, upcoming trials should focus on patient-reported voice quality apart from nerve and parathyroid vasculature preservations rather than operating time.

Apart from that, reduction of operation time is surely a significant benefit for surgical practice. However, the saving of operation time has to be set in relation to higher material costs. Since personnel and material costs differ from country to country, every single institution has to evaluate the potential benefit of employing these devices.

Besides included RCTS our literature search revealed six systematic reviews with meta-analysis investigating haemostatic techniques in thyroid surgery [712]. Also, one Cochrane Review protocol (Tam 2010) was found, aiming at investigating hemostasis by using the Ligasure system or harmonic scalpel versus conventional vessel ligation. The protocol was registered on 25 February 2010, but no results have been published yet. However, the literature search in all of these reviews was not comprehensive, and with one exception [10], all investigated only pairwise comparisons. Yao included four randomized and five non-randomized trials; Zhang included one trial in the analysis that was clearly not randomized [63]. A recently published network meta-analysis [13] included all trials published till June 2012. In the analysis performed by Garas, a publication on minimally invasive thyroidectomies [64] was included, and although the inclusion criteria excluded other devices other than the Ligasure, a publication using the Starion© vessel sealing system was included [65]. Moreover, the primary endpoints of the meta-analysis are the rate of persistent hypoparathyroidism and rate of persistent recurrence palsies. However, the authors do not stratify the data according to the extent of surgery (partial versus complete thyroidectomy). Thus, a major source of bias is introduced since after a hemithyroidectomy the possibility of persistent hypocalcaemia is very low. Furthermore, other relevant surgical outcomes are not reported (i.e. transient hypocalcaemia and transient recurrence palsy, weight of the specimen, wound-infection rates, reoperation rates, cosmetic result and postoperative pain.

Due to incompleteness of relevant literature and lack of a well-founded interpretation of the results the review’s results should be read with caution. However, despite incomplete evidence, the conclusions of most of the reviews are mainly in accordance with our results regarding a significant reduction of operating time in favour of the energized devices over conventional haemostatic techniques for partial and total thyroidectomies.

Whether surgical experience was equally distributed among groups was not possible to assess in any case. A sensitivity analysis for surgical experience, however, did not show any significant difference. Voutilainen et al. provided an estimate of the amount of bias that could be caused by the differences in surgical experience among the intervention groups [20]. They observed that the gain of the HS could amount to 1.66 times the actual observed gain in the trial if the consultant endocrine surgeon exclusively uses the new method and the senior residents employ the conventional technique alone.

Due to the wide range of included trial participants, covering patients from both male and female genders, with typical age ranges, benign and malign pathologies, partial and total thyroidectomies and geographic locations spread over four continents, the external validity of these findings are expected to be high. Moreover, the included sample size should be adequate to allow generalisability of these results.

Conclusion

This study provides a quantitative three-way comparison of CH with HS and LS in thyroid surgery. The results showed a significant reduction of operation time of HS and LS compared with CH and a marginal benefit of HS for several safety outcomes. The postoperative morbidity is not affected by employing energised devices.

Moreover, the results of our review may be useful for high-volume centres performing many thyroidectomies a day whereas a time saving of 23 min is hardly relevant for institutions with a low operation volume.