Since the first clinical description of hybrid transvaginal cholecystectomy in 2007 [1], the concept of natural orifice translumenal endoscopic surgery (NOTES) has gained widespread publicity. Theoretical advantages of NOTES include decreased post-operative pain and morbidity such as wound infection and incisional abdominal wall hernias, as well as improved cosmesis. The idea of performing scarless surgery has appealed to many clinicians as the ultimate step in the evolution of minimally invasive abdominal surgery; however, these techniques have not yet reached widespread clinical adaptation. This is not only partly related to concerns regarding disinfection [2] and reliable closure of any potential enterotomy for transgastric or transcolonic NOTES [3] but also due to the fact that technological evolution of surgical tools and platforms is yet to catch up with our ideological innovation. Despite this, even complex abdominal interventions such as NOTES pancreatectomy have been shown to be feasible in the animal model [4] and pure NOTES procedures (with no trans-abdominal assistance) in humans, such as appendectomy [5] and ventral hernia repair [6], have been reported from specialist centers.

Due to the established safety of colpotomy as an access route transvaginal hybrid cholecystectomy is currently the most commonly performed clinical application. Publications of large patient series and case registries [7, 8] [9] have suggested transvaginal hybrid NOTES cholecystectomy (TVC) to be a safe procedure when performed by appropriately specialist minimally invasive surgeons in selected centers. There are now a number of randomized controlled trials and comparative studies comparing conventional laparoscopic cholecystectomy (CLC) with TVC. The aim of this systematic review and meta-analysis is to compare safety and clinical outcomes of CLC with TVC for the treatment of benign gallstone disease.

Methods

An electronic search was performed using Embase, MEDLINE, Web of Science and Cochrane Library (Issue 1, 2014) databases from 2000 to 2014. The search terms, ‘laparoscopy’, ‘cholecystectomy’, ‘transvaginal’, ‘natural orifice translumenal endoscopic surgery’, ‘NOTES’, and medical subject headings (MeSH) terms ‘Laparoscopy’(MeSH), ‘cholecystectomy’(MeSH) and ‘natural orifice endoscopic surgery (MeSH) were used in combination with Boolean operators AND or OR. Two authors performed electronic searches independently in January 2014. The reference lists of articles obtained were also searched to identify further relevant citations. Finally, the search included the Current Controlled Trials Register (http://www.controlled-trials.com) and the Cochrane Database of Controlled Trials. Abstracts of the citations identified by the search were then scrutinized by two of the authors to determine eligibility for inclusion in the pooled analysis.

Publications were included if they were randomized controlled trials or comparative studies in which patients underwent either transvaginal or conventional multiport laparoscopic cholecystectomy and reported one of the outcome measures identified below. Studies were excluded if they were non-comparative or compared transvaginal cholecystectomy (TVC) with single-incision laparoscopic cholecystectomy and no reference to conventional multiport cholecystectomy (CLC). Standard three or four-port laparoscopic cholecystectomy was used as the experimental arm in the majority of studies, with ports most commonly placed in the epigastrium, right upper quadrant, right middle quadrant and infra-umbilical region. Transvaginal cholecystectomy was most commonly performed as hybrid procedure with an infra-umbilical camera port placed for safety to ensure no intra-peritoneal injury of the transvaginal port and for deployment of the laparoscopic clip applicator.

Primary outcome measures were total postoperative complications, bile duct injury, Clavien–Dindo grade II and III complications. Secondary outcome measures were operative time (min), length of hospital stay (days), pain score on postoperative days 1 and 3, and return to normal activity.

Statistical analysis

Data from eligible trials were entered into a computerized spreadsheet for analysis. Statistical analysis was performed using RevMan 5.2 (Review Manager version 5.2). Pooled odds ratios (POR) were calculated for the effect of transvaginal cholecystectomy (TVC) on discrete variables such as total complications, incidence of bile duct injury, Clavien–Dindo grade II and III complications. Weighted mean differences (WMD) were calculated for the effect of TVC on continuous variables such as operative time, length of hospital stay, postoperative pain score day 1 and day 3, and return to normal activity. All pooled outcome measures were determined using random-effects model as described by DerSimonian and Laird [10]. Heterogeneity among trials was assessed by means of the I 2 statistic. Statistical significance was assigned when the P value was <0.05.

Results

The literature search identified 14 publications [1124] that met the inclusion criteria for this pooled analysis. Figure 1 shows the preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram for the literature search. In total 1,145 cholecystectomy operations were included, 530 by TVC and 615 by CLC. Table 1 describes the method of TVC employed in each study included. Patient demographic data and indication for cholecystectomy are described in Table 2 where available. Tables 3 and 4 outline the main outcomes from each study.

Fig. 1
figure 1

PRISMA flowchart—systematic search and selection strategy

Table 1 Surgical Technique employed for transvaginal cholecystectomy and study design
Table 2 Patient demographics and indication for cholecystectomy from each study
Table 3 Primary outcome measures from each study included
Table 4 Secondary Outcome measures for each study included

Primary outcome measures

Total postoperative complications (Fig. 2)

Thirteen studies reported the incidence of postoperative complications [1122, 24]. In total 5.6 % of patients in the TVC group developed a postoperative complication compared with 7.5 % in the CLC group. Pooled analysis showed no significant difference between the groups in the incidence of postoperative complications (POR = 0.68; 95 % CI 0.40–1.14; P = 0.14). There was no evidence of statistical heterogeneity (I 2 = 0 %).

Fig. 2
figure 2

Forrest plot showing no significant difference between the groups in the incidence of postoperative complications

Bile duct injury (Fig. 3)

Thirteen studies reported the incidence of bile duct injury [1122, 24]. The incidence of bile duct injury was 0.6 % in the TVC group and 0.4 % in the CLC group. Pooled analysis showed no significant differences between the groups in the incidence of bile duct injury (POR = 1.33; 95 % CI 0.31–5.66; P = 0.70). There was no evidence of statistical heterogeneity (I 2 = 0 %).

Fig. 3
figure 3

Forrest plot showing no significant difference between the groups in the incidence of Bile duct injury

Clavien–Dindo grade II complications (Fig. 4)

Thirteen studies classified complications by the Clavien–Dindo grading system [1122, 24]. The incidence of Clavien–Dindo grade II complications was 0.4 % in the TVC group and 1.4 % in the CLC group. Pooled analysis showed no significant difference between the groups in the incidence of Clavien–Dindo grade II complications (POR = 0.48; 95 % CI 0.14–1.60; P = 0.23). There was no evidence of statistical heterogeneity (I 2 = 0 %).

Fig. 4
figure 4

Forrest plot showing no significant difference between the groups in the incidence of Clavien–Dindo grade II complications

Clavien–Dindo grade III complications (Fig. 5)

Thirteen studies classified complications by the Clavien–Dindo grading system [1122, 24]. The incidence of Clavien–Dindo grade III complications was 1.5 % in the TVC group and 2.4 % in the CLC group. Pooled analysis showed no significant difference between the groups in the incidence of Clavien–Dindo grade III complications (POR = 0.63; 95 % CI 0.24–1.65; P = 0.34). There was no evidence of statistical heterogeneity (I 2 = 0 %).

Fig. 5
figure 5

Forrest plot showing no significant difference between the groups in the incidence of Clavien III complications

Secondary outcome measures

Operative time (Fig. 6)

All 14 studies [1124] reported the operative time for each procedure; however, three studies [15, 18, 24] failed to report standard deviations for the results and were therefore excluded from the results. Pooled analysis showed the operative time was significantly increased in the TVC group compared to the CLC group (WMD = 14.81 min; 95 % CI 8.58–21.04; P < 0.00001). There was evidence of significant statistical heterogeneity (I 2 = 100 %).

Fig. 6
figure 6

Forrest plot showing the operative time was significantly increased in the TVC group compared to the CLC group (WMD = 14.81 min; 95 % CI 8.58–21.04; P < 0.00001)

Length of hospital stay (Fig. 7)

Six studies reported the average length of hospital stay with standard deviations and were included in the pooled analysis [11, 14, 15, 17, 19, 22]. Pooled analysis demonstrated no significant difference between the groups in length of hospital stay (WMD = −0.14 days; 95 % CI −0.45–0.18; P = 0.40). There was evidence of significant statistical heterogeneity (I 2 = 69 %).

Fig. 7
figure 7

Forrest plot demonstrating no significant difference between the groups in the length of hospital stay

Time to return to normal activities (Fig. 8)

Four studies reported the average time taken to return to normal activity [13, 14, 20, 21]. Pooled analysis demonstrated the time to return to normal activities was significantly reduced in the TVC group (WMD = −4.86 days; 95 % CI −9.33 to −0.39; P = 0.03). There was evidence of significant statistical heterogeneity (I 2 = 98 %).

Fig. 8
figure 8

Forrest Plot demonstrating the time to return to normal activities was significantly reduced in the TVC group (WMD = −4.86 days; 95 % CI −9.33 to −0.39; P = 0.03)

Postoperative pain score on day 1 (Fig. 9)

Five studies reported the postoperative pain score on day.11 [1, 12, 16, 20, 21]. Pooled analysis demonstrated a non-significant reduction in postoperative pain on day 1 in the TVC group (WMD = −0.80; 95 % CI −1.60 to 0.01; P = 0.05). There was evidence of significant statistical heterogeneity (I 2 = 90 %).

Fig. 9
figure 9

Forrest plot demonstrating a non-significant reduction in postoperative pain on day 1 in the TVC group (WMD = −0.80; 95 % CI −1.60 to 0.01; P = 0.05)

Postoperative pain score on day 3 (Fig. 10)

Four studies reported the postoperative pain score on day 3 [11, 12, 16, 21]. Pooled analysis demonstrated a non-significant reduction in postoperative pain on day 3 in the TVC group (WMD = −0.89; 95 % CI −1.77 to −0.01; P = 0.05). There was evidence of significant statistical heterogeneity (I 2 = 93 %).

Fig. 10
figure 10

Forrest plot demonstrating a non-significant reduction in postoperative pain on day 3 in the TVC group (WMD = −0.89; 95 % CI −1.77 to −0.01; P = 0.05)

Discussion

Whilst it is clear that complex NOTES operations remain too technically challenging using currently available operating platforms, hybrid procedures have gained popularity with the aim of capitalizing on some of the benefits of decreased invasiveness. Although large case series have demonstrated TVC to be relatively safe, there has been a lack of high-powered comparative studies. With the introduction of novel surgical techniques, safety has to remain of paramount importance and any procedures have to be performed under ethically approved trial protocols by adequately trained surgeons. Importantly therefore, this meta-analysis demonstrated no significant differences in post-operative complications (including by Clavien–Dindo grades II & III) or rate of bile duct injury between CLC and TVC. It is important to recognize that the incidence of bile duct injury is low in CLC and very large numbers of patients would be required to demonstrate a statistical difference, nevertheless it is reassuring to see no significant difference between both groups in this meta-analysis. As can be seen from Table 1, there were only a total of 6 reported conversions from TVC to CLC. There were five conversions from CLC to an open procedure in the reported cases. Naturally if there is doubt regarding safety during TVC these should at least be converted to CLC however this low level of conversion in the TVC group suggests that the technique is feasible and not associated with too high a level of technical difficulty in selected patient cohorts.

Despite evidence of significant statistical heterogeneity, the operative time was shown to be significantly greater in the TVC compared with the CLC group. This is likely to be related to the increased number of surgical steps associated with transvaginal cholecystectomy; however, there will also be a learning curve seen in surgeons performing TVC, as whilst the operative steps for cholecystectomy remain the same the instrument ergonomics will differ. As with the introduction of any novel technique, the operative time is likely to decrease as surgical workflow is streamlined. Furthermore there was a decreased time to return to normal activities in the TVC group although again there was significant statistical heterogeneity seen amongst the studies, and there was a non-significant reduction in post-operative pain on days 1 and 3. These results clearly need to be interpreted with caution; however, pain is widely known to correlate with return to normal activities and it is also the authors’ observation from a local case series that the vaginal incision is associated with very limited post-operative discomfort.

Although pooled analysis showed no difference in length of stay, it is important to consider that local clinical governance and reimbursement arrangements may be a significant contributing factor to this, which is exemplified by the lack of ambulatory cholecystectomy procedures in Germany, largely driven by financial incentives. This may therefore not be an optimal clinical end-point particularly when pooling results from several countries.

Due to the novelty of the TVC technique and the lack of operative standardization, there is some heterogeneity between the studies in relation to operative techniques and trans-abdominal assistance, as demonstrated in Table 1; however, the majority of studies used a 5-mm umbilical incision for initial laparoscopic visualization and deployment of a laparoscopic clip applicator. This step is important as flexible endoscopic clips have yet to prove reliable for closure of the cystic duct.

Colpotomy for peritoneal access has been proven to be safe from large case series in the gynaecology literature with no significant sequelae on sexual function [25]. Several studies in this meta-analysis reported no dyspareunia or difference in return to sexual activity between TVC and CLC groups [11, 14, 18, 22, 24] and this was objectively evaluated through item 26 of GIQOL questionnaire [11] as well as the German version of Female Sexual Function Index (FSFI-d) [14] with no differences shown between the groups. In one study [20], evaluation using a sexual function questionnaire showed worse sexual function at 3 months post-operatively in the TVC group in 2 of 7 domains; however, this was thought to be related to the fact that 86 % of patients in their TVC group were not sexually active compared with 0 % in the CLC group.

Quality of life was objectively evaluated in three studies [11, 13, 20]. Brochert et al. [11] showed no difference in SF36 and GIQOL scores between the groups and Santos et al. [20] showed no difference in SF36 scores; however, Bulian et al. [13] showed a significantly better GIQOL score in the TVC group. Due to the low morbidity of the intervention in the long-term, quality of life evaluation is not likely to be significantly different however trials comparing short-term quality of life perceptions are currently lacking.

One clear advantage of TVC is cosmetic appearance and this was reported in three studies [13, 14, 22] all showing significantly better perception of cosmesis in the TVC group and v den Boezem et al. [22] reported significantly higher scores for both body image and cosmetic subscales in a validated body image questionnaire.

There are some limitations to this meta-analysis. There is heterogeneity in study design and therefore quality of the comparative studies that were included as indicated in Table 1. Although there was one prospective, double-blind randomized controlled trial [12] and two prospective non-blinded randomized trials [13, 18] the other studies were prospective cohort or case-controlled studies which will inherently introduce the possibility of selection bias and lower level of evidence. Furthermore, these trials have also been conducted in centers of excellence in minimally invasive surgery by very skilled surgeons who often have many years of experience in laparoscopic surgery and have undertaken significant simulator, bench-top and animal training prior to embarking on the first human studies, which is likely to have an effect on applicability of the results. Importantly the majority of studies included in this pooled analysis excluded patients with cholecystitis, and focused on patients with benign non-inflamed gallstone disease. Therefore, the applicability of a transvaginal technique to cholecystectomy in more advanced gallstone disease with inflammation remains to be determined and is an important area for future assessment. Additionally, quantification of the effect of medical co-morbidities upon the outcomes of this meta-analysis would ideally be presented as a meta-regression; however, it was not possible to provide this type of analysis due to heterogeneity in the description and quantification of medical co-morbidities within the studies included.

Despite these limitations, these data suggest that TVC is safe in selected patient groups when performed by appropriately skilled surgeons with a similar morbidity and complication profile to CLC. Furthermore TVC may be associated with a decrease in post-operative pain and a faster return to normal activities. Further standardization of the operative technique will aid in the training and adoption of TVC and allow for more meaningful comparisons. Due to the paucity of high-quality data, there is a need for larger multi-center randomized controlled trials to elicit the true potential advantages of this technique.