Introduction

New surgical technology has enabled a shift towards minimally invasive surgery such as laparoscopy and more recently, robotics. These techniques have benefits relating to patient outcomes [1, 2] with robotic surgery (RS) demonstrating lower rates of intraoperative and post-operative complications when compared to laparoscopic surgery (LS) or open surgical techniques (OS) [3]. RS techniques have also built a reputation for improved precision and physical comfort for surgeons [4], which has led to its increased use across surgical disciplines. However, some studies have not shown a clear advantage of RS over LS for perioperative [3] or post-operative outcomes [5]. Furthermore, there are a high hospital costs [3] associated with adapting robotic technology.

Surgeon comfort is frequently cited to justify the use of RS over LS and OS. The majority of literature focuses on patient outcomes, but there is an increasing number of studies examining both the cognitive and ergonomic challenges of surgeons when using different surgical modalities. Studies have shown that LS is limited by decreased range of movement, reduced dexterity and two-dimensional views [6], whereas for RS, the three-dimensional optics and comfort of being seated [7] have shown to be associated with reduced muscular workload in the shoulder and neck regions as well as reduced perceived exertion [8]. A recent meta-analysis [9] comparing muscle activation between LS and RS suggests that RS is ergonomically superior with lower muscle activation.

Conceptually, society benefits from surgical methods that provide better patient outcomes while reducing the physical and mental workload for surgeons. While previous reviews and meta-analysis [9,10,11] have examined the physical ergonomics of RS, this systematic review aims to provide a more comprehensive understanding of the comparative literature on the physical and mental impact of RS compared to LS or OS on surgeons.

Methods

Search strategy and data source

This systematic review was conducted in accordance with the PRISMA-P guidelines [12]. A literature search was conducted using Medline, PubMed, Cochrane database, Embase and PsycINFO. The Medical Subject Headings (MESH) terms and text words from the MEDLINE search strategy were adapted to the other databases and indexing to capture the concept of physical or mental demands on either RS, LS and/or OS on surgeons to identify peer-reviewed articles (Supplementary Tables 1 and 2). In addition, a manual search of the cited references in each article was completed.

Study eligibility criteria

The inclusion criteria for this systematic review were: (1) original studies of comparative study design between RS and LS or OS reporting physical or mental outcomes, (2) published in English language between inception and December 2019, and (3) utilized the da Vinci robotic surgical system. Studies which were not comparative were excluded.

Selection process and data extraction

Two authors (LSP, FYP) independently screened the titles and abstracts for all search results and classified relevant articles based on the eligibility criteria. Studies selected by individual reviewers were then compared and any discrepancies were settled by a third reviewer (JH). Full-text review and data extraction of the final 30 studies were divided between the two reviewers (LSP, FYP), and a summary of the data was recorded in a collective database. Any queries or issues were discussed between the two reviewers (LSP and FYP) with any advice from JH as necessary.

Methodological quality and reporting of results

The quality of all included articles was assessed independently by the two authors using a modified version of the Newcastle–Ottawa risk of bias tool developed by Herzog et al. [13] (Supplementary Fig. 1). The modified Newcastle–Ottawa risk of bias tool is used to evaluate each study in three main categories: (1) selection (maximum score of 5), (2) comparability (maximum score of 2) and (3) outcome (maximum score of 3). Each of the three category scores was added to give a maximum total score of 10. Results of all included studies were synthesized based on their physical or mental impact on surgeons.

Results

Search results

A total of 6,563 articles were identified (Fig. 1). Seven additional articles were identified by checking through references of relevant articles. After the removal of duplicated papers, the remaining 5179 abstracts were screened resulting in a list of 71 articles. The 71 full-text articles were further assessed based on the predetermined inclusion/exclusion criteria resulting in a final number of 30 studies that were included in the qualitative synthesis. The 41 studies were excluded due to the use of a different robotic system (n = 8), no outcome measure of interest (n = 12), or non-comparative studies (n = 21).

Fig. 1
figure 1

PRISMA flowchart of literature search and selection of included studies

Study characteristics

The characteristics of the included studies are summarized in Tables 1 and 2. The number of participants in the final 30 studies ranged from 1 to 117 participants. 25 studies compared RS versus LS, one study compared RS versus OS, and four studies compared all three techniques. The included studies consisted of a wide range of surgery types (Tables 1 and 2) including simulation tasks, general, gynaecological, urological and thyroid surgery. Simulation surgical tasks were the most common type of surgery (N = 14) performed.

Table 1 Summary of characteristics and findings of studies examining physical demand
Table 2 Summary of characteristics and findings of studies examining mental demand

Physical and mental load assessment tools

Physical and mental impacts were examined in 24 (Table 1) and 19 studies (Table 2), respectively. Of these, 13 studies examined both the physical and mental impact of RS on surgeons. Examples of the tools and the number of studies that used each of these tools to compare the physical and mental impact of RS, LS, and/or OS on surgeons are outlined in Supplementary Tables 3 and 4.

Various types of measures and tools were used to assess the physical (Supplementary Table 3) and mental impact (Supplementary Table 4). 12 studies measured physical impact using quantitative tools, while 15 studies used subjective tools such as self-reported questionnaires or visual analogue scales. In contrast, mental strain was mostly measured using subjective questionnaires (N = 18), while quantitative measures such as cortisol levels and cardiovascular responses to stress were used by only five studies. The most commonly used tool was surface electromyography (EMG), an objective measure of physical stress, which was used by 9 studies. The NASA-TLX, a multi-dimensional subjective visual analogue rating scale that measures workload, was the next most common tool used by 8 studies which assesses both mental and physical load.

Risk of bias

The modified Newcastle–Ottawa scores of the 30 studies ranged from 5 to 9 with a mean score of 7.5, showing moderate risk of bias (Table 3). As a modified version of the Newcastle–Ottawa Scale for cross-sectional studies [13], there were no predetermined threshold scores to determine a “good” quality study. None of the 30 studies justified their sample size. 5 studies did not choose a representative sample for the target population. 15 studies lost points as the ascertainment of exposure was based on self-report. All studies used appropriate statistical tests to analyse the data and were clearly described.

Table 3 Risk of Bias scored by the modified Newcastle–Ottawa Tool

Study findings

A summary of the study findings is shown separately for the physical and mental demand in Tables 2 and 3, respectively.

Physical Demand. A total of 19 studies compared RS versus LS, one study compared RS versus OS, and four studies compared all three surgical modalities for physical impact on surgeons. Among these studies, eight studies favoured RS [4, 14,15,16,17,18,19,20], one study showed a trend towards favouring RS [21], and one study favoured RS over LS but showed no difference to OS [22]. Most studies (N = 10) showed mixed results [7, 23,24,25,26,27,28,29,30,31] with only three studies showing no difference between the surgical modalities [32,33,34], and one study with inconclusive results [35].

EMG was the most common tool used to measure physical demand in studies, which either favoured RS over LS [15, 16, 19, 23, 25, 26, 28], or produced mixed results [7, 23, 25, 28] as physical demand highly depended on which muscle was being measured. Less muscle activation in trapezius muscle [7, 23] but higher activation of arm muscles was seen for LS compared to RS [7, 23, 25]. On the other hand, one study [28] showed higher activation of the trapezium, anterior deltoid and flexor carpi radialis in RS, while there were no significant differences between the two surgical techniques in activation of other measure muscle groups.

Another common tool was the NASA-TLX, a subjective measure for physical workload, which showed that laparoscopic surgery was either more physically demanding [4, 7, 15, 18, 27] than robotic surgery or there were no significant differences between the two types of surgery [26, 30]. Interestingly, two studies [26, 27] compared the physical demand of robotic and laparoscopic surgery between novices and expert surgeons. While the study by Mendes et al. [27] showed that both novice and expert surgeons both showed less physical demand in robotic surgery measured by the NASA-TLX, the study by Zarate Rodrigues et al. [26] showed that novices found robotic surgery less physically demanding than laparoscopic surgery, while experts found no significant differences in physical demand between the two surgical techniques.

One study [34] comparing RS to OS showed no significant difference in physical activity levels measured by accelerometers between the two surgical techniques. The four studies [20,21,22, 31] comparing all three types of surgery showed the least physical discomfort or a trend towards the least physical discomfort in RS [20,21,22, 31] although one study recorded the greatest pain in the lower back in RS compared to OS or LS [31].

Mental demand. A total of 17 studies compared RS versus LS and two studies compared all three surgical modalities for mental impact on surgeons. Of these, seven studies [20, 27, 33, 35,36,37,38] showed mixed results, six studies [4, 14, 17, 39,40,41] favoured better mental outcomes in RS, and five studies [7, 15, 18, 28, 30] showed no difference. One study [31] comparing all three modalities showed the most mental demand in OS but did not report any statistical differences between RS and LS.

Mental demand was mostly measured by subjective measures such as self-report questionnaires, most commonly the NASA-TLX [7, 15, 18, 20, 27, 30, 31]. These studies all showed no differences in mental demand between the surgical techniques except for in young surgeons [27] and one study showing less mental demand in RS than LS [20]. Studies using physiological measures of mental stress [14, 33, 38] all favoured robotic surgery with the exception of mean arterial pressure and cortisol levels in one study [38].

Discussion

This systematic review aimed to synthesize the current literature on the physical and mental demand of robotic surgery on surgeons compared to laparoscopic and/or open surgery. Although systematic reviews examining the impact of robotic surgery to the patient outcomes have been published [15], the benefits for surgeons have yet to be critically assessed. One recent systematic review [10] appraised musculoskeletal pain in surgeons performing robotic surgery and a more recent meta-analysis [9] compared muscle activation between robotic and laparoscopic surgery using EMG; however, no studies to date have evaluated both the physical and mental impact of robotic surgery on surgeons.

Majority of the included studies in the current systematic review produced mixed results. Although many of the studies showed a general trend towards favouring robotic surgery, it is evident from this review that there is high heterogeneity in study size and methodology, surgical specialties, procedures, techniques, as well as measures to evaluate physical and mental demand. Therefore, the results of the current systematic review need to be interpreted with caution. The inconsistency in the study findings as well as the variability of outcome measures makes interpretation of the study results and generalizability challenging.

One of the major contributors to the heterogeneity were the tools used to measure physical and mental impact. Seven different subjective scales and six different objective tools were used to measure physical workload alone, while there were even greater variety in tools used to measure mental workload. Such variety in tools prevented performing a meta-analysis.

The NASA-TLX is a commonly used subjective measure of workload in human factors that is being increasingly used in surgical research [42]. The NASA-TLX consists of general questions on the subjective experience of physical or mental workload without specifying locations of the body or aspects of cognitive demand, respectively. It measures a general impression of physical and mental demand rather than specific areas of discomfort that can be compared between surgical technique types. Lawson et al. [24] used a different questionnaire, the Body Part Discomfort Questionnaire, which is a more targeted survey to measure physical discomfort. The authors reported less physical discomfort in the upper back and extremities but more discomfort in the neck and trunk region for robotic surgery compared to laparoscopic surgery. [24] The physical comfort surgeons experience may differ depending on the body part, which the NASA-TLX fails to capture. Despite this limitation, the NASA-TLX was one of the most commonly used tools. The Surgery Task Load Index is more specific to surgery but was used in two studies by the same group of researchers. [33, 37]

Surface EMG, an objective measure of muscle activity in a specific body part, was the most common tool used in this systematic review to assess physical workload. Some studies using surface EMG [15, 16, 19] were favourable for robotic surgery compared to laparoscopic or open surgery. However, majority of the studies using EMG [7, 23, 25, 26, 28] showed mixed results where the reduced muscle activation depended on which body part was being measured. This is consistent with the recent meta-analysis by Hislop et al. [9] which showed that the biceps were the only muscle group that consistently demonstrated lower muscle activation for robotic surgery.

Other studies have used tools such as the RULA, accelerometers, quantitative grip dynamometers, single-leg stance and cardiovascular measurements to examine the physical stress of surgical techniques on surgeons (Supplementary Table 3). Since there is no “gold standard” tool for measuring physical stress in surgeons, several different tools have been utilized, which make comparison of results across studies difficult. Moreover, increased activities in these objective measures may indicate more movement or muscle strain; however, they may not necessarily mean that the surgeon subjectively experiences greater physical strain. Only nine studies [7, 14, 15, 24, 26,27,28, 31, 32] included in this review used both subjective and objective measures to correlate the physical workload findings. It would be informative for future studies to examine correlations between these objective measures of physical stress and subjective measurement tools.

19 studies examined the mental impact of robotic surgery on surgeons most commonly using self-reported rating scales, while only five studies [14, 33, 37, 38, 41] utilized objective measures of mental stress. Various validated self-reported scales (Supplementary Table 4) including visual analogue scales [29, 32] were used to evaluate mental stress, fatigue, frustration or mental effort in surgeons, which may be subject to personal bias and preference in surgical technique. Furthermore, the variety of tools used make it difficult to interpret the results across the studies.

Studies using physiological measures of stress such as cardiovascular measures, cortisol, and skin conductance have been used to objectively measure stress. Although a few studies favoured robotic surgery, which showed lower mental effort and more adaptive cardiovascular responses to stress conditions, each study used a different calculation making interpretation across the studies challenging. It is important to note that these studies used heart rate variability as a measure of mental strain under the assumption that its increase is closely associated with increased sympathetic activation. However, cardiovascular measures may be influenced by various factors and the validity of such physiological measures of stress remains unclear. As such more studies using the same type of physiological tool measure in the same way in a larger number of subjects are necessary for better interpretation of the data.

There is no doubt that comparative surgical studies are challenging due to various patient, surgeon, environmental and skill factors. This is reflected in this systematic review which has demonstrated high heterogeneity in study size, methodology, surgical specialties, surgical expertise (from medical students to experienced surgeons) and tools used. In addition to the methodological weakness of the included studies, the majority of the studies used simulations which may underestimate both the physical and mental stress experienced by the surgeons compared to real surgeries. Even within the simulation studies, there were a wide range of outcome measures, expertise level and type of simulation used which resulted in significant heterogeneity and prevented meta-analysis and consistent interpretation of the results. It is possible that certain surgical specialties may benefit greater from robotic surgical techniques than others. For example, laparoscopic prostatectomy was performed by a small number of surgeons because of the technical difficulty but many perform robotic prostatectomy. This review is specific to the da Vinci robotic system and cannot be generalized to other platforms.

Further studies are necessary to better understand how different surgical approaches impact the surgeon’s physical and mental load during surgery. In the surgeons’ lifetime, physical pain and fatigue may increase the risk of complications and mistakes during surgery. [7] This level of physical pain and fatigue may vary depending on the level of surgeons’ expertise and setting of the surgery, which both need to be further ascertained in future studies. Additionally, mental stress and mental wellbeing can affect the efficiency, productivity and longevity of the surgeons’ career, [40] which may have economic benefits as training a surgeon is costly.

Conclusion

This systematic review identified 30 studies that examined the physical and/or mental impact of the da Vinci robotic surgical system on surgeons compared to laparoscopic and/or open surgical techniques. Most studies showed mixed physical and mental outcomes between the three surgical modalities. This is most likely due to the high heterogeneity in methodology and measurement tools used in the included studies, which makes comparison of results between studies challenging. Overall, the available evidence regarding the physical and mental demand for the different surgical approaches is of relatively low quality and it is not possible to definitely state that robotic surgery has less physical or mental fatigue based on the current evidence. Studies on long-term outcomes are needed to better understand the differences in cognitive or physical demand of surgeons between the three surgical modalities and their impact over time.