Abstract
The next generation of surgical robotics is poised to disrupt healthcare systems worldwide, requiring new frameworks for evaluation. However, evaluation during a surgical robot’s development is challenging due to their complex evolving nature, potential for wider system disruption and integration with complementary technologies like artificial intelligence. Comparative clinical studies require attention to intervention context, learning curves and standardized outcomes. Long-term monitoring needs to transition toward collaborative, transparent and inclusive consortiums for real-world data collection. Here, the Idea, Development, Exploration, Assessment and Long-term monitoring (IDEAL) Robotics Colloquium proposes recommendations for evaluation during development, comparative study and clinical monitoring of surgical robots—providing practical recommendations for developers, clinicians, patients and healthcare systems. Multiple perspectives are considered, including economics, surgical training, human factors, ethics, patient perspectives and sustainability. Further work is needed on standardized metrics, health economic assessment models and global applicability of recommendations.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Main
Surgical robots may be on the brink of achieving their fundamentally disruptive potential1. Since the first surgical robot was introduced in 1985 (the PUMA560, tasked with performing a computed tomography-guided brain biopsy2), the field of robotic surgery has expanded in size and scope, offering the potential for enhanced surgical precision, telesurgery and increasingly complex autonomous function. Technological advances in robotic control systems and artificial intelligence (AI) make it likely that the next generation of surgical robots will transform the surgical technology landscape, previously monopolized by a limited number of approved devices such as Intuitive’s da Vinci1,3,4.
This proliferation of robotic platforms poses important challenges for their safe and ethical clinical translation1,3,5—challenges that extend beyond the operating room and encompass wider considerations within healthcare and society4,6. The scope of the evaluation challenge is too broad for existing methodological templates5,7, but current circumstances create a brief window of opportunity to develop a structured framework capable of guiding the evaluation of surgical robots across their development and translation8.
Conducting high-quality surgical research is difficult owing to the nature of surgical innovation9,10 leading historically to a methodologically weak approach. Specific problems have included a lack of robust early-stage studies providing transparent and timely reporting of iterative development, and subsequent comparative studies failing to address variations in surgical technique and indications, operator learning curves and lack of equipoise1,5. Evaluating surgical robots is subject to all of these challenges, but adds the need to consider unique ethical considerations, profound questions about economic value and sustainability, major impacts on the host healthcare system, and the increasing integration of AI into robotic systems11.
Robotic surgery, like most innovative surgical technology, is often introduced without the stepwise testing process routinely used in medical therapeutics12. Evaluation of surgical innovation is traditionally through initial small case series documenting feasibility, followed by adoption (which may be fast or slow) based largely on non-comparative retrospective evidence of potential benefits to the patient. Robotics manufacturers engage in active campaigns to promote their products with physicians and directly or indirectly with patients. Uncertainty, desire to improve and personal biases can lead to innovation without rigorous evaluation, with consequent risks to patient safety. Therefore, frameworks to ensure proper evaluation of patient safety are essential13.
The IDEAL framework provides a structured evaluation pathway for surgical innovation and devices, from needs analysis and preclinical testing, to long-term studies of widespread use9,14,15 (Fig. 1). However, the breadth of the evaluation problem of surgical robotics goes beyond both IDEAL and the boundaries of classical evidence-based medicine, with solutions requiring a diverse array of stakeholders to tackle all the aspects that need consideration. The IDEAL Robotics Colloquium was established to make proposals for a comprehensive practical guide for evaluation of surgical robots, using the existing IDEAL study stages as a template (Fig. 2).
In this paper, we present a systematic analysis of the evaluation life cycle of surgical robots, in three parts. First, we dissect the preclinical and early clinical study of the safety and feasibility of new robotic concepts (IDEAL stages 0, 1 and 2a). Next, we review the pivotal phase when the effectiveness of robotic interventions is studied on a larger scale, and compared against current best practice (IDEAL stages 2b and 3). Finally, we consider IDEAL stage 4, when the robot has been widely adopted, shifting focus to long-term monitoring of performance in real-world settings. This analysis results in a list of stage-specific recommendations for systematic evaluation of robots in surgery.
Methodology
An international interdisciplinary consensus process was completed in several stages. First, seven distinct virtual panels with expertise relevant to important aspects of the challenges to robotic surgery evaluation were devised by the three lead authors (H.J.M., P.T.R. and P.M.). These panels considered AI, technical evaluation, clinical evaluation, human factors, health economics, ethics and surgical training. Patient representatives were included in each panel.
Panel leaders with relevant expertise were selected from the IDEAL council, and were asked to invite 8–12 experts from multiple disciplines (including surgeons, engineers, economists, statisticians, device regulators, patient representatives, ethicists, digital health experts, patient safety experts, system engineers, social scientists, philosophers and education experts) to join their respective panels. Experts from diverse professional and geographical backgrounds were invited, and were chosen based on leadership roles in relevant organizations (university, hospital, societal and industrial) and/or accomplishments relevant to robotic surgery development and evaluation. The recruitment and facilitation of these panels and the general strategy for their function were developed in partnership with the Royal College of Surgeons of England and the National Institute for Health and Care Research, and considered the views of industry. To avoid bias by association, one co-author conducted MEDLINE searches for publications relevant to each panel, and identified additional potential members, who were invited to join the panel, ensuring that each panel had at least one such member.
Each panel participated in a series of semistructured virtual meetings (chaired by respective panel leaders) at which the key challenges for each panel domain were discussed. The degree to which the current IDEAL framework addressed these challenges was also discussed, and further recommendations to address these challenges were proposed. Each panel therefore produced a report across each stage of the IDEAL framework to summarize the outputs of these meetings. Panel reports were then synthesized by an internationally diverse core writing group that included experts in sustainability (A.V.), global health (R.B.), device regulation (T.M.) and medical statistics (D.S.), who were independent from the colloquium panels. A full list of authors, panel members and industry collaborators is found at the end of the paper. To improve usability, the final recommendations were considered from the perspective of key surgical robotics stakeholders: the device developer, clinician, patient and wider healthcare ecosystems16 (Fig. 3).
Recommendations according to each of these perspectives were grouped together for IDEAL stages 0, 1 and 2a, which cover preclinical development and early clinical evaluation; for IDEAL stages 2b and 3, which cover comparative assessment; and for IDEAL stage 4, which covers long-term monitoring and technological evolution.
The IDEAL recommendations are based on three principles: (1) the use of the most rigorous and appropriate methodology to address the key questions at each stage in the intervention’s life cycle; (2) adherence to the fundamental principles of medical ethics (beneficence, non-maleficence, autonomy and justice); and (3) maximum feasible transparency in reporting evaluation outcomes. These principles have allowed the development of coherent proposals for evaluation across a very broad range of complex therapeutic interventions, but they inevitably lead to some recommendations that may not be feasible in many current contexts. In reporting our recommendations, we have indicated our recognition of this by prefacing certain recommendations with ‘in principle’, or by qualifying them by explicitly mentioning their conditional feasibility.
Preclinical development and early clinical evaluation (IDEAL stages 0, 1 and 2a)
An innovative device must first be deemed safe, feasible and acceptable for its successful translation. This is achieved through preclinical evaluation (IDEAL stage 0) to assess safety and feasibility, first-in-human study (IDEAL stage 1) and prospective development (IDEAL stage 2a) ahead of further collaborative evaluation and comparative assessment. Studies in this phase currently suffer from design flaws, severe reporting bias and methodological heterogeneity, which IDEAL aims to reduce16. This stage also commonly encompasses critical progression points such as regulatory approval and financing. The key challenges and recommendations of this early developmental stage are considered below, and summarized in Box 1.
Device perspective in IDEAL stages 0–2a
Key challenges
The complex and rapidly evolving nature of surgical robots poses unique challenges to assessing their safety and effectiveness3. Current assessment domains are usually driven by regulatory requirements. In the United Kingdom and European Union, this requires a demonstration of overall safety and performance; in the United States, a reasonable assurance of safety and effectiveness is required17,18. However, the implementation of these requirements varies among national regulators, and is subject to complex procedural rules and variable decision-making both within and between bodies. Requirements are also influenced by wider geopolitical, economic and legal factors19,20,21,22,23. Although international harmonized standards exist, they focus on technical aspects of device assessment, such as software or electrical safety assessments, rather than clinical metrics19,20,21,22. The nature and quality of scientific evidence developed for device safety, performance and effectiveness may therefore be vastly different for similar systems, being largely defined, verified, and validated internally by each company. Without recording of iterative systematic modification and assessment, key domains may be overlooked, particularly during prototyping and when changes are made during early clinical studies.
As complementary technologies develop, they will increasingly be integrated into surgical robotic systems13. The most impactful of these technologies will likely be AI—boosting function and increasing the autonomy of robotic systems through integration of sensory inputs, learned computational reasoning and adaptive behavior8,24,25,26,27. However, autonomous systems currently have no ‘common sense’ and so would not necessarily stop an obviously unsafe action if a specific scenario had not been ‘learned’ by the algorithm. The integration of AI also adds a further layer of complexity to device development, calibration and evaluation8,24,25,26,27. AI-integrated functions have the potential for rapid self-updating, requiring monitoring and understanding of risk and failure modes, including data drift. Isolating the dynamic AI components of the robotic system for assessment may be difficult, and assessment frameworks need to address this problem. A recent review on intraoperative AI applications for robotic surgery found that all identified publications reported on preclinical development only, and were heterogeneous in their evaluation approach, highlighting the need for a robust evaluation framework for early integration of AI into clinical practice8.
Recommendations
With these challenges in mind, this Colloquium proposes the following recommendations for early-stage evaluation of surgical robotics from a device perspective. When assessing the performance of a robotic system, technical metrics alone are acceptable in earlier studies (stage 0); however, a clinical outcomes-based approach should be used as the primary focus of assessment as early as is feasible28.
For early technical assessment of robotic systems, a standardized checklist should be used to summarize performance, safety and usability for each released version. Assessment should be systematic and transparent, including details of system latency, motion accuracy, instrument safety, operation under load, reliability, internal fault recognition and online security. Metrics and measurement instruments for each of these domains require further definition. For each domain, performance benchmarks and areas of concern should be clearly stated and shared.
Building on the IDEAL-D preclinical device assessment approach of relative risk assessment, the proportionate evaluation of autonomous surgical robots should be guided by its classification along two main axes—autonomy level and risk—before proceeding to clinical studies16,24. Autonomy levels are as described by Yang et al. into six categories: no autonomy, robot assistance, task autonomy, conditional autonomy, high autonomy and full autonomy. The preclinical evidence requirements should be guided by a failure modes and effects analysis approach to risk stratification, based on the likelihood and severity of device failure in each cell of the risk/autonomy matrix29. Therefore, before clinical study, a high-risk, full-autonomy device would require more extensive preclinical evidence than a low-risk device with no or low (that is, task-only) autonomy16,24.
For the evaluation of AI-integrated robots, the preclinical (stage 0) testing should begin with stand-alone evaluation of the autonomous component and hardware separately, followed by in silico and simulator-based assessment of the two integrated into a functional unit in realistic tasks. Later stages (stage 1 and beyond) should study the performance of the AI algorithms within systems (with the hardware components of that system version) in a clinical context—using clinical outcomes where feasible. Reporting guidelines, such as DECIDE-AI, should be used to guide early clinical evaluation30.
The maturation of the system from in silico to in vitro and in vivo versions should revolve around addressing identified clinical unmet needs and should be described with clear identification of the prototype version. This should include documentation of iterative changes to the procedure, device and patient selection, and describe simulation studies in detail. This information should be recorded prospectively, and a log should be accessible to regulators. In systems with AI integration, the AI component is particularly susceptible to rapid iterations, and therefore changes to input data, algorithm code and model testing should be reported.
Clinician perspective in IDEAL stages 0–2a
Key challenges
From the perspective of clinicians, the introduction of a robotic device within a clinical team is a multifaceted challenge. Investigation of robot interaction with humans (that is, the surgical team) is crucial, particularly in the domains of usability, trust and failure analysis13,31,32,33. This is particularly pertinent with respect to the integration of AI, which could alter responsibility and liability paradigms34. Understanding systems modeling is important, as a device is never integrated into a ‘static’ system—the act of introducing it will change both the behaviors of the surgical team and the way they think about their work within the operating room. Surgical team trust must also be considered in the evaluation of these systems—especially in systems with an autonomous component, which current assessment frameworks do not recognize or evaluate24,35. Human factors and ergonomics approaches will be important in developing solutions to these largely unexplored problems. Recent projects such as the Trustworthy Autonomous Systems Hub and Responsible AI UK will aim to bring standardization and regulation to this rapidly evolving field.
Recommendations
The human–device interface and team–device integration in the operating room should be included in the intervention development and description. In principle, this process starts with robotic development, which should utilize user-centered design and involve input from surgical team members.
Robot assessment should include a human factors-based evaluation of team communication (including communication with the robot), intuitiveness of visual displays, control interface usability, feedback mechanisms (for example, haptic and auditory) and ease of integration with existing workflows. Human factors assessment of system integration should ideally include directly observed user situational awareness, user workload (mental and physical), task analysis in device use, operational challenges and potential safety-related issues36. Formal qualitative research to study robot user opinion and perceptions may be helpful; the ongoing REINFORCE initiative may provide a framework for this37,38,39,40. Human reliability assessments should be used to stratify potential risks and hazards across a wide variety of surgical expertise (that is, consultants and trainees, those with previous robotic or minimally invasive surgery expertise)41.
Surgeons’ trust in any AI autonomous function and its evolution should be evaluated initially in simulated situations. This should include monitoring for frequency of, and reasons for, surgical team members taking over control of the robotic system, alongside independent observation and qualitative assessment. Surgical robotic incorporation of AI poses ethical challenges, including fair distribution of risk and benefits for patients and clinicians. Integration of ethical considerations should occur across key domains for every study stage by addressing the key issues of minimizing harm, ensuring autonomy and consent and optimizing justice (for example, in terms of differential access to treatment). Conflicts of interest should be openly addressed35. In principle, a standard process for determining responsibility for errors when AI is integrated should be adopted with suitable expert advice and should be publicly accessible.
Patient perspective in IDEAL stages 0–2a
Key challenges
From a patient perspective, as robotic systems grow in complexity they become increasingly difficult to understand and trust42,43. Patients invited to participate in early clinical studies will rarely be able to understand the risks and limitations of the technology, compare these against other treatment options (including other robots) or be aware of potential vested interests of the investigators and healthcare system. The nature of the early IDEAL stages means patient numbers and operating team experience are limited, resulting in the evaluation of interventions at early stages of the learning curve, with resultant implications for both clinicians and patients44. Surgical teams may not know all the risks, or how learning curves may increase the overall risk to patients involved at this early stage (relative to that which later patients experience)44. Provisions to minimize harm and ensure truly informed consent are ethical requirements in this phase of surgical robot evaluation35,43,45,46.
Recommendations
Active patient and public involvement is desirable to ensure a patient-centered research design from the outset, and formal qualitative research assessing patient perceptions, understanding of the robotic system and trust in the intervention may be very informative. Patient information sheets for both research and surgical consent purposes should be developed with input from patient groups. Crucially, informed consent in early clinical studies (that is, stages 1 and 2a) should acknowledge a potentially increased uncertainty of benefit and risk of harm in early cases, as with all new device introductions. Information should include details of previous studies; known risks and the possibility of unknown risks; dependence level on the surgical robot and mitigation plans for system failure; level of AI-system autonomy and protocols for the takeover of control; transparency regarding surgical team experience with the system; and any potential conflicts of interest.
System perspective in IDEAL stages 0–2a
Key challenges
When considering the impact of surgical robots in health systems, societal cost must be considered. Currently, health economic assessments are not standard components of early evaluation frameworks for devices, as illustrated by the lack of guidelines from The Professional Society for Health Economics and Outcomes Research (ISPOR) for this stage. Early health economic evaluations are heterogeneous and often unsatisfactory. Economic evaluations at this stage act as exploratory tools to assist decision-making about pursuing further development, and to provide insights into future cost-effectiveness, particularly for complex interventions47. The current deficit in early economic evaluation extends to encompass related gaps in the evaluation of the environmental sustainability and global applicability of surgical robots48. Early and systematic use of unmet needs analyses, health economic analyses and sustainability analyses can and should serve a vital role in guiding the efficient onward development of devices and avoiding waste49.
Recommendations
Unmet needs analyses and early economic models should routinely be considered before moving into definitive studies42, such as headroom analyses to provide early estimates of cost-effectiveness or economic burden studies to advise on high-priority disease targets. These could provide pilot metrics for expenditure (including time, money, human resource and technical resource) and costs of altered downstream care. Iterative exploratory decision-analytic modeling could inform robotic development as part of the early health technology assessment process47,50.
Value of research studies should identify surgical robots that are unlikely to be successfully implemented into the health system, permitting decision-making on halting research and investment into technologies unlikely to be adopted. Reverse engineering and frugal or alternative surgical robot design (such as handheld platforms) could be explored to reduce cost, improve eventual accessibility across healthcare systems and boost the potential for global health impact51,52.
Sustainability metrics should be recorded during preclinical (device-only) and clinical (device within system) evaluations48. Assessment should integrate a complete life cycle assessment model53. This includes recording resources required to build, run and maintain each device version, along with device design (for example, careful material selection, modular system design, reusable parts) from preclinical stages onwards. Interoperability, parts replacement and maintenance by local teams, especially in low-resource settings, should be considered at the earliest design stages.
Comparative evaluation (IDEAL stages 2b and 3)
Once a stable version of an effective and safe robot has been developed, a comparative evaluation with the current surgical standard should follow. Expert consensus is needed on the nature of the patients and procedures to be studied in trials, and on markers of adequate procedure quality, to avoid bias due to learning curves or wide variations in performance. Evidence from collaborative prospective cohort studies (IDEAL stage 2b) in a range of potentially appropriate settings and indications can provide this, and thereby facilitate definitive randomized comparative studies against an appropriate control group (IDEAL stage 3).
The importance of adequate comparative evaluation before adoption was recently illustrated by the US Food and Drug Administration warning against the use of robotic surgery for the treatment of breast and cervical cancers54. The recommendation on cervical cancer was based on the results of a prospective randomized trial, and a population-based study comparing open versus minimally invasive surgery (including robotic surgery) showing worse disease-free survival and overall survival in patients who underwent minimally invasive surgery55,56. A breakdown of adverse events across all robotic surgeries recorded by the US Food and Drug Administration includes 2,000 events that involved injury to the patient, 17,000 events due to malfunction of specific robots and 294 fatalities57. It is not clear how many of these events could have been avoided by more rigorous evaluation at an earlier stage, but it is undeniable that omitting such evaluation reduces our capacity to limit harm. The key challenges and recommendations of this comparative evaluation stage are considered below, and summarized in Box 2.
Device perspective in IDEAL stages 2b and 3
Key challenges
Surgical robots offer great potential technical advantages including improved precision, dexterity, improved ergonomics, and teleoperation, but they demand new or different resources from healthcare systems (for example, surgical team training, audit and maintenance)1,58. Few definitive high-quality comparative trials have been published, and from these the evidence of benefits of robot-assisted surgery over comparable minimally invasive surgical approaches has been inconclusive55,56,59,60,61. The literature reveals methodological limitations, such as poor reporting of outcome measures, a lack of agreed core outcomes sets, incomplete efficacy or effectiveness assessment, and variable reporting of safety58,62. The rapid evolution of robots poses major problems for evaluation, with newer AI-enabled systems threatening to render current studies outdated before their completion—demanding innovative, iterative evaluation strategies, such as implementation trials63. This uncertainty complicates decision-making about when, how and if a definitive randomized clinical trial should be performed within the evaluation cycle of the surgical robot. Some of the technology incorporated in newer surgical robots could itself provide next-generation evaluation measures, such as computer vision, a domain of AI applied to operative videos with procedural analytics64,65,66,67. However, such outcome measures must themselves be robustly validated before clinical implementation, and their relation to clinical outcomes fully understood.
Recommendations
The comparative stage poses numerous device-related challenges. The benefits and risks of a surgical robot should be documented through well-designed prospective evaluations, capturing clearly defined safety and effectiveness outcomes (including patient-reported outcomes) relevant to a given procedure, surgical speciality or patient population. These studies should proceed in a stepwise fashion according to the IDEAL recommendations, considering and adopting seamless designs for efficiency, where plausible.
Measured outcomes must include well-defined clinical outcomes (ideally from existing consensus core outcome sets), technical outcomes (including those derived from robotic kinematic and haptic sensors), patient-reported outcomes (such as quality-of-life indicators) and wider outcomes that reflect potential robotic disruption (ergonomic benefits, impacts on accessibility to surgery) where relevant. Next-generation outcomes and measures, such as those derived from robotic kinematic, haptic sensors and video data, should be reported where relevant, but should be robustly validated and their associations with clinical outcomes determined.
Randomized controlled trials will serve as the default choice for thorough comparative studies of robotic surgery where preliminary studies suggest a potentially important clinical or economic benefit. Planned prospective implementation trials should be considered only where randomized trials are considered impossible. However, for procedures where a robot system has previously established its superiority over non-robotic surgery in technically similar contexts, and no substantial change in the level of risk is expected, further randomized trials for every new procedure may be unnecessary. In this situation, a collaborative prospective cohort study (IDEAL stage 2b), a prospective implementation trial or a prospective registry is ethically necessary to ensure that a meaningful evaluation of effectiveness and safety is performed as indicated by existing decision-support algorithms68.
In principle, public preregistration of protocols and analytic intent is recommended for all studies, with any post hoc changes recorded. Protocols should specify defined data dictionaries, data recording by independent observers, with independent validation, and calculations of interobserver reliability. Data collection and analysis should be sensitive to, and protected against, conflicts of interest and related biases. The privacy and security implications of capturing, storing and using data from robotic devices should also be considered.
During evaluation, changes to the technology or procedure may result in unexpected outcomes, which could warrant reevaluation of the robotic surgery at the current or an earlier IDEAL stage. Thresholds for this kind of action should be established in advance, considering trends in outcome data suggesting changes in risk levels, indications for use or device performance. As a guiding principle, major changes in risk should warrant a return to earlier IDEAL stages. An independent expert panel should be involved and work with regulators in making these decisions, including which IDEAL stage study is required.
In cases where a robotic system can perform a procedure that achieves a physiological, clinical or functional effect that was not previously possible, there may be no reasonable comparator. Independent ethical advice should be sought to determine whether control groups for a randomized trial are acceptable, depending on the nature of the presumed benefits and anticipated risks of the procedure and the outcome data available. Where the clinical outcomes of the novel robotic approach are clearly unachievable by other means, randomization of participants may be unethical. Alternate designs to study effectiveness and safety, where possible, should be sought.
Clinician perspective in IDEAL stages 2b and 3
Key challenges
Human factors and ergonomics analysis is crucial during the clinical translation of surgical robots to ensure they are usable, and can efficiently integrate into complex teams and workflows33,69,70. Concerns about the occupational consequences of surgery has led to an interest in ergonomic innovation in surgery and is a purported benefit of surgical robotics, but the evidence base is conflicted and of limited quality70. As surgeons gain experience with the robot, their operative skills are expected to improve, described by a learning curve71. Surgeon experience and learning curves are an important source of potential variation and bias in comparative surgical robot trials, with high-quality trials incorporating their effects into analysis72,73,74,75,76. A reliable measure of the learning curve can only be achieved by analysis of meaningful measures of operation quality and patient outcomes77. Learning curve evaluation is important for fair comparative analysis, and for planning and implementing training programs for the surgical team32,78,79. Effective, standardized team training is essential for comparative evaluation and clinical translation, but there is no consensus on developing mandatory training program requirements71,78,80,81.
Recommendations
Human factors should be considered, and behavior change scientists should be consulted during the evaluation of surgical robots to examine hypotheses generated in earlier IDEAL stages—evaluating features such as workflow, variations in system use, ergonomic risk assessment, data collection capabilities of the device, teamwork, nontechnical skills and workspace analysis.
Analyzing learning curves is essential in evaluating new technologies, including surgical robots. Large prospective cohorts (IDEAL 2b studies) offer the first opportunity to capture real-world learning curves for surgical robots, and should be used to study their complexity and improve our tools for evaluating them. Metrics gathered from direct supervision, objectively defined criteria, cadaver laboratories and simulator training should be standardized and used for assessment of real-world learning curves. The performance plateau should be continuously monitored to detect changes over time, studying the effects of factors that may influence surgical performance such as casemix, team changes, and changes in the surgical environment. Criteria for the minimal acceptable level of plateau performance should be agreed for surgeons to practice independently or take part in definitive pivotal comparative studies with the robot, using objective measures of procedural quality. Statistical exploration of learning effects (such as sensitivity analysis or extensions of the primary analysis82) should be included in trial protocols to identify and adjust for learning curve bias. Training mechanisms should be audited for impact and iteratively improved to meet user needs37. Programs should directly attempt to track the learning curves seen in surgical robotics training and investigate techniques such as mentoring approaches to shorten them or minimize any effect on patients. Training courses should be validated by evidence of correlation between course evaluations and clinical performance. Institutional clinical governance policies should require the development and use of consistent criteria pertaining to surgeon training and outcomes to monitor continued learning.
In the case of autonomous systems, learning curves will likely be linked to the evolution of trust in the AI application. Proxies for clinicians’ trust in the autonomous components (such as instances of use or if manual override is required) should be studied and presented with learning curve analysis.
Patient perspective in IDEAL stages 2b and 3
Key challenges
Patient acceptability is increasingly important when implementing healthcare interventions, but this is difficult to define or assess in relation to surgical robots16,83. Acceptability is important for IDEAL stage 2b/3 studies as patients must provide fully informed consent to studies before enrollment. Very few patients have a comprehensive understanding of what a surgical robot is or does, of the current evidence about the potential and proven risks and benefits of a surgical robot or of the degree of autonomy of robots during surgery84. Patient perceptions of likely benefit or harm may be affected by media ‘spin’ or by industry and marketing psychology44. This may affect patient preference for one treatment over another42, and this could contribute to the challenges of randomization or trial recruitment. Therefore, it is important that patients are provided with a clear, accurate nontechnical explanation of the evidence on the established benefits, known risks and gaps in knowledge about robotic surgery in their specific context, and protected from potential bias from developers and robotic enthusiasts.
Recommendations
Although no universal definition of patient acceptability exists for surgical robots, it should be considered as including (1) patient perception (personal and societal views, the degree of trust within a patient–doctor relationship and wider system), (2) patient understanding (procedure, risk, equipoise and device) and (3) patient consent (informed consent, full disclosure of conflicts of interest).
The consent process should not be contaminated by surgeon bias or patient misinformation. Potential alternatives to the traditional consent process include using research nurses or computer decision-support programs. Surgeons involved in the process of consent for robotic surgery trials should undergo training to minimize unconscious bias85.
Patients involved in IDEAL stage 2b or 3 studies should be informed about their surgeon’s current level of experience with the proposed robotic platform and procedure, encompassing information on both local outcomes and complications for robotic and alternative (that is, standard-of-care) procedures. If an accurate assessment of the learning curve is available, this should be disclosed.
System perspective in IDEAL stages 2b and 3
Key challenges
A broad systems perspective is needed during comparative surgical robotic evaluation38. Surgical robots must be economically viable, and the cost of purchase, maintenance and repairs fully evaluated11,13. Increasing attention is being paid to the environmental impact of surgery; thus, the impact of robotics should be measured and justified in terms of global Net Zero initiatives53,86,87. Adoption of single-use robotic tools presents a concern in this regard.
Existing efforts reporting on the resource use, greenhouse gas emissions and material footprints associated with robots require extension to provide impact comparisons with existing technologies53. The Lancet Commission on Global Surgery highlighted the huge global unmet need for timely and effective surgical services, particularly in low-income settings88. An in-depth understanding of each surgical ecosystem will be needed before a decision on integration of robotics49,88,89. This includes understanding challenges such as inconsistent access to electricity, clean water, operating rooms and certified surgeons, equipment sterilization procedures, maintenance of equipment and inconsistent funding, which may render robotic surgery infeasible. In resource-poor settings, there is a clear opportunity cost of introducing surgical robots, which may squander scarce resources, and be impossible to maintain, resulting in net harm and perpetuating healthcare inequality81. From an ethical viewpoint, it is important to consider the impact of robotics on access to care for relatively disadvantaged populations in all healthcare systems90.
Recommendations
Analysis of healthcare costs associated with robotic intervention and control treatments should be routinely included in comparative surgical robotic studies. Economic studies should include clinically and system-relevant outcomes over a sufficient length of follow-up to compare a surgical robot to current surgical practice. Established international frameworks such as those published by ISPOR should be used to evaluate health economics and outcomes research91,92. Decision-analytic modeling should be used in IDEAL stage 2b studies. IDEAL stage 3 studies should incorporate formal economic evaluations providing trial-based cost-effectiveness analyses that follow established reporting guidelines, such as the Consolidated Health Economic Evaluation Reporting Standards93.
Although the Colloquium acknowledges the substantial barriers to implementing robotic surgery programs in low-resource settings, it is possible that future advances may reduce these. Therefore, stakeholders from low-income countries with an interest in robotic surgery should be encouraged to join discussions and provide insights into how robotic surgery might become more feasible and beneficial in such settings once its value in higher-income settings is established.
To delineate whether a surgical robot would result in net health benefits while remaining cost-effective in low-income settings, a rigorous modeling approach can be applied. This should include metrics on robot effectiveness and safety; health economic and sustainability analysis; and specific capacity metrics for the target healthcare environment—such as basic infrastructure (including energy and information technology services), healthcare infrastructure, necessary human resources (entire surgical team), medical supplies, critical care capacity and healthcare funding. The goal of this process is to estimate the robot’s impact within lower-resourced ecosystems, determining an environment’s readiness for downstream robot integration.
A modeling approach can also be applied to identify major risks to fair distribution of benefits within higher-income contexts. If modeling reveals concerns regarding equity of access, safety, cost-effectiveness or readiness, then a plan for local capacity building should be developed and its implementation monitored before robots are introduced. Efforts to uphold fairness by increasing access to successful innovation internationally should be supported by nongovernmental organizations, governments and the robotic industry, and by existing infrastructure, such as the SAFROS project to address current inequities in access to safe surgery.
The sustainability and economic evaluation of a surgical robot should include a complete life cycle assessment considering how the surgical robot changes practice in relation to the surgical procedure, manufacture and maintenance, type and amount of waste generated, and reusable and single-use items. Any projected increase in carbon footprint compared with continuing with non-robotic surgery should be assessed, minimized and offset where possible (for example, switching from consumable to reusable components) and should be justified in terms of other quantifiable benefits (for example, improved patient outcomes, and downstream economic and environmental benefits).
Long-term monitoring and technological evolution (IDEAL stage 4)
Following comparative evaluation and widespread adoption, the focus shifts to long-term monitoring of performance in real-world settings. Registries are the predominant methodology in this stage of evaluation11, but ownership and curation of robotic registries by commercial groups can introduce risks of bias and lack of transparency. Other prospective methods of long-term study, such as observational cohort studies, have limitations including fragmentation, maintenance costs and lack of comparability. In an increasingly digitalized healthcare landscape, real-world datasets (RWDs) leveraging data collected for clinical care or administrative purposes have become important potential data sources for the evaluation of health interventions94. However, valid studies based on RWD need standards to guide their design and reporting, and safeguards for privacy and data security. Expanding on the IDEAL framework, targeted recommendations specific to IDEAL stage 4 study designs are needed to inform their methodologies and analytics. The key challenges and recommendations of this long term monitoring stage are considered below, and summarized in Box 3.
Device perspective in IDEAL stage 4
Key challenges
Long-term monitoring of a surgical robot’s real-world performance is critical for the safety, evolution and longevity of a device. This could best be achieved by device developers working with regulators, providers, insurers and other stakeholders to create international surveillance systems7. The developers of surgical robots have a duty to ensure that patients and scientific evaluators have the best possible evidence to fulfill the ethical requirements for autonomy and non-maleficence, respectively, and this needs comprehensive, unbiased outcome data from real-world settings. Many existing device monitoring systems are criticized as passive and inconsistent, underreporting incidents and therefore lagging behind analogous systems for drug monitoring95,96,97. Given the current lack of incentives to evaluate, it is unsurprising that existing evidence on devices is weak, and efforts to curate data are fragmented, reducing comparability and scope for analysis98,99.
Manufacturers, hospitals and insurers curate and maintain datasets, but have few incentives to make them widely accessible, while commercial, and sometimes regulatory, issues also inhibit full disclosure of clinical and technical data95. Registries are currently the predominant methodology for long-term monitoring of robotic surgical interventions, but currently these are generally in-house datasets focused on a single robotic system, and usually lack independent validation and/or have limited access to real-world data15. Efforts to link datasets to facilitate better analysis of larger groups are currently limited in their impact and capacity, partly by regulatory issues around data sharing. Stakeholder collaboration at all levels (individual, organizational, system, international) is required to generate high-quality data, as seen with the US national device and evaluation system MDEPinet, which acts as a registry network for specific surgical devices100. To give a full picture, evaluation systems for surgical robots need to go further, supplementing standard outcome measures (that is, effectiveness, safety and economical) with complementary datasets, including machine-generated activity data, data from human factors analyses and data to monitor the dynamic nature of AI incorporated into surgical robotics101.
Recommendations
In principle, best practice should be followed using established design and reporting guidelines, and prospectively collected high-quality data102. Integration of RWDs should be encouraged if quality can be assured. Data should be collected and analyzed by groups independent from those producing it. The roles and conflicts of interest of those producing and curating data should be transparent and available.
Datasets should include, but not be limited to, patient population demographics, disease characteristics, device characteristics, device indications, type of setting, clinical outcomes, economic outcomes, low-level technical outcomes, technical failures, adverse events, changes in device capabilities and dedicated metrics monitoring AI-system evolution. Reporting of technical failures (including software failures) and patient safety incidents should be mandatory, supported by national regulators and independent of device manufacturers. Rapidly generated, scalable datasets should be developed for widely adopted innovations. Collection and analysis should be fully automated, with harmonized coding language and core reporting and outcome measures.
Regulatory, political and commercial barriers may limit the feasibility of optimal sharing of real-world data. In principle, international collaborative approaches are recommended to produce homogeneous and comparable datasets, with data-sharing agreements giving data access to all stakeholders. Governance of linked datasets should ensure open access to facilitate observational research. Governments, insurers, hospitals and professional associations all have potential roles in this.
Statistical analyses of real-world data should be transparent in their methods, and show how they account for confounding factors, sources of bias and missing data. Analyses should be made accessible according to the FAIR (findability, accessibility, interoperability and reuse) principles103.
AI-enabled and autonomous systems require particular attention. The initial use and indication of use should be clearly stated, and metrics for long-term monitoring of performance and safety established from the outset of clinical use. Performance should be evaluated at regular intervals, with more frequent evaluations of rapidly changing systems. Changes in indication of use, the level of autonomy of the system or performance drift, which might increase the level of risk, will require detailed evaluation. Changes in machine behavior during the period should be described, with analysis of how the algorithm has changed where this is possible.
Clinician perspective IDEAL stage 4
Key challenges
The long-term integration of surgical robots into health systems relies on their adoption by clinicians. The principal challenges from this perspective arise from training, credentialing and determining accountability for adverse outcomes (particularly in the context of robot autonomy and AI). Even technologies that demonstrate safety and efficacy experimentally pose risk to patients in untrained hands, and inadequate training prolongs learning curves, particularly during the long-term study stage, as devices are adopted by new surgical teams101,104,105,106,107. Research attempting to elucidate learning curves associated with surgical robots remains sparse but appears to be developing. While standardized robotic training programs exist for well-established surgical robots, such as the da Vinci, most robotic surgery training remains inconsistent and non-standardized, particularly for novel robots108,109.
There are efforts to address these challenges, such as the multi-institutional validation and assessment of training modalities in robotic surgery (the MARS project), but the optimal strategies for training robotic surgeons are unclear109. Ongoing certification and credentialing based on a regular reexamination of skills is not currently required for robotic surgery, which contrasts with practice in comparable high-risk industries involving complex technologies (for example, aviation)71. Determining accountability for, and analyzing the causation of, adverse events during surgery will be more complicated in a robotic future34,110,111. Communication difficulties due to altered spatial relationships in the operating room, telesurgery, input from company technical experts and, in future, increasing machine AI autonomy all have the potential to diffuse responsibility for decisions33,112,113. Effective monitoring will require routine recording, storing and analysis of granular data including technical, video, audio and IT data streams, which may be needed in the analysis of adverse events, aligning surgery with other high-risk, high-technology processes114.
Recommendations
Novel training methods should undergo evaluation using appropriate frameworks for determining validity (for example, Messick’s framework115). They should specify the aims of the training and use an appropriate educational paradigm. These studies should inform standardized training programs, which receive oversight from recognized accrediting bodies and are independent from industry partners. Where validated methods exist, surgeons using a robotic system should undergo regular revalidation with holistic assessments of performance through assessment of technical and nontechnical skills. Novel methodologies including automated performance metrics, AI-driven credentialing and operative video assessment should be adopted if validated. Ongoing credentialing and revalidation should include assessments of skills necessary to operate the device, but also the availability of skills in techniques needed to safely manage emergencies using alternative approaches, whether by the same surgeon or another.
A human factors expert should be included in the analysis of all serious adverse events involving a surgical robot. Adverse events/errors should be analyzed using data including technical, usability, interface and system integration failures. Governance for robotic surgery, particularly where AI systems with autonomy are involved, needs to evolve so that it can determine appropriate responsibility for monitoring, accountability for adverse events and responsibility for implementing improvements. This will require collaboration between legislators, healthcare organizations, professional bodies and industry.
Processes for monitoring the unplanned evolution of aspects of machine-learning-enabled AI should be iteratively reviewed, as human experience of this activity is in its infancy. Reevaluation of processes and algorithms should take place at regular intervals, and whenever evolving aspects (for example, level of autonomy or drift in target population) cause substantial changes in performance.
Patient perspective IDEAL stage 4
Key challenges
As with all IDEAL stages, patients are the most important stakeholder when evaluating surgical robotics, as the recipients of both benefits and harms. Patient perceptions are influenced by exposure to the views and agendas of other stakeholders, for example, manufacturer marketing and clinician enthusiasm. However, patients have limited access to scientific evidence, which may be further restricted due to regulatory/approval processes. They may be falsely reassured that a robotic system is well established and safe, without specific evidence for the indication it is being offered for (procedure creep). They are unlikely to be cognizant of iterative changes to a surgical robot, rendering it different to the device upon which initial evidence was generated (device creep), making it important that this type of information is explicitly mentioned during the consent process.
Recommendations
Comprehensive robotic surgery registries and/or systems for extracting reliable information from existing real-world data sources should be made accessible and understandable to patients by providing lay language explanations of their outputs.
Current data should inform the consenting process; evidence referred to when seeking informed consent must relate to the indication and not simply to the device, since robotic systems may be used for many different procedures (procedure creep). Informed consent by patients should routinely seek general consent for future use of anonymized data for research and safety surveillance to maximize the value of health data.
Finally, where mechanisms to facilitate this exist, public and patient involvement should inform the design of IDEAL stage 4 studies and outcome measures to ensure they remain patient centered.
System perspective IDEAL stage 4
Key challenges
The evaluation of the wider systems impact of robotic systems needs to continue in the long term, to track the cost-effectiveness and sustainability of their integration into healthcare systems with varying resources and capacity. Health economic analyses need to be iteratively updated with real-world data, and should remain free from restriction or private interests to maintain transparency116. Costs will be impacted by learning curves, technical errors, system failures, dynamic pricing and other factors. This means that real-world data, including health data, administrative claims data and prospective observational studies are essential in modeling the true value of robotic systems in IDEAL stage 4. Potential access and equity issues accompanying these high-cost investments must be considered117, meaning resource allocation requires justification in terms of their place among competing choices. Ethically, providers must consider the benefits of robots against wider health system needs, and rationally allocate limited resources to high-priority issues.
Similarly, strong arguments are needed in favor of robotic surgery to counterbalance environmental impacts seen through life cycle assessments. This issue makes an argument for innovators to adopt sustainable practices in the development, implementation and maintenance of robots. Innovators should measure and minimize environmental harms of robots, ideally through open, transparent datasets, such as the HealthcareLCA repository118, fostering collaborative investigation of their impact in real-world settings. Outside experimental evaluation settings, complex interventions enter complex adaptive systems, with potentially unforeseen ‘emergent’ consequences. True performance will only be revealed in real-world settings and must be monitored to avoid unrecognized gradual decline in safety or effectiveness. This demands the development of monitoring infrastructure, processes and governance.
Recommendations
Cost-effectiveness analyses using decision-analytic modeling of real-world data should evaluate robotic systems by indication, and provide comparable analyses openly available to all stakeholders. These should use validated outcome metrics and comply with ISPOR guidance. In principle, regular reviews of robotic surgery cost-effectiveness should include an assessment of changes in organizational configurations and their influence on processes/outcomes, where the necessary resources are available.
National and international discussion forums involving clinicians, patient advocacy groups, industry, policymakers, ethicists and economists are needed to consider the potential effects of robotics on equity of healthcare access, and to explore models that might justify use in low-income settings. Advice from public health experts, policymakers, ethicists and climate scientists should be considered in discussions of how robotic surgery platform design, development and use could be made more sustainable.
In principle, complete life cycle assessments of surgical robotics should incorporate a broad range of parameters, be guided by environmental experts, produce data without restriction and contribute toward living open-access data repositories. Moreover, complete life cycle assessments of surgical robotics should be iteratively updated against existing care standards in real-world settings to monitor and minimize their environmental impacts through quality improvement. These recommendations will require further development of collaborations, datasets and resources.
Conclusion
The next generation of surgical robotics is poised to transform healthcare systems around the world. Whether this will result in substantial patient and societal benefit depends critically on whether innovation is guided by appropriate evaluation. This Colloquium has provided key recommendations for the evaluation of surgical robots across their developmental life cycle, mapped to the IDEAL evaluation framework.
Our analysis presents practical recommendations to guide robotics developers, clinicians, patients and wider systems as we enter the next era of surgical robotics. For all stages of evaluation, all stakeholders should be considered at the outset, including the surgical team (human factors analysis and training), patients (acceptability and rigorous ethical assessment) and the wider system (economic and sustainability evaluation). Further work is needed to establish standardized metrics for technical and clinical outcomes, refine health economic assessment models and assess the global representativeness of these recommendations.
No framework that deals with such a broad range of evaluation challenges can hope to avoid conflicts between recommendations, or situations where recommendations may appear disproportionate to the problem addressed. Such dilemmas with IDEAL recommendations are usually easily resolved by referral to the underlying principles mentioned in Methodology. The breadth of the subject also raises the question of which evaluation recommendations are relevant in the context of which particular studies. Clearly, incorporating all possible aspects in any single study would be infeasible and unnecessary, but sensible judgment, involving discussion where necessary with relevant subject experts, should allow this guidance to be of practical use to clinicians, robotic engineers, patients and other stakeholders in the development of robotic surgery.
Change history
31 January 2024
A Correction to this paper has been published: https://doi.org/10.1038/s41591-024-02836-8
References
Lee, N. Robotic surgery: where are we now? Lancet 384, 1417 (2014).
Kwoh, Y. S., Hou, J., Jonckheere, E. A. & Hayati, S. A robot with improved absolute positioning accuracy for CT guided stereotactic brain surgery. IEEE Trans. Biomed. Eng. 35, 153–160 (1988).
Peters, B. S., Armijo, P. R., Krause, C., Choudhury, S. A. & Oleynikov, D. Review of emerging surgical robotic technology. Surg. Endosc. 32, 1636–1655 (2018).
Maynou, L., Pearson, G., McGuire, A. & Serra-Sastre, V. The diffusion of robotic surgery: examining technology use in the English NHS. Health Policy 126, 325–336 (2022).
The Lancet. Robotic surgery evaluation: 10 years too late. Lancet 388, 1026 (2016).
Christensen, C. M., Baumann, H., Ruggles, R. & Sadtler, T. M. Disruptive innovation for social change. Harv. Bus. Rev. 84, 94–101 (2006).
Tan, W. S., Ta, A. & Kelly, J. D. Robotic surgery: getting the evidence right. Med J. Aust. 217, 391–393 (2022).
Vasey, B. et al. Intraoperative applications of artificial intelligence in robotic surgery: a scoping review of current development stages and levels of autonomy. Ann. Surg. https://doi.org/10.1097/SLA.0000000000005700 (2023).
Ergina, P. L., Barkun, J. S., McCulloch, P., Cook, J. A. & Altman, D. G. IDEAL framework for surgical innovation 2: observational studies in the exploration and assessment stages. BMJ 346, f3011 (2013).
Hirst, A. et al. No surgical innovation without evaluation: evolution and further development of the IDEAL framework and recommendations. Ann. Surg. 269, 211–220 (2019).
NIHR. REINFORCE: a real-world, in-situ, evaluation of the introduction and scale-up of robot-assisted surgical services in the NHS. ARC https://arc-nenc.nihr.ac.uk/projects/reinforce-a-real-world-in-situ-evaluation-of-the-introduction-and-scale-up-of-robot-assisted-surgical-services-in-the-nhs/
Sheetz, K. H., Claflin, J. & Dimick, J. B. Trends in the adoption of robotic surgery for common surgical procedures. JAMA Netw. Open 3, e1918911 (2020).
Future of Surgery Commission Group. The Commission on the Future of Surgery. https://www.rcseng.ac.uk/standards-and-research/future-of-surgery/ (2018).
McCulloch, P., Cook, J. A., Altman, D. G., Heneghan, C. & Diener, M. K. IDEAL framework for surgical innovation 1: the idea and development stages. BMJ 346, f3012 (2013).
Cook, J. A. et al. IDEAL framework for surgical innovation 3: randomised controlled trials in the assessment stage and evaluations in the long term study stage. BMJ 346, f2820 (2013).
Marcus, H. J. et al. IDEAL-D framework for device innovation: a consensus statement on the preclinical stage. Ann. Surg. https://doi.org/10.1097/SLA.0000000000004907 (2021).
UK Statutory Instruments, UK Government. The Medical Devices Regulations 2002. 2002 no. 618 (King’s Printer of Acts of Parliament).
Official Journal of the European Union. Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC. vol. 117 (2017).
International Electrotechnical Commission. Amendment 1 - medical electrical equipment - part 1-2: general requirements for basic safety and essential performance - collateral standard: electromagnetic disturbances - requirements and tests. IEC 60601-1-2:2014/AMD1 (2020).
International Organization for Standardization. Quality management and corresponding general aspects for products with a health purpose including medical devices. IEC 62304:2006.
International Organization for Standardization. Medical electrical equipment Part 2-77: Particular requirements for the basic safety and essential performance of robotically assisted surgical equipment. IEC 80601-82:2019.
International Organization for Standardization. Medical devices—quality management systems—requirements for regulatory purposes. ISO 13485:2016.
Foote, S. B. Managing the Medical Arms Race: Innovation and Public policy in the Medical Device Industry (Univ. California Press, 1992).
Yang, G. -Z. et al. Medical robotics—regulatory, ethical, and legal considerations for increasing levels of autonomy. Sci. Robot 2, eaam8638 (2017).
Andras, I. et al. Artificial intelligence and robotics: a combination that is changing the operating room. World J. Urol. 38, 2359–2366 (2020).
Bhandari, M., Zeffiro, T. & Reddiboina, M. Artificial intelligence and robotic surgery: current perspective and future directions. Curr. Opin. Urol. 30, 48–54 (2020).
Panesar, S. et al. Artificial intelligence and the future of surgical robotics. Ann. Surg. 270, 223–226 (2019).
Hung, A. J. et al. Development and validation of objective performance metrics for robot-assisted radical prostatectomy: a pilot study. J. Urol. 199, 296–304 (2018).
Ashley, L., Armitage, G., Neary, M. & Hollingsworth, G. A practical guide to failure mode and effects analysis in health care: making the most of the team and its meetings. Jt. Comm. J. Qual. Patient Saf. 36, 351–358 (2010).
Vasey, B. et al. Reporting guideline for the early stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. BMJ 377, e070904 (2022).
Schreyer, J. et al. RAS-NOTECHS: validity and reliability of a tool for measuring non-technical skills in robotic-assisted surgery settings. Surg. Endosc. 36, 1916–1926 (2022).
Raison, N. et al. Development and validation of a tool for non-technical skills evaluation in robotic surgery-the ICARS system. Surg. Endosc. 31, 5403–5410 (2017).
Catchpole, K. et al. Human factors integration in robotic surgery. Hum. Factors https://doi.org/10.1177/00187208211068946 (2022).
O’Sullivan, S. et al. Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. Int. J. Med. Robot 15, e1968 (2019).
Rogers, W. A., Hutchison, K. & McNair, A. Ethical issues across the IDEAL stages of surgical innovation. Ann. Surg. 269, 229–233 (2019).
Møller, L. et al. Identifying curriculum content for operating room nurses involved in robotic-assisted surgery: a Delphi study. Surg. Endosc. https://doi.org/10.1007/s00464-022-09751-4 (2022).
Lawrie, L. et al. Current issues and future considerations for the wider implementation of robotic-assisted surgery: a qualitative study. BMJ Open 12, e067427 (2022).
Lawrie, L. et al. Barriers and enablers to the effective implementation of robotic assisted surgery. PLoS ONE 17, e0273696 (2022).
Woudstra, K., Reuzel, R., Rovers, M. & Tummers, M. An overview of stakeholders, methods, topics, and challenges in participatory approaches used in the development of medical devices: a scoping review. Int. J. Health Policy Manag 12, 6839 (2022).
van der Wilt, G. J., Gerhardus, A. & Oortwijn, W. Toward integration in the context of health technology assessment: the need for evaluative frameworks. Int. J. Technol. Assess. Health Care 33, 586–590 (2017).
Health and Safety Executive. Review of Human Reliability Assessment Methods; https://www.hse.gov.uk/research/rrpdf/rr679.pdf (2009).
Boys, J. A. et al. Public perceptions on robotic surgery, hospitals with robots, and surgeons that use them. Surg. Endosc. 30, 1310–1316 (2016).
Johnson, J. & Rogers, W. Innovative surgery: the ethical challenges. J. Med. Ethics 38, 9–12 (2012).
Angelos, P. Ethics and surgical innovation: challenges to the professionalism of surgeons. Int. J. Surg. 11, S2–S5 (2013).
Hutchison, K., Rogers, W., Eyers, A. & Lotz, M. Getting clearer about surgical innovation: a new definition and a new tool to support responsible practice. Ann. Surg. 262, 949–954 (2015).
B, H., S, D., W, O., I, C. & D, S. Harmonization of ethics in health technology assessment: a revision of the Socratic approach. Int. J. Technol. Assess. Health Care 30, 3–9 (2014).
Partington, A. & Karnon, J. It’s not the model, it’s the way you use it: exploratory early health economics amid complexity comment on ‘problems and promises of health technologies: the role of early health economic modelling’. Int. J. Health Policy Manag. 10, 36–38 (2020).
Rizan, C. et al. The carbon footprint of surgical operations: a systematic review. Ann. Surg. 272, 986–995 (2020).
Sullivan, R. et al. Global cancer surgery: delivering safe, affordable, and timely cancer surgery. Lancet Oncol. 16, 1193–1224 (2015).
Grutters, J. P. C. et al. Problems and promises of health technologies: the role of early health economic modeling. Int. J. Health Policy Manag. 8, 575–582 (2019).
Bolton, W. S. et al. Disseminating technology in global surgery. Br. J. Surg. 106, e34–e43 (2019).
Payne, C. J. & Yang, G. -Z. Hand-held medical robots. Ann. Biomed. Eng. 42, 1594–1605 (2014).
Papadopoulou, A., Kumar, N. S., Vanhoestenberghe, A. & Francis, N. K. Environmental sustainability in robotic and laparoscopic surgery: systematic review. Br. J. Surg. 109, 921–932 (2022).
Micha, J. P., Rettenmaier, M. A., Bohart, R. D. & Goldstein, B. H. Robotic-assisted surgery for the treatment of breast and cervical cancers. JSLS 26, e2022.00014 (2022).
Ramirez, P. T. et al. Minimally invasive versus abdominal radical hysterectomy for cervical cancer. N. Engl. J. Med. 379, 1895–1904 (2018).
Nitecki, R. et al. Survival after minimally invasive vs open radical hysterectomy for early-stage cervical cancer: a systematic review and meta-analysis. JAMA Oncol. 6, 1019–1027 (2020).
US Food and Drug Administration. MAUDE - Manufacturer and User Facility Device Experience; https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm
Hu, Y. & Strong, V. E. Robotic surgery and oncologic outcomes. JAMA Oncol. 6, 1537–1539 (2020).
Yaxley, J. W. et al. Robot-assisted laparoscopic prostatectomy versus open radical retropubic prostatectomy: early outcomes from a randomised controlled phase 3 study. Lancet 388, 1057–1066 (2016).
Parekh, D. J. et al. Robot-assisted radical cystectomy versus open radical cystectomy in patients with bladder cancer (RAZOR): an open-label, randomised, phase 3, non-inferiority trial. Lancet 391, 2525–2536 (2018).
Feng, Q. et al. Robotic versus laparoscopic surgery for middle and low rectal cancer (REAL): short-term outcomes of a multicentre randomised controlled trial. Lancet Gastroenterol. Hepatol. 7, 991–1004 (2022).
Garfjeld Roberts, P. et al. Research quality and transparency, outcome measurement and evidence for safety and effectiveness in robot-assisted surgery: systematic review. BJS Open 4, 1084–1099 (2020).
Wolfenden, L. et al. Designing and undertaking randomised implementation trials: guide for researchers. BMJ 372, m3721 (2021).
Khan, D. Z. et al. Automated operative workflow analysis of endoscopic pituitary surgery using machine learning: development and preclinical evaluation (IDEAL stage 0). J. Neurosurg. 1–8 (2021).
van Amsterdam, B., Clarkson, M. J. & Stoyanov, D. Gesture recognition in robotic surgery: a review. IEEE Trans. Biomed. Eng. 68, 2021–2035 (2021).
Kiyasseh, D. et al. A vision transformer for decoding surgeon activity from surgical videos. Nat. Biomed. Eng. 7, 780–796 (2023).
Chen, J. et al. Use of automated performance metrics to measure surgeon performance during robotic vesicourethral anastomosis and methodical development of a training tutorial. J. Urol. 200, 895–902 (2018).
Páez, A. et al. Beyond the RCT: when are randomized trials unnecessary for new therapeutic devices, and what should we do instead? Ann. Surg. 275, 324–331 (2022).
Shouhed, D., Gewertz, B., Wiegmann, D. & Catchpole, K. Integrating human factors research and surgery: a review. Arch. Surg. 147, 1141–1146 (2012).
Ijy, W., Lj, K. & Jc, N. A systematic review of the true benefit of robotic surgery: ergonomics. Int. J. Med. Robot. 16, e2113 (2020).
Collins, J. W. & Wisz, P. Training in robotic surgery, replicating the airline industry. How far have we come? World J. Urol. 38, 1645–1651 (2020).
Jayne, D. et al. Effect of robotic-assisted vs conventional laparoscopic surgery on risk of conversion to open laparotomy among patients undergoing resection for rectal cancer: the ROLARR randomized clinical trial. JAMA 318, 1569–1580 (2017).
Johnson, B., Sorokin, I., Singla, N., Roehrborn, C. & Gahan, J. C. Determining the learning curve for robot-assisted simple prostatectomy in surgeons familiar with robotic surgery. J. Endourol. 32, 865–870 (2018).
Pernar, L. I. M. et al. An appraisal of the learning curve in robotic general surgery. Surg. Endosc. 31, 4583–4596 (2017).
Vilallonga, R. et al. The initial learning curve for robot-assisted sleeve gastrectomy: a surgeon’s experience while introducing the robotic technology in a bariatric surgery department. Minim. Invasive Surg. 2012, 347131 (2012).
Wijburg, C. J. et al. Learning curve analysis for intracorporeal robot-assisted radical cystectomy: results from the EAU Robotic urology section scientific working group. Eur. Urol. Open Sci. 39, 55–61 (2022).
Kirkpatrick, D. L. Techniques for evaluating training programs. Train. Dev. J. 33, 78–92 (1979).
Sridhar, A. N., Briggs, T. P., Kelly, J. D. & Nathan, S. Training in robotic surgery—an overview. Curr. Urol. Rep. 18, 58 (2017).
Skjold-Ødegaard, B. & Søreide, K. Competency-based surgical training and entrusted professional activities—perfect match or a Procrustean bed? Ann. Surg. 273, e173–e175 (2021).
Carpenter, B. T. & Sundaram, C. P. Training the next generation of surgeons in robotic surgery. Robot Surg. 4, 39–44 (2017).
Mark Knab, L. et al. Evolution of a novel robotic training curriculum in a complex general surgical oncology fellowship. Ann. Surg. Oncol. 25, 3445–3452 (2018).
Corrigan, N. et al. Exploring and adjusting for potential learning effects in ROLARR: a randomised controlled trial comparing robotic-assisted vs. standard laparoscopic surgery for rectal cancer resection. Trials 19, 339 (2018).
Torrent-Sellens, J., Jiménez-Zarco, A. I. & Saigí-Rubió, F. Do people trust in robot-assisted surgery? Evidence from Europe. Int J. Environ. Res. Public Health 18, 12519 (2021).
Buabbas, A. J., Aldousari, S. & Shehab, A. A. An exploratory study of public’ awareness about robotics-assisted surgery in Kuwait. BMC Med. Inform. Decis. Mak. 20, 140 (2020).
Rooshenas, L. et al. The QuinteT Recruitment Intervention supported five randomized trials to recruit to target: a mixed-methods evaluation. J. Clin. Epidemiol. 106, 108–120 (2019).
Salas, R. N., Maibach, E., Pencheon, D., Watts, N. & Frumkin, H. A pathway to net zero emissions for healthcare. BMJ 371, m3785 (2020).
Rasheed, F. N. et al. Decarbonising healthcare in low and middle income countries: potential pathways to net zero emissions. BMJ 375, n1284 (2021).
Meara, J. G. et al. Global Surgery 2030: evidence and solutions for achieving health, welfare, and economic development. Lancet 386, 569–624 (2015).
Garas, G. et al. Surgical innovation in the era of global surgery: a network analysis. Ann. Surg. 271, 868–874 (2020).
K, H., J, J. & D, C. Justice and surgical innovation: the case of Robotic prostatectomy. Bioethics 30, 536–546 (2016).
Caro, J. J., Briggs, A. H., Siebert, U. & Kuntz, K. M. Modeling good research practices—overview: a report of the ISPOR-SMDM modeling good research practices task force-1. Value Health 15, 796–803 (2012).
Ramsey, S. D. et al. Cost-effectiveness analysis alongside clinical trials II—an ISPOR good research practices task force report. Value Health 18, 161–172 (2015).
Husereau, D. et al. Consolidated Health Economic Evaluation Reporting Standards 2022 (CHEERS 2022) statement: updated reporting guidance for health economic evaluations. BMC Med. 20, 23 (2022).
Dreyer, N. A. Strengthening evidence-based medicine with real-world evidence. Lancet Healthy Longev. 3, e641–e642 (2022).
Kramer, D. B., Xu, S. & Kesselheim, A. S. How does medical device regulation perform in the United States and the European Union? A systematic review. PLoS Med. 9, e1001276 (2012).
Cooper, M. A., Ibrahim, A., Lyu, H. & Makary, M. A. Underreporting of robotic surgery complications. J. Healthc. Qual. 37, 133–138 (2015).
Rajan, P. V., Kramer, D. B. & Kesselheim, A. S. Medical device postapproval safety monitoring: where does the United States stand? Circ. Cardiovasc. Qual. Outcomes 8, 124–131 (2015).
Cipriani, A. et al. Generating comparative evidence on new drugs and devices after approval. Lancet 395, 998–1010 (2020).
Huot, L., Decullier, E., Maes-Beny, K. & Chapuis, F. R. Medical device assessment: scientific evidence examined by the French national agency for health—a descriptive study. BMC Public Health 12, 585 (2012).
Sedrakyan, A. et al. Advancing the real-world evidence for medical devices through coordinated registry networks. BMJ Surg. Inter. Health Technol. 4, e000123 (2022).
Ficuciello, F., Tamburrini, G., Arezzo, A., Villani, L. & Siciliano, B. Autonomy in surgical robots and its meaningful human control: Paladyn. J. Behav. Robot. 10, 30–43 (2019).
Bilbro, N. A. et al. The IDEAL reporting guidelines: a Delphi consensus statement stage specific recommendations for reporting the evaluation of surgical innovation. Ann. Surg. 273, 82–85 (2021).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
van Workum, F. et al. Learning curve and associated morbidity of minimally invasive esophagectomy: a retrospective multicenter study. Ann. Surg. 269, 88–94 (2019).
Oshikiri, T. et al. Short-term outcomes and one surgeon’s learning curve for thoracoscopic esophagectomy performed with the patient in the prone position. Surg. Today 47, 313–319 (2017).
Zeuschner, P. et al. Three different learning curves have an independent impact on perioperative outcomes after robotic partial nephrectomy: a comparative analysis. Ann. Surg. Oncol. 28, 1254–1261 (2021).
Le Morvan, P. & Stock, B. Medical learning curves and the Kantian ideal. J. Med. Ethics 31, 513–518 (2005).
Dixon, F. & Keeler, B. Robotic surgery: training, competence assessment and credentialing. Bulletin 102, 302–306 (2020).
Chen, R. et al. A comprehensive review of robotic surgery curriculum and training for residents, fellows, and postgraduate surgical education. Surg. Endosc. 34, 361–367 (2020).
Jamjoom, A. A. B. et al. Autonomous surgical robotic systems and the liability dilemma. Front. Surg. 9, 1015367 (2022).
van Wynsberghe, A. in Robotics, AI and Humanity: Science, Ethics and Policy (eds. J. von Braun et al.) 239–249 (Springer International Publishing, 2021).
Catchpole, K. et al. Safety, efficiency and learning curves in robotic surgery: a human factors analysis. Surg. Endosc. 30, 3749–3761 (2016).
Poulsen, J. L., Bruun, B., Oestergaard, D. & Spanager, L. Factors affecting workflow in robot-assisted surgery: a scoping review. Surg. Endosc. 36, 8713–8725 (2022).
van Dalen, A. S. H. M. et al. Analyzing and discussing human factors affecting surgical patient safety using innovative technology: creating a safer operating culture. J. Patient Saf. 18, 617–623 (2022).
Messick, S. Validity. in (ed. R. L. Linn) Educational Measurement 3rd ed. pp. 13–104 (American Council on education and Macmillan, 1989).
Bai, F. et al. More work is needed on cost-utility analyses of robotic-assisted surgery. J. Evid. Based Med. 15, 77–96 (2022).
Schneider, M. A. et al. Inequalities in access to minimally invasive general surgery: a comprehensive nationwide analysis across 20 years. Surg. Endosc. 35, 6227–6243 (2021).
Drew, J., Christie, S. D., Rainham, D. & Rizan, C. HealthcareLCA: an open-access living database of health-care environmental impact assessments. Lancet Planet. Health 6, e1000–e1012 (2022).
Roodbeen, S. X. et al. Evolution of transanal total mesorectal excision according to the IDEAL framework. BMJ Surg. Interv. Health Technol. 1, e000004 (2019).
Morrisey, Z. S. et al. Transition to robotic total knee arthroplasty with kinematic alignment is associated with a short learning curve and similar acute-period functional recoveries. Cureus 15, e38872 (2023).
Kelkar, D. S., Kurlekar, U., Stevens, L., Wagholikar, G. D. & Slack, M. An early prospective clinical study to evaluate the safety and performance of the versius surgical system in robot-assisted cholecystectomy. Ann. Surg. 277, 9–17 (2023).
Bell, S. W. et al. Improved accuracy of component positioning with robotic-assisted unicompartmental knee arthroplasty: data from a prospective, randomized controlled study. J. Bone Joint Surg. Am. 98, 627–635 (2016).
Acknowledgements
We thank all colloquium panel members and experts who took part in the semistructured meetings to devise this framework. This work was supported by the IDEAL Collaboration.
Author information
Authors and Affiliations
Consortia
Contributions
H.J.M., P.M. and P.T.R. were responsible for the conceptualization of the paper and coordinated the work with the support of D.Z.K., H.L.H., J.G.H. and S.C.W. All authors contributed to the original draft, as well as reviewing and editing the manuscript. All authors reviewed and approved the submitted version of this manuscript.
Corresponding authors
Ethics declarations
Competing interests
H.J.M. is supported by grants from the Wellcome (203145Z/16/Z) EPSRC (NS/A000050/1) Centre for Interventional and Surgical Sciences, National Institute for Health Research (NIHR) Biomedical Research Centre at University College London and the National Brain Appeal. H.J.M. also declares stocks in Panda Surgical, and has previously received consulting fees from Intuitive Ventures. D.Z.K. is supported by an NIHR Academic Clinical Fellowship and a Cancer Research UK Predoctoral Fellowship. J.G.H. is supported by an NIHR Academic Clinical Fellowship. D.J.B. holds a Royal College of Surgeons chair supported by the Rosetrees trust and declares institutional research grant funding from the NIHR. K.C. is financially supported by the Agency for Healthcare Research and Quality (grant no. R01 HS26491-01). A.C. is supported by NIHR funding from NIHR Health Technology Assessment, PGfAR, PHR, RfPB and i4i programs. K.H. is supported by a Discovery Project grant from the Australian Research Council (grant no. DP200100883) and an Australian Research Council Discovery Early Career Researcher Award (grant no. DE200101301). T.M. has acted as an unpaid advisory board member of Pumpinheart, and previously as senior medical officer in medical devices at the Health Products Regulatory Authority, Ireland, and previous co-chair of the Clinical Investigation and Evaluation Working Group of the European Commission. D. Stoyanov is supported by grants from the Wellcome Trust, UKRI, European Commission and RAEng and employment from Digital Surgery (Medtronic) and declares stocks in Odin Vision (Olympus) and Panda Surgical. M.R. is supported by grants from Siemens Healthineers and the Healthcare Institute in the Netherlands. P.D. is supported by grants from the UKRI, The Urology Foundation and the Recordati Foundation and declares consulting fees from Proximie, MysteryVibe and Jiva.AI. D.N. is the Chief Technology Officer of Moon Surgical. D. Stocken is supported by grants from NIHR, CRUK, YCR and BHF. D. Stocken declares traveling funding from RCSEng and AntiCancer Fund Belgium for conference attendance. G.S. is supported by a grant from the European Union Framework Programme (Horizon 2020). B.V. is supported by the Lord Florey scholarship (PhD scholarship) from the Berrow Foundation. P.M. receives grants from Medtronic and CMR (unrestricted educational grants to Oxford University for the IDEAL Collaboration) and the Oxford Biomedical Research Centre. J.C. is an associate medical director at CMR Surgical and receives a part-time salary from them.
Peer review
Peer review information
Nature Medicine thanks Mani Menon, Xiaolong Liu, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Karen O’Leary, in collaboration with the Nature Medicine team.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Marcus, H.J., Ramirez, P.T., Khan, D.Z. et al. The IDEAL framework for surgical robotics: development, comparative evaluation and long-term monitoring. Nat Med 30, 61–75 (2024). https://doi.org/10.1038/s41591-023-02732-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-023-02732-7
- Springer Nature America, Inc.
This article is cited by
-
Artificial intelligence in surgery
Nature Medicine (2024)
-
The evolution of robotics: research and application progress of dental implant robotic systems
International Journal of Oral Science (2024)
-
Robotic assisted versus laparoscopic surgery for deep endometriosis: a meta-analysis of current evidence
Journal of Robotic Surgery (2024)