Surgery, especially minimally invasive surgery, requires specific psychomotor skills like visuospatial perception, fine motor skills, and bimanual coordination. Furthermore, surgeons need knowledge about different procedures, anatomy, physiology, and cognitive skills for surgical planning, error prevention and recognition to deal with perioperative complications [1].

Additionally, surgeons must possess non-technical skills for clinical decisionmaking, communication, and interpersonal skills when treating and communicating with patients and other healthcare professionals.

With the introduction of more technically advanced procedures such as laparoscopy and robotic-assisted surgery (RAS) and more work-time restrictions, methods for assessing surgical competency are needed to ensure surgeons possess the necessary skills and improve patient safety [2,3,4]. Neuroimaging can provide valuable insights into neural changes associated with motor learning [5], and it has been proposed that neuroimaging can be a helpful tool to advance the understanding of the cognitive processes needed to acquire a surgical skill. Furthermore, it has been suggested that neuroimaging can aid in identifying and providing a deeper understanding of possible differences between novice and expert surgeons regarding non-technical skills, such as decisionmaking or situational awareness, among others [6].

It has been suggested that in some types of operations, such as RAS, cognitive assessment of surgeons may aid in defining the levels of expertise when performing complex surgical tasks and could be an adjunct to the traditional ways of assessing surgical competency [2] such as procedural logs, written and oral examinations and objective assessments such as task-specific checklists, global rating scales, and simulator metrics. These traditional measures have been criticized for offering only insight into some aspects of surgical competency [3, 7,8,9].

Simulation-based training provides an alternative to the traditional learning method in the operating theatre as it allows trainees to train without direct consequences to the patient. Additionally, simulation training, such as virtual reality simulation training, can provide immediate feedback and monitor and assess the progression of the trainees’ surgical skills limited to temporal, motion-based, and outcome-based measurements [10,11,12]. These types of assessments are useful but not representative of all underlying aspects of a “skilled performance”. Features that distinguish an expert surgeon from a novice surgeon have not yet been fully identified, and the identification of reliable measures of surgical expertise and its development have yet to be fully understood [13].

In this systematic review, we aimed to examine whether neuroimaging could be used to assess changes in cortical activation when surgeons use technical and non-technical surgical skills. We wanted to identify brain areas of interest and evaluate possible uses of neuroimaging, e.g., for assessing competency or the effect of training and potential gaps in the current knowledge.

Materials and methods

Eligibility criteria

We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [14] in completing this systematic review. The protocol was registered in PROSPERO (Record ID: 293025).

We included studies examining surgical technical and non-technical skills while using a brain imaging technique to investigate directly or indirectly brain function associated with surgical performance. Technical skills were surgical procedural skills, e.g., laparoscopic instrument handling or knot-tying and suturing, and non-technical skills were cognitive skills, e.g., situational awareness or decisionmaking.

The search concluded on July 1st, 2023. There were no restrictions based on the publication date of publications released before the conclusion. Randomized controlled trials (RCTs), case–control studies, and observational cohort or cross-sectional studies were included. Studies were excluded if they were not available in English or full text.

Search strategy

A systematic search of MEDLINE/PubMed, EMBASE and Web of Science was conducted by reformulating the research questions into a searchable query using the PICO principles and Boolean operators. A research librarian at the University of Copenhagen assisted in the creation of the search string. The search string was tailored to PubMed and modified to work in the other databases.

First, we searched using only MeSH terms in PubMed, followed by a MeSH-term and free-text search (Table 1). In EMBASE, a similar search strategy was conducted, while the search in Web of Science only consisted of a free-text search. Lastly, we manually searched for additional studies in the references of included studies and studies citing the included studies.

Table 1 Search string used in PubMed

Study selection

The relevant studies were added to Covidence (www.covidence.org, Melbourne, Australia) and duplicates were removed. First, two reviewers (AGA, ACR) independently assessed the title and abstract of articles for eligibility based on inclusion criteria. After the initial screening, a secondary screening was conducted. In the secondary screening, the full text was read, and data were extracted by the same two reviewers (AGA, ACR). If any disagreement occurred during this process between the two reviewers, the final decision was made by a third reviewer (FB).

Data extraction, synthesis, and analysis

Both reviewers extracted the following data: general information (authors, publication year, title), aim, study design, brain imaging modality (fMRI, fNIRS, EEG etc.), number of participants, participants’ characteristics/surgical experience (if mentioned), inclusion criteria, exclusion criteria, type of technical or non-technical surgical skill, environment (if mentioned), primary outcomes, brain activation patterns and brain activation localizations.

Studies were categorized by the type of brain imaging used.

Quality assessment was done by AGA and ACR independently. Methodological quality was assessed using the MERSQI and NOS-E checklists [15], as a standardized quality assessment for studies using neuroimaging in medical education has not been developed. The checklist was not used to determine whether a study was to be included or not, as the studies were very heterogeneous, and the scores varied depending on the study design itself.

We used Microsoft Excel for Mac 2022 (Microsoft Corp, Redmond, WA) to manage the extracted data. We did not conduct a meta-analysis due to the heterogeneity of the included studies but performed a qualitative data synthesis.

Results

Study selection and characteristics

A total of 5406 studies were imported for screening in Covidence. Of these, 5273 came from the database search, and 133 came from studies citing included studies and reference cross-searching. We removed 75 duplicate references. 5260 studies were excluded by the title and abstract screening due to either not using neuroimaging as a skill assessment or not assessing surgical or non-surgical skills. We evaluated 71 studies for full text assessability, and 38 studies were included in the final synthesis. See Fig. 1 for the PRISMA diagram and exclusion reasons.

Fig. 1
figure 1

PRISMA 2020 flow diagram for new systematic reviews

Neuroimaging modalities utilized in the included articles were EEG (n = 14), fNIRS (n = 16), fMRI (n = 6), and PET-CT (n = 1), see Tables 2, 3, and 4. Of the 16 articles using fNIRS, two by Leff et al. Used the same population but with different outcome measures, and one used a study group b for comparison. The study population was only counted once. One article by Walia et al. Incorporated both EEG and fNIRS.

Table 2 Main findings of included EEG articles
Table 3 Main findings of included fNIRS articles
Table 4 Main findings of included fMRI and PET-CT articles

A total of 782 participants were described in the studies, and Table 5 shows the participant characteristics and the skills examined.

Table 5 Study, participant, and skill characteristics

NOS-E scores ranged from 1 to 6 with a mean NOS-E of 3.9 (SD = 1.7), see Table 6, while MERSQI scores ranged from 10 to 14 with a mean score of 12 (SD = 1.1), see Table 7. Inter-rater reliability showed a strong agreement with a Cohens κ = 0.80 [16].

Table 6 NOS-E
Table 7 MERSQI

Technical skill assessment

Neuroimaging has the possibility of being an objective assessment tool, with a range of different studies finding the same areas of interest and further exploring models such as neural networks explicitly for assessing surgical skills. Twenty-eight studies investigated neuroimaging for technical skill assessment. In all modalities, EEG, fNIRS, fMRI and PET-CT, the Prefrontal Cortex (PFC) [4, 17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33], Supplementary Motor Area (SMA) [4, 19, 20, 23,24,25, 30, 33,34,35,36,37], and the Primary Motor Area (M1) [4, 19, 20, 23, 25, 27, 34,35,36, 38, 39] were the predominantly investigated areas of interest. Some studies investigated a broader part of the brain, such as the frontal cortex, which includes the PFC, SMA, and M1 [36, 40], while EEG studies tended to focus on frequency over specific locations [2, 32, 34, 41,42,43] and fMRI tended to distinguish the areas in greater detail [24,25,26,27,28, 39].

Thiretten studies across neuroimaging modalities compared novices with experts [2, 19,20,21, 31, 33, 35, 37, 39, 40, 44,45,46], with 8 studies found significant higher or increased activation in the prefrontal cortex in novices [19,20,21, 31, 33, 40, 44, 45], one study found non-significant increase in activation in novices [2] and another study found higher activation across the frontal lobe [46].

Non-technical skill assessment

Sixteen studies investigated the use of neuroimaging for the assessment of non-technical skills. EEG was the most used neuroimaging modality in non-technical skill assessment, identifying engagement [2, 47], stress [31, 46], distractibility [41, 42], and identifying surgical complexity [2, 31, 32, 36, 43, 48]. One study showed a significant difference between audio and audio-visual conditions in utilizing auditory display when training image-guided neurosurgery, where EEG could be viewed as a marker for greater cognitive load [49].

Other neuroimaging modalities also investigated non-technical performance. Decisionmaking in surgeons and trainees was examined using fNIRS, which showed that ventromedial activation trends were observed solely among novices and not in experts [50], and the same modality was used to establish correlations between the subjective experience of cognitive load and perceived workload and prefrontal activation [44, 51]. This correlation was further used to investigate the impact of time pressure on prefrontal activation and technical performance [22]. The effect of the operative platform and the complexity of surgery was investigated for surgeons using fNIRS, finding a greater prefrontal change in oxy-Hb in robotic-assisted compared to conventional laparoscopic surgery [30] and EEG with a significant increase in inter-hemispheric coherence in the range of beta activity (and in upper alpha band but less robust), for surgeons using the robotic device compared both with resting condition and laparoscopic modality [36]. Likewise, both primary surgeons and assistants had significantly higher prefrontal activity in the beta band performing more complex surgical procedures, in this case, multiport laparoscopic surgery versus laparo-endoscopic single-site surgery. However, within the same surgical procedure, beta activity did not differentiate between surgical roles.

Discussion

In this systematic review, we found that the use of neuroimaging for surgical skill assessment is quickly evolving. As seen in Fig. 2, there has been a surge in publications after 2018. The studies are characterized by different methods, finding the same brain areas of interest and patterns in differentiating novices from experts. Neuroimaging is a promising area of interest as it has the possibility of in the future being used for objectively assessing surgical competence [2, 4, 19, 20, 23, 29, 32, 34, 35, 39, 41, 42, 42, 45], investigate the difference in brain activation patterns in novices and experts [2, 19,20,21, 31, 33, 35, 39, 40, 44,45,46, 50], identify the early learning phases and help individualize training programs [17,18,19, 21, 23, 24, 26,27,28, 34, 38, 39, 43, 52, 53], and explore non-technical aspects of surgical education such as cognitive load [22, 30, 36, 44, 47,48,49, 51, 53], intraoperative stress [31, 46], distractability [41, 42], and decision making [50].

Fig. 2
figure 2

Number of articles published by year

Brain areas of interest in assessment

Studies investigating brain areas involved in investigating surgical skills used different methods and subsequently the distinction between the functional areas was poorly defined across all studies. However certain areas of interest were predominantly investigated and found important for surgeons and surgeons-to-be (Fig. 3). Across studies these were the PFC, which is responsible for consolidation of motor learning, motor planning and decision making, the SMA, which is responsible for controlling movement functions, the M1 which are responsible for motor function functions and SSA and VA which are responsible for somatosensory input and coordination and visual input and sensory integration.

Fig. 3
figure 3

Brain regions investigated and findings with with references. M1 Primary Motor Cortex, S1 Primary Sensory Cortex, SMA Supplemental Motor Cortex, PFC Prefrontal Cortex, PMC Premotor Cortex

Studies using fNIRS examined brain activity by measuring hemodynamical cortical activity, measuring oxy- and deoxy hemoglobin levels. Some studies that investigated hemodynamic patterns in novices compared with experts before and after training showed greater activation in PFC for novices at the beginning of the surgical training process compared to later [17,18,19] or compared to experts [19,20,21, 44, 45] and lower activation in the primary motor areas and SMA [19, 20]. Others found increased oxy-Hb levels in the frontal cortex after training in novices with no endoscopic surgical experience, with one study showing no change in surgical experts after training [40], while two other studies showed that senior residents demonstrated a significantly greater change in oxy-Hb in the right dorsomedial PFC during self-paced conditions [22] and in the bilateral PFC in a time-paced condition [51] compared to junior or intermediate residents.

Performance time and left and middle PFC and left lateral M1 brain region activity were significantly correlated, while performance error showed a significant negative correlation to middle PFC and SMA brain region activation measured with fNIRS.

EEG monitoring showed consistent recruitment of frontal, motor, and prefrontal areas during laparoscopic skills training. There was a significant correlation between performance level and motor-cognitive integration while activity was in the beta frequency band [34]. In concordance with the fNIRS studies, one study showed a significant decrease of theta band activity in the frontal area across training sessions and a significant increase in the alpha band over parietal regions [38].

Functional MRI was able to describe brain areas of interest in detail and activation involved in the training of technical skills [24,25,26,27,28, 39], with especially the left PFC, the SMA, the primary somatosensory area (SSA), and the occipital visual areas (VA) of interest. Novice surgeons, in general had a higher activation in the left precentral gyrus and insula and the right praecuneus and inferior occipital gyrus [39], the Brodmann Area of the SMA [25], and left ventral premotor cortex [26, 27]. Left PFC showed the biggest contribution to the motor skill level in a convolutional network model for predicting surgical performance level, and removing any prefrontal regions led to reduced accuracy [29]. During robotic suturing tasks, greater prefrontal activation was identified compared with laparoscopic suturing [30].

Duty et al. [35] performed the only study comparing novices and experts using PET-CT. Novice subjects had significantly increased blood flow (with deactivation in the expert group) in the left precentral gyrus and insula and the right precuneus and inferior occipital gyrus, while the experts had deactivation in the same areas.

There was a greater variation in activation in students than in experts, who showed lesser fluctuations in cortical activity [21, 52].

When researching non-technical skills, the PFC again emerged as the most frequently identified brain area, with a higher activation correlating with higher cognitive load [30, 31, 44, 46,47,48,49, 53]. However, in two studies, a diminished PFC activation correlated with higher perceived workload [51] and temporal stress [22] using fNIRS. With EEG being the most frequently used modality for assessing non-surgical competencies, some studies even used activation in PFC to measure cognitive load, comparing activity in the different wavelength bands over the prefrontal cortex as a substitute for cognitive load, intraoperative stress, and distraction. The reported findings correlated with both self-observed workload and performance [36, 42, 46, 48, 53]. In one study NASA-TLX did not always correlate with performance [53], EEG showed great promise as an objective and verifiable measurement of workload.

Using neuroimaging for the assessment of technical skills

Six studies investigated the feasibility of using neuroimaging for surgical skill assessment, with two investigating the validity evidence of novel classification systems.

Gao et al. [29] investigated a Convolutional Neural Network based on fNIRS, which, with over 550–600 observations could predict pass/fail scores in a pattern cutting task with an AUC of 0.91. Nemani et al. [4] found that fNIRS could successfully classify untrained subjects from physical or virtual simulator trained subjects with misclassification errors of 2.7% and 9.1% compared to traditional metrics that had misclassification errors ranging from 20 to 41%. They also found that functional connectivity changes based on WCO and WPCO metrics corresponded to the surgical motor skill proficiency.

One study by Guru et al. [2] using both EEG and fNIRS showed that the time spent in each activity state was significantly affected by the participant’s skill level while also showing a significant difference in HbO signal over the left PFC and SMA in novices and experts. This method was able to both investigate activated brain areas and focus attenuation, concluding that novices had a widespread stimulus-driven hemodynamic activation without the same focusing effect seen in experts [37].

One fMRI study used a novel assessment method in identifying experience level by using blood oxygen level-dependent signal changes [29], and EEG was also used to investigate experience level [2, 31, 46].

As the PFC and SMA emerged as the most researched areas of interest, these were also investigated for assessing skills. The methods mostly consisted of either measuring changes, neural networks or measuring activity states. 8 studies [19,20,21, 31, 33, 40, 44, 45] found significantly lower activation or a decrease in activation in experts compared to novices in the PFC, and 2 studies found a significant decrease in PFC activation after training compared with untrained controls [4, 18]. The PFC is important in the cognitive recognition and analysis of motor planning, “thinking about doing”, while the PMC and SMA are necessary for coordinating bimanual tasks [54]. Experts don’t require as much cognitive planning for movements because they have performed them hundreds of times.

Limitations

Most studies were observational or non-randomized, with only six randomized controlled studies, five randomized cross-over studies, and two validity investigation studies. The studies were very heterogeneous in design, control, quality, and surgical skill assessment tools, making it difficult to compare them.

Many studies had small sample sizes with limited statistical power. They were conducted with specific surgical populations such as urology residents or orthopedic surgeons etc. in realistic operation conditions. In contrast, others were conducted with medical students or unspecified untrained individuals in laboratories or scanners.

Most studies did not assess the long-term impact of training on the brain nor the long-term impact of different assessment methods on surgical outcomes, instead focusing on comparing novices and experts or pre- and posttest investigations.

There is a risk of reporting bias as studies with positive results are more likely to be published, and unpublished negative results could skew the consensus towards neuroimaging being more useful.

Future perspectives

The mapping of brain areas during training has great potential in directing surgical training or even assisting in this process, as shown in a study by Galvin et al. using transcranial direct-current stimulation during the acquisition of laparoscopic surgical skills [55].

Increasing our understanding of functional brain networks in learning can aid in developing individualized training programs, as suggested by Shaifei et al., who used specific neural activity measured with EEG and eye gaze tracking to predict performance and learning rate for various laparoscopic surgical tasks [56].

Armstrong et al. [57] did a pilot study that, building on our knowledge of the correlation between neural activity and future motor errors, showed that EEG can predict errors intraoperatively during surgery. The pilot study found that specific neural signatures predicted technical errors in laparoscopic surgery.

In this rapidly evolving field, this knowledge can further other research areas, such as the use of artificial intelligence [58] or have implications for intraoperative feedback [57], for example, by incorporating neuroimaging in a machine learning model to provide objective skill classification models and feedback in surgical training [59].

While there is potential in using neuroimaging, only a few studies touch upon the cost of introducing neuroimaging in surgical skill training. With the introduction of simulation into surgical skill training, studies sought to highlight the cost-effectiveness and the need to regularly assess the feasibility but also found that the implementation cost can be a substantial investment [60, 61].

Future research in this area should address the abovementioned limitations and focus on developing more standardized and reliable neuroimaging-based assessments of surgical skills both in the learning process and the implications for intraoperative real-time feedback. Research is also needed to investigate the long-term impacts of training on the brain.

Conclusion

Using neuroimaging to assess surgical skills is a quickly evolving research area with the potential for evaluating technical and non-technical surgical competencies. Different neuroimaging modalities have shown similar patterns in brain activations patterns, with the prefrontal cortex, supplementary motor area, and primary motor area being the most frequently investigated areas. There is not yet agreement on which areas and activation patterns are relevant when assessing surgical skills, but the prefrontal cortex shows the greatest potential in novice-expert comparisons and in investigating cognitive load and performance and has significant activation patterns when comparing novices with experts and untrained with trained subjects. Therefore, we recommend that research focus on the PFC when using neuroimaging to assess surgical skills.

Overall, this systematic review highlights the potential of neuroimaging in surgical skill assessment, but further research is needed to establish whether it can be a useful assessment tool in surgical education.