SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation

Wang, An; Islam, Mobarakol; Xu, Mengya; Zhang, Yang; Ren, Hongliang

doi:10.1007/978-3-031-47401-9_23

An Wang³¹,
Mobarakol Islam³²,
Mengya Xu³³,
Yang Zhang³⁴ &
…
Hongliang Ren^31,33

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14393))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

1406 Accesses
6 Citations

Abstract

The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation and demonstrates remarkable generalization capabilities across a wide range of downstream scenarios. In this empirical study, we examine SAM’s robustness and zero-shot generalizability in the field of robotic surgery. We comprehensively explore different scenarios, including prompted and unprompted situations, bounding box and points-based prompt approaches, as well as the ability to generalize under corruptions and perturbations at five severity levels. Additionally, we compare the performance of SAM with state-of-the-art supervised models. We conduct all the experiments with two well-known robotic instrument segmentation datasets from MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict certain parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as wrong classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, SAM struggles to identify instruments in complex surgical scenarios characterized by the presence of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. We also attempt to fine-tune SAM using Low-rank Adaptation (LoRA) and propose SurgicalSAM, which shows the capability in class-wise mask prediction without prompt. Therefore, we can argue that, without further domain-specific fine-tuning, SAM is not ready for downstream surgical tasks.

A. Wang and M. Islam—Co-first authors.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Unsupervised Surgical Instrument Segmentation via Anchor Generation and Semantic Diffusion

Scalable Joint Detection and Segmentation of Surgical Instruments with Weak Supervision

Rethinking Surgical Instrument Segmentation: A Background Image Can Be All You Need

1 Introduction

Segmenting surgical instruments and tissue poses a significant challenge in robotic surgery, as it plays a vital role in instrument tracking and position estimation within surgical scenes. Nonetheless, current deep learning models often have limited generalization capacity as they are tailored to specific surgical sites. Consequently, it is crucial to develop generalist models that can effectively adapt to various surgical scenes and segmentation objectives to advance the field of robotic surgery [18]. Recently, segmentation foundation models have made great progress in the field of natural image segmentation. The segment anything model (SAM) [14], which has been trained on more than one billion masks, exhibits remarkable proficiency in generating precise object masks using various prompts such as bounding boxes and points. SAM stands as the pioneering and most renowned foundation model for segmentation. Whereas, several works have revealed that SAM can fail on common medical image segmentation tasks [4, 6, 8, 16]. This is not surprising or unexpected since SAM’s training dataset primarily comprises natural image datasets. Consequently, it raises the question of enhancing SAM’s strong feature extraction capability for medical image tasks. Med SAM Adapter [22] utilizes medical-specific domain knowledge to improve the segmentation model through a simple yet effective adaptation technique. SAMed [23] has applied a low-rank-based finetuning strategy to the SAM image encoder, as well as prompt encoder and mask decoder on the medical image segmentation dataset.

However, evaluating the performance of SAM in the context of surgical scenes remains an insufficiently explored area that has the potential for further investigation. This study uses two publicly available robotic surgery datasets to assess SAM’s generalizability under different settings, such as bounding box and point-prompted. Moreover, we have examined the possibility of fine-tuning SAM through Low-rank Adaptation (LoRA) to examine its capability to predict masks for different classes without prompts. Additionally, we have analyzed SAM’s robustness by assessing its performance on synthetic surgery datasets, which contain various levels of corruption and perturbations.

2 Experimental Settings

Datasets. We have employed two classical datasets in endoscopic surgical instrument segmentation, i.e., EndoVis17 [2] and EndoVis18 [1]. For the EndoVis17 dataset, unlike previous works [5, 13, 20] which conduct 4-fold cross-validation for training and testing on the 8 $\times $ 225-frame released training data, we report SAM’s performance directly on all eight sequences (1–8). For the EndoVis18 dataset, we follow the dataset split in ISINet [5], where sequences 2, 5, 9, and 15 are utilized for evaluation.

Prompts. The original EndoVis datasets [1, 2] do not have bounding boxes or point annotations. We have labeled the datasets with bounding boxes for each instrument, associated with corresponding class information. Additionally, regarding the single-point prompt, we obtain the center of each instrument mask by simply computing the moments of the mask contour. Since SAM [14] only predicts binary segmentation masks, for instrument-wise segmentation, the output instrument labels are assigned inherited from the input prompts.

Metrics. The IoU and Dice metrics from the EndoVis17 [2] challenge^{Footnote 1} is used. Specifically, only the classes presented in a frame are considered in the calculation for instrument segmentation.

Comparison Methods. We have involved several classical and recent methods, including the vanilla UNet [17], TernausNet [20], MF-TAPNet [13], Islam et al. [10], Wang et al. [21], ST-MTL [11], S-MTL [19], AP-MTL [12], ISINet [5], TraSeTR [24], and S3Net [3] for surgical binary and instrument-wise segmentation. The ViT-H-based SAM [14] is employed in all our investigations except for the finetuning experiments. Note that we cannot provide an absolutely fair comparison because existing methods do not need prompts during inference.

Table 1. Quantitative comparison of binary and instrument segmentation on EndoVis17 and EndoVis18 datasets. The best and runner-up results are shown in bold and underlined.

Full size table

3 Surgical Instruments Segmentation with Prompts

Implementation. With bounding boxes and single points as prompts, we input the images to SAM [14] to get the predicted binary masks for the target objects. Because SAM [14] can not provide consistent categorical information. We compromise to use the class information from the bounding boxes directly. In this way, we derive instrument-wise segmentation while bypassing the possible errors from misclassifications, an essential factor affecting instrument-wise segmentation accuracy.

Results and Analysis. As shown in Table 1, with bounding boxes as prompts, SAM [14] outperforms previous unprompted supervised methods in binary and instrument-wise segmentation on both datasets. However, with single points as prompts, SAM [14] degrades a lot in performance, indicating its limited ability to segment surgical instruments from weak prompts. This reveals the performance of the SAM closely relies on prompt quality. For complicated surgical scenes, SAM [14] still struggles to produce accurate segmentation results, as shown in columns (a) to (l) of Fig. 1. Typical challenges, including shadows (a), motion blur (d), occlusion (b, g, h), light reflection (c), insufficient light (j, l), over brightness (e), ambiguous suturing thread (f), instrument wrist (i), and irregular instrument pose (k), all lead to unsatisfied segmentation performance.

Table 2. Quantitative results on various corrupted EndoVis18 validation data.

Full size table

4 Robustness Under Data Corruption

Implementation. Referring to the robustness evaluation benchmark [7], we have evaluated SAM [14] under 18 types of data corruptions at 5 severity levels following the official implementations^{Footnote 2} with box prompts. Note that the Elastic Transformation has been omitted to avoid inconsistency between the input image and associated masks. The adopted data corruption can be allocated into four distinct categories of Noise, Blue, Weather, and Digital.

Results and Analysis. The severity of data corruption is directly proportional to the degree of performance degradation in SAM [14], as depicted in Table 2. The robustness of SAM [14] may be influenced differently depending on the nature of the corruption present. However, in most scenarios, SAM’s performance diminishes significantly. Notably, JPEG Compression and Gaussian Noise have the greatest impact on segmentation performance, whereas Brightness has a negligible effect. Figure 2 presents one exemplar frame in its original state alongside various corrupted versions at a severity level of 5. We can observe that SAM [14] suffers significant performance degradation in most cases.

5 Automatic Surgical Scene Segmentation

Implementation. Without prompts, SAM [14] can also facilitate automatic mask generation (AMG) for the entire image. For naive investigation of the automatic surgical scene segmentation results, we use the default parameters from the official implementation^{Footnote 3} without further tuning. The colors of each segmented mask are randomly assigned because SAM [14] only generates binary masks for each object.

Results and Analysis. As shown in Fig. 3, in surgical scene segmentation of EndoVis18 [1] data, SAM [14] can produce promising results on simple scenes like columns (a) and (f). But it encounters difficulties when applied to more complicated scenes, as it struggles to differentiate between the entirety of instrument articulating parts accurately and to identify discrete tissue structures as interconnected units. As a foundation model, SAM [14] still lacks comprehensive awareness of objects’ semantics, especially in downstream domains like surgical scenes.

6 Parameter-Efficient Finetuning with Low-Rank Adaptation

With the rapid emergence of foundational and large AI models, utilizing the pretrained models effectively and efficiently for downstream tasks has attracted increasing research interest. Although SAM [14] has shown decent segmentation performance with prompts and can cluster objects in surgical scenes, we seek to finetune and adapt it to make it capable of traditional unprompted multi-class segmentation pipeline - take one image as input only, and predict its segmentation mask with categorical labels.

Implementation. To efficiently finetune SAM [14] and enable it to support multi-class segmentation without relying on prompts, we consider utilizing the strategy of Low-rank Adaptation (LoRA) [9] and also adapting the original mask decoder to output categorical labels. Taking inspiration from SAMed [23], we implement a modified architecture as shown in Fig. 4, whereby the pretrained SAM image encoder maintains its frozen weights $W_{enc}$ during finetuning while additional light-weight LoRA layers are incorporated for updating purposes. In this way, we can not only leverage the exceptional feature extraction ability of the original SAM encoder, but also gradually capture the surgical data representations and store the domain-specific knowledge in the LoRA layers parameter-efficiently. We denote this modified architecture as “SurgicalSAM”. With an input image x, we can derive the image embedding $h_{image}$ following

$$\begin{aligned} h_{image} = W_{enc}x + \varDelta W x, \end{aligned}$$

(1)

where $\varDelta W$ is the weight update matrix of LoRA layers. Then we can decompose $\varDelta W$ into two smaller matrices: $\varDelta W = W_A W_B$, where $W_A$ and $W_B$ are $A \times r$ and $r \times B$ dimensional matrices, respectively. r is a hyper-parameter that specifies the rank of the low-rank adaptation matrices. To maintain a balance between model complexity, adaptability, and the potential for underfitting or overfitting, we empirically set the rank r of $W_A$ and $W_B$ in the LoRA layers to 4.

Table 3. Quantitative evaluation of SurgicalSAM under data corruption.

Full size table

During the unprompted automatic mask generation (AMG), the original SAM uses fixed default embeddings $h_{default}$ for the prompt encoder with weights $W_{prompt}$. We adopted this strategy and updated the lightweight prompt encoder during finetuning, as shown in Fig. 4. In addition, we modified the segmentation head of the mask decoder $W_{dec}$ to allow for the production of predictions for each semantic class. In contrast to the binary ambiguity prediction of the original mask decoder of SAM, the modified decoder predicts each semantic class of $\hat{y}$ in a deterministic manner. In other words, it is capable of semantic segmentation beyond binary segmentation (Fig. 5).

We adopt the training split of the Endo18 dataset for finetuning and test with the validation split, as other works reported in Table 1. Following SAMed [23], we adopt the combination of the Cross Entropy loss $L_{CE}$ and Dice loss $L_{Dice}$ which can be expressed as

$$\begin{aligned} L = \lambda L_{Dice} + (1-\lambda ) L_{CE}, \end{aligned}$$

(2)

where $\lambda $ is a weighting coefficient balancing the effects of the two losses. We empirically set $\lambda $ as 0.8 in our experiments. Due to resource constraints, we utilize the ViT_b version of SAM and finetuning on two RTX3090 GPUs. The maximum epochs are 160, with a batch size 12 and an initial learning rate of 0.001. To stabilize the finetuning process, we apply warmup for the first 250 iterations, followed by exponential learning rate decay. Random flip, rotation, and crop are applied to augment the training images and avoid overfitting. The images are resized to $512 \times 512$ as model inputs. Besides, we use AdamW [15] optimizer with a weight decay of 0.1 to update model parameters.

Results and Analysis. After naively finetuning, the SurgicalSAM model can manage the instrument-wise segmentation without reliance on prompts. With further tuning of hyper-parameters like the learning rate, the batch size, and the optimizer, SurgicalSAM can achieve ${\textbf {71.38}}\%$ mIoU score on the validation split of the Endo18 dataset, which is on par with the state-of-the-art models in Table 1. Since other methods in Table 1 are utilizing temporal and optical flow information as supplement [5], or conducting multi-task optimization [3, 24], the results of our image-only and single-task architecture SurgicalSAM are promising. Besides, the encoder backbone we finetuned is the smallest ViT_b due to limited computational resources. We believe the largest ViT_h backbone can yield much better performance. Compared with the original SAM, our new architecture is of great practical significance as it can achieve semantic-level automatic segmentation. Moreover, the additionally trained parameters are only 18.28 MB, suggesting the efficiency of our finetuning strategy.

Furthermore, we have evaluated the robustness of SurgicalSAM in the face of data corruption using the EndoVis18 validation dataset. As shown in Table 3, the model’s performance exhibits a significant degradation when subjected to various forms of data corruption, particularly in the case of Blur corruption.

7 Conclusion

In this study, we explore the robustness and zero-shot generalizability of the SAM [14] in the field of robotic surgery on two robotic instrument segmentation datasets of MICCAI EndoVis 2017 and 2018 challenges, respectively. Extensive empirical results suggest that SAM [14] is deficient in segmenting the entire instrument with point-based prompts and unprompted settings, as clearly shown in Fig. 1 and Fig. 3. This implies that SAM [14] can not capture the surgical scenes precisely despite yielding surprising zero-shot generalization ability. Besides, it exhibits challenges in accurately predicting certain parts of the instrument mask when there are overlapping instruments or only with a point-based prompt. It also fails to identify instruments in complex surgical scenarios, such as blood, reflection, blur, and shade. Moreover, we extensively evaluate the robustness of SAM [14] with a wide range of data corruptions. As indicated by Table 2 and Fig. 2, SAM [14] encounters significant performance degradation in many scenarios. To shed light on adapting SAM for surgical tasks, we fine-tuned the SAM using LoRA. Our fine-tuned SAM, i.e., SurgicalSAM, demonstrates the capability of class-wise mask prediction without any prompt.

As a foundational segmentation model, SAM [14] shows remarkable generalization capability in robotic surgical segmentation, yet it still suffers performance degradation due to downstream domain shift, data corruptions, perturbations, and complex scenes. To further improve its generalization capability and robustness, a broad spectrum of evaluations and extensions remains to be explored and developed.

Notes

References

Allan, M., et al.: 2018 robotic scene segmentation challenge. arXiv preprint arXiv:2001.11190 (2020)
Allan, M., et al.: 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426 (2019)
Baby, B., et al.: From forks to forceps: a new framework for instance segmentation of surgical instruments. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6191–6201 (2023)
Google Scholar
Deng, R., et al.: Segment anything model (SAM) for digital pathology: assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155 (2023)
González, C., Bravo-Sánchez, L., Arbelaez, P.: ISINet: an instance-based approach for surgical instrument segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020, Part III. LNCS, vol. 12263, pp. 595–605. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_57
Chapter Google Scholar
He, S., Bao, R., Li, J., Grant, P.E., Ou, Y.: Accuracy of segment-anything model (SAM) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324 (2023)
Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: International Conference on Learning Representations (2019)
Google Scholar
Hu, C., Li, X.: When SAM meets medical images: an investigation of segment anything model (SAM) on multi-phase liver tumor segmentation. arXiv preprint arXiv:2304.08506 (2023)
Hu, E.J., et al.: LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Islam, M., Atputharuban, D.A., Ramesh, R., Ren, H.: Real-time instrument segmentation in robotic surgery using auxiliary supervised deep adversarial learning. IEEE Robot. Autom. Lett. 4(2), 2188–2195 (2019)
Article Google Scholar
Islam, M., Vibashan, V., Lim, C.M., Ren, H.: ST-MTL: spatio-temporal multitask learning model to predict scanpath while tracking instruments in robotic surgery. Med. Image Anal. 67, 101837 (2021)
Article Google Scholar
Islam, M., Vibashan, V., Ren, H.: AP-MTL: attention pruned multi-task learning model for real-time instrument detection and segmentation in robot-assisted surgery. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 8433–8439. IEEE (2020)
Google Scholar
Jin, Y., Cheng, K., Dou, Q., Heng, P.-A.: Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. In: Shen, D., et al. (eds.) MICCAI 2019, Part V. LNCS, vol. 11768, pp. 440–448. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_49
Chapter Google Scholar
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Ma, J., Wang, B.: Segment anything in medical images (2023)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part III. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Seenivasan, L., Islam, M., Kannan, G., Ren, H.: SurgicalGPT: end-to-end language-vision GPT for visual question answering in surgery. arXiv preprint arXiv:2304.09974 (2023)
Seenivasan, L., Mitheran, S., Islam, M., Ren, H.: Global-reasoned multi-task learning model for surgical scene understanding. IEEE Robot. Autom. Lett. 7(2), 3858–3865 (2022)
Article Google Scholar
Shvets, A.A., Rakhlin, A., Kalinin, A.A., Iglovikov, V.I.: Automatic instrument segmentation in robot-assisted surgery using deep learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 624–628 (2018)
Google Scholar
Wang, A., Islam, M., Xu, M., Ren, H.: Rethinking surgical instrument segmentation: a background image can be all you need. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 1343, pp. 355–364. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16449-1_34
Chapter Google Scholar
Wu, J., et al.: Medical SAM adapter: adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620 (2023)
Zhang, K., Liu, D.: Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785 (2023)
Zhao, Z., Jin, Y., Heng, P.A.: TraSeTR: track-to-segment transformer with contrastive query for instance-level instrument segmentation in robotic surgery. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 11186–11193. IEEE (2022)
Google Scholar

Download references

Acknowledgements

This work was supported by Hong Kong Research Grants Council (RGC) Collaborative Research Fund (CRF C4063-18G and CRF C4026-21GF), Shun Hing Institute of Advanced Engineering (SHIAE project BME-p1-21) at the Chinese University of Hong Kong (CUHK), General Research Fund (GRF 14203323), Shenzhen-Hong Kong-Macau Technology Research Programme (Type C) STIC Grant SGDX20210823103535014 (202108233000303), and (GRS) #3110167.

Author information

Authors and Affiliations

Department of Electronic Engineering, Shun Hing Institute of Advanced Engineering (SHIAE), The Chinese University of Hong Kong, Hong Kong SAR, China
An Wang & Hongliang Ren
Department of Medical Physics and Biomedical Engineering, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, UK
Mobarakol Islam
Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore
Mengya Xu & Hongliang Ren
School of Mechanical Engineering, Hubei University of Technology, Wuhan, China
Yang Zhang

Authors

An Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mobarakol Islam
View author publications
You can also search for this author in PubMed Google Scholar
Mengya Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongliang Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongliang Ren .

Editor information

Editors and Affiliations

University of Central Arkansas, Conway, AR, USA
M. Emre Celebi
Amazon Development Center U.S. Inc., Seattle, WA, USA
Md Sirajus Salekin
Korea University, Seoul, Korea (Republic of)
Hyunwoo Kim
University Hospital Bonn, Bonn, Germany
Shadi Albarqouni
Instituto Superior Técnico, Lisboa, Portugal
Catarina Barata
Memorial Sloan Kettering Cancer Center, New Yrok, NY, USA
Allan Halpern
Medical University of Vienna, Vienna, Austria
Philipp Tschandl
Kenko AI, Barcelona, Spain
Marc Combalia
Google (United States), Palo Alto, CA, USA
Yuan Liu
National Institutes of Health, Bethesda, MD, USA
Ghada Zamzmi
Amazon, USA, Cambridge, MA, USA
Joshua Levy
Amazon (United States), Fairfax, VI, USA
Huzefa Rangwala
German Cancer Research Center, Germany, Heidelberg, Germany
Annika Reinke
Amazon (United States), Baltimore, WA, USA
Diya Wynn
Vanderbilt University, Brentwood, TN, USA
Bennett Landman
Korea University, Seoul, Korea (Republic of)
Won-Ki Jeong
Johns Hopkins University, Baltimore, MD, USA
Yiqing Shen
University of Surrey, Guildford, UK
Zhongying Deng
University of Pennsylvania, Philadelphia, PA, USA
Spyridon Bakas
University of British Columbia, Vancouver, Canada
Xiaoxiao Li
Imperial College London, London, UK
Chen Qin
Nvidia, Munich, Germany
Nicola Rieke
Nvidia Corporation, Bethesda, MD, USA
Holger Roth
NVIDIA Corporation, Santa Clara, CA, USA
Daguang Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, A., Islam, M., Xu, M., Zhang, Y., Ren, H. (2023). SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation. In: Celebi, M.E., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 Workshops . MICCAI 2023. Lecture Notes in Computer Science, vol 14393. Springer, Cham. https://doi.org/10.1007/978-3-031-47401-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-47401-9_23
Published: 01 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47400-2
Online ISBN: 978-3-031-47401-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation

Abstract

Similar content being viewed by others

Unsupervised Surgical Instrument Segmentation via Anchor Generation and Semantic Diffusion

Scalable Joint Detection and Segmentation of Surgical Instruments with Weak Supervision

Rethinking Surgical Instrument Segmentation: A Background Image Can Be All You Need

1 Introduction

2 Experimental Settings

3 Surgical Instruments Segmentation with Prompts

4 Robustness Under Data Corruption

5 Automatic Surgical Scene Segmentation

6 Parameter-Efficient Finetuning with Low-Rank Adaptation

7 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation

Abstract

Similar content being viewed by others

Unsupervised Surgical Instrument Segmentation via Anchor Generation and Semantic Diffusion

Scalable Joint Detection and Segmentation of Surgical Instruments with Weak Supervision

Rethinking Surgical Instrument Segmentation: A Background Image Can Be All You Need

1 Introduction

2 Experimental Settings

3 Surgical Instruments Segmentation with Prompts

4 Robustness Under Data Corruption

5 Automatic Surgical Scene Segmentation

6 Parameter-Efficient Finetuning with Low-Rank Adaptation

7 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation