Small gastric polyp detection based on the improved YOLOv5

Wu, Linfei; Liu, Jin; Yang, Haima; Huang, Bo; Liu, Haishan; Cheng, Shaowei

doi:10.1007/s11042-024-18497-1

Small gastric polyp detection based on the improved YOLOv5

Published: 06 February 2024

Volume 83, pages 71773–71788, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Small gastric polyp detection based on the improved YOLOv5

Download PDF

Linfei Wu¹,
Jin Liu ORCID: orcid.org/0000-0001-5607-9940¹,
Haima Yang²,
Bo Huang¹,
Haishan Liu¹ &
…
Shaowei Cheng¹

231 Accesses
1 Altmetric
Explore all metrics

Abstract

Small target polyps are prone to missed detection due to their small coverage area and little information. To address this issue, a modified PATM-YOLO polyp detection model based on YOLOv5 is proposed. The model first addresses the issue of missed detection of small polyps by constructing a detection head for identifying small polyps and using an improved Phase-Aware Token Mixing Module(PATM) attention module to increase the network’s attention to small polyps and suppress the model’s focus on non-polyp regions. Secondly, an improved Adaptively Spatial Feature Fusion(ASFF) module is proposed to fully utilize multi-scale information, enhancing the network’s feature richness. Finally, by introducing the Swin Transformer into the network and determining its optimal placement through experiments, the detection accuracy is maximized without affecting the network’s performance. After experimental comparison on the constructed dataset and the public dataset SUN, the proposed PATM-YOLO network model alleviated missed detection in dense and small polyp images, and achieved an precision of 91.3$\%$, which is 8.5$\%$ higher than the baseline YOLOv5 network model. This indicates that the detection performance of this model outperforms other classical target detection networks and the original network.

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Gastric cancer is a prevalent disease that poses a serious threat to human health [1]. According to the estimates by the International Agency for Research on Cancer (WHO), around 1.09 million individuals across the world suffered from gastric cancer in 2020, with 769,000 deaths attributed to the disease. This ranks gastric cancer as the sixth most prevalent malignancy and fourth most deadly worldwide. China accounts for 36.42$\%$ of new cases and 37.58$\%$ of deaths from gastric cancer worldwide, indicating the urgency of prevention and treatment. However, gastric cancer is frequently difficult to detect at an early stage due to the lack of specific symptoms in most gastric cancer patients, which is one of the reasons for the current low diagnosis and treatment rate of early-stage gastric cancer. In addition, gastric polyps can cause surface bleeding due to the corrosive effects of gastric acid, leading to symptoms such as anemia at the later stages. If polyps can be detected at an early stage and undergo regular follow-up or direct surgical treatment, the survival rate of patients can be significantly improved. Therefore, effective prevention and treatment measures should be taken to reduce the harm caused by gastric cancer to humans.

The main method for detecting polyps is gastroscopy screening, which is divided into traditional manual screening and assisted screening methods based on deep learning technology. However, traditional manual screening methods have limitations. Due to the irregular variations in texture features [2, 3], shape [4, 5], size [5], color [6, 7], and other characteristics of polyps, manual screening not only time-consuming to identify polyps but also requires doctors to have relevant knowledge. Moreover, even if the doctor has the appropriate knowledge, factors such as fatigue level [8] and the characteristics of the polyp itself can lead to experienced experts making misdetection or missed detection of some polyps [9]. Therefore, the development of computer-aided diagnosis technology to assist doctors in polyp detection is of great significance.

Due to the significant breakthroughs in GPU computing power, recently, researchers have devoted a lot of effort to computer vision. Compared to conventional manual screening methods, deep learning-based approaches can help doctors focus their attention on identifying suspected polyps, rather than wasting their time on a massive amount of normal images. Therefore, deep learning technology is now utilized in the field of medical imaging [10, 11], aiming to address the above problems while improving the precision of polyp detection and reducing the risk of missed diagnosis and misdiagnosis, providing doctors with more comprehensive, reliable, and efficient diagnostic tools.

In 2020, Deeba et al. [12] introduced a computer-assisted algorithm for the detection of polyps in both colonoscopy and wireless capsule endoscopy (WCE). This algorithm involved several key components, including image enhancement, the generation of saliency maps, and the extraction of histogram of oriented gradients (HOG) features, all of which played a crucial role in the final classification process. By effectively amplifying clinically significant features and reducing the number of search windows through saliency-based selection, the algorithm ultimately improved detection efficiency. In 2021, Qadir et al. [13] created a colon polyp detection system based on F-CNN, which used a two-dimensional Gaussian mask instead of a binary mask, allowing the proposed system to successfully detect the flat and small target polyps with blurred boundaries between the background and the polyp, reducing the rate of missed detection of colon polyps. In 2021, Taş et al. [14] suggested a preprocessing approach that used a super-resolution method based on convolutional neural networks (SRCNN) to enhance the resolution of colonoscopy images prior to polyp localization. This method improved both recall rate and accuracy of the model compared to the low-resolution case. In 2021, Chen et al. [15] improved the saliency of the polyp area by enhancing the contrast of the input image through the differentiation of foreground and background images. The enhanced data were input into an improved deep residual convolutional neural network and integrated learning method for automatic colon polyp detection. By adding attention modules, the network can focus on useful feature channels and suppress invalid feature channels, greatly improving the precision of the detection network. In 2021, Cao et al. [16] proposed a network for detecting gastric polyps, which incorporated a module that extracted and merged features and was based on the YOLOv3 network. The network utilized both high-level and low-level feature maps’ semantic information, resulting in improved detection of small target polyps, with a recall rate of 86.2$\%$. In 2022, Nisha et al. [17] proposed a dual-path convolutional neural network (DP-CNN) that used image enhancement techniques, DP-RNN structure, and the S-shaped classifier to detect polyps, successfully classifying polyps and non-polyp patches in colonoscopy images and reducing complexity with fewer learnable parameters. In 2022, Hu et al. [18] proposed a novel approach, NeutSS-PLP, aimed at extracting polyp regions within colonoscopy images. The method combines neutral uncertainty theory and saliency detection strategies to enhance the identification accuracy of specular reflections in colonoscopy images and to perform suppression. In addition, a two-level short connection to the saliency detection network was introduced to extract multi-level and multi-scale features for better polyp region extraction.

Several effective strategies have been proposed for conventional polyp detection problem, which have performed well in terms of accuracy, recall, and feasibility. However, due to the irregularity, low resolution, and insufficient feature information of polyp targets, conventional detection models often encounter issues of missed detection or false detection when facing such small targets. Therefore, this study optimized the YOLOv5 model for small polyp target detection, including the following aspects: firstly, to address the issue of information loss in small polyp targets, a new network was developed by adding a small target detection head and utilizing Swin Transformer to enhance the network’s sensitivity to small targets, thereby improving small polyp detection. Secondly, to fully utilize the information between different scales, the new network integrated the ASFF module, which can be applied to four detection heads. Additionally, to weaken the impact of non-detection object areas in the image on the model results, a more outstanding plug-and-play Res-PATM attention mechanism module was proposed based on the PATM module. The proposed PATM-YOLO algorithm achieved 91.3$\%$ precision and 86.6$\%$ recall in the constructed dataset and 95.6$\%$ precision and 90.8$\%$ recall in the public polyp dataset SUN, outperforming other comparison algorithms in both datasets. The presented PATM-YOLO algorithm demonstrates its effectiveness in detecting small polyps, as indicated by these results.

2 Material and methods

2.1 Dataset

In this study, the parts of the collected datasets [23, 24] related to polyps were extracted and combined into a new dataset, in order to test the detection capability of the model with a richer dataset. The dataset constructed in this study consists of 1759 images, most of which are small polyps and can support related detection tasks for small polyps. Table 1 displays the distribution of images used in this study. A total of 1,127 images were used for training the network model, 281 for validation, and 351 for testing the model’s performance. To validate the model’s effectiveness, experiments were also conducted on the publicly accessible polyp dataset SUN [25], using the corresponding techniques. SUN is an public dataset for polyp detection, which contains up to 49,136 photos with polyp information collected from 100 patients and divided into 100 parts according to different patients. To ensure experimental objectivity, these photos were randomly partitioned into training, validation, and test sets. Within the training set, 31,448 photos containing polyp information were used for training. Furthermore, the validation set consisted of 7,862 images, while the remaining 9,826 polyp photos were allocated to the test set.

Table 1 Details regarding the public datasets employed in this study

Full size table

2.2 The improved ultra-small target detection head

Due to the high presence of small polyps in the dataset and the significant irregularity in their shape, texture, and size, YOLOv5 does not perform optimally in detecting these small polyps. To address this issue, this study constructed a detection head for small targets in the model’s head to counteract the missed detection that occurs with YOLOv5 in small polyp detection [19]. This approach enhances the model’s detection accuracy for small polyps without compromising its ability to detect polyps of other sizes.

2.3 Improved PATM attention module

Currently, attention mechanisms are being widely used in the field of vision. Inspired by this, this study introduces the PATM attention module, which combines the advantages of smaller inductive bias and simpler architecture in MLP, to enhance the network model’s attention to effective targets and suppress attention to non-target areas [20], based on the following principle:

The PATM attention module characterizes a token as a wave function that possesses phase and amplitude, defined as follows:

$$\begin{aligned} \tilde{Z}_p = {|Z_p |}\odot e^{{i\theta }_p}, p=1,2,\ldots ,m, \end{aligned}$$

(1)

Where i represents the imaginary unit that satisfies $i^2 = -1$, $ |\cdot |$ represents the absolute value operator, and $ \odot $ represents the element-wise dot product operator. The amplitude $ |Z_p |$ represents the real-valued feature for each token, $ e^{{i\theta }_p} $ is a periodic function, and $ \theta _p $ represents the phase, corresponding to the current position of the token within the wave period. The phase term $ \theta _p $ affects the summing result of different tokens during aggregation.

Calculate the corresponding amplitude information $ Z_p $ and phase information $ \theta _p $ based on the given input features using Formula 2 and Formula 3, respectively.

$$\begin{aligned} {Z}_p = Channel - FC(X_p, W^c)=W^cX_p, p=1,2,\ldots ,m, \end{aligned}$$

(2)

$$\begin{aligned} \theta _p = \Theta (X_p,W^\theta ), \end{aligned}$$

(3)

Where $ W^c $ represents a weight that possesses learnable parameters, and $ W^\theta $ represents learnable parameters.

As Formula 1 is represented in the complex domain, Formula 4 is needed to expand it and represent it in terms of real and imaginary parts.

$$\begin{aligned} \tilde{Z}_p = {|Z_p |}\odot cos\theta _p + i{|Z_p |}\odot sin{\theta _p}, p=1,2,\ldots ,m, \end{aligned}$$

(4)

In the above formula, the real and imaginary parts of complex-valued tokens are signified by two vectors, correspondingly. Then, different tokens $ \tilde{Z}_p $ are merged using the $ token-FC $ operation, i.e.:

$$\begin{aligned} \tilde{O}_p = Token-FC(\tilde{Z},W^t)_p={\sum _q W_{pq}^t}\odot \tilde{Z}_q, p=1,2,\ldots ,m, \end{aligned}$$

(5)

Where $ \tilde{Z}=[\tilde{Z}_1, \tilde{Z}_2,\ldots ,\tilde{Z}_m ]$ represents all the wave-like tokens in one layer. In Formula 5, the interaction between tokens takes into account both their amplitude and phase information. The resulting output, $ \tilde{O}_p $ , is represented by complex values that combine the features. Following the common quantum measurement approach that involves projecting a quantum state, characterized by a complex-valued representation, onto an observable real value. The real-valued output $ O_p $ is obtained by weighting and summing the real and imaginary parts of $ \tilde{O}_p $ with parameters. Combined Formula 5, the output $ O_p $ can be obtained:

$$\begin{aligned} O_p = \sum _q W_{pq}^t Z_{q} \odot \cos \theta _q + W_{pq}^i Z_q \odot \sin \theta _q, p=1,2,\ldots ,m, \end{aligned}$$

(6)

Where $ W^t $ and $ W^i $ each represent weights with learnable parameters. In the above formula, the phase $ \theta _q $ dynamically adjusts itself based on the semantic content of the input data. In addition to the unchanging weights, the phase also modulates the aggregating process of different tokens.

As shown in Fig. 1, the PATM attention module generates amplitude and phase information using Formula 2 and the phase estimation function Formula 3, respectively, given the input features. The complex-value representation is obtained by expanding the output as wave-like tokens using Formula 4 and aggregating them with Formula 6. The output features are enhanced by transforming them with another Channel-FC to increase the representational power and fed into the bottleneck residual module to deepen the network, thus improving the detection capability for small targets. This study refers to the improved module as Res-PATM.

To investigate how the improved PATM attention module can be applied with the bottleneck residual module, this study conducted experiments comparing four adding methods as shown in Fig. 2. These experiments aimed to maximize the detection results of the network and determine the optimal number of bottleneck residual modules to be used in the PATM attention module.

The experimental findings presented in Fig. 2 indicate that the optimal performance of the network is achieved when two bottleneck residual modules are incorporated into the PATM attention module, allowing it to concentrate on more relevant information.

2.4 Determining the location of swin transformer

The Swin Transformer-v2 architecture benefits from the shift-window operation, which restricts the attention operation to a window and reduces the computational cost. Additionally, the Patch Merging operation can increase the receptive field and obtain multi-scale features [21]. Swin Transformer-v2 architecture makes the amplitude controllable by applying layer normalization afterwards. Inspired by these methods, this study replaced some of the original Cross Stage Partial(CSP) modules in YOLOv5 with CSP modules based on the Swin Transformer-v2 architecture(Swin-CSP).The schematic diagram related to the Swin-CSP module is shown in Fig. 3.

Table 2 Experimental results of optimal placement positions

Full size table

To further verify the optimal placement of Swin Transformer modules in the network, this study conducted experimental comparisons of the optimal placement positions, as shown in Table 2.

The results presented in Table 2 demonstrate that the detection model performs best when replacing one CSP module in the backbone network and all in the neck, and is able to extract more effective features compared to other network structures. Finally, we extended this to the PATM-YOLO model. As depicted in Fig. 5, only one CSP module in the head was replaced, whereas all CSP modules in the neck were replaced.

2.5 The improved ASFF module

The initial structure of YOLOv5 is affected by the irregular variations in size and shape of polyps, resulting in differences in detection difficulty due to polyps of different sizes. To reduce the detection difficulty fluctuations caused by polyps of different sizes, this study introduced the ASFF module [22]. The ASFF module’s fundamental concept revolves around empowering the network to dynamically acquire spatial feature weights across various scales during fusion, and its implementation can be divided into two parts: feature size normalization and scale fusion, as follows:

Unified Feature Size: Due to the various resolutions and channel numbers in the network header, the feature layers will eventually need to perform the summation operation as depicted in Formula 7. Therefore, it is crucial to ensure that each layer has uniform channel numbers and feature map size. This can be achieved by initially utilizing a regular convolution operation to obtain equal channel numbers, and then adjusting the sampling strategy for different levels of upsampling and downsampling to ensure uniform feature map size. As shown in Fig. 4, the blue line indicates the downsampling operation and the red line represents the upsampling operation used to enhance resolution.

As shown in Formula 7, the scale fusion operation is performed on the l-th level as an example. By dividing the feature layers of different resolutions into levels, the same resolution feature maps obtained from the other three levels after the feature size unification operation (i.e., downsampling operation) are weighted and summed to obtain the final features.

$$\begin{aligned} y_{ij}^l=\alpha _{ij}^l*x_{ij}^{1 \rightarrow l}+\beta _{ij}^l*x_{ij}^{2 \rightarrow l}+\gamma _{ij}^l*x_{ij}^{3 \rightarrow l}+\delta _{ij}^l*x_{ij}^{4 \rightarrow l} \end{aligned}$$

(7)

Where $ \alpha _{ij}^l $, $ \beta _{ij}^l $, $ \gamma _{ij}^l $ and $ \delta _{ij}^l $ are the spatial feature fusion weights of their corresponding feature maps relative to the feature map of level l, and these weights are shared across channels. It is worth noting that $ \alpha _{ij}^l $, $ \beta _{ij}^l $, $ \gamma _{ij}^l $ and $ \delta _{ij}^l $ are subject to the constraints of $ \alpha _{ij}^l+\beta _{ij}^l+\gamma _{ij}^l+\delta _{ij}^l=1 $ and $ \alpha _{ij}^l, \beta _{ij}^l, \gamma _{ij}^l,\delta _{ij}^l\in [0,1 ]$ and are defined as follows:

$$\begin{aligned} \alpha _{ij}^l=\frac{e^{\lambda _{\alpha _{ij}}^l}}{e^{\lambda _{\alpha _{ij}}^l}+e^{\lambda _{\beta _{ij}}^l}+e^{\lambda _{\gamma _{ij}}^l}+e^{\lambda _{\delta _{ij}}^l}} \end{aligned}$$

(8)

Where $ \alpha _{ij}^l $, $ \beta _{ij}^l $, $ \gamma _{ij}^l $ and $ \delta _{ij}^l $ are defined by the softmax function with $ \lambda _{\alpha _{ij}}^l $ ,$ \lambda _{\beta _{ij}}^l $,$ \lambda _{\gamma _{ij}}^l $, and $ \lambda _{\delta _{ij}}^l $ as the control parameters.

2.6 The PATM-YOLO algorithm

Based on the above enhancements, the present study suggests an improved algorithm Phase-Aware token Module based YOLOv5(PATM-YOLO) based on YOLOv5, which is dedicated to enhancing the missed detection of dense, small polyps due to the loss of information on small polyps, unevenness in polyp texture and polyp size, and the complexity of the detection background. Figure 5 depicts the structural diagram of the PATM-YOLO algorithm.

3 Results

3.1 Implementation details

3.1.1 Training setting

In this paper, the experimental setup utilizes the Ubuntu 20.04 operating system. The central processing unit (CPU) employed is a 24 vCPU Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz, while the graphics processing unit (GPU) chosen is the RTX 3090 with 24GB memory capacity. The experimentation environment is configured with Python 3.8.1 scripting language, PyTorch 1.10.0 deep learning framework, and CUDA 11.3 GPU acceleration library.

The key training parameters for the experiments were configured as follows: input image dimensions were set to 640 by 640 pixels, the initial learning rate was established at 0.01, learning rate momentum was assigned a value of 0.937, and the weight decay coefficient was set to 0.0005. The training process spanned 100 epochs with a batch size of 64.

3.1.2 Evaluation metrics

In this research, the effectiveness of pre- and post-improved network models in detecting both dense and small polyp images using a constructed dataset was assessed under similar experimental conditions. The differences in experimental outcomes were compared to assess the network performance, i.e., the status of missed and false detections. The three main metrics chosen for this study included precision, recall, and mean average precision (mAP), which were calculated as follows:

$$\begin{aligned} Precision=\frac{TP}{TP+FP}, \end{aligned}$$

(9)

$$\begin{aligned} Recall=\frac{TP}{TP+FN}, \end{aligned}$$

(10)

$$\begin{aligned} mAP@0.5=\frac{\sum _{i=1}^N AP_i}{N}, \end{aligned}$$

(11)

The formulas presented above use TP to indicate true accurate predictions, FP to indicate false predictions, and FN to indicate false negative i.e., false predictions,and is frequently employed for assessing the overall target detection network model’s detection performance.

3.2 Ablation experiment

To further verify the impact of the proposed modules and optimizations on the detection algorithm in polyp detection tasks, this study conducted a set of ablation experiments. Based on the YOLOv5s network, this study added the Swin Transformer network to create YOLOv5s-a, added the ASFF module to create YOLOv5s-b, added the PATM attention module to create YOLOv5s-c, and added the small target detection head to create YOLOv5s-d. The network with all modules added to the YOLOv5s baseline is referred to as the proposed PATM-YOLO network. The results of the ablation experiments are shown in Table 3. In comparison to the YOLOv5 network, the separate addition of each module to the network not only led to a minimum improvement of 1.8$\%$ in precision but also yielded a performance increase of at least 1.1$\%$ in both recall and mAP@0.5. This underscores the feasibility of enhancing the model.

Table 3 Ablation experiments of PATM-YOLO

Full size table

3.3 Experimental comparison between YOLOv5 algorithm and improved algorithms

In this section, the PATM-YOLO algorithm is compared with the YOLOv5 series algorithms. In order to objectively demonstrate the performance on the dataset, considering all existing YOLOv5 models, including YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5x, and YOLOv5l. Table 4 shows the different performance of PATM-YOLO algorithm and YOLOv5 series algorithms on the constructed dataset.

Table 4 Comparison of experimental results between the PATM-YOLO and YOLOv5 series networks

Full size table

Figure 6 displays a comparison of performance parameters between the PATM-YOLO algorithm and the YOLO series algorithms, revealing that the PATM-YOLO algorithm exhibits superior precision, recall rate, and mAP@0.5 compared to other models. With reference to Table 4, it can be inferred that the PATM-YOLO algorithm attains an precision rate of 91.3$\%$, a recall rate of 86.6$\%$, and an mAP@0.5 of 92$\%$ in the detection experiment of polyp targets when contrasted with the original YOLOv5 series network. This constitutes an improvement of 8.5$\%$, 3.3$\%$, and 4.7$\%$, respectively, over the YOLOv5s baseline network model. It can be observed that the performance of the PATM-YOLO algorithm on the constructed dataset exhibits an advantage.

For specific detection tasks involving dense and small targets, the detection performance of the YOLOv5 and PATM-YOLO network models is shown in Figs. 7 and 8. Figure 7 corresponds to dense polyp images with three targets in the original image, of which the original network detected two targets but missed one, while the improved PATM-YOLO network detected all polyp targets. For small polyp images in Fig. 8, the original network exhibited missed detection, while the improved network was able to detect all targets.

3.4 Comparison of the PATM-YOLO algorithm with other algorithms

To further validate the effectiveness of the PATM-YOLO model, comparative experiments were conducted with other algorithms under the condition of maintaining consistent configuration environments and initial hyperparameters as much as possible.

Table 5 shows the experimental results of the PATM-YOLO algorithm and other algorithms on the dataset constructed in this paper. As shown in the table, under the input size of 640*640, the detection performance of the PATM-YOLO algorithm and other algorithms are outstanding, and are able to achieve better performance that surpasses other algorithms in polyp detection tasks.

Table 5 Comparison between the PATM-YOLO algorithm and other algorithms

Full size table

3.5 Testing of the PATM-YOLO algorithm on the SUN dataset

This section aims to introduce the experiments of the PATM-YOLO algorithm proposed in this study on the public polyp dataset SUN. Similarly, to ensure the fairness of the polyp detection experiments, the experiment is conducted using similar parameter settings as the previous experiments.

Table 6 The comparison results between PATM-YOLO and other algorithms on the SUN dataset

Full size table

Figure 9 shows the training process and validation results of YOLOv5, YOLOv7, YOLOv8, and PATM-YOLO. As shown in the figure, the red line representing YOLOv8 has lower recall, precision, and mAP@0.5 compared to the other three detection algorithms. The proposed PATM-YOLO algorithm can achieve higher precision and recall in a shorter time compared to the baseline YOLOv5 network. Furthermore, although the training process shows that PATM-YOLO and YOLOv7 have similar precision and recall, the PATM-YOLO algorithm actually outperforms YOLOv7 in the test set. When Table 6 is integrated into the analysis, it can be observed that the PATM-YOLO algorithm on the SUN dataset shows an increase of 0.4$\%$ in precision and 0.8$\%$ in recall.

It can be observed from Table 6 that the PATM-YOLO algorithm outperformed the detection performance in the public dataset SUN by a significant margin in terms of recall and was more suitable for the polyp detection task compared to other detection networks [26, 27].

4 Conclusions

The study proposes a new method for detecting small polyps in images, called PATM-YOLO. The proposed method addresses the issue of missed detection of small polyps. In terms of network architecture, a detection head is firstly constructed for detecting small targets, followed by an attention mechanism to obtain richer information and limit the influence of background areas in the image on the target. Secondly, the Swin Transformer structure is employed to augment the network’s feature extraction capacity. Finally, the ASFF module is incorporated into the network to enhance the integration of multi-scale features and enrich the network’s feature diversity. The PATM-YOLO algorithm achieved better performance than other YOLOv5 series algorithms, with an precision of 91.3$\%$, a recall rate of 86.6$\%$, and an mAP@0.5 of 92$\%$ on the constructed dataset. In addition, the algorithm also achieved an precision of 95.6$\%$ and a recall rate of 90.8$\%$ on the public SUN dataset, making it more suitable for polyp detection tasks. The study shows that PATM-YOLO algorithm can improve the detection performance of polyps. In addressing the computational requirements, there is a need for further improvement in the PATM-YOLO algorithm. Enhancing the algorithm to reduce computational costs while maintaining detection accuracy and improving its detection performance to facilitate deployment on resource-constrained devices will be a focal point of our future work.

Code Availibility

Not applicable

Availability of data and materials

Three publicly available polyp datasets of endoscopy images (the PLoS One-Zhang dataset, the Hyper-Kvasir-Segmentation dataset, and the SUN dataset) were used in the experiments of this study. The PLoS One-Zhang dataset can be found at: https://github.com/jiquan/Dataset-acess-for-PLOS-ONE. The Hyper-Kvasir-Segmentation dataset can be found at: https://datasets.simula.no/hyper-kvasir/. The SUN dataset can be found at: http://sundatabase.org.

References

Brosens LA, Wood LD, Offerhaus GJ, Arnold CA, Lam-Himlin D, Giardiello FM, Montgomery EA (2016) Pathology and genetics of syndromic gastric polyps. International journal of surgical pathology 24(3):185–199. https://doi.org/10.1177/1066896915620013
Article Google Scholar
Ameling S, Wirth S, Paulus D, Lacey G, Vilarino F (2009) Texture-based polyp detection in colonoscopy. In: Bildverarbeitung Für die Medizin 2009: Algorithmen—Systeme—Anwendungen Proceedings des Workshops Vom 22. Bis 25. März 2009 in Heidelberg, Springer, pp 346–350. https://doi.org/10.1007/978-3-540-93860-6_70
Iakovidis DK, Maroulis DE, Karkanis SA, Brokos A (2005) A comparative study of texture features for the discrimination of gastric polyps in endoscopic video. In: 18th IEEE Symposium on computer-based medical systems (CBMS’05), IEEE, pp 575–580. https://doi.org/10.1109/CBMS.2005.6
Hwang S, Oh J, Tavanapong W, Wong J, De Groen PC (2007) Polyp detection in colonoscopy video using elliptical shape feature. In: 2007 IEEE International conference on image processing, IEEE, 2:465. https://doi.org/10.1109/ICIP.2007.4379193
Bernal J, Sánchez J, Vilarino F (2012) Towards automatic polyp detection with a polyp appearance model. Pattern Recognition 45(9):3166–3182. https://doi.org/10.1016/j.patcog.2012.03.002
Article Google Scholar
Leung WK, Guo C-G, Ko MK, To EW, Mak LY, Tong TS, Chen L-J, But DY, Wong SY, Liu KS et al (2020) Linked color imaging versus narrow-band imaging for colorectal polyp detection: a prospective randomized tandem colonoscopy study. Gastrointestinal Endoscopy 91(1):104–112. https://doi.org/10.1016/j.gie.2019.06.031
Article Google Scholar
Alexandre LA, Nobre N, Casteleiro J (2008) Color and position versus texture features for endoscopic polyp detection. In: 2008 International conference on biomedical engineering and informatics, IEEE, 2:38–42. https://doi.org/10.1109/BMEI.2008.246
Freedman JS, Harari DY, Bamji ND, Bodian CA, Kornacki S, Cohen LB, Miller KM, Aisenberg J (2011) The detection of premalignant colon polyps during colonoscopy is stable throughout the workday. Gastrointestinal endoscopy 73(6):1197–1206. https://doi.org/10.1016/j.gie.2011.01.019
Article Google Scholar
Simmons DT, Harewood GC, Baron TH, Petersen BT, Wang KK, Boyd-Enders F, Ott BJ (2006) Impact of endoscopist withdrawal speed on polyp yield: implications for optimal colonoscopy withdrawal time. Alimentary pharmacology & therapeutics 24(6):965–971. https://doi.org/10.1016/j.gie.2006.03.026
Article Google Scholar
Setio AAA, Ciompi F, Litjens G, Gerke P, Jacobs C, Van Riel SJ, Wille MMW, Naqibullah M, Sánchez CI, Van Ginneken B (2016) Pulmonary nodule detection in ct images: false positive reduction using multi-view convolutional networks. IEEE transactions on medical imaging 35(5):1160–1169. https://doi.org/10.1109/TMI.2016.2536809
Article Google Scholar
Pang S, Ding T, Qiao S, Meng F, Wang S, Li P, Wang X (2019) A novel yolov3-arch model for identifying cholelithiasis and classifying gallstones on ct images. PloS one 14(6):0217647. https://doi.org/10.1371/journal.pone.0217647
Article Google Scholar
Deeba F, Bui FM, Wahid KA (2020) Computer-aided polyp detection based on image enhancement and saliency-based selection. Biomed Signal Process Control 55:101530. https://doi.org/10.1016/j.bspc.2019.04.007
Qadir HA, Shin Y, Solhusvik J, Bergsland J, Aabakken L, Balasingham I (2021) Toward real-time polyp detection using fully cnns for 2d gaussian shapes prediction. Med Image Anal 68:101897. https://doi.org/10.1016/j.media.2020.101897
Taş M, Yılmaz B (2021) Super resolution convolutional neural network based pre-processing for automatic polyp detection in colonoscopy images. Comput Electrical Eng 90:106959. https://doi.org/10.1016/j.compeleceng.2020.106959
Chen B-L, Wan J-J, Chen T-Y, Yu Y-T, Ji M (2021) A self-attention based faster r-cnn for polyp detection from colonoscopy images. Biomed Signal Process Control 70:103019. https://doi.org/10.1016/j.bspc.2021.103019
Cao C, Wang R, Yu Y, Zhang H, Yu Y, Sun C (2021) Gastric polyp detection in gastroscopic images using deep neural network. PloS one 16(4):0250632. https://doi.org/10.1371/journal.pone.0250632
Article Google Scholar
Nisha J, Gopi VP, Palanisamy P (2022) Automated colorectal polyp detection based on image enhancement and dual-path cnn architecture. Biomed Signal Process Control 73:103465. https://doi.org/10.1016/j.bspc.2021.103465
Hu K, Zhao L, Feng S, Zhang S, Zhou Q, Gao X, Guo Y (2022) Colorectal polyp region extraction using saliency detection network with neutrosophic enhancement. Comput Biol Med 147:105760. https://doi.org/10.1016/j.compbiomed.2022.105760
Zhu X, Lyu S, Wang X, Zhao Q (2021) Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2778–2788. https://doi.org/10.48550/arXiv.2108.11539
Tang Y, Han K, Guo J, Xu C, Li Y, Xu C, Wang Y (2022) An image patch is a wave: Phase-aware vision mlp. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 10935–10944. https://doi.org/10.48550/arXiv.2111.12294
Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J, Cao Y, Zhang Z, Dong L, et al. (2022) Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12009–12019. https://doi.org/10.48550/arXiv.2111.09883
Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516. https://doi.org/10.48550/arXiv.1911.09516
Zhang X, Chen F, Yu T, An J, Huang Z, Liu J, Hu W, Wang L, Duan H, Si J (2019) Real-time gastric polyp detection using convolutional neural networks. PloS one 14(3):0214133. https://doi.org/10.1371/journal.pone.0214133
Article Google Scholar
Borgli H, Thambawita V, Smedsrud PH, Hicks S, Jha D, Eskeland SL, Randel KR, Pogorelov K, Lux M, Nguyen DTD et al (2020) Hyperkvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Scientific data 7(1):283. https://doi.org/10.1038/s41597-020-00622-y
Article Google Scholar
Misawa M, Kudo S-E, Mori Y, Hotta K, Ohtsuka K, Matsuda T, Saito S, Kudo T, Baba T, Ishida F et al (2021) Development of a computer-aided detection system for colonoscopy and a publicly accessible large colonoscopy video database (with video). Gastrointestinal endoscopy 93(4):960–967. https://doi.org/10.1016/j.gie.2020.07.060
Article Google Scholar
Pacal I, Karaman A, Karaboga D, Akay B, Basturk A, Nalbantoglu U, Coskun S (2022) An efficient real-time colonic polyp detection with yolo algorithms trained by using negative samples and large datasets. Comput Biol Med 141:105031. https://doi.org/10.1016/j.compbiomed.2021.105031
Karaman A, Pacal I, Basturk A, Akay B, Nalbantoglu U, Coskun S, Sahin O, Karaboga D (2023) Robust real-time polyp detection system design based on yolo algorithms by optimizing activation functions and hyper-parameters with artificial bee colony (abc). Expert Syst Appl 221:119741. https://doi.org/10.1016/j.eswa.2023.119741

Download references

Funding

This work was supported by the Shanghai Science and Technology Innovation Action Plan(22S31903700 $ \& $ 21S31904200).

Author information

Authors and Affiliations

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, 333 Longteng Road, Shanghai, 201620, People’s Republic of China
Linfei Wu, Jin Liu, Bo Huang, Haishan Liu & Shaowei Cheng
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, 516 Jungong Road, Shanghai, 200093, People’s Republic of China
Haima Yang

Authors

Linfei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haima Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Haishan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shaowei Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LW, JL, and HY contributed to conception and design of the study. LW, HY, HL, and SC organized the database. LW, JL and SC performed the statistical analysis. LW wrote the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Corresponding author

Correspondence to Jin Liu.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, L., Liu, J., Yang, H. et al. Small gastric polyp detection based on the improved YOLOv5. Multimed Tools Appl 83, 71773–71788 (2024). https://doi.org/10.1007/s11042-024-18497-1

Download citation

Received: 03 June 2023
Revised: 07 September 2023
Accepted: 29 January 2024
Published: 06 February 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s11042-024-18497-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Small gastric polyp detection based on the improved YOLOv5

Abstract

Explore related subjects

1 Introduction

2 Material and methods

2.1 Dataset

2.2 The improved ultra-small target detection head

2.3 Improved PATM attention module

2.4 Determining the location of swin transformer

2.5 The improved ASFF module

2.6 The PATM-YOLO algorithm

3 Results

3.1 Implementation details

3.1.1 Training setting

3.1.2 Evaluation metrics

3.2 Ablation experiment

3.3 Experimental comparison between YOLOv5 algorithm and improved algorithms

3.4 Comparison of the PATM-YOLO algorithm with other algorithms

3.5 Testing of the PATM-YOLO algorithm on the SUN dataset

4 Conclusions

Code Availibility

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation