Abstract
Deep learning techniques have made significant advancements in computer vision. The YOLO algorithm, a representative single-stage detection approach, has demonstrated remarkable results in detecting ship targets in SAR images. We introduce an enhanced ship target detection model for SAR images, utilizing the improved YOLOv7 object detection network. We incorporate the coordinate attention mechanism into the network to enable automatic detection and diagnosis of ship targets within SAR images. To enhance the robustness and positioning accuracy of the detection network, we replace the CIoU regression loss in YOLOv7 with the SIoU loss, reducing the complexity of the loss function. Additionally, we integrate the rotating target detection technology into the network to mitigate the impact of target overlap on detection results. Comprehensive experiments conducted on the Capella Space synthetic aperture radar datasets validate that the proposed methodology achieves superior performance in multiple evaluation metrics, including precision, recall, and mean average precision.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Ship target detection based on remote sensing images has become an increasingly important research for coastal countries, driven by advancements in remote sensing technology. Synthetic Aperture Radar (SAR) serves as an active microwave remote sensing system with several advantages. It operates effectively in all weather conditions and times of day. SAR is not influenced by natural factors such as extreme weather or light intensity [1]. The adaptability of SAR to the variability of oceanic climates makes it well-suited for comprehensive real-time ship target detection activities [2]. Consequently, the utilization of SAR images for ship detection has emerged as a prominent and research-intensive area within the field of target detection [3].
Ship target detection techniques utilizing SAR images can be categorized into two main groups, including traditional detection methodologies and deep learning-based detection procedures [4]. Traditional approaches to ship target detection in SAR images typically involve eliminating regions unlikely to contain the target from large scene images and selecting potential regions of interest. However, the aforementioned selected regions often include numerous false alarms due to noise, necessitating further target identification and classification. Additionally, traditional detection methods generally rely on gray statistics, edge information, and other fuzzy edge detection algorithms [5]. When applied to complex backgrounds such as offshore terminals, these traditional methods encounter challenges such as missed detections and high false alarm rates. Furthermore, the feature representations employed by traditional ship target detection techniques are typically designed manually. The interpretation of SAR images is highly dependent on the professional expertise and work experience of relevant personnel, leading to the detection and recognition algorithm being weak in robustness and generalization ability [6].
The rapid advancement of computing power, coupled with the emergence of artificial intelligence technologies, has significantly enhanced the training efficiency of deep learning approaches. The advancement enables efficient processing of multi-dimensional data and demonstrates substantial application potential in the fields of computer vision and object detection [7]. Deep learning, which is a subset of machine learning, involves constructing deep neural network models with multiple hidden layers to learn multi-layered feature information from targets. The aforementioned process entails deep-level feature transformation and extraction from original image data, leading to the abstract representation of targets in high-level features and facilitating target detection [8]. Deep learning techniques excel at extracting deep features through continuous training, demonstrating robust adaptive learning capabilities. Given the hierarchical structure and extensive parameterization of deep learning models, they are well-suited for adapting to large-scale data environments [9]. By integrating feature extraction and classification into a unified framework and leveraging data-driven feature learning, deep learning approaches effectively address the limitations of manual feature design inherent in traditional SAR target detection methods, which include being time-consuming, labor-intensive, and challenging to adapt to complex environments. At present, deep learning methods have been widely adopted in the field of image processing, consistently delivering exceptional performance [10]. Therefore, integrating deep learning technology into ship target detection tasks utilizing SAR images contains meaningful research significance, which is instrumental in advancing the development of SAR target detection technology.
This article introduces a ship target detection architecture for SAR images by integrating the advanced and robust deep learning-based object detection network models. We enhance and optimize the existing model structure to align with the characteristics of ship targets in the images, aiming to improve both the accuracy and efficiency of ship target detection. The enhancements include the implementation of a novel attention mechanism model, the refinement of the loss function of the network architecture, and the integration of rotating target detection technology with the circular smooth label algorithm. We designed experiments utilizing public SAR image datasets to validate the feasibility and effectiveness of the proposed architecture.
The main contributions of this paper can be summarized as follows:
-
1.
This paper incorporates the coordinate attention mechanism and develops an architecture based on the YOLOv7 object detection network. By integrating the coordinate attention mechanism, the object detection network concentrates on important regions and features from SAR images, thereby enhancing the performance of the ship target detection task. The coordinate attention mechanism supports the network in identifying tiny ship targets within SAR images and enables the object detection network to be versatile across diverse complex scenarios.
-
2.
The SCYLLA Intersection over Union (SIoU) loss is employed to replace the original Complete Intersection over Union (CIoU) regression loss in the YOLOv7 network model. The adjustment reduces the complexity of the loss function and elevates detection accuracy while mitigating false positives and the likelihood of missed targets, hence enhancing the robustness and positioning accuracy of the network. The SIoU loss measures the scale factor of the target during IoU calculation, which minimizes regression biases and improves target localization accuracy, enabling the network suitable for the ship target detection task. As a result, it enhances detection accuracy, reduces false detections, and minimizes missed targets.
-
3.
Due to the challenges associated with precise feature extraction from SAR images, false detections and missed detections often occur when dealing with densely distributed and complex images, such as ships in nearshore areas. This research implements the rotating box detection method based on the circular smooth label algorithm to achieve accurate positioning. The angle prediction is transformed into a high-precision classification task, addressing the boundary discontinuity problem and enhancing detection performance. The integration of the circular smooth label algorithm empowers the object detection network to analyze the shape and structure of the corresponding ship target, consequently minimizing label boundary ambiguities. The enhancement improves the capability of the network to generalize across ship targets across diverse scenes, scales, and poses.
The remainder of the paper is organized as follows: Section 2 introduces generic object detection methodologies and recent advancements in ship detection techniques. Section 3 provides a detailed description of the designed network model architecture. Section 4 presents the comparative experiment, offering a comprehensive analysis of the experimental results. Potential areas for improvement and future insights are discussed in Sect. 5, followed by a summary of the paper in Sect. 6.
2 Related works
Traditional ship target detection approaches based on SAR images have largely relied on the concept of semi-automatic target detection, with numerous studies conducted in this area [11]. Ai et al. introduced the Constant False Alarm Rate (CFAR) algorithm, incorporating two parameters, and subsequently developed a CFAR algorithm based on the K-distribution [12]. Most CFAR algorithms analyze SAR images pixel by pixel by employing local sliding windows. However, the aforementioned procedure involves multiple calculations for each pixel, leading to a low computational efficiency [13]. To mitigate the challenges posed by traditional imaging techniques, which frequently generate strong background clutter and high sidelobe interferences, Xu et al. employed machine learning approaches. They introduced a target-centric Bayesian compressive sensing imaging method, complemented by a region-adaptive extractor, which enhances radar image object perception tasks [14]. Nasrabadi et al. proposed a method that employs entropy, concave wavelet transform, and template matching from information theory to detect ship targets [15]. Additionally, Guo et al. developed an algorithm for SAR image target detection based on feature extraction [16]. Despite their contributions, traditional techniques exhibit multiple limitations, including strong dependence on manual intervention, low generalization ability, suboptimal detection accuracy, and extended detection times. Moreover, methods relying on image texture feature extraction necessitate manual design interventions, making the entire process complex, time-consuming, and challenging to ensure timely detection [17].
With the advent of Convolutional Neural Networks (CNNs) and the proliferation of artificial intelligence technologies, SAR ship object detection techniques based on deep learning approaches have rapidly evolved and demonstrated impressive detection performance. Various advanced SAR ship object detection methods based on deep learning technologies can be broadly categorized into two types, including two-stage detection models and single-stage detection models [18]. The two-stage detection model initially employs selective search or a region proposal network to generate and extract suggested regions from the input image. Subsequently, it utilizes the features of suggested regions to predict object categories and perform regression classification. For instance, Liu et al. developed a two-stage ship detection algorithm in SAR images based on Regional Convolutional Neural Networks (R-CNN) [19], while Lin et al. introduced a two-stage Faster R-CNN for ship detection in SAR images [20]. Xu et al. identified that prevailing deep learning-based SAR ship detection methods predominantly focus on single-polarization SAR images, overlooking the potential of dual-polarization characteristics. To overcome the limitation, they introduced a group-wise feature enhancement-and-fusion network. The network incorporates dual-polarization feature enrichment, aiming to enhance the accuracy of dual-polarization SAR ship detection [21]. In contrast, the single-stage detection model simplifies the object detection problem by treating it as a regression problem. It eliminates the regional proposal stage and utilizes a single convolutional neural network to directly predict the category probabilities and position coordinates of various objects. Compared to the two-stage detection model, the single-stage model streamlines the entire workflow, leading to higher recognition rates. Among various single-stage detection models, the You Only Look Once (YOLO) series techniques stand out for their speed, particularly excelling in recognizing relatively small targets [22]. As a result, it has gained widespread adoption in SAR ship detection tasks.
Numerous scholars have delved into SAR ship target detection methodologies based on the YOLO model architecture. Gao et al. applied a regression-based approach to establish the deep separation convolutional network utilizing the YOLOv4 model. They incorporated channel and spatial attention mechanisms to enhance the ship detection accuracy in SAR images [23]. Similarly, Guo et al. introduced an improved YOLOv5 detection method to address the multi-scale challenges of ship target detection in complex scenes [24]. Xu et al. utilized the YOLOv5 algorithm as the foundation and introduced a streamlined onboard SAR ship detector named Lite-YOLOv5. The variant minimizes the model size, reduces the computational overhead, and achieves onboard ship detection without compromising accuracy [25]. While these studies utilized conventional horizontal label boxes for detection, which require fewer parameters and simplify the model training process [26], they face limitations in scenes with complex images. Horizontal label boxes often encompass redundant background information, complicating classification and leading to inaccurate target representation. To address the aforementioned challenges, rotating target detection offers a viable solution. Rotation boxes eliminate overlap during object detection, enable precise target identification as well as localization amidst complex backgrounds, and broadly exclude background information from the detection box, reducing its influence on object classification. Sun et al. incorporated rotating target detection into SAR ship target detection by designing a circular smooth label algorithm and integrating it into the YOLOv5 detection network, achieving precise ship target positioning [27]. Despite these advancements, most rotating target detection models in the existing literature are based on YOLOv4 or YOLOv5. Notably, the YOLOv7 algorithm represents a more recent innovation within the YOLO series approaches. YOLOv7 introduces updated network architecture and auxiliary detection for preliminary result screening, enhancing both computational efficiency and detection accuracy [28]. While retaining the dynamic tag allocation strategy from previous versions, YOLOv7 further improves computational efficiency and detection accuracy, enabling it to be a promising candidate for SAR ship target detection [29].
To address the aforementioned challenges and enhance the efficiency as well as the accuracy of ship target detection in SAR images, in this research, we make improvements based on the YOLOv7 detection algorithm framework. Tailoring the loss function to align with the characteristics of ship target detection, we optimized the detection network. Inspired by the methodology proposed by Hou et al. [30], we incorporated the coordinate attention mechanism module to bolster ship target detection performance and effectiveness in SAR images. Additionally, we integrated rotating target detection technology into the model to mitigate the impact of target overlap on detection results. To validate the efficacy of the proposed model, we compared it with the aforementioned state-of-the-art techniques.
3 Methodology
3.1 Detection network
The YOLOv7 network model we constructed primarily consists of Input, Backbone, Head, and Prediction modules. The structure of the model is depicted in Fig. 1.
The Input module first scales SAR images to a standardized pixel size to align with the input size requirements of the architecture network. Following a series of preprocessing steps, including data augmentation, the images are forwarded to the Backbone module. The Backbone module comprises multiple BConv convolutional layers, Extended Efficient Layer Aggregation Network (E-ELAN) layers, and Max-Pooling Convolutional (MPConv) layers [31]. The BConv layer consists of a convolutional layer, a Batch Normalization (BN) layer, and an activation function. It serves to extract image features across various scales [32]. The E-ELAN layer architecture is an enhancement of the original ELAN structure. While retaining the transition layer structure from the original ELAN design, E-ELAN introduces diverse feature learning by guiding different feature-set blocks. Through mechanisms like expand, shuffle, and merge cardinality, E-ELAN enhances network learning capabilities without disrupting the original gradient flow [33]. Lastly, the MPConv convolutional layer broadens the receptive field of the current feature layer. Subsequently, it combines the expanded feature information with the output from standard convolution processing to bolster the generalization capabilities of the network [34].
The Backbone module extracts multiple features from the processed images. The extracted features are then fused utilizing the concat operation within the Head module to generate features of varying sizes. The Head module adopts a Path Aggregation Feature Pyramid Network (PAFPN) architecture, facilitating efficient feature fusion across different levels by introducing a bottom-up path that smoothly transfers information from the base to the top [35]. Within the Head module, the architecture incorporates both Spatial Pyramid Pooling and Convolutional Spatial Pyramid Pooling (SPPCSPC) structures. This SPPCSPC structure enhances the perceptual field of the network by integrating a Convolutional Spatial Pyramid (CSP) structure into the standard Spatial Pyramid Pooling (SPP). Additionally, it features a substantial residual edge to aid in optimization and feature extraction. By integrating multiple MaxPool operations in parallel with a sequence of convolutions, the design mitigates image distortion from processing operations and addresses the issue of redundant feature extraction in the CNN model [36]. The abbreviation CAM stands for Coordinate Attention Mechanism. In this research, three coordinate attention mechanisms are integrated into the Head module, positioned before the prediction head. This design aims to capture crucial feature representations essential for the downstream object detection task. The interior structure for each CAM is displayed in Fig. 1. The placement of the CAM within the Head module from the architecture of the object detection network is inspired by approaches proposed by Liu et al. [37] and Raj et al. [38]. While several researchers have integrated the attention mechanism into the Backbone module of the object detection network, the experimental results consistently demonstrate similar outcomes [39]. A detailed explanation of the inner components of the attention mechanism is provided in the following section. Subsequently, the fused features are directed to the Prediction module. The module adjusts the channel count for features of different scales from the PAFPN output employing RepVGG blocks (REP). It then employs convolution for predicting confidence scores, categories, and anchor frames [40]. Compared to its predecessors, the YOLOv7 detection network enhances feature extraction capabilities, striking a commendable balance between detection efficiency and accuracy. Figure 1 illustrates the core enhancements introduced in the research, including the coordinate attention mechanism, the SIoU loss, and the rotating target detection technology. The rotational target detection branch is integrated into the multi-tasking pipeline of the prediction component within the object detection network. The rotation detection branch utilizes the circular smooth label algorithm to predict output results. The placement of the rotating target detection technology within the overarching object detection network architecture is based on the approach proposed by Zhang et al. [41].
3.2 Attention mechanism
YOLOv7 demonstrates exceptional performance by generating a substantial volume of information. However, it potentially leads to information overload, necessitating a focused approach within the network, particularly on the object regions. In addressing the aforementioned challenge, attention mechanisms, which are widely employed in deep learning techniques and computer vision-relevant tasks, play a pivotal role. Attention mechanisms guide the model to emphasize specific information and locations crucial to the task, effectively reducing attention to less relevant data and mitigating information overload. The targeted focus enhances both efficiency and accuracy [42]. To bolster the precision of the detection network without introducing significant computational overhead, we incorporated a flexible coordinate attention mechanism. The architecture and flow of the coordinate attention mechanism are illustrated in Fig. 2.
The input feature graph X represents the output of the preceding layer of convolution. It has dimensions \(C\times H\times W\). The number of channels is C, H denotes the height, and W signifies the width. The average pooling of dimensions (H, 1) and (1, W) is used to encode information from each channel across the horizontal and vertical dimensions, respectively, which is the output of the \(c-th\) channel with height h and the \(c-th\) channel with width w.
The formulas are displayed as follows:
The two aforementioned transformations aggregate features along two spatial directions and subsequently cascade the resulting feature graphs \(z^h\) and \(z^w\). A convolution operation \(F_1\) with a kernel size of 1 is then applied to produce the intermediate feature graph f, capturing spatial information in both horizontal and vertical directions. The formula for this operation is as follows:
The intermediate feature graph f is partitioned into two separate tensors, \(f^h\) and \(f^w\), along the spatial dimension. Subsequently, these feature graphs \(f^h\) and \(f^w\) are expanded to match the channel count of the input X using two convolution operations, \(F_h\) and \(F_w\), each with a kernel size of 1. The formulas for these operations are below.
In the aforementioned equations, the operation \(\sigma \) represents the Sigmoid activation function. The operation scales the output to a range between 0 and 1, indicating the level of importance. \(g^h\) and \(g^w\) serve as attention weights. The final output formula is displayed in the equation as follows:
Consequently, the detection network can effectively focus on the relevant channels and spatial coordinates. The attention mechanism is integrated into the BConv convolutional layer of the Backbone module and the CatConv convolutional layer of the Head module. Therefore, the detection network can extract features from the target areas of interest more effectively, thereby improving the efficiency of model training.
3.3 Loss function
The loss function for the YOLOv7 detection network comprises three components, including the localization loss, the confidence loss, and the classification loss. The overall loss is calculated as the weighted sum of these three individual losses, as shown in the equation below. Both the confidence loss and the classification loss utilize the BCEWithLogits loss function, while the localization loss is computed using the Complete Intersection over Union (CIoU) regression loss.
The equation of the CIoU regression loss is shown below.
In the equations provided, b represents the predicted box, \(b_{gt}\) stands for the ground-truth box. c denotes the diagonal distance of the smallest enclosing region that can encompass both the predicted and ground-truth boxes. \(\alpha \) is the equilibrium parameter, and v measures the consistency of aspect ratios between the predicted and ground. When the aspect ratio of the predicted box matches that of the ground-truth box (v is 0), the penalty term for the aspect ratio becomes ineffective, destabilizing the CIoU loss function. To address this issue, we employ the SIoU loss function, which is proposed by Gevorgyan et al. [43], as a substitute in our object detection network. The SIoU loss function integrates angle cost considerations, thereby redefining the distance based on this angle cost and reducing the overall flexibility of the loss function. The parameters associated with the SIoU loss function are illustrated in Fig. 3.
The SIoU regression loss function consists of four parts: the angle cost, the distance cost, the shape cost, and the IoU cost. The equations of the four parts are shown below, respectively.
In the equations provided, \(\Lambda \) denotes the angle loss, \(\Delta \) represents the distance loss, \(\Omega \) signifies the shape loss, and the IoU stands for the Intersection over Union loss. Additionally, the distance loss calculation considers the loss of angles associated with the two boxes. The variable \(\theta \) is adjustable, determining the weight the network assigns to the shape loss. The angle loss integrated into SIoU primarily facilitates the calculation of the distance loss between the two boxes. During the initial stages of model training, the predicted box and the ground-truth box often do not intersect. Incorporating the angle loss accelerates the computation of the distance between these boxes, enabling quicker convergence of their distances. When the angle \(\alpha \) exceeds 45 degrees, the term \(\beta \) is utilized in the formula to replace \(\alpha \). It allows the network model to initially align the center point of the predicted box with that of the ground-truth box. Subsequently, the predicted box is guided to approach the ground-truth box along the relevant axis.
With the inclusion of the angle cost, the loss function achieves an increased comprehensive representation. The addition reduces the likelihood of the penalty term equating to zero. As a result, the stability of the convergence for the loss function is enhanced, leading to improved regression accuracy and a consequent reduction in prediction errors.
3.4 Rotating target detection
In recent years, rotating object detection technology has gained significant traction, particularly in text and image detection tasks. The technology proves especially effective when detecting targets that are densely distributed and exhibit a certain tilt angle [44]. In SAR images, ship targets often manifest at specific tilt angles, particularly in areas with densely arranged nearshore wharves. Utilizing rotating target detection can mitigate the impact of overlapping bounding boxes on detection outcomes.
While the current rotation detection techniques have demonstrated promising results, they still encounter certain challenges. One significant issue is the boundary discontinuity problem stemming from angle regression. To address the aforementioned problem, this research employs the concept of circular smooth label algorithm as proposed by Yang et al., which considers the angle as a classification problem [45]. We innovatively integrate the circular smooth label algorithm with the object detection network to enhance the performance of the architecture in addressing ship target detection tasks in SAR images. The challenges associated with angle-related algorithms based on regression methods primarily revolve around two issues, including the periodicity of the angle and the commutativity of the boundary. The angular periodicity problem arises due to the cyclic nature of angle parameters. In contrast, the boundary exchangeability problem is predominantly tied to the definition of the boundary frame [46]. The core issue leading to boundary discontinuity problems is the divergence of ideal prediction results beyond the predefined range. The divergence results in a significant spike in losses at the boundaries, complicating the regression of boundary boxes [47].
To address the boundary issue, the circular smooth label algorithm redefines the angle problem from its original regression format to a classification format. The transformation effectively resolves angular boundary challenges and synergizes well with the long-side definition method. Within the circular smooth label algorithm, the defined angles are segmented for better clarity and efficacy. A comparative analysis of angular classification methods is illustrated in Fig. 4.
As depicted in Fig. 4, the circular smooth label classification method employs circular label encoding characterized by periodicity. The assigned label values exhibit smooth transitions within a specified tolerance range. It ensures label continuity at boundaries, eliminating arbitrary accuracy errors stemming from the periodic nature of the circular smooth label classification. When the window function is represented by a pulse function or when its radius is relatively small, the one-hot label technique aligns with the circular smooth label classification methodology. The specific formulation of the circular smooth label classification algorithm is presented in the following equation:
In the given equation, g(x) denotes the window function, r signifies the radius of this window function, and \(\theta \) stands for the angle of the current bounding box. The ideal window function g(x) should satisfy the requirements delineated in the equations.
where \(T=180/\omega \) signifies the number of bins into which the angle is partitioned, with a default value set at 180.
In the equation, \(\theta \) is the center of the symmetry.
The equation above describes a monotonically non-increasing trend from the center point toward both sides. The aforementioned equations, introduced by Yang et al. [45], demonstrate the four essential properties of an ideal window function g(x). These pivotal properties include periodicity, as indicated in Eq. 17; symmetry, as denoted in Eq. 18; maximum, as represented in Eq. 19; and monotonicity, as demonstrated in Eq. 20.
Given that, the label value remains continuous at the boundary without arbitrary accuracy errors stemming from the periodicity of the circular smooth label algorithm. In addition, when the window function is a pulse function or when the radius of the window function is relatively small, the one-hot label or vanilla classification equates to the circular smooth label algorithm [48]. The angle prediction process within the circular smooth label classification methodology is delineated by the equations below.
4 Experiments
4.1 Experimental setup and evaluation metrics
To assess the performance of the designed ship detection network, experiments are conducted utilizing the Capella Open SAR dataset. The dataset comprises 995 images predominantly featuring two types of scenes, including far-sea and nearshore scenes. The SAR images have a ground range resolution of 0.73, a range resolution of 0.48 m, and an azimuth resolution of 0.5 m. The image dimensions are 21000 \(\times \) 21000 pixels. For training the detection network, images are cropped to a size of 512 \(\times \) 512 pixels. The model is trained using stochastic gradient descent (SGD) across two NVIDIA GeForce GTX 3090 graphics cards. The experiments are implemented employing PyTorch 1.12, with a batch size set to 24. The Adam optimizer is employed with a learning rate of 0.00125 and a cosine annealing schedule for training over 100 epochs. To evaluate the performance of the experiments, precision, recall, F-measure, and mean average precision (Mean AP) are used as evaluation metrics.
4.2 Ablation study
We conduct a comparative analysis on the Capella Open SAR dataset, contrasting the proposed approach with several advanced object detection methods, including the two-stage target detector Faster R-CNN [20], the one-stage target detector YOLOv3 with multi-target tracking (YOLOv3-MT) [49], YOLOv4 with attention mechanism (YOLOv4-AM) [23], and YOLOv5s-CBAM-BiFPN [24]. Additionally, the original YOLOv7-Tiny detection network without any enhancements is also evaluated [50]. In the aforementioned baseline approaches, Faster R-CNN, YOLOv4-AM, and YOLOv5s-CBAM-BiFPN are specifically tailored for ship target detection in SAR images. In Table 1, the highest value for each column is bolded, while the second-highest value is underlined. Our developed method achieved the top mean average precision while maintaining a high running speed. A qualitative comparison of the methods is illustrated in Fig. 5. As observed in Table 1, the proposed improved YOLOv7 detection network has significantly enhanced the mean average precision of SAR images compared to classical target detectors.
From the time consumption data presented in Table 1 for each baseline approach and the proposed enhanced object detection framework, it is evident that as the backbone model version of the YOLO network is upgraded, the inference time of the method decreases, highlighting the corresponding faster inference speed [50]. YOLO operates as an object detection algorithm by performing detections in a single feedforward neural network inference [33]. The efficiency and speed of object detection stem from its single-pass detection methodology. In contrast, Faster R-CNN adopts a two-stage detection process [16, 20]. Firstly, candidate regions are generated, followed by the corresponding classification. Due to this two-stage detection process, Faster R-CNN may not match the speed of YOLO. With the integration of the coordinate attention mechanism and the circular smooth label algorithm, the time consumption for the proposed enhanced object detection framework is marginally higher than that of the original YOLOv7-Tiny detection network without any enhancements. Nonetheless, considering the improvements in the multiple evaluation metrics, the slight increase in inference time is justifiable. It is worth noting that the actual inference time may vary based on different experimental settings.
To validate the effectiveness of the proposed object detection network for ship targets in SAR images, this research conducted experiments utilizing the publicly available Official-SSDD dataset [51]. The SSDD benchmark is notably the first publicly accessible dataset extensively employed by numerous researchers in the SAR ship detection community. The most recent version, termed the Official-SSDD dataset, is employed for the experiments. The SSDD benchmark comprises 1100 SAR images sourced from RADARSAT-2, TerraSAR-X, and Sentinel-1 satellites [52]. The SAR images exhibit resolutions ranging from 1 to 10 m and encompass radar polarizations such as VV, VH, HH, and HV. Specifically, the dataset contains 920 training samples and 180 testing samples in this research [53]. The experimental results are indicated in Table 2.
Analysis of the experimental results reveals that the proposed enhanced object detection network for ship targets outperforms advanced baseline competitors on the publicly available Official-SSDD benchmark dataset across all evaluation metrics in SAR images, achieving a margin of approximately 2% over the second-best results. The outcomes underscore the effectiveness of the designed enhanced object detection network. While there is a slight increase in computational time compared to the top-performing and second-best techniques, the notable enhancements across all evaluation metrics justify the time consumption expenditure.
4.3 Rotating frame detection
In the aforementioned experiments, it can be observed that misdetection and missed targets are partly attributable to the relatively dense nearshore targets. To address this issue, we implement a fusion rotating frame target detection network. The architecture of the existing object detection network is reconstructed utilizing the circular smooth label algorithm. During the training process, the batch size is set to 10. The Adam optimization algorithm is employed for gradient descent. The initial learning rate is defined at 0.01 with a cyclic learning rate also set at 0.01. The training consisted of 500 iterations. The final training results for the loss components are as follows: \(AngleLoss=0.1956\), \(BoxLoss=0.0758\), and \(ObjectLoss=0.0302\).
We select 265 images featuring nearshore features from the Umbra Open dataset for the experiments. The nearshore target environment is inherently more complex, leading to increased detection challenges. Consequently, the accuracy of detecting nearshore ship targets significantly influences the overall detection performance. In contrast, in far-shore scenarios, various deep learning methodologies exhibit comparable ship target detection results with most techniques yielding satisfactory outcomes. However, when focusing on nearshore ship target detection, the rotating target detection network architecture proposed in this study demonstrates clear advantages over other algorithms. It excels in accurately detecting targets amidst complex environments. A comparative analysis of the detection results obtained by different deep learning techniques on the dataset is presented in Fig. 6.
For an intuitive comparison of the object detection performance of various deep learning techniques in nearshore target environments, we present visual comparative results utilizing selected SAR images from the experimental dataset across seven object detection algorithms, as depicted in Fig. 7. The visual comparisons underscore the enhancements achieved through updates in the detection network backbone and the incorporation of the rotating target detection technique.
Figure 8 illustrates the partial detection results of the fusion rotating frame target detection network on the dataset. The boundary frames accurately encircle the targets across various scenarios. Whether it is a small-scale target in the far sea or a large-scale target in the near sea, the angle category is effectively predicted in the angle classification process presented in this research, allowing for the precise selection of the optimal boundary frame.
5 Discussion
The study of SAR ship target detection technology holds significant application value, especially in marine resource detection. As high-resolution SAR systems advance, SAR images now encompass more detailed information, laying a robust foundation for precise ship target detection in oceanic environments [54]. In this paper, we propose an enhanced SAR ship target detection model, constructed upon the existing deep learning target detection network architectures. Addressing challenges arising from the diverse and densely packed nature of ship targets in SAR images, we incorporate an attention mechanism architecture. Additionally, we refine the loss function of the original target detection network and integrate a rotating target detection algorithm. The aforementioned enhancements enable the detection of ship targets of varying sizes and significantly reduce the probability of missing ship targets, especially in dense nearshore scenes.
In SAR images, ship targets of various sizes coexist, which are often densely arranged, making the detection of small-sized ships particularly challenging. While this research achieves high accuracy in ship target detection utilizing a deep learning approach, several challenges and limitations remain, suggesting areas for future research and improvement. One significant challenge arises when SAR images are affected by substantial clutter interference during the imaging process. Such interference can degrade image quality, leading to issues like ghosting and incomplete ship target contours. Consequently, the performance of deep learning-based ship target detection in SAR images may suffer, sometimes resulting in false detections where partial structures of incomplete ships are misinterpreted as whole targets. Addressing the aforementioned challenge is crucial for enhancing the accurate detection of fragmented or incomplete ship targets. Another challenge emerges when ship targets occupy a relatively small proportion of large-amplitude SAR images. In such scenarios, deep learning-based detection methods may miss detecting these smaller targets. To mitigate the issue, manual image cropping and segmentation are often required to enlarge the target size for subsequent detection. Future research could focus on optimizing the existing models to improve the detection of small-sized ship targets, thereby reducing the dependency on manual interventions. Moreover, most existing deep learning-based SAR ship detection technologies primarily utilize the amplitude information of SAR images, overlooking the rich phase information inherent in SAR imaging. Unlike optical images, SAR imaging relies on the scattering characteristics of electromagnetic waves, which contain both amplitude and valuable phase information [55]. Incorporating the phase information as an additional input to the detection network could potentially enhance target detection and recognition capabilities, warranting further investigation and study.
6 Conclusion
This research presents an optimized object detection network, leveraging an enhanced version of the YOLOv7 algorithm, tailored specifically for detecting ship targets within SAR images. To bolster the detection capabilities of the network, we innovatively implement multiple pivotal components. Firstly, we incorporate the coordinate attention mechanism into the object detection network. By integrating the coordinate attention mechanism, the object detection network concentrates on crucial regions and features, thereby enhancing the accuracy and performance of the object detection task. Given the sparse presence of ship targets in SAR images, integrating the coordinate attention mechanism enables the object detection network to focus on pivotal areas and intricacies of the target. The coordinate attention mechanism aids the network in identifying and pinpointing tiny targets, making it capable of the subsequent ship target detection tasks. In addition, the inclusion of the coordinate attention mechanism amplifies the generalization capability of the model across targets of varying scales, shapes, and orientations, rendering it versatile across diverse complex scenarios. Secondly, we replace the conventional CIoU regression loss with the SIoU loss in the object detection network. The substitution aims to elevate detection accuracy while mitigating false positives and the likelihood of missed targets, bolstering the reliability of the network for the object detection task. The SIoU loss adeptly handles variations in object shape and scale, enhancing the robustness of the model in complex scenes and when dealing with occlusions. Considering the scale variability of ship targets in SAR images, the SIoU loss accounts for the scale factor of the target during IoU calculation. The adjustment minimizes regression biases and improves target localization accuracy, rendering the network qualified for the subsequent ship target detection task. Thirdly, recognizing the challenges posed by complex SAR images featuring densely packed ship targets in nearshore regions, we integrate rotating object detection technology into our framework. Furthermore, we incorporate the circular smooth label algorithm to enhance the detection and recognition of closely spaced ship targets. The innovative approach addresses the issues of reduced detection and recognition performance attributed to model errors and oversights. Conventional label approaches might result in imprecise bounding boxes due to variations in the shape, pose, or occlusion of the target. The integration of the circular smooth label algorithm allows the object detection network to adapt to the shape and structure of the target, consequently minimizing label boundary ambiguities. With the leveraging of the circular smooth label algorithm, the object detection network can acquire a generalized and robust feature representation of ship targets. The enhancement bolsters the capability of the network to generalize across ship targets across diverse scenes, scales, and poses. Through rigorous experimentation on public SAR datasets, our proposed architecture demonstrates superior accuracy without compromising significantly on speed. Specifically, it outperforms the second-best baseline competitor on the baseline datasets by approximately 2% in both precision and recall and surpasses the second-best baseline method by around 1.5% in Mean Average Precision (Mean AP). These results underscore the efficiency and effectiveness of the designed approach in ship target detection within SAR images.
The enhanced object detection network for ship target detection proposed in this research is not without limitations. Firstly, a significant limitation is that the developed technique primarily focuses on addressing the issue of ship target detection in SAR images. Therefore, future work will focus on applying the object detection network to broader categories of images, such as ship target detection in high-resolution remote sensing satellite images. Secondly, given the rapid advancements in deep learning approaches and their widespread application in object detection, it is worth exploring and incorporating the latest object detection networks to address the downstream ship target detection challenges. Finally, while integrating advanced enhancement modules into the object detection network can enhance algorithm performance, the associated increase in processing time remains an inevitable limitation. Balancing improved detection performance with model efficiency is a crucial challenge for future research.
Supplementary information
Not applicable.
Availability of data and materials
The authors will supply the relevant data in response to reasonable requests.
Code availability
The authors will supply the relevant source code in response to reasonable requests.
References
Mondini AC, Guzzetti F, Chang K-T, Monserrat O, Martha TR, Manconi A (2021) Landslide failures detection and mapping using synthetic aperture radar: Past, present and future. Earth Sci Rev 216:103574
Hamidi E, Peter BG, Muñoz DF, Moftakhari H, Moradkhani H (2023) Fast flood extent monitoring with sar change detection using google earth engine. IEEE Trans Geosci Remote Sens 61:1–19
Zhou Z, Chen J, Huang Z, Lv J, Song J, Luo H, Wu B, Li Y, Diniz PS (2023) Hrle-sardet: a lightweight sar target detection algorithm based on hybrid representation learning enhancement. IEEE Trans Geosci Remote Sens 61:1–22
Wang X, Liu J, Liu X, Liu Z, Khalaf OI, Ji J, Ouyang Q (2022) Ship feature recognition methods for deep learning in complex marine environments. Complex Intell Syst 8(5):3881–3897
Cui Z, Wang X, Liu N, Cao Z, Yang J (2020) Ship detection in large-scale sar images via spatial shuffle-group enhance attention. IEEE Trans Geosci Remote Sens 59(1):379–391
Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Methods Eng 27:1071–1092
Xu Y, Liu X, Cao X, Huang C, Liu E, Qian S, Liu X, Wu Y, Dong F, Qiu C-W (2021) Artificial intelligence: a powerful paradigm for scientific research. The Innovation 2(4):100179
Singh N, Sabrol H (2021) Convolutional neural networks: an extensive arena of deep learning. A comprehensive study. Arch Comput Methods Eng 28(7):4755–4780
Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2(3):160
Wang P, Fan E, Wang P (2021) Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recogn Lett 141:61–67
Sun L, Chen J, Feng D, Xing M (2021) The recognition framework of deep kernel learning for enclosed remote sensing objects. IEEE Access 9:95585–95596
Ai J, Tian R, Luo Q, Jin J, Tang B (2019) Multi-scale rotation-invariant haar-like feature integrated cnn-based ship detection algorithm of multiple-target environment in sar imagery. IEEE Trans Geosci Remote Sens 57(12):10070–10087
Chen S, Li X (2019) A new CFAR algorithm based on variable window for ship target detection in sar images. SIViP 13(4):779–786
Xu Y, Zhang X, Wei S, Shi J, Zeng T, Zhang T (2023) A target-oriented Bayesian compressive sensing imaging method with region-adaptive extractor for mmw automotive radar. IEEE Trans Geosci Remote Sensing
Nasrabadi NM (2019) Deeptarget: an automatic target recognition using deep convolutional neural networks. IEEE Trans Aerosp Electron Syst 55(6):2687–2697
Guo Y, Du L, Lyu G (2021) Sar target detection based on domain adaptive faster r-cnn with small training data size. Remote Sensing 13(21):4202
Wei X, Zhang S, Qi Q, Fu H, Qiu T, Zhou A (2021) Predicting malignancy and benign thyroid nodule using multi-scale feature fusion and deep learning. Pattern Recognit Image Anal 31:830–841
Yasir M, Jianhua W, Mingming X, Hui S, Zhe Z, Shanwei L, Colak ATI, Hossain MS (2023) Ship detection based on deep learning using sar imagery: a systematic literature review. Soft Comput 27(1):63–84
Liu L, Chen G, Pan Z, Lei B, An Q (2018) Inshore ship detection in sar images based on deep neural networks. In: IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 25–28. IEEE
Lin Z, Ji K, Leng X, Kuang G (2018) Squeeze and excitation rank faster r-cnn for ship detection in sar images. IEEE Geosci Remote Sens Lett 16(5):751–755
Xu X, Zhang X, Shao Z, Shi J, Wei S, Zhang T, Zeng T (2022) A group-wise feature enhancement-and-fusion network with dual-polarization feature enrichment for sar ship detection. Remote Sensing 14(20):5276
Ma P, Li C, Rahaman MM, Yao Y, Zhang J, Zou S, Zhao X, Grzegorzek M (2023) A state-of-the-art survey of object detection techniques in microorganism image analysis: from classical methods to deep learning approaches. Artif Intell Rev 56(2):1627–1698
Gao Y, Wu Z, Ren M, Wu C (2022) Improved yolov4 based on attention mechanism for ship detection in sar images. IEEE Access 10:23785–23797
Guo Y, Chen S, Zhan R, Wang W, Zhang J (2022) Sar ship detection based on yolov5 using cbam and bifpn. In: IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium, pp. 2147–2150. IEEE
Xu X, Zhang X, Zhang T (2022) Lite-yolov5: a lightweight deep learning detector for on-board ship detection in large-scene sentinel-1 SAR images. Remote Sensing 14(4):1018
Yang L, Liu Y, Yu H, Fang X, Song L, Li D, Chen Y (2021) Computer vision models in intelligent aquaculture with emphasis on fish detection and behavior analysis: a review. Arch Comput Methods Eng 28:2785–2816
Sun Z, Lei Y, Leng X, Xiong B, Ji K (2022) An improved oriented ship detection method in high-resolution sar image based on yolov5. In: 2022 Photonics & Electromagnetics Research Symposium (PIERS), pp. 647–653. IEEE
Li S, Li M, Li R, He C, Zhang L (2023) One-to-few label assignment for end-to-end dense detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7350–7359
Madjidi H, Laroussi T (2023) Approximate mle based automatic bilateral censoring CFAR ship detection for complex scenes of log-normal sea clutter in sar imagery. Digit Signal Process 136:103972
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722
Lai Y, Ma R, Chen Y, Wan T, Jiao R, He H (2023) A pineapple target detection method in a field environment based on improved yolov7. Appl Sci 13(4):2691
Subedi S, Bist R, Yang X, Chai L (2023) Tracking floor eggs with machine vision in cage-free hen houses. Poult Sci 102(6):102637
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475
Yuan B, Sun Z, Pei L, Li W, Hu Y, Mohammed A-S (2023) Airfield concrete pavement joint detection network based on dual-modal feature fusion. Autom Constr 151:104868
Lee S-H, Bae S-H (2023) Afi-gan: Improving feature interpolation of feature pyramid networks via adversarial training for object detection. Pattern Recogn 138:109365
Mishra A, Gupta P, Tewari P (2022) Global u-net with amalgamation of inception model and improved kernel variation for MRI brain image segmentation. Multimed Tools Appl 81(16):23339–23354
Liu K, Peng L, Tang S (2023) Underwater object detection using tc-yolo with attention mechanisms. Sensors 23(5):2567
Raj GD, Prabadevi B (2023) Steel strip quality assurance with yolov7-csf: a coordinate attention and siou fusion approach. IEEE Access 11:129493–129506
Cao L, Zheng X, Fang L (2023) The semantic segmentation of standing tree images based on the yolo v7 deep learning algorithm. Electronics 12(4):929
Zhu L, Lee F, Cai J, Yu H, Chen Q (2022) An improved feature pyramid network for object detection. Neurocomputing 483:127–139
Zhang R, Xie C, Deng L (2023) A fine-grained object detection model for aerial images based on yolov5 deep neural network. Chin J Electron 32(1):51–63
Tang F, Yang F, Tian X (2023) Long-distance person detection based on yolov7. Electronics 12(6):1502
Gevorgyan Z (2022) Siou loss: More powerful learning for bounding box regression. arXiv preprint arXiv:2205.12740
Liu Z, Cai Y, Wang H, Chen L, Gao H, Jia Y, Li Y (2021) Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions. IEEE Trans Intell Transp Syst 23(7):6640–6653
Yang X, Yan J (2020) Arbitrary-oriented object detection with circular smooth label. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pp. 677–694. Springer
Jiao Y, Zhu Q, He H, Zhao T, Wang H (2022) Rotating target detection based on lightweight network. In: PRICAI 2022: Trends in Artificial Intelligence: 19th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2022, Shanghai, China, November 10–13, 2022, Proceedings, Part III, pp. 619–630. Springer
Yang X, Yan J (2022) On the arbitrary-oriented object detection: classification based approaches revisited. Int J Comput Vision 130(5):1340–1365
Yang X, Yan J (2022) On the arbitrary-oriented object detection: classification based approaches revisited. Int J Comput Vision 130(5):1340–1365
Wang K, Liu M (2022) Yolov3-mt: a yolov3 using multi-target tracking for vehicle visual detection. Appl Intell 52(2):2070–2091
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475
Zhang T, Zhang X, Li J, Xu X, Wang B, Zhan X, Xu Y, Ke X, Zeng T, Su H (2021) Sar ship detection dataset (SSDD): official release and comprehensive data analysis. Remote Sensing 13(18):3690
Zhang T, Zhang X, Shao Z (2023) Saliency-guided attention-based feature pyramid network for ship detection in sar images. In: IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, pp. 4950–4953. IEEE
Gong Y, Zhang Z, Wen J, Lan G, Xiao S (2023) Small ship detection of sar images based on optimized feature pyramid and sample augmentation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Hou X, Ao W, Song Q, Lai J, Wang H, Xu F (2020) FUSAR-ship: building a high-resolution SAR-AIS matchup dataset of Gaofen-3 for ship detection and recognition. SCIENCE CHINA Inf Sci 63:1–19
Zhang L, Dong H, Zou B (2019) Efficiently utilizing complex-valued Polsar image data via a multi-task deep learning framework. ISPRS J Photogramm Remote Sens 157:59–72
Acknowledgments
The authors would like to thank the anonymous reviewers for their valuable comments. This article has been supported by the National Natural Science Foundation of China (61941113) and Science and Technology on Information System Engineering Laboratory (No: 05202104).
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
HZ and ZW were involved in conceptualization; ZW helped with methodology, validation, resources, and project administration; and HZ was involved in software, formal analysis, investigation, data curation, writing—original draft preparation, writing—review and editing, and visualization. All authors have read and agreed to the published version of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
We would like to confirm that this work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere. All the authors give their consent to publish the manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zou, H., Wang, Z. An enhanced object detection network for ship target detection in SAR images. J Supercomput 80, 17377–17399 (2024). https://doi.org/10.1007/s11227-024-06136-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-06136-3