Multiple features learning for ship classification in optical imagery

Huang, Longhui; Li, Wei; Chen, Chen; Zhang, Fan; Lang, Haitao

doi:10.1007/s11042-017-4952-y

Multiple features learning for ship classification in optical imagery

Published: 01 July 2017

Volume 77, pages 13363–13389, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Multiple features learning for ship classification in optical imagery

Download PDF

Longhui Huang¹,
Wei Li ORCID: orcid.org/0000-0001-7015-7335¹,
Chen Chen²,
Fan Zhang¹ &
…
Haitao Lang³

839 Accesses
40 Citations
Explore all metrics

Abstract

The sea surface vessel/ship classification is a challenging problem with enormous implications to the world’s global supply chain and militaries. The problem is similar to other well-studied problems in object recognition such as face recognition. However, it is more complex since ships’ appearance is easily affected by external factors such as lighting or weather conditions, viewing geometry and sea state. The large within-class variations in some vessels also make ship classification more complicated and challenging. In this paper, we propose an effective multiple features learning (MFL) framework for ship classification, which contains three types of features: Gabor-based multi-scale completed local binary patterns (MS-CLBP), patch-based MS-CLBP and Fisher vector, and combination of Bag of visual words (BOVW) and spatial pyramid matching (SPM). After multiple feature learning, feature-level fusion and decision-level fusion are both investigated for final classification. In the proposed framework, typical support vector machine (SVM) classifier is employed to provide posterior-probability estimation. Experimental results on remote sensing ship image datasets demonstrate that the proposed approach shows a consistent improvement on performance when compared to some state-of-the-art methods.

Ship Detection in Optical Satellite Images Based on Sparse Representation

Ship Recognition Based on Active Learning and Composite Kernel SVM

A novel ship classification network with cascade deep features for line-of-sight sea data

Article 22 April 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Ship recognition and classification algorithms in ocean area are important to enhancing maritime safety and security [12, 17]. The goal of vessel/ship classification is to recognize the type of ships in a given image with as much details as possible. Beyond the typical recognition challenges caused by partial occlusions and variations in scale [38], ships are especially difficult to recognize since ships’ appearance is usually significantly affected by external factors such as weather conditions (i.e., cloudy, sunny, etc), viewing geometry, and sea state [29]. Moreover, wide variation within-class in some types of vessels also makes vessels classification more complicated and challenging [2].

In recent years, numerous efforts have been investigated in the problem of vessel recognition in optical remote sensing imagery, achieving many diverse solutions to the problem [10, 11, 17]. In [15, 26], several well-studied feature extraction methods in facial recognition were applied to a small set of ship images. The feature extraction methods include global/holistic features: Principal Components Analysis (PCA) [25], Linear Discriminant Analysis (LDA) [4], Independent Component Analysis (ICA) [3], Random Projections (Rand) [42] and Multilinear PCA (MPCA) [25]; and local features: hierarchical multiscale Local Binary Pattern (HMLBP) [14] and Histogram of Oriented Gradients (HOG) [9]. HMLBP operates on multiple gray-scale texture to obtain scale-invariant local spatial structure and texture information of the image. HOG is essentially an image descriptor that represents the image by the distribution of local intensity gradients or edge directions, to capture the local object appearance and shape in the image. On the basis of previous work in [15], Rainey et al. [35] explored the efficacy of several object recognition algorithms at classifying ships and other ocean vessels in commercial panchromatic satellite imagery.

In addition to the above mentioned methods, the Bag of (visual) words (BOVW) [19], as the one of the most popular approaches in image analysis and classification applications, was also applied to ship classification. The BOVW, which is inspired by the bag of words representation used in text classification tasks, represents an image as a histograms of frequencies of a set of local descriptors such as Scale-Invariant Feature Transform(SIFT) [24]. In order to increase robustness of the feature, Rainey et al. [5] replaced Scale-Invariant Feature Transform (SIFT) descriptors in the BOVW mode with Pyramid of Histograms of visual Words (PHOW [34]) descriptors, which are extracted on a dense grid of key points and provide greater accuracy on resized non-degraded data. An algorithm for vessel classification, which combined BOW histograms with sparse representation based classification (SRC), was proposed in [36]. The original BOW and BOW with ℓ ₂-normalized term frequency (TF) weighting schemes were studied in [28] to the vessel type classification task in the maritime domain. In [1], the Local Binary Patterns (LBP [21]), which has high distinctiveness and low computational complexity, was developed.

According to the way of extracting features, these approaches can usually be partitioned into two categories: global (holistic) and local. Global features usually describe the image using a lower-dimensional space or histogram, and form the feature space by utilizing the entire image. Thus the global features are often easily implemented and have low computational cost, but their performances for classification are limited. Local features usually use a set of local descriptors to characterize the ship image. However, when the shapes of two classes are relatively similar, the local features are not discriminative enough to distinguish. Although the aforementioned feature extraction approaches (i.e., local features such as BOVW, and global features such as MPCA) have achieved satisfying performances in vessel classification, each type of feature has its own advantages and limitations. Moreover, it is known that different features have their own characters to capture different information of ships and certain features may be only suitable for one specific pattern. To this end, the global feature, which is complementary to the local features, is utilized to represent more discriminative information of a class.

Inspired by this observation, a multiple features learning (MFL) framework is proposed for ship classification in this work. The proposed method employs three types of features to simultaneously capture global structures and local details information of various ships. We employ Gabor-based multi-scale completed local binary pattern (MS-CLBP [7]), which fuses the benefits of Gabor wavelet and CLBP descriptor, to extract the global feature from images, and we treat it as the first type of features in our work. This global feature captures the scale-invariant and orientation-invariant spatial structure and texture information from the entire image. The local feature extraction method based on patch-based MS-CLBP and Fisher vector (FV [37]) is used to obtain a local feature representation of ship images which is considered as the second type of features. The patch-based MS-CLBP and FV has been demonstrated great success in image classification [16]. The BOVW, as a classical feature extract method in the object recognition domain, is employed in the proposed algorithm. We further combine BOVW and spatial pyramid matching (SPM [20]) as the third type of features, which is able to overcome the orderless of bag-of-features representation and enhance the spatial order structure of local features. The FV and BOVW are both visual code models, but the FV employs a soft assignment strategy while the BOVW employs a hard assignment strategy. In the FV model, we use MS-CLBP operator to extract local texture information from image patches. The BOVW model utilizes the SIFT descriptor to roughly represent edge information in an image patch. Note that both the FV and BOVW are employed in order to obtain more comprehensive local features of ship images.

In this paper, we simultaneously extract global and local features to comprehensively capture the characteristics of ships. After feature extraction, fusion strategies are followed for a final classification. It is well-known that the fusion strategies mainly include feature-level and decision-level fusion. In our scheme, feature-level fusion and decision-level fusion are both investigated in the proposed classification framework. Feature-level fusion combines different feature vectors together into a single feature vector. Decision-level fusion, which performs on probability outputs of each individual classification pipeline and combines the distinct decisions into a final one, can be divided into two types: “hard” fusion at the class-label level and “soft” fusion at the probability level. However, hard-fusion (e.g., majority voting (MV) rule [47]) may lead to a rough result. In this paper, we choose a “soft” fusion scheme, namely logarithm opinion pool (LOGP [22]), and verify its effectiveness by comparing it with the hard-fusion method (i.e., MV) in the experimental analysis.

There are two main contributions of this work. First, a multiple feature learning (MFL) method using three types of features including global and local features is proposed. In the proposed classification framework, feature-level and decision-level fusion are both investigated. Second, Gabor-based MS-CLBP is employed to extract global feature to compensate for the local feature, therefore taking full advantage of the complementary nature between global and local features. The proposed method is extensively evaluated using two public optical remote sensing ship image datasets. The experimental results verify the effectiveness of our proposed method as compared to some state-of-the-art algorithms.

The remainder of the paper is organized as follows. Section 2 describes the proposed approach, including the three types of features extraction and the feature-level as well as decision-level soft fusion strategy. Section 3 introduces two experimental datasets (i.e., BCCT 200 −resize [35] and VAIS [48]). Section 4 reports the experimental results and provides some analysis. Finally, Section 5 makes several concluding remarks.

2 Proposed classification framework

The flowchart of the proposed method is illustrated in Fig. 1. There are two fusion strategies for the proposed classification framework, i.e., feature-level fusion and decision-level fusion approaches, which are illustrated in Fig. 1a and b, respectively.

As shown in Fig. 1a, we first extract three kinds of features of the input image respectively. The first one is Gabor-based MS-CLBP. We use multi-orientation (e.g., π/8, π/4, π/2, etc.) Gabor filters to obtain multiple Gabor feature images, where the MS-CLBP is then employed. The second one is a local feature which is extracted by patch-based MS-CLBP and FV. We partition the image and its multi-scale versions into dense patches and using the CLBP descriptor to extracts a set of local patch descriptors. Then, FV encoding is used to encode the local patch descriptors into a discriminative representation. The third one is the combination of BOVW and SPM. We employ the dense SIFT descriptor by partitioning an image into small patches. Then k-means clustering is employed to generate the codebook which presents the visual words of the BOVW. The SPM is further employed to calculate the local features, which enhances the spatial order structure of local features. Finally, we employ the typical support vector machine (SVM) classifier [6] to obtain final classification performance. Details of the features extraction is presented in Section 2.1.

The decision-level fusion framework is shown in Fig. 1b. For each input image, three kinds of features are extracted separately. Then, for each of the three types of features, the typical SVM classifier is employed to calculate the probability estimates, respectively. Finally, the proposed method merges outputs of each individual classification pipeline using decision-level soft-fusion (i.e., LOGP) to obtain final classification result.

2.1 Feature extraction

2.1.1 Gabor-based MS-CLBP

Inspired by the success of Gabor filters and LBP in computer vision applications, we employ Gabor-based MS-CLBP as the first type of features in the proposed framework to extract global features of ship images. The features extraction process is summarized in Algorithm 1.

A Gabor wavelet [8, 30] transform can be viewed as an orientation-dependent bandpass filter. Its impulse response is defined by a sinusoidal wave multiplied by a Gaussian function. In 2-D (x, y) coordinate system, a Gabor filter, which includes a real component and an imaginary term, can be defined as,

$$ {G}_{\varepsilon ,\theta ,\psi ,\sigma ,\gamma }(x,y)=\exp \left( -\frac{{x^{\prime}}^{2}+{{\gamma }^{2}}{y^{\prime}}^{2}}{2{{\sigma }^{2}}} \right)\exp \left( i\left( 2\pi \frac{x^{\prime}}{\lambda }+\psi \right) \right), $$

(1)

where $ x^{\prime }=x\cos \theta +y\sin \theta $, $ y^{\prime }=-x\sin \theta +y\cos \theta $. Here, ε is the wavelength of the sinusoidal factor, 𝜃 is the orientation separation angle (e.g., π/8, π/4, π/2, etc.) of Gabor kernels, ψ represents the phase offset, σ is the standard derivation of the Gaussian envelope, and γ is the spatial aspects ratio specifying the ellipticity of the support of the Gabor function. Note that ψ = 0 and ψ = π/2 return the real and imaginary parts of the Gabor filter, respectively. Parameter σ is determined by ε and spatial frequency bandwidth bw as,

$$ \sigma =\frac{\varepsilon }{\pi }\sqrt{\frac{\ln 2}{2}}\frac{{{2}^{bw}}+1}{{{2}^{bw}}-1} . $$

(2)

The LBP [27] is a texture operator, which is able to characterize the spatial structure information of local image texture, and it has been widely employed in object recognition (e.g., texture classification, face recognition, etc). Given a pixel in the image, which gray value is denoted as g _c. Its neighboring pixels are equally spaced on a circle of radius. The resulting LBP for g _c in decimal number can be computed by comparing it with its neighbors,

$$\begin{array}{@{}rcl@{}} LB{{P}_{m, r}}\left( {{g}_{c}} \right)&=&\sum\limits_{p=0}^{m-1}{s\left( {{g}_{p}}-{{g}_{c}} \right){{2}^{p}}=\sum\limits_{p=0}^{m-1}{s\left( {{d}_{p}} \right){{2}^{p}}}}, \\ s({d}_{p})&=&\left\{ \begin{array}{cc} 1, &{{d}_{p}}\ge 0 \\ 0, &{{d}_{p}}<0 \end{array} \right. \end{array} $$

(3)

where g _p is the gray value of the neighbors, d _p = (g _c − g _p) represents the difference between the center pixel and each neighbor, r represents the radius of a circle, and m is the total number of involved neighbors. If the coordinate of g _c is (0, 0), the coordinates of g _p are denoted as $(-r\sin (2\pi {i}/{m}),r\cos (2\pi {i}/{m}))$. After the LBP coded image is generated by compute the LBP values of all pixels in the image, a histogram is calculated to represent the texture image. Nevertheless, the LBP only uses the sign information of d _p while ignoring the magnitude information.

In the improved CLBP [13], the sign and magnitude information, i.e., CLBP-Sign (CLBP_S) and CLBP-Magnitude (CLBP_M), is complementary. The CLBP_S is equivalent to the traditional LBP operator and the CLBP_M operator is expressed as,

$$\begin{array}{@{}rcl@{}} CLBP\_{{M}_{m,r}}\left( {{g}_{c}} \right)&=&\sum\limits_{p=0}^{\text{m}-1}{f\left( \left| {{g}_{p}}-{{g}_{c}} \right|, \eta \right)=\sum\limits_{p=0}^{m-1}{f\left( \left| {{d}_{p}} \right|,\eta \right)}}, \\ f(x,y)&=&\left\{ \begin{array}{cc} 1, &{{d}_{p}}\ge \eta \\ 0, &{{d}_{p}}< \eta \end{array} \right. \end{array} $$

(4)

where η is a threshold that is usually set to the mean value of $\left | {{d}_{p}} \right |$. The syncretic CLBP can describe both the spatial and depth information by mapping the CLBP_S and CLBP_M into histograms. To make the feature scale invariant, the multi-scale representation of CLBP, termed as MS-CLBP, is considered. In our work, the multi-scale CLBP representation is formed by concatenated the CLBP_S and CLBP_M histogram features extracted at different scales which are obtained by down-sampling the original image using the bicubic interpolation [7]. An example of the implementation of a 3-scale CLBP operator is illustrated in Fig. 2.

2.1.2 Patch-based MS-CLBP and FV

As noted in [6], image representation method based on patch-based MS-CLBP and FV can well extract the local feature of image and has achieved great performance in remote sensing image scene classification. In this work, we first use regular grid to partition an entire image into B × B overlapped patches. For simplicity, the overlap between two patches is half of the patch size (i.e., B/2) in both horizontal and vertical directions. For each patch, we employ MS-CLBP with the second implementation capture the spatial pattern and the contrast of local image texture. If M patches are extracted from the multi-scale images, we can obtain a set of local patch descriptors and form a feature matrix denoted by H = [h ₁, h ₂, ..., h _M], where h _i is the CLBP histogram feature vector extracted from patch i.

After local feature extraction, FV encoding is used to encode the local patch descriptors into a discriminative representation. Let X = {x _i, i = 1, ⋯, N} be the set of local patch descriptors extracted from an image. A Gaussian mixture model (GMM), which probability density function is denoted as u _λ with parameter λ, is trained on the training images using Maximum Likelihood (ML) estimation [23]. The image can be characterized by gradient vector,

$$ \mathbf{G}_{\mathbf{\lambda} }^{\mathbf{X}}=\frac{1}{N}{{\nabla }_{\mathbf{\lambda} }}\log {{u}_{\mathbf{\lambda} }}(\mathbf{X}) . $$

(5)

The gradient of the log-likelihood describes the direction along which parameters are to be adjusted to best fit the data. The GMM u _λ can be represented as

$$ {{u}_{\mathbf{\lambda} }}=\sum\limits_{t=1}^{T}{{{\omega }_{t}}{{u}_{t}}(\mathbf{x})} . $$

(6)

We denote λ = {ω _t, μ _t, Σ_t, t = 1, ..., T}, where ω _t, μ _t and Σ_t are the mixture weight, mean vector, and covariance matrix of Gaussian μ _t, and T is the number of Gaussian in the GMM. Under an independence assumption, the covariance matrices are diagonal, i.e., ${{\Sigma }_{t}}=\text {diag}({\sigma _{t}^{2}})$. We make use of Fisher kernel (FK) of [18] to measure the similarity between two samples x and y, and it is defined as,

$$ \mathbf{S}(\mathbf{x},\mathbf{y})=\mathbf{G}{{_{\mathbf{\lambda} }^{\mathbf{x}^{\prime}}}}\mathbf{F}_{\mathbf{\lambda} }^{-1}\mathbf{G}_{\mathbf{\lambda} }^{\mathbf{x}}, $$

(7)

where F _λ is the Fisher information matrix of u _λ: ${{\mathbf {F}}_{\mathbf {\lambda } }}={{\mathbf {E}}}[\nabla _{\mathbf {\lambda } }^{{}}\log {{u}_{\mathbf {\lambda } }}(\mathbf {x}){{\nabla }_{\mathbf {\lambda } }}\log {{u}_{\mathbf {\lambda } }}{{(\mathbf {x})}^{\prime }}]$. Following the diagonal closed-form approximation of [31], the FK can be rewritten as a dot-product between normalized vectors ${{\mathbb {G}}_{\mathbf {\lambda } }}$ with,

$$ \mathbb{G}_{\mathbf{\lambda} }^{\mathbf{x}}=\mathbf{F}_{\mathbf{\lambda} }^{-1/2}\mathbf{G}_{\mathbf{\lambda} }^{\mathbf{x}} . $$

(8)

Let γ _i(t) be the occupancy probability, i.e., the probability of descriptor x _i generated by the Gaussian u _t,

$$ {{\gamma }_{i}}(t)=\frac{{{\omega }_{t}}{{u}_{t}}({{\mathbf{x}}_{i}})}{\sum\limits_{j=1}^{T}{{{\omega }_{j}}{{u}_{j}}({{\mathbf{x}}_{i}})}}. $$

(9)

Let $\mathbb {G}_{\mu ,t}^{\mathbf {X}}$ (resp. $\mathbb {G}_{\sigma ,t}^{\mathbf {X}}$) be the gradient with respect to the mean μ _t (resp. standard deviation σ _t) of Gaussian t. Mathematical derivations lead to,

$$ \mathbb{G}_{\mu ,t}^{\mathbf{X}}=\frac{1}{N\sqrt{{{\omega }_{t}}}}\sum\limits_{i=1}^{N}{{{\gamma }_{i}}(t)\left( \frac{{{\mathbf{x}}_{i}}-{{u}_{t}}}{{{\sigma }_{t}}}\right)}, $$

(10)

and

$$ \mathbb{G}_{\sigma ,t}^{\mathbf{X}}=\frac{1}{N\sqrt{2{{\omega }_{t}}}}\sum\limits_{i=1}^{N}{{{\gamma }_{i}}(t)\left[\frac{{{({{\mathbf{x}}_{i}}-{{u}_{t}})}^{2}}}{{{\sigma }_{t}}}-1\right]} . $$

(11)

The final gradient vector is just the concatenation of the $\mathbb {G}_{\mu ,t}^{\mathbf {X}}$ and $\mathbb {G}_{\sigma ,t}^{\mathbf {X}}$ vectors for t = 1, ..., T. Therefore, the dimensionality of the FV is (2D + 1) × T, where D denotes the size of the local descriptors.

Each training image yields a feature matrix representing the local patch descriptors using patch-based MS-CLBP operator. Then, all the feature matrices of the training data are used to estimate the GMM parameters via the Maximum Likelihood (ML) algorithm. After that, the FV is utilized to generate the final feature representation. The features extraction process is summarized in Algorithm 2. Figure 3 illustrates the detailed procedure for generating FV of training images. For the testing image, we use the patch-baesd MS-CLBP feature extraction method shown in the Fig. 3 to obtain local descriptors. Then the Fisher Kernel representation is utilized based on the GMM which is estimated from the training data to obtain the final FV feature. The detailed process for generating FV is shown in Fig. 4.

2.1.3 BOVW and SPM

In recent years, the BOVW model, which is a classical local feature representation method, has demonstrated its outstanding performance in several object recognition tasks such as face recognition. Since traditional BOVW model ignores spatial and structural information, we affiliate SPM to the BOVW framework since the SPM can capture the spatial arrangement of images. The BOVW features extraction process is summarized in Algorithm 3, and the overall description of combination feature of BOVW and SPM is summarized in Algorithm 4.

First of all, we partition image into small blocks using regular grid and then we extract local descriptor utilizing the SIFT operator on each block whose center is viewed as a key point. An entire image is partitioned into π × π overlapped blocks and the overlap between two blocks is half of the block size (i.e., π/2) in both horizontal and vertical directions.

In doing this, we can obtain a set of local descriptors. Then, k-means clustering is employed to generate the codebook which presents the visual words of the BOVW. After that, we encode the patches based on the codebook using vector quantization and calculate the frequent histogram. The SPM is further employed to enhance the spatial order structure of local features. In the SPM framework, an image is partitioned into 2^l × 2^l segments in different scales (e.g., l = 0, 1, 2), and the BOVW histogram within all the segments is calculated respectively. The final feature representation of the image is the concatenation of all the histograms. An example of the implementations of the SPM is illustrated in Fig. 5.

2.2 Feature-level fusion

After feature extraction, feature-level fusion and decision-level fusion are both investigated for final classification. Figure 1a illustrates the feature-level fusion [43] employed in this work. For multiple feature learning, the Gabor-based MS-CLBP extracts spatial structure and texture feature; the local feature extract method based on patch-based MS-CLBP and FV is visual code model with soft assignment strategy; and another local feature extracts method combined of BOVW and SPM is visual code model with hard assignment strategy. For various classification tasks, these features have their own advantages and disadvantages, and it is difficult to determine which one is always optimal [21]. Thus, we use feature-level fusion strategy to fusion three types of features by straightforward stacking feature vectors into a composite one. The specific implementation of feature fusion process is represented in Algorithm 5. To modify the scale of feature values, feature normalization before feature stacking is a necessary preprocessing step.

2.3 Decision-level fusion

Decision-level fusion [32] is to merge results from each individual classification pipeline and combines distinct classification results into a final decision which can improve the accuracy of a single classifier that uses a certain type of features. Score level fusion [39, 45, 46] is a special case of decision-level fusion, and it is equivalent to the soft fusion of decision-level fusion. The goal here is to utilize score level fusion combine the posterior-probability estimations provided from each individual classification pipeline. In this paper, we employ LOGP soft-fusion rule, also known as product rule [44] in score level fusion scheme. The process is illustrated in Fig. 1b.

Assume ${{p}_{i}}({{y}_{k}}\left | \mathbf {x} \right .)$ be the conditional class probability of the i th classifier in class k(0 < k ≤ C) (C is the number of classes and y _k indicates the kth class to which a sample x belongs.). The LOGP rule [33] utilizes individual conditional class probabilities of each classifier to estimate a global membership function $\mathbf {P}({{y}_{k}}\left | \mathbf {x} \right .)$, which is a weighted product of these output probabilities. The final class label is given according to,

$$ {{y}^{*}}=\arg \underset{k=1, ...,C}{\max}\,= \mathbf{P}({{y}_{k}}\left| \mathbf{x} \right.), $$

(12)

where the global membership function is defined as,

$$ \mathbf{P}({{y}_{k}}\left| \mathbf{x} \right.) = \underset{i=1}{\overset{3}{\prod}}\,{{p}_{i}}{{({{y}_{k}}\left| \mathbf{x} \right.)}^{{{\alpha}_{i}}}}, $$

(13)

or

$$ \log \mathbf{P}({{y}_{k}}\left| \mathbf{x} \right.) = \sum\limits_{i=1}^{3}{{{\alpha}_{i}}{{p}_{i}}({{y}_{k}}\left| \mathbf{x} \right.)}, $$

(14)

where $\{{{\alpha }_{i}}\}_{i=1}^{3}$ is the classifier weights uniformly distributed over all of the classifiers. The overall description of decision-level fusion is summarized in Algorithm 6.

3 Experimental datasets

To evaluate the efficacy of the proposed MFL approach for ship classification, we conduct extensive experiments using optical image datasets. In the experiments, a library for SVM^{Footnote 1} is employed for classification, which is able to provide posterior-probability estimation for each type of features.

The first dataset is an overhead satellite scene referred to as BCCT200-resize which is detailed in [35]. It consists of small grey-scale ship images chipped out of larger electro-optical satellite images by the RAPIER Ship Detection System. The chips have been pre-processed to be rotated and aligned to have uniform dimensions and orientation. The dataset contains 4 different ship categories (i.e., barges, cargo ships, container ships, and tankers). Each class contains 200 images of size 150 × 300 pixels. Examples of each class in this data set can be seen in Fig. 6.

The second dataset is the original BCCT200 dataset, in which the images are unprocessed and display ships under various orientations and resolutions. Such variations makes the dataset more challenging. The dataset includes the following classes: barges, container ships, cargo ships and tankers. 200 images have been collected from each class which are of non-uniform size. Example images of each class are shown in Fig. 7.

In order to facilitate a fair comparison, we follow the same experimental setup reported in [35] for the above two datasets. Five-fold cross-validation is performed, in which the dataset is randomly partitioned into five equal and disjoint subsets. There are 40 images from each ship class in a subset. For each run, a different subset is held out for testing and the remaining four are used for training. The classification accuracy is average results over the five cross-validation evaluations.

The third dataset used in our experiments, referred to as VAIS [48], is the world’s first publicly available dataset of paired visible and infrared ship imagery. The dataset consists of 2865 images (1623 visible and 1242 IR), of which there are 1088 corresponding pairs. The dataset includes 6 coarse-grained categories: merchant ships, sailing ships, medium passenger ships, medium “other” ships, tugboats and small boats. The area of the visible bounding boxes ranged from 644 − 6350890 pixels, with a mean of 181319 pixels and a median of 13064 pixels. Example bounding box images are shown in Fig. 8.

The dataset is partitioned into “official” train and test splits. All images in this dataset greedily assigned from each named ship to either partition. This resulted in 539 image pairs and 334 singletons for training, and 549 image pairs and 358 singletons for testing. In our experiments, we only use the visible ship imagery category. Followed the same pre-process of image as deep convolutional neural network (CNN) algorithm in [48], we resize each crop to the size using bicubic interpolation. Note that the original images in these two experimental datasets are color images, the images are converted from the RGB color space to the YCbCr color space, and the Y component (luminance) is used for ship classification.

4 Experimental results and analysis

4.1 Parameters setting

First of all, we investigate optimal parameters for each type of features. For the first one (i.e., the Gabor-based MS-CLBP), according to [30], the orientations 𝜃 of Gabor filtering set as $\left [ 0, \frac {\pi }{8}, \frac {\pi }{4}, \frac {3\pi }{8}, \frac {\pi }{2}, \frac {5\pi }{8}, \frac {3\pi }{4}, \frac {7\pi }{8} \right ]$ and the spatial frequency bandwidth set 5 for both two experimental datasets. The result of Gabor filtering of a sample image is shown in Fig. 9. Then we estimate the optimal parameter pair (m, r) for the CLBP operator. In this experiment, we vary the CLBP parameter sets, and fix parameters for others. The classification results are listed in Tables 1 and 2 for two experimental datasets, respectively.

Table 1 Classification accuracy (%) of with the first feature different parameters (m, r) for the BCCT200-resize dataset

Multiple features learning for ship classification in optical imagery

Abstract

Similar content being viewed by others

Ship Detection in Optical Satellite Images Based on Sparse Representation

Ship Recognition Based on Active Learning and Composite Kernel SVM

A novel ship classification network with cascade deep features for line-of-sight sea data

Explore related subjects

1 Introduction

2 Proposed classification framework

2.1 Feature extraction

2.1.1 Gabor-based MS-CLBP

2.1.2 Patch-based MS-CLBP and FV

2.1.3 BOVW and SPM

2.2 Feature-level fusion

2.3 Decision-level fusion

3 Experimental datasets

4 Experimental results and analysis

4.1 Parameters setting

4.2 Classification results and analysis

5 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation