Keywords

1 Introduction

Oceans are like the lifeblood of Mother Nature, holding 97% of the earth’s water. They produce more than half of the oxygen and absorb most of the carbon from our environment. Maintaining these and other oceanic ecosystem services requires maintenance of critical marine habitats. Important among these are seagrass meadows and coral reefs, which are critical to marine foodwebs, habitat provision and nutrient cycling [29]. For example, dredging, physically remove benthic marine species, like seagrasses, can lead to their burial and can reduce the light necessary for photosynthesis [3]. Tourism, shipping, urbanization and human intervention are damaging coral colonies, with 19% of the world’s coral reefs having been destroyed by 2011 and 75% threatened [4]. Monitoring is an important aspect of any robust effort to manage these destructive impacts but can be an arduous task. Marine optical imaging technology offers enormous potential to make monitoring more efficient, in terms of both cost and time.

Many marine management strategies incorporate remote sensing and tracking of marine habitats and species. In recent years, the use of digital cameras, autonomous underwater vehicles (AUV) and unmanned underwater vehicles (UUV) has led to an exponential increase of availability of underwater imagery [9]. The Integrated Marine Observing System (IMOS) collects millions of images of coral reefs around Australia, but less than 5% go through expert marine analysis. For the National Oceanic and Atmospheric Administration, the rate is even lower, only 1–2% [1]. For this reason, it is now a research priority to analyse marine digital data automatically. To solve this issue, deep learning, the state-of-art machine learning technology, provides potentially unprecedented opportunities for many underwater objects [12].

Low-level manually designed features have been used in traditional classification solution so far. Face and texture classification is done by Gabor and Local Binary Patterns (LBP) while features and object recognition is regularly done by Scale Invariant Feature Transform (SIFT) and Histogram of oriented gradients (HOG) hand-crafted features. In the case of specific task and data, careful execution of hand-crafted features have achieved good performances. But many of them cannot be reused for a new situation without core changing. Moreover, Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA) and other machine learning conventional tools are quickly saturated when the training data volume increases. Hinton et al. [5] proposed learning features using deep neural networks (DNNs) to address these short comings. To make sense of texts, images, sounds etc., deep learning transforms input data through more layers than shallow learning algorithms [19]. At each layer, the signal is transformed by a processing unit, like an artificial neuron, whose parameters are ‘learned’ through training [20]. Deep learning is replacing handcrafted features, with efficient algorithms for feature learning and hierarchical feature extraction [21]. Deep learning attempts to make better representations of an observation (e.g. an image) and create models to learn these representations from large-scale data.

By the use of large amounts of training data, large and deep networks demonstrated excellent success. For example, convolutional neural network which is trained through ImageNet has achieved unprecedented accuracy in image classification [6]. They have been applied in the field of object detection [7], image classification [6], face verification [22], digits and traffic signs recognition [23] etc. and demonstrated high performance. However, deep learning has not been widely applied in marine object detection and classification.

A survey on the current deep learning approaches for various marine object detection and classification would help researchers understanding the challenges and explore more efficient possibilities. To the best of our knowledge, this paper is the first survey on such approaches.

The rest of the paper is organized as follows. The existing approaches for automated marine object detection on digital data are discussed in Sect. 2. Associated challenges, especially for seagrass identification have been outlined in Sect. 3 and finally, conclusions are drawn in Sect. 4.

2 Approaches for Underwater Marine Object Detection

All the known machine learning approaches especially those using deep neural network in digital marine data analysis, image annotation, object detection and classification are discussed in this section. The approaches are categorized according to the object of detection. Features and classifiers used in each of the approaches are also highlighted and summarized in Table 1 and discussed in the follow sections.

Table 1. Summary of deep learning approaches for marine object detection

2.1 Deep Learning in Fish Detection and Classification

Before 2015, very few attempts were taken to integrate deep learning on fish recognition. Haar classifiers were used by Ravanbakhsh et al. [13] to classify shape features. Principal Component Analysis (PCA) modelled the features. To get a balance of accuracy and processing time for underwater fish detection, Spampinato et al. [15] used moving average algorithm. Both of these methods have limited ability to process large amount of underwater imagery. Li et al. [8] first introduced deep convolution network for fish detection and recognition. They used Fast Region-based Convolutional Neural Network (Fast R-CNN) to detect fish efficiently and accurately. They also constructed a clean fish dataset of 24272 images over 12 classes, a subset of ImageCLIEF training and test dataset. As illustrated in Fig. 1, they pre-trained an AlexNet on a large auxiliary dataset (ILSVRC2012) with five convolutional layers and fully connected three layers by caffe CNN library which is an open source one. They modified AlexNet so that the Fast R-CNN can be adopted to train the Fast R-CNN parameters; they used stochastic gradient descent (SGD). Their experimental outcome showed better performance with a higher maximum a posteriori estimation (mAP). They got an average 9.4% higher precision than Deformable Parts Model (DPM). Table 2 shows the performance of their approach in fish detection compared to different other approaches using non-deep learning techniques.

Fig. 1.
figure 1

(adapted from [8]).

Architecture of fish detection and recognition using fast R-CNN

Table 2. Fish recognition accuracy comparison (adopted from [12])

Villon et al. [25] evaluated the effectiveness of the deep learning against Ground-Truth dataset made by the Fish4Knowledge project. They also compared the performance of deep learning for fish detection with a traditional system combined with Support Vector Machines (SVM) classification and HOG feature extraction. The architecture of their deep network had nine inception layers, 27 layers with a soft max classifier and was inspired by the GoogleNet [32].

2.2 Deep Learning in Plankton Classification

Planktons are frequently the foundation for aquatic food webs and therefore are frequently monitored as indicators of ecosystem condition. Conventional plankton monitoring and measurement systems are not adequate to meet the scope of large scale studies. In 2015, The National Data Science Bowl [30], a data science competition, was held to classify the images of plankton with the support of Hatfield Marine Science Centre of Oregon State University. The winning team was a group of researchers lead by Prof. Joni Dambre from Ghent University in Belgium using convolutional neural network. While it generally thought that enormous datasets are required for the deep learning approaches, the classification accuracy in this case was 81.52% where there were about 30000 examples for 121 classes and some of the classes had less than 20 examples in total. The winning team’s output feature maps were the same as the input maps and the pooling and overlapping had window size 3 and stride 2. By starting with a fairly shallow model of six layers and, gradually increasing the number of layers, the final structure had 16 layers. To give network the ability to use the same feature extraction pipeline to look at the input from different angles, a cyclic pooling technique was used where the same stack of convolutional layers were applied and fed into a stack of dense layers and at the top the feature maps were pooled together. Finally, the stacks of cyclic pooling output feature maps from different orientations were combined into one large stack and then the next layer was learned on this combined input which adds four times more filters than it actually had. The operation that combines feature maps from different orientations was named a ‘roll’ (Fig. 2).

Fig. 2.
figure 2

(adopted from [31]).

Roll operation with cyclic pooling

Using the same dataset of National Dataset Bowl of 2015 and inspired by GoogleNet, another published approach of plankton classification was done by Py et al. [26]. They proposed and developed an inception module with a convolutional layer for distortion minimization and maximization of image information extraction. Inside the network, improved utilization of computing resources was the hallmark of their network architecture. Data augmentation was done to co-op with rotational and translational invariant and rotational affine was applied to data augmentation. They divided a deep convolutional Neural Network into classifier part and feature part. But they found, this kind of design of classifier part is prone to overfitting if the dataset is not large enough and, replacing the last two fully connected layers with small kernels was better for such dataset. Performance of their model was better than the state of the art models for particular size of images [26].

Deep network approach for classification of plankton using a much larger dataset was done by Lee et al. [27]. They worked with the WHOI-Plankton dataset (developed by Woods Hole Oceanographic Institution) which had 3.4 million expert-labeled images of 103 classes. In their approach, they mainly focused on solving the class imbalance problem of a large dataset. For the reduction of bias from class imbalance, they chose the CIFAR 10 CNN model as a classifier. Their proposed architecture had three convolutional layers followed by two fully connected layers. At first their classifier was pre-trained on class normalized data and then re-trained on the original data which helped reducing the class imbalance biasness [27].

Introduction of deep convolutional network solely for the classification of Zooplankton was done by Dai et al. [28]. Their dataset was consisting of 9460 microscopic and grey scale zooplankton images of 13 different classes captured by ZooScan system. They proposed a new deep learning architecture called ZooplanktoNet for zooplankton classification which is strongly inspired by AlexNet and VGGNet. After experimenting with different sizes of convolution, they concluded that ZooplanktoNet with 11 layers can provide the best performance so far. To support their claim, they did a comparative experiment with other deep learning architectures like AlexNet, CaffeNet, VGGNet and GoogleNet and found that ZooplanktoNet performs better with an accuracy of 93.7% [28].

2.3 Deep Learning in Coral Classification

The color, size, shape and texture of corals may vary according to the class difference. Moreover, the boundary differences are ambiguous and organic. Furthermore, currents, algal blooms, density of planktons can change the turbidity of water and light availability, affecting the image color. These kinds of challenges make conventional annotation techniques like, bounding boxes, situ analysis in line or point transects, image labels or full segmentation inappropriate [1, 16].

Local Binary Pattern (LBP) for texture and Normalized Chromaticity Coordinate (NCC) for color were used by Shiela et al. [14]. They used a three layer back propagation neural network for classification purposes. However, Beijbom et al. [1] first addressed automated annotation on a large scale for coral reef survey image by introducing the Moorea Labelled Corals (MLC) dataset. They proposed a method based on color and texture descriptors over multiple scales and it out performed traditional methods for texture classification. Elawady et al. [24] used supervised Convolutional Neural Networks (CNNs) for coral classification. They worked on Moorea Labeled Corals and Heriot-Watt University’s Atlantic Deep Sea Dataset and computed Phase Congruency (PC), Zero Component Analysis (ZCA) and Weber Local Descriptor (WLD). With spatial color channels they also considered shape and texture features for input images [24].

For making the conventional point-annotated marine data compatible to the input constraints of CNNs, Mahmood et al. [10] proposed a feature extraction scheme based on Spatial Pyramid Pooling (SPP) (as shown in Fig. 3). They used deep features extracted from the VGGNet [10] for coral classification. They also combined texton and color based hand-crafted features to improve capability of classification. The block diagram of the combined approach is illustrated in Fig. 4.

Fig. 3.
figure 3

(adopted from [10]).

Local-SPP based feature extraction scheme from the VGGNet for coral classification

Fig. 4.
figure 4

(adopted from [10]).

CNNs architecture combined with Texton and color based hand-crafted features to improve capability for coral identification and classification

2.4 Deep Learning Opportunities for Seagrass Detection and Classification

For the stabilization of sediment, sequestration of carbon and provision of food and habitat for enormous oceanic animals, sea grasses are very vital [7]. To improve the understanding of the temporal and spatial patterns in species composition, reproductive phenology and abundance of seagrass and the influence of commercialization and human interaction, it is very important to monitor seagrass in more and more areas.

In 2013, Teng et al. [17] performed the binary classification of seagrass using hyperspectral images from seagrass habitats to separate tube worms from rest of the seagrass surface. More specific work to quantify the presence of the seagrass Posidonia oceanica in Palma Bay was performed by Campos et al. [2]. They used analogic RGB data. They chose Logistic Model Tree (LMT) classifier and Law’s energy measurements. Grey level co-occurrence matrix was used to identify the differences in texture. Oguslu et al. [11] used sparse coding and morphological filter to detect seafloor propeller seagrass scars in shallow water using panchromatic images captured using WorldView2 orbiting satellite. This approach was only effective in the shallow coastline and for detecting the scars in the shore line.

Presently, as a conventional digital imagery approach approved by Commonwealth Scientific and Industrial Research Organization (CSIRO) and Health Safety and Environment Policies (HSE), Australia, images approximately 60 × 80 cm are taken from a digital camera every three seconds. The camera is normally kept attach to a frame towed behind a boat travelling at 1.5–3 knots which ensures the images are spaced approximately 2–3 m apart. These images are then analyzed using photo Grid or TranscetMeasure (®SeaGIS) software. A regular grid of 20 dots are superimposed (Fig. 5) and a human operator identifies the presence and species of seagrasses [18]. It typically takes a technician several hours to process image data for a single transect of 50 m and with 25–50 images. As most surveys require several hundreds of meters of seabed to be covered, it can require several days to perform the analysis. Furthermore, different technicians may vary in their ability to detect seagrass within images. Deep learning approaches may increase efficiency and simultaneously remove observer bias for the analyses. However, to best of our knowledge, there is no approach existing that applies deep learning to digital images for seagrass detection. Therefore, there is a great opportunity to use deep neural network to analyse the deep sea bed, detect and classify the species of seagrasses. We are going to focus in this matter in our future work.

Fig. 5.
figure 5

A screenshot of the TransectMeasure software, used to analyze seagrass [18].

3 Challenges

Visual content recognition is the most important problem and a quite challenging task for underwater imagery analysis. Intra-class variability produces the variation of visual content through views, scales, illumination and non-rigid deformation. Especially, for the detection and classification of seagrasses, the boundary differences in different classes are much more ambiguous than for fish or corals. Also in the digital images, visual content becomes more ambiguous as the depth of the water increases.

4 Conclusion

In this paper, recent approaches for detecting and classifying various underwater marine objects using deep learning are discussed. Approaches are categorized according to the targets of detection. Features and deep learning architectures used are summarized. It was necessary to highlight all the approaches of marine data analysis in a single paper so that it becomes easy to focus on the possibilities of future work based on deep neural network method. It has been found that more works have be done for coral detection and classification using deep learning but no work has been done for the case of seagrass which is equally vital for oceanic ecosystem. The effectiveness, accuracy and robustness of any detection and classification algorithm can be increased significantly if both color and texture based features are combined. Accumulation of hand-crafted features and neural network may bring better results for seagrass detection and classification. Therefore, the opportunity exists to develop an efficient and effective deep learning approach for underwater seagrass imagery, which will be the focus of our future work.