Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In this paper we describe a system for the automated detection and video identification of coral growths using a marine robot. Our objective is to develop a fully autonomous system that can swim over coral reefs in open water, collect video data of live coral formations, and make an estimate of coral abundance. The video is intended for examination by human specialists, but the system needs to be able to both remain resident on the reef surface and recognize coral as it is encountered to perform its mission.

Coral reefs are delicate marine environments of immense importance both ecologically and socio-economically, and yet they are under substantial threat almost everywhere they occur. One preliminary step to retaining these environments is to be able to objectively record their presence, their change over time, and their health. Such records are critical not only to any remediation effort, but also in order to present a compelling case to law makers and law enforcement officials regarding the preservation of these ecosystems. While human divers are commonly deployed to observe reefs and measure their health, the requisite measurements need to be performed using scuba gear under conditions that present a risk to the divers involved.

In the work reported here, we use a small, portable, and high mobility underwater vehicle which is able to swim over the surface of a coral reef, hover in place, navigate in confined spaces, and collect video data from multiple cameras operating simultaneously. In our current experimental configuration the vehicle is accompanied by a human supervisor, but our approach and target scenario does not require a human operator to be present while data is being collected. This vehicle is ideally suited for reef surveillance since it can be deployed manually by a single user either from shore or in the water, does not require an associated tender (ship), can maneuver even in very shallow water, and can even land on a set of legs on sand or a reef surface with limited physical contact. Our approach to covering coral reefs requires the vehicle to be initialized over or near a reef. It can subsequently circumnavigate the reef and cover its interior using inertial navigation. In prior work we have also employed GPS data, acquired by allowing the vehicle to surface, to assist in the navigation task, but in this work navigation is accomplished while remaining underwater at the expense of global localization. This paper does not focus on coverage and navigation, but rather on the system architecture, the nature of the data we collect, and our ability to detect and recognize living coral using this vehicle.

In this paper, we propose and evaluate two critical components of the visual processing pipeline used for both the guidance and data collection for our vehicle. These operations are the classification of images that are observed as either containing live coral or not, and the subsequent segmentation of the live coral within the image. Several structured data sets used in our evaluation are described below and are available to the community.Footnote 1

2 Background

As coral health is an issue of worldwide importance, its monitoring has been studied by many authors previously, both in the field of biology and intelligent systems. This section describes several of the most relevant contributions.

2.1 Coral Reef Biology and Reef Health

Coral reefs are majestic structures crucial to ecosystem functioning. They are home to roughly 25 % of the oceans’ inhabitants, and act as a nursery, feeding ground, and shelter for thousands of marine organisms [1]. To humans, they represent approximately US$30 billion annually in goods and services, and are the focus of many studies searching for novel biochemically active drug compounds [2]. Optimistic reports estimate that at the current rate, by 2050 some 75 % of the world’s remaining reefs will be critically threatened [3]; more pessimistic estimates predict that all of Earth’s coral reefs will be dead by the end of the century [4].

Some of the major driving forces behind coral decline worldwide include increasing water temperatures, ocean acidification, increase in frequency and intensity of coral diseases, and damage due to natural disasters such as hurricanes. Many anthropogenic activities are also causing direct harm to reefs, including the overfishing of essential herbivorous species of fish, increasing amounts of water pollution from terrestrial runoff, and increasing sedimentation from coastal construction [3]. Arrival of invasive species can further exacerbate the situation and lead to a dramatic decrease in reef diversity and health, such as the invasion of Indo-Pacific lionfish in the Caribbean Sea and of the crown-of-thorns seastar in Australia [5].

While little can be done on a regional scale about issues such as global warming and increasing ocean temperature, there is an increasing focus on local management and conservation of coral reefs [6]. One critical component of any successful conservation effort is being able to assess whether a particular conservation strategy results in beneficial outcomes on the system in question. In order to protect what remains of the world’s coral reefs, it is essential that we design accurate and precise methods to assess the health of coral reefs without undue risk to human participants. This will not only allow us to see when conservation efforts work, but will also help determine which reefs should be conservation priorities and provide evidence to policy makers and the general public that conservation efforts are necessary to preserve the well being of coral reef ecosystems [7].

2.2 Robotic Reef Surveys

Several research groups have considered the use of autonomous underwater vehicles (AUVs) for data collection in marine environments, and even in coral reefs. Reefs are challenging environments since they are both valuable and physically delicate, and they have complex morphologies. A few vehicles have been developed that can make close approaches to the ocean floor, corals, or aquatic structures [8, 9]. This can be challenging due to several factors: (a) the propulsion systems may be unsafe to operate close to sensitive underwater environments; (b) otherwise “gentle” devices such as gliders have limited maneuverability; (c) it is difficult for humans to produce pre-planned trajectories since sensor feedback underwater is often poor, communications are difficult and terrain models are rarely complete; (d) many propulsion systems are prone to disturbing bottom sediments which reduces visibility.

The problem of designing and controlling stable AUVs has been studied by several authors [10, 11] on a variety of platforms. In prior work with the Aqua class of vehicles developed in our lab, we have demonstrated a combination of small size, low weight, and high maneuverability with diverse gaits [12, 13].

Several authors have also considered using towed or autonomous surface vehicles to perform visual data collection over marine environments [14], although in the context of coral reefs such an approach is feasible only for the shallowest reef structures and depends critically on very good visibility. Deep water AUVs have been used to map the ocean floor, inspect underwater structures, and measure species diversity [15].

Australia’s Integrated Marine Observation System (IMOS) is carrying out a project to deliver precisely navigated time series of seabed imagery and other variables at selected stations on Australia’s continental shelf [16]. They are using UAVs to make this endeavor scalable and cost efficient.

In [17], the authors present a structure from motion framework aided by the navigation sensors for building 3D reconstructions of the ocean floor and demonstrate it on an AUV surveying over a coral reef. Their approach assumes the use of a calibrated camera and some drifting pose information (compass, depth sensor, DVL). They use the SeaBED AUV, an imaging platform designed for high resolution optical and acoustic sensing [18].

In previous work [19] we have developed a controller to allow our vehicle to autonomously move about over coral reef structures using visual feedback. In this paper we restrict our attention to the analysis of the data collected by such a system, and consider the sensing issues that arise.

2.3 Visual Coral Categorization

Our methodology has been inspired by recent successes of previous biologically relevant visual data sets. For example, the Fish Task of the recent LifeCLEF contest [20] supported progress on detecting moving fish in video and fish species identification through the release of nearly 20,000 carefully annotated images. The identification of coral using visually equipped AUVs has been studied previously [21]. While we share similar motivations to this work, we differ in deployment and algorithmic objectives. Nonetheless, the relationship is a motivation for the public release of our training and test images which could facilitate comparisons. Additionally, Girdhar et al. [22] has demonstrated a system which modifies swimming behavior on-line to follow novel visual content.

3 The MRL Coral Identification Challenge

The first contribution of this paper is a robot-collected data set of visual images from environments proximal to a number of coral reefs. This data was collected by the Aqua swimming robot during a series of field deployments in the Caribbean, where the robot’s existing navigation technologies were exercised to cover each reef and its surroundings. Although our robot did not use vision to inform its navigation strategies during these trials, the images that it collected are representative of the challenge that faces a coral-seeking robot. Therefore, we have organized and annotated them to form two visual challenge tasks: live coral image classification and live coral segmentation. The remainder of this section describes the components of this effort.

3.1 Robotic Data Collection

As mentioned previously, robots require specialized hardware and capabilities in order to operate safely near coral formations. We utilized the Aqua robot [23], an amphibious hexapod that swims using the oscillations of its flippers. Aqua has been designed for use as a visual inspection device and is equipped with four cameras with a variety of properties: a forward-facing stereo pair with a narrow field-of-view (which allows recovery of depth), a front fish-eye camera (which captures a wider scene), and finally a \(45^{\circ }\) (which allows the fourth camera to capture the ocean floor directly below the robot).

In order to achieve broad coverage of the underwater environment, our robot executed a coverage pattern repeatedly over the reef. We set the parameters of this motion by hand so that the robot would pass completely over the reef as well as an equal portion of the sandy surroundings. This gives our data set a roughly equal split between the coral images we target and less desirable content, which poses an interesting classification problem for the visual processing component.

Two attitude strategies were employed, each targeted to induce ideal viewpoints for a different sub-set of Aqua’s cameras. First, a flat-swimming maneuver controlled the robot to be aligned with gravity in both the roll and pitch rotational axes. With this attitude, the downward looking camera views the ocean bottom with an orthogonal viewpoint and the front fish-eye camera views the horizon at roughly half the image height. Second, we considered swimming with a downwards pitch of \(30^\circ \). This strategy allowed the narrow-view stereo pair to view the ocean bottom slightly in front of the robot. The depths observed at this angle would allow fixed-altitude operations, which are desirable in order to prevent accidental collisions with the coral.

The robot executed five data collection runs at four distinct reef locations (one reef was visited twice). We selected reefs within the Folkstone Marine Preserve and in Heron Bay, both of which are located on the western coast of St. James, Barbados. During each run, the robot covered an area of approximately 100 m\(^2\). Each reef location was an instance of the spur-and-groove coral formations that tend to present the widest range of diversity of coral species, and are thus ideal regions for collection of biologically relevant data.

Data Statistics

All of the videos are taken at 15 frames per second, with VGA resolution. The total size of visual data collected over the five collection runs is 104 gigabytes consisting of 164 min of video. Depth and IMU data are also recorded throughout.

3.2 Data Annotation

A marine biologist manually annotated the coral within a subset of the images we collected. The results of this annotation have been made available in a standardized format, and the data is being released publicly for the purposes of comparison of results and classifier training. As a variety of tasks can be considered, depending on the goals of the robot platform, we define two coral-related visual tasks and accompanying evaluation criteria. We continue by describing our annotation procedure.

Annotation for Image Classification

The first sub-task that we define is coral image classification. Given an image, the system outputs whether there is live coral in the image. To create training and testing data for this task, we extracted images at 5 s intervals from all of the videos taken by the downward-looking mirrored camera while the robot was swimming flat. Each image was then subdivided into four \(320\times 256\) quadrants to limit the diversity and facilitate ease of labeling. The biologist labeled 3704 images into one of three categories:

  • Yes: There is live coral in the image

  • No: There is no live coral in the image

  • Reject: The image should be discarded because it is too difficult to tell whether there is live coral or not. This could be because the image is too blurry or the coral is too small to see clearly.

This provided us with 1087 Yes images, 2336 No, and 281 Reject images. Figure 1 shows some examples of Yes and No images.

Fig. 1
figure 1

Annotated images used for training a detector for the image classification task. The left two images are labeled as having coral and the right two images are labeled as not containing coral

Annotation for Segmentation

Secondly, we define the coral segmentation task, where the coral regions within an image must be identified, through creation of a coral mask. While some existing segmentation data sets contain pixel-wise ground truth, we lacked the resources to produce this detailed data. Instead, we have manually annotated rectangular coral regions for each of the 1087 Yes images from our classification data set. Examples of the selected image regions are shown in Fig. 2. Rectangular regions cause a small approximation error at region boundaries, but this task is still a reliable proxy for coral segmentation, as will be demonstrated in our results section.

Fig. 2
figure 2

Positive training images cropped to contain only coral, which is useful for training a detector for the coral segmentation task

Annotation Statistics

The final annotated data set produced by our labeler was reduced in size from the raw robot footage due to the rejection of poor quality and ambiguous images. We separated the annotated data into a training set (416 positive examples and 701 negative examples) and a test set (492 positive examples and 1544 negative examples). The training set contains images from three data collection runs at three unique reefs, and the test set contains images taken from two data collection runs at the fourth reef location. Thus, there is no overlap between the training and test sets.

We have additionally defined evaluation protocols for the use of this data, following best-practices from existing challenges such as the ILSVRC [24]. Broadly, we measure performance on each binary categorization task as prediction accuracy, normalized by the data set size. For the categorization task this represents the number of images, and for segmentation this is measured in image area. Methods cannot be optimized directly on the test data set. Rather, parameters should be refined by splitting the training set into folds and then reporting the performance after a single run on the test set. This data is being released to the public alongside this paper and we will maintain a record of the best performing techniques over time as other authors attempt the task. We now continue by describing several baseline techniques that we have developed.

4 Method

Coral identification in the ocean shares many of the typical challenges that face terrestrial vision systems, as well as several challenges unique to this task. The lighting conditions in the shallow ocean include caustics caused by the water’s surface, inter-reflections and the absorption of low-frequency colors. This makes brightness invariance essential. The robot changes its orientation during the survey, which implies the need for orientation invariance. Small floating particles are ubiquitous in the underwater domain, causing an optical snow effect. Additionally, the appearance of the coral itself has a wide diversity and there are local variations between reef locations, so generalization must be the focus of learned methods.

In the face of these challenges, our approach to coral identification is to encode the visual data in robust feature representations that capture canonical appearance properties of coral, such as its color and texture and to learn coral classifiers from training data on top of these features. We develop two processing streams—one for each of the visual tasks described above. Our classification process employs Gabor functions and global processing to compute aggregate statistics. Segmentation is achieved through local computations on sub-regions of the image. Each approach will be described in detail in the remainder of this section.

4.1 Global Image Statistics for Coral Classification

The classification pipeline uses both global color and aggregate texture features in a classifier subsystem to learn from labeled example images and subsequently predict whether an image contains live coral. This subsystem computes two types of attributes over the entire (global) images to produce a characteristic feature vector. These vectors are then classified using a support vector machine (SVM) trained with our manually classified data. Figure 3 (top) illustrates the classification pipeline.

Fig. 3
figure 3

Image processing pipeline. (top) Gabor-based classification. (bottom) LBP-based segmentation

Our method represents texture through the use of the well-known Gabor transform. The Gabor function [25] is a sinusoid occurring within a Gaussian envelope and has inspired a class of image filters particularly suited to describing texture [26]. Our method automatically selects a sub-set of Gabor wavelets from a large family by selecting those with frequency and spatial support parameters that optimize task performance, using cross-validation on the training set.

Applying filters result in a stack of transformed images and we extract robust energy statistics from these in order to produce a vector suitable for classification. The amplitude histogram of each Gabor filter provides a characterization of the image content including the presence of outlier objects. Order statistics can effectively characterize such a signal [27] and are robust to much of the noise present in our task. For this reason we characterized the energy distribution with several statistics of each Gabor filter: the mean energy, the variance of the energy distribution, and the energy at a specific set of percentiles of the cumulative distribution (5th, 20th, 80th and 95th percentiles). In order to capture color information, we additionally extracted the same robust statistics for the distribution of hue values observed in the image.

The result of both the Gabor texture filters and the color summary were concatenated into a fixed-length vector. Depending on the number of active Gabor components, this representation had between 24 and several hundred dimensions. In order to reduce computation and simplify the learning, we performed principal components analysis on these vectors to find the subspace that captures 99.99 % of the variance.

The final step in this pipeline is to predict the label of an image (live coral or not). We learn an SVM from the training images described previously and apply the resulting learned model to make coral predictions on new images.

Fig. 4
figure 4

Local binary pattern neighbor sets for \((P=4,R=1)\), \((P=8,R=1)\) and \((P=12,R=2)\)

4.2 Local Binary Pattern Based Coral Segmentation

Our coral segmentation pipeline uses LBPs [28] and color information as image descriptors, and an SVM to detect whether small patches of the images correspond to live coral or not. Unlike the Gabor filters, which are applied globally, our features and classifier are applied on small image patches, which allows fine-grained segmentation of coral regions. Figure 3 (bottom) illustrates the segmentation pipeline.

For a given pixel in the image, its LBP is computed by comparing its gray level \(g_c\) with that of a set of P samples in its neighborhood, \(g_p\) (\(p=1,2,\ldots ,P\)). These samples are evenly spaced along a circle with radius R pixels, centered at \(g_c\) (see Fig. 4). For any sample that does not fall exactly in the center of a pixel, its gray value is estimated by interpolation. The LBP is computed according to

(1)

where is the indicator function.

To achieve rotational invariance, Ojala et al. [28] proposed to label the LBPs according to their number of 0/1 transitions. LBPs with up to two transitions are called uniform and they are assigned a label corresponding to the number of 1’s in the pattern. LBPs with more than two transitions are called nonuniform and they are all assigned the label \(P+1\). Finally, the rotation invariant LBP image descriptor is a \(P+2\) bin histogram of these labels computed across all pixels in the image. Uniform patterns are assigned to unique bins, while nonuniform patterns are all assigned to a single bin. As color is also an important feature for coral segmentation, we appended the LBP histogram with an eight bin histogram of the hue values of the pixels in the image patch.

During operation time, our learned model is used to segment an image by splitting it into patches with the same size as those used during training. Features are extracted from each patch and scored with the SVM, producing a coral segmentation mask that can be used to guide the robot during its mission.

5 Experiments and Results

5.1 Global Coral Classification

Our global classifier was tested on the data sets above using distinct testing and training sets collected over different reefs. We were able to achieve a net classifier accuracy of 89.9 % on balanced sets of images containing coral and not containing coral. This accuracy generally increased with the number of Gabor basis functions; however, since these are the primary source of computational cost, we are interested in a compromise between performance and the number of filters user. The trade-off between accuracy and the size of the filter bank is illustrated in Fig. 5. While using a bank of 24 or more filters provides maximal performance, the 80.6 % rate achieved with just 20 filters appears quite acceptable for our applications.

Fig. 5
figure 5

Classification accuracy increases with both: (left) number of Gabor filters; and (right) number of PCA components. This reflects the trade-off of computational effort and performance

Fig. 6
figure 6

Classification accuracy versus patch size (pixels)

Fig. 7
figure 7

Samples of live coral segmentation. a Test set reef segmentation. b Live coral segmentation. (right) false negative. c Live coral segmentation. (left) false positive

5.2 LBP-based Coral Segmentation

To study the effect of varying the number of points and radius (PR) of the LBPs and the size of the patches on the segmentation, we performed a grid search on these parameters. Also, to optimize the performance of the SVM, we ran a grid search on the gamma, tolerance and regularization constant (C) parameters of the radial basis function (RBF) kernel.

The LBP parameters had very small impact on the accuracy of the classifier. We tested over the values \((P,R) =(8,1), (16,2), (16,3), (24,3), (32,5)\) and found that the difference in accuracy between them was less than \(2.1\,\%\), regardless of the patch size. Given such a small impact, we decided to use \((P=8,R=1)\) for the remaining experiments.

The patch size, on the other hand, had a much larger impact on the classification accuracy, which is illustrated in Fig. 6. The maximum classification accuracy achieved was 81.16 % with the RBF kernel parameters set to \(\gamma = 0.0001\), \(tol = 2\) and \(C=10{,}000{,}000\). The optimal patch size was found to be 30 pixels.

In Fig. 7, we present some examples of images from the test set with an overlay (in red) showing the segmented live coral. Figure 7a is a stitched image created from several consecutive frames from the original video. We observe that the segmentation pipeline correctly finds areas of the image with live coral. We also observe areas where the classifier has problems detecting coral, such as when the texture is uniform – with an example of a false negative shown in Fig. 7b. Likewise, live coral can be incorrectly detected when variations in texture (or shadows) match that of live coral – with an example of a false positive shown in Fig. 7c.

6 Discussion

We have described a robot-vision system for performing automated coral surveys of the sea floor. We learn coral predictors that are able to robustly detect live coral patches and segment them from the background, agreeing with the assessments of an experienced coral biologist with an accuracy of 80–90 %. These results are based on a data set of thousands of labeled images of only moderate quality, confounded by the typical phenomena that confront any diver or AUV. Our data set is being made available in conjunction with this submission.

In the future, we plan to study the disambiguation of other zooxanthellae-containing organisms from coral and the automated labeling of different coral subspecies. This will require suitably labeled training data, as well as more diverse raw data sets, potentially including active illumination. Additionally, we hope to integrate coral mapping into the navigation stack of our vehicle, as we have successfully done in the past with other vision-guided navigation methods [22]. The resulting system has the potential to perform autonomous longitudinal surveys, providing biologists with an easy, quick, and accurate way of monitoring reef health. Such methods are critical for understanding how these ecosystems respond to environmental disturbances, documenting the efficacy of novel coral reef conservation and restoration efforts, and convincing policy makers to enact stringent protection measures for coral reef ecosystems.