Introduction

Coastal zones represent a small area of the Earth’s oceans but are regarded as one of the most productive and diverse environments on the planet (Gray 1997). They are often those most vulnerable to threats due to resource exploitation, habitat destruction, pollution and susceptibility to a changing climate (Jackson 2008). Habitat maps that depict the distribution of marine natural resources are becoming a prerequisite for marine spatial planning, design and implementation of monitoring programs, and management of physical and biological resources (Baker and Harris 2012). However, our knowledge of the extent, geographical range and ecological functioning of benthic habitats remains relatively poor. This poses challenges for implementing strategies to safeguard our ocean systems, and our ability to detect change in benthic habitats in a rapidly changing climate (Wernberg et al. 2016). Often we set aside areas for protection with limited knowledge of their representativeness, raising questions on the robustness of spatial planning decisions that are based on limited information (Devillers et al. 2015). This is compounded by the fact that only 5–10% of the world’s seafloor has been mapped to resolutions at an appropriate scale for marine management (Sandwell et al. 2003; Wright 2003).

MBES have become the system of choice for marine habitat mapping studies due to their ability to collect co-located full coverage bathymetry and seafloor backscatter data. Whilst bathymetric LIDAR systems mounted on aerial platforms are increasingly used in coastal zones due to the rapidity of data acquisition over large areas, this approach lacks the sounding accuracy typically achieved with MBES data acquisition (Costa et al. 2009). The ability to collect high precision MBES data on relatively small platforms has opened new opportunities for seabed mapping data and benthic habitat characterisation in shallow water environments. Ultra-high resolution MBES data provide new prospects for how we characterise marine geomorphometry (Lecours et al. 2016), with the potential to provide data at resolution sufficiently high to gain information on individual biogenic features. For example Montereale-Gavazzi et al. (2016) found that MBES backscatter data at the resolution of 0.05 and 0.2 m allowed identification of individual sponges and mapping their distribution in the shallow heterogeneous Venice lagoon. They also identify a clear trade-off between accuracy of model predictions and the type of features that can be mapped at such high resolution.

The development of approaches using seafloor backscatter data acquired by MBES as a means to remotely characterize the properties of the seafloor has received increasing attention from the research community (Lucieer et al. 2017). Despite the growing use of MBES backscatter data, standards of seabed backscatter acquisition, processing and classification are still under development, which results in challenges for data comparison across platforms and processing packages. In 2015 the Backscatter Working Group (BSWG) proposed the first set of guidelines for acquisition, processing and use of backscatter data (Lurton and Lamarche 2015) and provided recommendations for further development of backscatter acquisition systems and processing software. Acoustic backscatter is typically applied for sediment class discrimination (Diesing and Stephens 2015). The interpretation of backscatter to inform seabed type is typically done by using acoustic facies that are defined as the “the characteristics and spatial organization of seafloor patches with common acoustic responses and the measurable characteristics of this response” (Lamarche and Lurton 2017). Backscatter is combined with other environmental variables such as those from bathymetry for habitat characterisation (Ierodiaconou et al. 2011; Rattray et al. 2009).

The concept of habitat lies at the core of ecological theory. In the field of benthic habitat mapping the definition of habitat has evolved to reflect the objectives and applications of the data (Dauvin et al. 2008). Definitions of habitat from earlier studies underlie the field’s origins in marine geology and geophysics. Acoustic geophysical tools including echosounders and later, MBES, allowed the delineation of meaningful geological facies based the acoustic response of the seabed, supported by appropriate physical samples. Benthic taxa often exhibit strong links with seafloor geology, for example macroalgae species are generally associated with hard reef. The biological component of habitat is therefore often inferred directly under the assumption that geologically defined substrate is the primary determinant of the species and community types that develop there (Greene et al. 1999). Subsequent adoption of this technology by ecologists for mapping from a biophysical perspective has given rise to an increasingly biocentric notion of habitat (Brown et al. 2011). This reflects both the needs of natural resource management agencies for primarily biological information, and also the recognition that many other physical, chemical and biological determinants are also central to patterns of biological distribution (McArthur et al. 2010). Kostylev et al. (2001) defined habitat as: ‘a spatially defined area where the physical, chemical, and biological environment is distinctly different from the surrounding environment’. Knowledge of the relative contribution of MBES backscatter data compared to other variables such as those derived from bathymetry in differentiating between habitats is important for end users when developing classification approaches.

With the increasing volumes of MBES data becoming available, there is an urgent need to develop robust methods for mapping marine habitats to establish their geographical location, extent, and condition (Brown et al. 2011). Habitat mapping products need to be created using repeatable methodologies where uncertainty in model outputs is quantified. Repeatability is particularly important as habitat mapping studies are undertaken to form a baseline for assessment which implies the ability to undertake repeat surveys to monitor change through time (Montereale-Gavazzi et al. 2017; Rattray et al. 2013). Thus, habitat mapping products need to be created using repeatable methodologies where uncertainty in model outputs is quantified. Habitat classification generally involves the integration of seafloor structure information with biological ground-truth samples or observations. Uncertainty may compromise classification outputs due to propagation of errors in the repeatability of classification by observers (Rattray et al. 2014), artefacts associated with seabed mapping acquisition and processing (Lecours et al. 2016; Schimel et al. 2015b) and spatial mismatch between observations and acoustic data (Mitchell et al. 2017).

There are two different approaches commonly used to generate marine geomorphic variables for input to benthic habitat classification. Pixel based (PB) approaches involve the use of a neighbourhood analysis (typically 3 × 3) to generate derivative products for model input (i.e. slope, roughness measures). The classification approach involves assigning each image pixel either to a cluster that is later assigned to a benthic habitat class (unsupervised), or to a predefined benthic habitat class based on statistics derived signatures from the pixel’s digital values (supervised) that is provided from often multiple spatial derivatives (Ierodiaconou et al. 2007, 2011; Rattray et al. 2009). Object based (OB) image analysis approach involves grouping spatially contiguous pixels with similar properties into “objects” in such way that maximizes both within-object homogeneity in terms of pixel values, and between-object differences (Blaschke 2010). OB image analysis approaches have been increasingly and successfully applied to marine habitat mapping over the past decade (Diesing et al. 2014; Hasan et al. 2014, 2012b; Lacharité et al. 2017; Lucieer et al. 2013). Whilst both PB and OB approaches are now well developed in the benthic habitat mapping literature, few studies have systematically compared them in terms of map accuracy or importance of input variables in explaining patterns observed (Hasan et al. 2014).

We apply an ensemble learning classification integrating PB and OB analysis for mapping benthic habitats in a high use coastal embayment in south-east Australia. We combine ultra-high resolution MBES bathymetry and backscatter data with groundtruth data provided by a combination of autonomous underwater vehicle (AUV) surveys, benthic grabs and drop video sampling. We evaluate three classification approaches—PB, OB and a hybrid approach—and investigate both classification performance and variable importance for model outputs.

Methods

Study area

The study site is Refuge Cove; a small embayment within the Wilsons Promontory National Park, in the state of Victoria, Australia (39° 02′ 17.6″ S 146° 27′ 48.4″ E, Fig. 1). Refuge Cove is located within the shallow temperate sea of Bass Strait separating mainland Australia from Tasmania. With the Pacific Ocean in the east, and the Southern Ocean to the southwest, Bass Strait marks the confluence of the warm waters of the Eastern Australian Current and the colder waters of western Bass Strait from the South Australian Current, which is likely driving the high species richness and diversity observed in monitoring programs (Edmunds et al. 2012). Prevailing westerlies drive the west to east water currents observed in Bass Strait (James and Bone 2011). During the winter, winds are predominantly from the south west, driving a south westerly swell of up to 8 m in height (mean 2 m) on the west and south coast of the promontory (Kennedy et al. 2014). The cove being on the east coast of the promontory is protected from most swell directions, making it a unique and popular safe anchorage for vessels transiting through the southeast coast of Australia. Refuge cove covers approximately 0.39 km2 with the seafloor extending to 22 m deep at the entrance. The south side consists of large (2–3 m) granitic boulders, whilst the northern arm is characterised by a sloping granitic bedrock with occasional cracks and overhangs gently sloping to a sediment dominated seabed. The sediment-dominated parts of the seabed support filamentous algal mats and seagrass communities, whilst hard-bottom areas are dominated by diverse algal dominated assemblages (Edmunds et al. 2013).

MBES data acquisition and processing

The MBES data were acquired on the 11th June 2013 using a Kongsberg Maritime EM2040C MBES, operated with Kongsberg Maritime’s acoustic data acquisition software SIS, and integrated with an Applanix POS MV WaveMaster, all fitted to Deakin University’s 9.2 m research vessel Yolla. Lines were run so as to ensure 100% overlap of adjacent lines. The MBES was operated at a constant frequency of 300 kHz, a varying ping rate and pulse length (resp. up to 50 Hz and down to 0.025 ms) automatically adjusting to water depth, in high-density equidistant mode (400 soundings per ping) and with a constant sector coverage of ± 65° athwartships. Sound velocity in the water column was obtained from a profile captured at the start of survey with a Valeport Monitor Sound Velocity Profiler, while sound velocity at the depth of the transducer was measured continuously during the survey with a Valeport mini SVS sensor. These information were combined by SIS to correct soundings in real time for variation of sound velocity in the water column. The POS MV WaveMaster measured the position of the vessel in Differential GNSS mode using GPS/GLONASS corrections received by radio from the Fugro MarineStar satellite positioning service. The POS MV WaveMaster also measured precise vessel motion data (roll, pitch, yaw, true heave), which were recorded and set aside for post-processing.

Bathymetry data processing was carried out using Applanix software POSPac Mobile Mapping Suite (MMS) and CARIS software HIPS & SIPS 8.1A. POSPac MMS was used to obtain a post-processed kinematic (PPK) solution of the vessel navigation (with a horizontal resolution greater than + 0.2 m), motion and GPS modelled tides. This solution was imported in HIPS & SIPS to replace realtime navigation data. Soundings were then manually cleaned, vertically referenced to the Lowest Astronomical Tide datum and gridded into a DEM using the CUBE algorithm at a resolution of 0.25 m (Fig 1).

Fig. 1
figure 1

Location of refuge cove within the Wilsons Promontory National Park, Victoria, Australia (top panels) and the site bathymetry in June 2013 (bottom panel)

Backscatter data processing was carried out in QPS software Fledermaus geocoder toolbox (FMGT 7.4.1). The backscatter “beam time series” data type was used as a data source, and all beams were kept (setting starting and cutoff beam angles as 0 and 90 degrees, respectively, in the “Adjust” settings panel). FMGT provides much freedom in setting the parameters of the processing but little explanation as to the algorithms being applied and the sequence in which they are implemented (Schimel et al. 2015a). Given this lack of information and as recommended by Lurton and Lamarche (2015) our backscatter data processing procedure favours consistency in parameters in order to ensure consistency of backscatter mosaics between surveys, to the potential detriment of the subjective quality of each individual mosaic. Thus for this dataset as for others, the processing parameters were kept as close to the FMGT defaults as possible. The “Pipeline” settings were all kept as default. The “Navigation” settings were kept to the default “Use adjacent lines within time window of 5” without any other setting enabled. FMGT operates the same standard geometric and radiometric corrections as prior iterations of Geocoder as described in the literature (Fonseca and Calder 2005), which includes a compensation for the built-in TVG as described by Hammerstad (2000) in the case of Kongsberg Maritime data. The “Adjust” settings were kept to default enabling of “Tx/Rx Power Gain Correction” (taken from runtime parameters datagrams) and default “Beam Pattern Correction”. The “Absorption” setting in the “Oceanography” panel was kept at its default value of 0 dB/km with absorption in the water-column suitably compensated using absorption coefficients in the raw data files. The “Sonar Default” settings were set to “automatic”, which means the software extracted the parameters necessary for radiometric correction (transmit power, frequency, pulse length, etc.) directly from the raw data files. After geometric and radiometric corrections, FMGT implements a standard “sliding window” method to correct for angular dependence, termed “AVG”. The settings we used for this correction were the “trend” algorithm—which considers the two sides of the swath separately—and a “window size” of “300” (the number of pings surrounding the data to be corrected). FMGT uses a reference angle for normalization as the average level between 20 and 60 degrees (Fonseca et al. 2009). Finally, the data for individual lines after corrections and AVG were all mosaicked together at a resolution of 0.25 m using a “Blend Mosaicking Style” algorithm with a parameter of 50%, a “dB Mean Filter Type”, and requesting to “Fill gaps using adjacency” (Fig. 2).

Fig. 2
figure 2

MBES backscatter mosaic of Refuge cove (resolution 0.25 m)

MBES derivatives

Pixel based derivatives

To further characterise local variation within the MBES data and delineate analogous regions of morphology, a suite of spatial derivatives were produced from the primary bathymetry digital terrain model (Table 1). These derivatives were selected for their expected influence over distribution of biological assemblages in terms of exposure to wave energy and benthic currents (northness, eastness), susceptibility to sediment accumulation (slope), complexity and surface area of reef structure (complexity, rugosity, maximum curvature). Derivative layers were selected based on their ability to produce accurate benthic habitat maps in previous studies in adjacent coastal waters (Ierodiaconou et al. 2007, 2011; Rattray et al. 2009, 2013; Young et al. 2015). For all analyses, a moving window with a kernel size of 3  × 3 pixels (0.75 m2) was used.

Table 1 Spatial derivatives from MBES bathymetry

Object based derivatives

OB segmentation was carried out using the multi-resolution segmentation algorithm in software eCognition v9.0, which uses an optimisation procedure that locally minimises the average heterogeneity of image objects for a given resolution. Starting from an individual pixel (or existing image object), it consecutively merges pixels (or image objects) until a certain threshold, defined by the scale parameter is reached. The scale parameter is an abstract term that determines the maximum allowable heterogeneity for the resulting image objects (see Appendix 1 Supporting material). We chose a scale parameter of 41 (mean object size 306 m2) for defining segments for classification following visual inspection of coherence in shapes and orientation of objects observed on the seafloor. The object heterogeneity, to which the scale parameter refers, is defined by the ‘composition of homogeneity’ criterion. This criterion defines the relative importance of ‘colour’ (pixel value in this case, e.g. backscatter digital number) versus shape of objects. If high weight is given to colour then the object boundaries will be predominantly determined by variations in colour of the image (e.g. backscatter strength). Further on, the shape criterion has contributions from smoothness and compactness, both of which can be weighted. A high value for smoothness will lead to smoother boundaries of the objects. High values of compactness will increase the overall compactness of image objects. We applied default values of 0.9 for colour, 0.1 for shape, 0.5 for smoothness and 0.5 for compactness. Segmentations were carried out on primary acoustic products bathymetry and backscatter. We also included rugosity as a measure of complexity due to its variable importance in defining benthic habitats in the region in previous studies.

Ground truth data

The ground-truth dataset for this study consisted in a combination of AUV video imaging, drop video camera and sediment samples positioned using DGPS (~ 1 m accuracy) (Fig. 3). High-definition video data were captured with a GoPro Hero 3 Black video camera mounted obliquely (45°) on the underside of an Ocean Server Inc Iver2-580-EP AUV. The AUV was preprogrammed to survey benthic transects at a height of 1.2 m above the seabed and a speed of 1.5 knots following a continuous transect divided into six pre-programed missions. However, two missions were not completed due to entanglement in macroalgae reef on the northern section of the cove. Transects were prioritized to target a range of habitats on sediment and reefs across depth gradients within the study site. Every second of video was matched to positional information and mission statistics recorded by the AUV micro-processor.

Fig. 3
figure 3

Ground-truth data for this study. The video data used for training were classified by benthic habitat type and are shown colour coded. The coloured lines represent video data obtained by the AUV while the coloured dots represent video data obtained with the drop camera. The drop video data put aside for validation are shown as triangles

We assessed spatial autocorrelation (SA; see Appendix 2 Supporting material) using Moran’s I to inform a sampling design for the ground-truth dataset used for validation. Results from the SA analysis showed that there was significant correlation up to 50 m; therefore, we randomly generated sample localities spaced at least 50 m apart and stratified by the major acoustic facies of the site. These acoustic facies were defined by a cluster analysis run on the OB segments to group segments with similar acoustic characteristics and reduce the number of unique classifications. Some of the randomly generated sampling localities were adjusted to intersect with existing AUV tracks as long as they remained in the same cluster and maintained the minimum distance requirements.

A drop video camera survey was undertaken to provide: (1) an additional dataset for training in areas that were not covered by the AUV survey, and (2) an independent dataset for validation purposes at locations indicated by the sampling survey design described above. A total of 85 drops ranging from 1.9 to 22.1 m in depth were performed. Video footage was obtained using a Delta vision HD underwater video camera, and software Ashtec Mobil Mapper 10 was used to create shapefiles and log GPS raw data. Raw GPS data were later improved to a DGPS solution, using base stations data from VicMap’s Continuously Operating Reference Stations (CORS) system.

Benthic sediment samples were collected at 18 of the video stations using a small Van Veen grab (surface area sampled 260 cm2). Samples were first disaggregated using a 10% sodium hexametaphosphate, ultrasonically bathed, and sieved to remove gravel and very coarse sand (> 1.5 mm). The residual was then characterised using a Beckman Coulter LP 13320 laser particle sizer. The relationship between pixel and object based backscatter data and the mean and standard deviation of sediment grain size as well as the proportion of gravel in sample were assessed using ordinary least squares regression in software R version 3.2.4.

After reviewing the video we interpreted 5 broad habitats that characterised the site with a description of the typical characteristics within each habitat below:

  1. i)

    Macroalgae Dominated Reef (ALG)—High relief, granitic bedrock and boulder reef from depths of 2–22 m was populated with diverse assemblages of brown, red and green macroalgal taxa that varied in composition and density with depth and exposure. Sessile invertebrates were evident on vertical walls and in fissures throughout the reef systems and in deeper areas were prevalent in the algal understorey.

  2. ii)

    Filamentous Mat (FMAT)—Fine sandy to muddy sand sediments in the sheltered and shallow southern arm of the bay were covered in extensive mats of filamentous microalgae and diatoms. This was interspersed with sparse to very sparse seagrass (Zostera sp.) shoots, often with evidence of bioturbation.

  3. iii)

    No Visible Biota (NVB)—Extensive areas of the bay consisting of fine to coarse sands and gravel with no visible epibiota.

  4. iv)

    Seagrass (Amphibolis antarctica) (SGAM)—Predominantly dense beds of the seagrass A. antarctica. These were characterised by low, woody root masses at the sediment interface which form distinctive mounded edges surrounding the beds.

  5. v)

    Seagrass (Zostera sp.) (SGZ)—Beds of seagrass characterised by the presence of Zostera sp. growing in fine sands and muddy sands with patchy distribution ranging from sparse to dense. Where visible through the canopy, sediments frequently showed evidence of bioturbation, with mounds and burrows in sheltered areas.

Statistical analysis

Modelling approach

In this study we implemented Random Forests models to predict class membership. Random Forests (RF) is an ensemble learning method that combines tree-type classifiers with bootstrap aggregation of multiple models based on subsets of the same training data (Breiman 2001).The approach reduces the inherent tendency of single decision tree classifiers to overfit their training sets by including the results of multiple trees, produced from random bootstrap samples of the training set (Cutler et al. 2007). An important property of RF is that the random selection of variables at each split minimises correlation of trees in the ensemble and is thus less subject to potential biases associated with the training data. RF models have been successfully applied to mapping marine substrates (Diesing et al. 2014; Lucieer et al. 2013) and dominant biological communities (Che Hasan et al. 2014; Rattray et al. 2015).

RF models were implemented using randomForest package in R (Liaw and Wiener 2002; R Development Core Team 2008) to predict habitat classes according to models trained with three different sets of predictor variables derived from MBES bathymetry and backscatter data:

  1. 1.

    A pixel based (PB) model with uncorrelated predictors at 25 cm resolution

  2. 2.

    An object based (OB) model trained with uncorrelated predictors which were derived using an OB segmentation approach

  3. 3.

    A combined model trained with uncorrelated predictors from both PB and OB models.

Data preparation

Prior to analysis, object and pixel based derivatives were screened for outliers and normalised to a range of 0–1. Although normalisation is not required for RF classification as the approach is invariant to monotonic transformations of the input features, in this case it was used so that measures of feature importance could be displayed at an easily interpretable scale.

Although outputs of RF classifiers are considered robust to correlated predictors due to the random variable selection process, there is evidence to suggest that subsequent variable importance measures can be biased towards correlated variables (Strobl et al. 2006). In this study, we examined correlation between predictors using the Pearson product-moment correlation coefficient (Fig. 4). For each of our 3 models, we tested individual predictor performance using a recursive ‘leave one out’ procedure. Where correlation coefficients were greater than 0.6 we retained the best performing predictors for model inclusion. Uncorrelated predictors included in each model are detailed in Table 2.

Fig. 4
figure 4

Correlation matrix containing all MBES predictors used in the study. Size and colour of the circles indicate the degree and direction (red-negative, blue-positive) of the relationship respectively. Suffixes PP (per-pixel) and OB (object-based)

Table 2 Uncorrelated predictor variables retained for inclusion in each of the 3 models

Model training

The RF classifier was implemented for each of the three sets of predictors using the same training dataset. Prior to all analyses, the (pseudo) random number generator in R was seeded with an arbituary value of 42 to ensure reproducibility of results. Tuning parameters mtry (the number of predictors randomly selected for each split) and ntree (the number of trees contained in each model) were evaluated using the caret package in R (Kuhn 2008). Each of the three models was repeatedly run across a range of values of mtry and ntree and model performance assessed at each iteration using k-fold cross-validation. Model tuning resulted in the use of 300 trees in each of the models, and mtry values of 3, 5, and 6 for the PB, OB and combined models respectively. Per-class variable importance in each of the 3 models was determined using the permutation importance measure, and partial dependence plots for the three highest ranked variables in the best performing model were created using the randomForest package in R (Liaw and Wiener 2002).

Model evaluation

Model accuracy was determined by comparing predicted classifications for each of the models against an independent test set of classified observations withheld from model training. As the error metrics for each model were derived from the same test set of observations and therefore were not independent, formal testing of between model accuracy was carried out using a pairwise bootstrapping approach in the multiagree package in R (Vanbelle and Albert 2008) using 999 sampling iterations of the data. The approach, an extension of the resampling method proposed by Mckenzie et al. (1996), draws repeated samples (with replacement) from the validation data and estimates the difference in the kappa coefficient of agreement \(\left( {\widehat K} \right)\) for pairs of models at each iteration. The test statistic is distributed as Hotelling’s T2, under the null hypothesis that there is no difference between classifications (H0 : \(\widehat K1 = \widehat K2\)) (Vanbelle and Albert 2008).

Results

Mapping of sediment samples revealed that Refuge Cove was dominated by sand substrata, with increases in the proportion of mud and gravel from samples taken in the south and north-west, respectively (Fig. 5). The increased presence of mud in the samples coincides with the marine input of a small freshwater creek in the south of the cove. The Ordinary Least Squared regression revealed that mean and standard deviations of sediment grain size have a significant, moderately strong, and positive relationship with backscatter intensity values for both the pixel and object based derivatives (Fig. 3). A similar relationship was observed for the proportion gravel in a sample, with higher backscatter values associated with larger proportions of gravel. In all three instances correlations (i.e. R2) were stronger for object based backscatter than for the pixel based alternative (Fig. 6). No significant relationships between backscatter datasets and proportion of mud or sand in a sample were observed. Whilst the sediment analysis was informative in defining relationships between sediment grain size and backscatter intensity the modelling component focused on mapping the 5 broad habitats that characterised the site.

Fig. 5
figure 5

Artificially illuminated backscatter mosaic of the Refuge Cove study site showing distribution of sediment gravel sand and mud proportions for each of the grab sampling sites. Samples containing muddy sand at the southern end of the bay are located near the outflow of a small watercourse and coincide with seagrass habitat composed of Zostera. Sp

Fig. 6
figure 6

Ordinary least-squared regression plots of object-based (OB) and per-pixel (PP) backscatter intensity against mean grain size (µm), standard deviation of grain size (µm), and proportion gravel content of sediment samples (n = 18)

Benthic habitat maps were created for the Refuge Cove study site using habitat/environment relationships derived from three random forests models, each using either PB, OB or combined sets of MBES derived input features. Map accuracy was determined by comparing predicted classes against a spatially independent reference dataset that was not included for model training. Overall accuracies and Kappa statistics (\(\widehat K\)) for each of the classifications were generally good at 72.5% (\(\widehat K\) = 0.62), 78.5% (\(\widehat K\) = 0.70) and 83.6% (\(\widehat K\) = 0.78) for the PB, OB and combined classifications respectively (Figs. 7, 8). Bootstrap comparison of \(\widehat K\) between models revealed that the combined model performed significantly better than the OB model (T2 = 0.92, p = 0.013, α = 0.05), and the PP model (T2 = 5.31, p = 0.023, α = 0.05), but we found no significant difference between classifications from the OB and PB models (T2 = 1.63, p = 0.204, α = 0.05).

Fig. 7
figure 7

Error matrices for Pixel based (PB), Object based (OB) image analysis and a combined approach showing overall, User’s and Producer’s accuracies. Agreement charts on the left of the figure provide a visual representation of proportional class agreement. Total possible agreement is indicated by the larger rectangles. The black squares within represent the total agreement of each class with the model validation data and the relation of the dark squares to the diagonal line indicates marginal heterogeneity i.e. when the marginal totals are the same, the squares fall along the diagonal. Validation data used to construct the tables were derived from drop-video frames that were spatially independent of the model training data

Fig. 8
figure 8

Classification maps of a pixel based (PB) classification, b object based (OB) classification, c hybrid classification

Differences in class specific classification accuracies, as represented in the error matrix and associated agreement charts (Fig. 8) show similar patterns in accuracy and misclassifications for the macroalgae (ALG), filamentous mat (FMAT) and bare sediment (NVB) classes in each of three models. Accuracy of the predicted seagrass classes (SGAM and SGZ), on the other hand, varied substantially. The best performing model in this respect was the combined model incorporating both object and pixel based predictors with Producer’s accuracy (model sensitivity) of 85.7% for both the A. antarctica (SGAM) and Zostera sp. (SGZ) dominated seagrass classes. In comparison, Producer’s accuracies for the pixel and object based models for the SGAM class were 79.9 and 57.1%, and for the SGZ class only 0 and 33% respectively.

According to the benthic habitat map derived from the best performing combined model (Fig. 8), the Refuge Cove site is characterised by macroalgae dominated communities on the fringing boulder reefs to the north and south-west of the site. Extensive beds of the seagrass A. antarctica characterise the north-western arm of the bay, while shallow, muddy sands coincident with the outflow of a small creek in the south of the bay contain beds of sparse to medium Zostera sp. The remainder of the bay is characterised by sandy to gravelly unconsolidated sediments, which in shallow, sheltered areas are covered by a mixed filamentous mat of microalgae and diatoms. Observable difference between the habitat characterisations are relatively minor. Confusion between the seagrass classes, and NVB and FMAT classes is evident in the ‘salt and pepper’ striping artefacts in the PB map, and in the OB map led to an underestimation of seagrass cover when compared with the field observation data.

Class-specific variable importance was ranked from highest to lowest (Fig. 9) based on overall mean decrease in accuracy measures from the random forest models (Liaw and Wiener 2002). The measure indicates the decrease in accuracy resulting from the omission or permutation of a given variable in each classification tree averaged over all trees. Higher values of the mean decrease in accuracy measure therefore represent greater importance of the variable in the classification process. The predictor variables exhibiting the greatest mean decrease in accuracy for each model were bathymetry, backscatter and rugosity (PB model); bathymetry (mean), backscatter (mean) and rugosity (sd) (OB model); rugosity (sd), mean backscatter and bathymetry for the combined model. Compared to rugosity (sd), which was the single most important variable in the OB and combined classifications, the rugosity variable in the pixel based model had lower importance to classification outcomes, despite the relatively high correlation (0.71) of the two variables. Rugosity (sd) was the most important predictor in classification of the ALG and SGAM classes, while bathymetry (PB) was important to differentiation of the FMAT and SGZ classes.

Fig. 9
figure 9

Variable importance plots for each classification showing the relative contribution of variables for each benthic habitats per classifier

The relationship between predictor variables and model class selection was examined using partial dependence plots. Partial dependence plots for the three most important predictors (Fig. 10) represent the effects of each predictor variable on class response, while averaging out the influence of all other variables. Rugosity (sd) (OB) was selected for discriminating both of the seagrass classes (SGAM and SGZ) at lower values of rugosity and the ALG class at higher values of rugosity. This suggests that the variable captures both fine-scale morphological variation in the sediments associated with the seagrass classes, and also broader-scale seabed roughness in areas of medium to high profile algal dominated reef. Photosynthetic classes represented by SGAM, SGZ and FMAT were most often separated from deeper bare sediments (NVB) by the bathymetry (PB) variable. Likewise lower values of backscatter mean (OB) were influential in the model for the NVB and seagrass classes, whereas high values of backscatter tended to define reef which was almost universally occupied by the ALG class.

Fig. 10
figure 10

Partial dependence plots for the three variables (Object based mean backscatter intensity; Pixel based bathymetry; Object-based standard deviation of rugosity) of highest importance in the hybrid classification approach for each habitat class. Each point on the partial dependence plot is the average vote percentage in favour of the class across all observations, given a fixed level of the predictor variable. That is, plots provide an indication of the relative importance of each predictor to selection of a given class over the predictor’s range

Discussion

Benthic habitats form the building blocks of coastal marine ecosystems. They contain a large proportion of diversity in our oceans and provide a range of ecosystem services from supporting fisheries, providing coastal protection, providing physical and biological resources and regulating climate. The basic need for an understanding of the geology, land use and land cover in terrestrial environmental management has long been recognized. As we strive to improve the potential for sustainably managing marine coastal resources, there is a pressing need to fill similar knowledge gaps. Central to this approach is the development of robust classification techniques to derive reliable maps that accurately reflect the location and extents of habitats which are important for informing marine spatial planning and management. This remains a challenge with MBES sensors. Whilst terrestrial approaches using satellite data often working with calibrated and compensated data this is uncommon with acoustic datasets, specifically in backscatter analysis (Lamarche and Lurton 2017). This poses a problem when developing time series for change analysis or transferring acoustics to observation relationships in other locations.

OB image analysis is gaining popularity for analyzing marine acoustic datasets for benthic habitat mapping (Diesing et al. 2014; Hasan et al. 2012a, 2014; Lucieer et al. 2013; Lucieer and Lamarche 2011). However, the comparison between existing PB and more recently adopted OB approaches has been limited. In this study we examined the relative merits of these two image based approaches to characterizing benthic habitats. We compared the accuracy and interpretability of Random Forests classifications derived using each approach for 5 broad, benthic habitats in a small embayment in Eastern Victoria, Australia. We found that OB achieved a higher overall accuracy that the PB approaches, however, differences were not discernible statistically. Notably the major difference between the models classes is evident in the ‘salt and pepper’ striping artefacts in the PB map compared to the OB map outputs which provided more clearly defined habitat boundaries. We found that a model incorporating elements of both approaches proved to be significantly more accurate that OB or PB methods alone. The results suggest, at least for the present case-study, that the approaches are not mutually exclusive, and that the specific advantages of each approach can be incorporated into a single modeling exercise. By combining ecologically relevant layers derived using both approaches, we maintain the richness of the acoustic information whilst incorporating the segments informed by grouping of these pixels using OB approaches. The improved classification accuracy achieved indicates that combining the benefits associated with PB and OB image analysis in this study may hold relevance for benthic habitat mapping approaches elsewhere.

Whilst there will always be a need for expert interpretation, automated approaches pose many advantages in terms of their ability to extract information from highly dimensional data that would be difficult to visually interpret in a repeatable or timely fashion. In comparison to manual approaches, automated PB and OB classification approaches reduce subjectivity and allow for scalable work flows with limited impact on time and cost. As volumes of seabed mapping data available for habitat mapping applications increase with wider and less costly access to technology, there is a greater need for the development of repeatable and automated approaches that can take advantage of the benefits that “big data” can provide in terms of understanding how patterns of habitat distribution can be linked to the processes that drive them. Well established PB methods have their roots in satellite land cover mapping and in a benthic habitat mapping context have been widely used to derive secondary products from gridded sonar returns at a range of spatial scales (Diesing et al. 2016). PB methods have their advantages in terms of maintaining the richness of the acoustic derived signal in the two dimensional plane, thus preserving the inherent spatial scale of the data (25 cm in this study). However, they are also prone to capturing artefacts. This is especially relevant when we consider habitat mapping classes and how they are represented in high-resolution data now possible in shallow coastal waters where we may inherently capture elements of within class variability (Calvert et al. 2015; Micallef et al. 2012; Montereale-Gavazzi et al. 2016). The resolution we are now able to achieve is likely to be better than the positional accuracy that can be achieved with groundtruth systems (Rattray et al. 2014). Thus we may be introducing modelling uncertainty by propagating errors associated with positional error and the sensor field of view when developing acoustic signatures for training or when validating classification results. OB approaches may assist in better defining potential class boundaries when integrating with ground truth data (Mitchell et al. 2017). For example in this study validation locations deployed using a drop video system were targeted at the centre of selected segments for ground truth to limit the potential for confusion between class boundaries. Performance is also likely to be driven by the heterogeneity of the benthic habitats present. With the ability to collect sub meter resolution data with MBES systems we are at a point where the pixel may be smaller than the benthic habitat feature of interest, thus potentially capturing within class variation. With increasing resolution of the imagery captured by remote sensors, the problem is likely to become more prevalent in both terrestrial and aquatic systems (Blaschke et al. 2014). When classifying remote sensing data at very high resolution, incorporating features that those pixels create rather than in isolation may provide advantages in terms of class discrimination.

OB image analysis approaches generalise the pixel-scale acoustic information into discrete segments that act as a defined minimum mapping unit (see review by Blaschke 2010). These segments have the additional advantage that summary statistics can be extracted for each segment (central tendency and variation). A distinct advantage is that segments are inherently linked to specific scales of features in the environment (Phinn et al. 2012). OB approaches provide new opportunities for exploring marine geomorphometry using seabed mapping data at multiple spatial scales (Lecours et al. 2016) that can in turn inform acoustic characteristics influencing habitat distributions or areas of high biodiversity significance. Object based segments also provide discrete regions to extract additional features from MBES data such as angular response which captures the acoustic backscattering strength with the angle of incidence of the acoustic signal at the seafloor (Lurton 2010). Hasan et al. (2014) demonstrated how integrating angular response metrics for segmented features derived from the backscatter mosaic improved the predictive performance of habitat maps with best results integrating bathymetry, mosaic and angular response features in the classification process. OB features derived from the backscatter mosaic in the current study reduce the speckle and nadir noise common in backscatter mosaics, likely contributing to uncertainty in the automated classification process. One of the advantages underlying the Random Forests model is the ability to handle highly dimensional data allowing incorporation of a broad range of predictors with little or no impact on model robustness or performance.

The Random Forests approach provides the means to explore variable importance across models and the relationship of individual variables with benthic habitats. In the present study, variables of primary importance to classification accuracy were rugosity (SD), pixel bathymetry and backscatter (OB mean). However, the relative importance varied for each habitat type across the three models. For example, rugosity (SD) was a variable of high importance for characterizing the distribution of the seagrass class A. antarctica. The partial dependence plots reveal intermediate rugosity values having a high association with this class at the feature level. Feature rugosity (OB SD) appears to be capturing the fine-scale micro roughness captured in the imagery associated with the seagrass canopy and distinctive root complex thereby differentiating it from adjacent reef and sediment habitat which would be difficult using a pixel based approach alone. Feature rugosity (OB SD) was also a variable of primary importance in the classification of algal dominated reefs, with high feature rugosity (OB SD) values likely capturing the structure of the complex granitic reef supporting macroalgae dominated habitats. It appears that this variable is capturing these two relationships which are occurring at different spatial scales relevant to the benthic habitats observed.

Bathymetry was a variable of high importance in explaining habitat distributions, in particular for the two classes defined by the presence of seagrass classes and the filamentous algal mat. This is not surprising as bathymetry acts as an indirect proxy of light availability, which limits the depth at which the basic requirements for photosynthesis can be met. In high energy environments depth is also a major factor in the exposure of benthic environments to wave action, acting as an important modifier of distributions of biological communities (Rattray et al. 2015). Pixel based bathymetry was the only pixel variable that was more important than its object-based counterparts. As MBES systems are principally designed for capturing bathymetry, survey artefacts are typically less apparent than derivative products using analysis windows, especially where there are areas of limited relief.

Feature backscatter mean was also found to considerably improve class differentiation, especially for the no visible biota and microalgae classes where this variable contributed considerably more than bathymetry. The fact that the feature backscatter mean has the appearance of a noise removed version of the backscatter mosaic probably also contributed to the success of this feature. In effect, the speckle and nadir noise commonly displayed in backscatter mosaics are likely to be responsible for errors in the classification process, and, hence, reduce the relevance of the backscatter mosaic to the classification.

One of the main issues experienced when creating benthic habitat maps is accounting for spatial autocorrelation (SA) in the accuracy assessment points because errors in one location can positively or negatively affect errors in nearby locations (Campbell 1981). Neighboring points along transects are inherently spatially autocorrelated due to the clustering of habitats (Kendall et al. 2005). SA is not necessarily an issue for data input for training of the ensemble classifier in this study, it is important to ensure the validation data are statistically independent. Through our assessment of SA in the classified video footage from the AUV tracks, we found significant SA at distances up to ~ 250 m. By using bathymetry and backscatter derivatives to account for some of the spatial variation, we were able to decrease SA to 50 m in the validation dataset. Stratification of drop video samples across the cluster analysis results of the OB segments with a minimum distance of 50 m allowed us to capture the acoustic variability across the site with observation data and remove any effects of SA.

Conclusion

This study highlights the potential of combining the discriminatory power of PB and OB image analysis approaches for benthic habitat mapping studies. We show that classification accuracy can be significantly improved using a combined approach rather that OB or PB methods alone. To our knowledge this is the first time PB and OB approaches have been integrated for benthic habitat mapping. These approaches are generally performed in isolation and more testing is required to determine whether the benefits observed in this study have similar advantages for benthic habitat mapping studies elsewhere. We also highlight that bathymetry and backscatter data both contribute as important variables in modelling the distribution of the benthic habitats observed. The ensemble approach employed has flexibility in terms of variables that can be used in the modeling process. Whilst generally benthic habitat mapping studies focus on the use of seabed mapping products there are clear opportunities to integrate variables such as MBES water column and oceanographic variables into the classification process in future studies. Machine learning approaches provide a way forward, capable of handling large data volumes and a framework for repeatable and objective classification approaches.