Introduction

Fisheries acoustics is a tool used by scientists and fishermen alike [1,2,3]. Fishermen often use active-acoustic instruments (e.g., echosounders) to support their search for high-quality fishing grounds, to determine bottom depth, and in some situations to derive information about the seabed structure [4]. Fisheries scientists commonly use active-acoustics to estimate stock abundance and biomass of marine fish species [5,6,7,8]. Due to the growing interest of fishers and scientists in active-acoustic technology, it is unsurprising that many modern fishing vessels are equipped with scientifically rated echosounders [1, 4, 9,10,11]. Such echosounders can be calibrated and are capable of recording raw acoustic data with a minimal amount of internal data manipulation [1, 4, 10]. Due to their time at sea and distances travelled, opportunistic recording of acoustic data by fishers at sea can deliver new insights into the distribution and structure of the targeted fish species in the area. Such data can be used to compliment data collected during existing scientific monitoring programmes, to improve the spatial and temporal effectiveness of these surveys [4]. Where dedicated scientific surveys provide good snapshots of the presence and abundance of fish contained within the surveyed region at the time of the survey, data collected on vessels of opportunity (e.g., fishing vessels) can give a broader picture on a wider temporal and spatial scale [4]. Deriving information such as density hotspots [12] or geostatistical indices [13,14,15] of this data can then support an informed decision process on surveying strategies.

In the higher latitudes (north and south), in temperate climate regimes, fisheries acoustics have become a standard tool for monitoring many pelagic fish stocks [8]. These fish species generally form single-species aggregations, making them ideal for acoustic monitoring. In contrast, in tropical ecosystems, fish often occur in mixed-species schools with a much higher species diversity and, as such, species-specific acoustic assessment is more difficult [16,17,18,19]. While post-processing techniques based on multi-frequency acoustic data are able to distinguish between different functional groups, such as phytoplankton, zooplankton, swimbladder and non-swimbladder fish, distinguishing between similar species in mixed schools is more challenging [17, 20, 21]. Although supervised (e.g., feed forward neural networks [22, 23]) and unsupervised (e.g., clustering [16, 24], random forest classification [25] and self-organising maps [26, 27]) methods for acoustic target classification exist, their application remains limited. Most classification methods used in fisheries acoustics are empirical, for example, the use of classification feature libraries [28] or frequency response characteristics [17, 29]. These methods are both data-driven and dependent on expert judgement [30]. They generally require data collected by additional sampling methods [31]. Unsupervised or supervised modelling approaches to target classification are advantageous in situations where no, or limited, dedicated alternative sampling observations are available [16].

Moving towards an ecosystem approach in fisheries management (EBFM), the identification of different habitats and their association with different marine species is important [32, 33]. Fisheries acoustics is considered as being one of the main tools to provide the basis for EBFM [19, 32, 33]. In addition to the detection of fish, active-acoustics can be used to derive information about seabed characteristics [34,35,36,37]. Seabed properties have been shown to play an important role in the habitat description of demersal and semi-demersal fish species [38,39,40,41,42]. An enhanced understanding of the distribution of acoustic density hotspots and fish school characteristics, in conjunction with habitat characteristics, has the potential to improve the monitoring and management of mixed-species fisheries in tropical environments [43, 44]. Here, we use acoustic and catch information collected on a commercially operating trap fishing vessel to identify density hotspots, describe acoustic diversity of fish schools and identify different acoustic seabed habitats within three fishing regions. Defined clusters of school and seabed descriptors were then linked to catch information to investigate the ecological meaning of the clusters.

Methods

Study Area

The Northern Demersal Scalefish Fishery (NDSF) is a mixed-demersal trap fishery which encompasses an area of 408,400 km\(^{2}\). In the northwest of Australia, off the coast of Broome, the NDSF extends to the shelf edge close to the Indonesian border. All data were collected on board FV Carolina M, a 15 m trap fishing boat, during normal commercial operations. In this study, we focussed on three fishing regions where simultaneously collected acoustic and biological information was available. Details on the total area of the three fishing regions, delimited by manually drawn polygons, and the period of data collection are given in Table 1, with additional information found in [11, 45]. Fishing regions were defined as areas where high densities of fishing and acoustic data were available. An overview of the locations of the three fishing regions and acoustic recordings are given in Fig. 1.

Table 1 Summary of the three fishing regions
Fig. 1
figure 1

Map of a part of the NDSF area off Broome in Western Australia. The location and extents of the three selected fishing regions are indicated by the three white polygons. Locations of acoustic recordings are shown as red dots

Biological Sampling

Catch information on all specimens was obtained by a GoPro Hero 3 camera mounted on each trap as it was hauled on board (see [11, 45] for details). The downwards looking camera, facilitated recording of all fish caught within the trap before the catch was split into commercially relevant species or returned to sea. The optical recordings of the catch facilitated counting of fish per species group. Length measurements of the individual fish was based on pixel counting, calibrated through the known, constant mesh size of the traps. Calibrated video recordings were corrected for the built-in wide angle distortion, and subsequent processing was performed using a custom built software package, FishVid [11, 45]. Specimens were categorised into nine groups [11, 45] (for simplicity referred to as species groups): Goldband snapper (Pristipomoides multidens), Red Emperor (Lutjanus sebae), Saddletail (Lutjanus malabaricus), Lutjanids (members of the Lutjanidae family, other than Saddletail, Red Emperor or Goldband snapper), Lethrinids (members of the Lethrinidae family), Rankin Cod (Epinephelus multinotatus), Cods (members of the Epinephelidae family other than Rankin Cod), Triggerfish (members of the Balistidae family) and a miscellaneous group containing all other species.

Acoustic Data Processing

All acoustic data were collected (Fig. 1) using calibrated [46] hull-mounted SIMRAD ES-70 split-beam echosounders operating at 38 and 120 kHz. All settings were the same as those used during normal commercial operations. A detailed description of these settings and the acoustic processing steps can be found in [11, 45]. All acoustic processing was conducted in Echoview 7.0 [43]. Ping geometry and times were matched for 38 and 120 kHz [47, 48]. Effects of impulse noise (mainly caused by non-synchronised echosounders), transient noise (mainly caused by poor weather conditions) and background noise (mainly caused by the vessels engine and bad weather [49, 50]), were minimised through adaptation of filters described by Ryan et al. [50].

All fish observed in the catch possessed a swimbladder. This allowed for the application of a simple bi-frequency algorithm to differentiate between fish and fluid-like (e.g., plankton) targets [45, 51, 52]. Volume backscattering coefficients (\({s_v}\); m\(^{2}\) m\(^{-3})\) of acoustically detected fish schools were integrated (\(s_A\); m\(^{2}\) nmi\(^{-2})\) and averaged over 1 nmi by 10 m grid cells, starting at a depth of 10 m (outside the near-field) to 1 m above the seabed (to avoid inclusion of seabed echoes). Effects of the acoustic deadzone were compensated through the application of methods described by Ona and Mitson [53].

Acoustic Density Hotspots: A Geostatistical Approach

In this study, acoustic density hotspots (hereafter referred to as hotspots) are defined as areas with high concentrations of \(s_{A}\). These hotspots are considered as proxies for areas of higher than average fish densities. Generally, hotspots are identified through the application of a subjective threshold [54] which are mostly based on the cumulative distribution function of the data [55] or defined through a kernel [56]. Here, we apply a local, nonlinear rule for identifying hotspots [12]. Thresholds are based on the spatial relationship between data points above a cut-off value and those below the cut-off value [12]. Cut-off values are based on local transition probabilities, which can be temporally and spatially variable [12].

Table 2 Description of the eleven acoustic school descriptors used as inputs into the robust sparse k-mean clustering algorithm used to define school clusters

Seven \(s_{A}\) cut-off values (0.01, 10, 50, 100, 200, 400 and 600 m\(^{2}\) nmi\(^{-2})\), ranked from one to seven, were used. For each of the \(s_{A}\) cut-off values binary indicator sets (1 if above \(s_{A}\) cut-off; 0 if below \(s_{A}\) cut-off) and variograms were generated. The ratio between two indicator variograms of a lower first \(s_{A}\) cut-off value and a higher second \(s_{A}\) cut-off value provides the transition probability of moving from the indicator set defined by the lower \(s_{A}\) cut-off value into the indicator set defined by the higher \(s_{A}\) cut-off value. The variogram ratio is defined over the distance described within the variogram models. If the variogram ratio increases with distance, the indicator set defined by the higher \(s_{A}\) cut-off value tends to be positioned in the central part of the indicator set defined by the lower \(s_{A}\) cut-off value. If the variogram ratio is flat (pure nugget), the data points contained within the indicator set of the higher \(s_{A}\) cut-off value are randomly distributed within the indicator set of the \(s_{A}\) lower cut-off value, the geometries of both indicator sets are spatially uncorrelated (no edge effect). If the \(s_{A}\) cut-off is high, the variograms tend to be described by a pure nugget effect (unstructured data), due to destructuration [57]. The \(s_{A}\) cut-off is defined as the lower value of the pair where no edge effect is observed and above which the residuals are ideally pure nugget. This \(s_{A}\) cut-off value becomes the top cut in the model described in detail by Rivoirard et al. [58]. Ordinary kriging was used to map the hotspot probabilities and indicator kriging was used to clearly differentiate the hotspots.

Acoustic School Descriptors

A set of eleven school descriptors (six geometric, four energetic (for both frequencies 38 and 120 kHz) and one bathymetric) were extracted for each school from the acoustic data, largely following the methods described in [16] (Table 2). The geometric descriptors were mean height [height mean (m)], length [corrected length (m)], area [corrected area (m\(^{2}\))], perimeter [corrected perimeter (m)] and roundness of the school (Image compactness) (Table 2). Energetic descriptors were mean backscattering volume [\(S_{v}\) mean (dB re 1 m\(^{-1}\))], beam geometry corrected \(S_{v}\) mean [corrected MVBS (dB re 1 m\(^{-1}\))], maximum \(S_{v}\) [\(S_{v}\) max (dB re 1 m\(^{-1}\))] and skewness (skewness) (Table 2). The bathymetric descriptor was mean depth of the school [mean depth (m)] (Table 2). These descriptors were used to describe the characteristics of the fish schools to categorise them into different clusters.

Clustering

Clustering of acoustically detected fish schools, using school descriptors, was applied to each of the three fishing regions separately, as they were spatially and temporally disconnected. Only schools observed at an altitude of less than 20 m from the seabed were considered. This allowed for improved comparison with biological information and omitted analysis of pelagic fish species. Schools observed during night-time were excluded from clustering as diurnal migration patterns have been observed within the study area [45]. Times of sunrise and sunset within the study area were around 06:00 and 18:00, respectively [11] at the time of data collection.

An unsupervised clustering algorithm, robust sparse k-means clustering (RSKC) [59], was used to label the different school clusters based on the energetic, geometric and bathymetric descriptors. Ideally, clustered groups should represent biologically meaningful categories, for example, group together species that have similar morphology and which show similarities in their aggregation structure [16]. RSKC is a relatively new clustering algorithm which combines the trimmed k-means [60] and the sparse k-means [61] algorithms, which are both derivatives of the k-means algorithm. The main strength of the RSKC, compared to other k-means algorithms, is its robustness against noisy data containing outliers. This is useful for acoustic data where sudden large variations are not uncommon. Maximisation of dissimilarities is based on the squared Euclidean distance, rather than the normal Euclidean distance, giving more weight to points at a greater distance [59]. The algorithm assumes that the dissimilarity between clusters is additive, depending on the contribution of each descriptor. Using the Lasso method [62], a weight w, constrained to a tuning parameter \(I_{1}\) [1:\(\surd N_{\mathrm{features}}\)] is attributed to each descriptor in order to maximise the separation of clusters. Here, \(I_{1}\) was kept at a maximum, giving non-zero weights to all descriptors.

Any k-means algorithm requires the number of clusters to be defined a priori [24]. The number of clusters was selected using the “Clest” algorithm [59, 63]. The optimal number of clusters is based on the maximisation of the predictive power of the model, which is based on random subsamples of the data (random validation was executed 15 times). Validation of the predictive power is based on the Classification Error Rate (CER) [64]. The accepted optimal number of clusters is obtained through minimisation of the subtraction of the median CER for different numbers of clusters (CER\(_{\mathrm{Obs}})\) and the median expected CER under the null hypothesis, where the number of clusters equals one. The expected CER was computed using five different datasets generated through Monte Carlo sampling.

The similarities of the schools and the cluster they were attributed to were tested through revised silhouette values [65]. The revised silhouette plot and value represent a measure of cohesion, i.e., they describe how similar a school is to the cluster it is contained in compared to the other clusters. Revised silhouette values range from −1 to \(+\)1, where a value close to \(+\)1 has a high similarity with the cluster [65]. The influence of the different descriptors on the clustering was shown through Principal Component Analysis (PCA) [66]. The characteristics of the different clusters are illustrated through a parallel coordinate plot [67]. Indicator kriging was used to produce maps of the most dominant clusters within the three regions. Catch information was linked with school descriptor clusters through distance minimisation.

Habitat Description

To describe the habitat associated with the fish schools, seafloor characteristics were assessed using acoustic techniques. Seabed characteristics were determined based on the scattering properties of the first bottom echo on the recorded echograms. The seabed features exported from Echoview [68] were bottom depth, bottom roughness, first bottom length, maximum \(S_{v}\), bottom rise time, skewness and kurtosis (see below for details on each seabed feature). All seabed features, except for bottom depth, which was exported for every ping, were based on intervals of 15 pings to minimise measurement variability [35, 69].

Bottom depth was detected using the “best bottom candidate” algorithm in Echoview. The algorithm searches for peaks (i.e. the shallowest detections) within a ping window, containing eight pings in this study. If no peak is found, the average of the peaks in the surrounding ping windows is used. The maxima of the different retained peaks are summed, and the highest (i.e. shallowest) value is considered representative of the bottom response for that ping window. These points are connected to form the final bottom line which is then shifted towards the transducer until the detected value at individual points drops below the discrimination level (−50 dB re m\(^{-1})\). Finally, a backstep of 0.2 m is added.

The first bottom length refers to the total duration of the first bottom echo. Firstly, for a bottom echo to be considered valid, a minimum of three consecutive sample values above a given threshold (here −60 dB re m\(^{-1})\) are required. These consecutive sample values determine the beginning of the first bottom echo (i.e. bottom depth). The end of the first bottom echo is then determined using a bottom echo threshold at 1 m (dB). The first of three consecutive sample values below this threshold indicate the end of the first bottom echo.

Bottom roughness is determined from an integration of the tail energy of the first bottom echo [70, 71] as it is assumed that the energy contained in the first echo is mainly [36] dependent on the bottom roughness. A rough bottom will have an increased, more complex surface and therefore an increased integration interval (longer tail due to the delayed arrival of energy packets at the transducer face), resulting in an increased bottom acoustic roughness index. Smoother seabeds act more like acoustic mirrors, reflecting the incident energy directly to the transducer, resulting in a steep, sharp peak, with a small or no tail.

Kurtosis and skewness describe the shape of the probability distribution of sample values. Kurtosis sometimes referred to as “peakedness” is a measure of the variability in sample values in the first bottom echo and is defined by the “tailedness” of the data. This means that the higher the kurtosis, the higher the proportion of variance which is explained by extreme deviations. Skewness describes the asymmetry, i.e. how unbalanced the sample values are, or how the distribution of sample values deviates from a normal distribution towards either tail.

Maximum \(S_{v}\) is the maximum energy reflected by the bottom and can, to a certain extent, be considered as a proxy for density. For example, a dense substrate such as bedrock is likely to reflect more energy than a less dense substrate like sand and therefore have a higher maximum \(S_{v}\).

Bottom rise time is the rise time of the first bottom echo in the integration interval. Bottom rise time is mainly influenced by sudden drops or rises of the seabed.

Maps of the different seabed descriptors were generated using ordinary kriging. Seabed types were defined and classified using PCA analysis with k-means clustering. The ideal number of clusters (k) was determined using a combination of the Hubert score [72], D-Index [73] and Weighted Sum of Squares (WSS) [74]. For WSS, the number of clusters was determined through comparison of WSS against the number of clusters. The ideal number of clusters was located where WSS was minimised or a larger number of clusters contributed very little to the minimisation of WSS [74]. For the Hubert score (correlation coefficient between two matrices) [72] and the D-index [73], a knee point that corresponds to a significant increase of the measurement value was identified. The final number of clusters was defined through the majority rule, where the most frequently detected ideal number of clusters was accepted as the final number. Maps of the resulting seabed clusters were produced using indicator kriging.

The clusters were related to biological information (derived from catch data) through indicator species analysis [75], where the nine species groups were treated as potential indicators. Significance of the relationship was tested through a permutation test [76]. Indicator values are a statistical tool used to define which species can be seen as indicators of a given cluster [76] or a group of habitat clusters [77]. A significant benefit of the indicator values approach is that it combines mean abundance and occurrence frequencies of a given species within a cluster [24, 76,77,78]. Indicator values are a combination of two components called A and B [24]. A is the probability that a given cluster belongs to the target cluster, since the selected species group was detected (also known as specificity or positive predictive value). B is the fidelity or sensitivity, which is the probability of encountering the species group within a given cluster [24]. In order to assess the validity of indicator species for a given cluster, proportion coverage was computed [75]. Proportion coverage or quantity coverage is the proportion of sites where one of the indicators is found [75].

Results

Acoustic Hotspots

The highest mean \(s_{A}\) of 73.6 m\(^{2}\) nmi\(^{-2}\) [standard deviation (SD) 228.24 m\(^{2}\) nmi\(^{-2}\)] and highest percentage of zero values (50.9%) were observed in Region 2. Lower mean \(s_{A}\) were observed in Region 1 (54.0 m\(^{2}\) nmi\(^{-2}\), SD 120.5 m\(^{2}\) nmi\(^{-2})\) and Region 3 (51.8 m\(^{2}\) nmi\(^{-2}\), SD 165.7 m\(^{2}\) nmi\(^{-2})\). Region 3 had a percentage of zero values (45.2%) comparable to Region 2, while this percentage was much lower in Region 1 (25.2%). Within Region 1 and Region 2 the \(s_{A}\) cut-off value was 100 m\(^{2}\) nmi\(^{-2}\) which corresponded to the fourth hotspot indicator value. In Region 3, the region with the lowest mean \(s_{A}\), the \(s_{A}\) cut-off value was detected at the third hotspot indicator value with an \(s_{A}\) cut-off of 50 m\(^{2}\) nmi\(^{-2}\). In all three regions about half of the total areas were identified as hotspots (44% in Region 1, 52% in Region 2 and 51% in Region 3). The hotspot indicator, in all regions, was the last structured indicator defined by the selected \(s_{A}\) cut-off. For higher \(s_{A}\) cut-offs, no structure was detected and the variograms were described by a pure nugget. Hotspots were patchily distributed in the three regions (Fig. 2). The central part of Region 1 contained the main density hotspot with smaller patches observed in the north and south of the region (Fig. 2a). In Region 2, hotspots were mainly concentrated in the south of the region while they were largely absent in the north (Fig. 2b). In Region 3 hotspots were distributed as patches throughout the region (Fig. 2c).

Fig. 2
figure 2

Geostatistical hotspots of acoustic density within the three regions (a Region 1, b Region 2, c Region 3), with the probability maps on the left and the identified hotspots on the right (light hotspot, dark no hotspot), relative \(s_{A}\) is indicated by size of black circles

Acoustic School Descriptors

The optimal number of clusters determined by the Clest algorithm was two in Regions 1 and 3, and five in Region 2 (Table 3). The median observed Classification Error Rate (CER\(_{\mathrm{Obs}})\) was low in all three regions (0.03, 0.12 and 0.08 in Regions 1, 2, and 3, respectively; Table 3). During clustering, the energetic descriptors were the most important for all clusters in all three regions (Fig. 3; Table 4).

Table 3 Results of the Clest algorithm which identified three as the optimal number of school clusters (bold row), with k, number of clusters; \(d_{k}\), test statistic; CER\(_{\mathrm{obs}}\) and CER\(_{\mathrm{ref}}\), observational and reference Classification Error Rates respectively; p, probability of the absolute CER being higher than the CER under the null hypothesis
Fig. 3
figure 3

Biplot of the first (PC1) and second (PC2) principal components of the school clusters with circles indicating the 68% confidence intervals of the clusters obtained from robust sparse k-means clustering for the three regions (ac), with an indication of the pulling direction of the school descriptors, where mSv38, mean Sv at 38 kHz; mSv120, mean Sv at 120 kHz; SvMax38, maximum Sv at 38 kHz; SvMax120, maximum Sv at 120 kHz; MVBS38, corrected Sv at 38 kHz, MVBS120, corrected Sv at 120 kHz, H, height, S, skewness, L, length, R, image compactness, P, perimeter, T, thickness, A, area, D, depth

Table 4 Weights of the descriptors within the robust sparse k-means clustering (RSKM weights) defining the school clusters and contributions to the first (PC1) and second (PC2) components of the Principal Component Analysis
Fig. 4
figure 4

Revised silhouette plot for the three regions (ac), where each silhouette represents one cluster, composed of single lines, each representing a school. The y axis represents the revised silhouette value and the printed values are the mean revised silhouette value for each cluster

Fig. 5
figure 5

Parallel coordinate plot of the descriptors considered in the robust sparse k-means clustering for the three regions (ac), with scaled values on the y axis. Each thin line represents one school and the thick, coloured lines represent the scaled mean descriptor value of each cluster, where mSv38, mean Sv at 38 kHz; mSv120, mean Sv at 120 kHz; SvMax38, Maximum Sv at 38 kHz; SvMax120, Maximum Sv at 120 kHz; MVBS38, Corrected Sv at 38 kHz; MVBS120, corrected Sv at 120 kHz; H, height; S, skewness; L, length; R, image compactness; P, Perimeter; T, Thickness; A, area; D, depth

The first two principal components of the PCA explained 63.9% of the variation contained within the data in Region 1; 66.0% in Region 2 and 66.2% in Region 3 (Fig. 3). Principal component 1 (PC1) accounted for 44.5% of the variance in Region 1, 50.9% in Region 2 and 44.3% in Region 3. PC1 was predominantly driven by the energetic descriptors (Table 4). Principal component 2 (PC2) explained 19.4% of the total variance in Region 1, 15.1% in Region 2 and 21.9% in Region 3 and was mainly influenced by geometric features (Table 4; Fig. 3). All three clusters were mainly separated by PC1, hence mostly determined by energetic features. High agreement was found in Regions 1 and 3 with average silhouette values of 0.83 and 0.71, respectively (Fig. 4). The highest agreement was found for Cluster 1 in Region 1, with a silhouette value of 0.89. In Region 2, moderate to high agreement was found, with an average silhouette value of 0.57. The highest agreement in Region 2 was found for Cluster 2 with a silhouette value of 0.70 and lowest for Cluster 4 (0.44) (Fig. 4).

Given the high influence of the energetic descriptors on the cluster separation, the clusters can best be described by mean, corrected mean, max \(S_{v}\) and MVBS at 38 and 120 kHz (Fig. 5; Table 4). In Regions 1 and 3, clustering largely followed the same trends, with Cluster 1 being considered a high-energy cluster, while Cluster 2 was regarded as a moderate to low-energy cluster (Fig. 5). In Region 3, schools within Cluster 1 were generally found to occupy a much larger area, with a slightly greater thickness, a larger perimeter and of marginally greater height (Fig. 5; Table 5).

Table 5 Summary of the mean, and standard deviation (SD) of the considered descriptors within each cluster and region
Table 6 Total number of eligible acoustic schools and number of traps recorded in each region with the percentage of the area (% area) of dominance of the school clusters and percentage of traps (% traps) taken within each cluster for the three regions
Fig. 6
figure 6

Indicator kriging maps of the occurrence of the school clusters in the three regions (ac)

In Region 2, one high-energy cluster (Cluster 3) and one low-energy cluster (Cluster 2) could be identified (Fig. 5; Table 4). Clusters 1, 4 and 5 were considered moderate energy clusters, with Cluster 1 mainly differentiated through higher energetic values at 120 kHz (Fig. 5; Table 4). Cluster 4 was mainly separated from other clusters due to more elongated and larger schools, with a larger perimeter (Fig. 5; Table 4).

The percentage of traps taken and the percentage of schools observed, generally agreed well (Table 6). In Region 1, the low-energy Cluster 2 was dominant in terms of area coverage (76.8%) (Fig. 6), amount of schools (73.6%) and traps (66.0%) (Table 6). In Region 2, almost half of the area was dominated by the high-energy Cluster 3 (49.2%) (Fig. 6), encompassing 19.4% of the observed schools and 37.7% of the recorded traps within this region (Table 6). In Region 3, the two clusters were more evenly spread, with 52.4% of the area covered by the high-energy Cluster 1 (Fig. 6) (55.6% of the traps, 41.1% of the schools) (Table 6).

Indicator species could be detected for some of the school clusters at a significance level of 5% (Table 7). No significant indicator species could be detected for Cluster 2 in Region 1 and Clusters 4 and 5 in Region 2. In general, high A values were observed (up to 0.86), while B values remained low (<0.45 except for Cod in Cluster 1, Region 3, where \(B = 0.60\)) (Table 7). If considering only one species group, Triggerfish were the only indicator species group for Cluster 1 in Region 1. In Region 2, Lutjanids were detected as an indicator species for Cluster 1, Rankin Cod for Cluster 2 and Lethrinids for Cluster 3 (Table 7). In Region 3, Cod was detected as an indicator species, while no singular indicator species could be detected for Cluster 2 (Table 7). If a combination of up to three species groups was accepted, combinations of Misc, Goldband and Lutjanids with Triggerfish were detected as indicator species (Table 7). In Region 2, for Cluster 2, various combinations including Miscellaneous, Rankin Cod, Red Emperor, Cod and Triggerfish were obtained (Table 7). For Cluster 3, Lethrinids and/or Red Emperor, with or without Triggerfish were accepted as indicator species groups. In Region 3, for Cluster 1 a combination of (1) Lutjanids, Miscellaneous and Cod were considered indicative and combinations of Goldband with (2) Red Emperor and Triggerfish; (3) Saddletail, Lutjanids and Triggerfish; (4) Red Emperor and Saddletail or (5) Saddletail and Triggerfish for Cluster 2 (Table 7).

Table 7 Indicator species groups associated with the respective school cluster within the three selected regions with the corresponding indicator value (indval), the p value (p) and the indicator variable A and B
Fig. 7
figure 7

Kriged bottom depth maps for the three regions (ac)

Fig. 8
figure 8

Kriged bottom roughness maps for the three regions (ac)

Fig. 9
figure 9

Kriged bottom kurtosis maps for the three regions (ac)

Habitat Description

Region 1 was the deepest (120–130 m) of the three regions throughout, while Region 2 was the shallowest (61–90 m) (Fig. 7; Table 1). In Region 1 the deepest parts were observed in the north, gradually decreasing southwards and eastwards (Fig. 7). Within Region 2 the deeper areas were in the west getting shallower towards the east, while in Region 3 the deepest part was found in the central area (Figs. 6c, 7b respectively). Regions 2 and 3 showed similar roughness indices with values over 7 for the majority of the areas (Figs. 7, 8b, c respectively). Region 3 contained only a small channel of less rough seabed in the central part (Fig. 8c). Region 1 appeared to be less rough with maximum roughness values of around 7.5 (Fig. 8a). First bottom length and maximum \(S_{v}\) showed similar trends and distributions of patches to bottom roughness with lowest values of each seabed feature observed throughout Region 1. Maps for bottom skewness and kurtosis were almost identical to one another for the three regions (Fig. 9). Like the other seabed features the lowest values for skewness and kurtosis were observed in Region 1 and similar values, with well defined, patchy hotspots found in Region 2 and 3 (Fig. 9).

Fig. 10
figure 10

Biplot of the first (PC1) and second (PC2) principal components of the bottom clusters, with Depth, depth; BRT, bottom rise time; K, Kurtosis; S, Skewness; FBL, first bottom length; SvMax, maximum Sv; R, roughness. with circles indicating the 68% confidence intervals of the clusters obtained from k-means clustering

Fig. 11
figure 11

Radial plot highlighting the mean value of the bottom descriptors scaled around its own mean for the three bottom clusters, identified by colours

According to the majority rule, the ideal number of clusters for the bottom classification within the three regions was three. The first two principal components (PC1 and PC2) of the PCA explained 73.9% of the variance (Fig. 10). PC1 (explained 46.6% of variance) caused separation between the three clusters with some overlap found between Cluster 1 and Cluster 2. PC2 increased the difference between Cluster 1 and Cluster 2 as well as removing the overlap between Cluster 2 and Cluster 3 (Fig. 10). Cluster 1 mainly occurred at greater depths, in areas with low roughness, first bottom length, bottom rise time, maximum \(S_{v}\), kurtosis and skewness (Fig. 11). In short, Cluster 1 describes a deep, smooth seabed. Cluster 2 was predominantly characterised by high levels of roughness and maximum \(S_{v}\), but low bottom rise time, skewness and kurtosis (Fig. 11). This suggests that Cluster 2 describes a rough bottom with low variance within the sample values. Cluster 3 contained high levels of all descriptors except for depth (Fig. 11). Cluster 3 describes a rough, complex bottom, with high inter-sample variance.

In Region 1, only Cluster 1 was found (Fig. 12a). Almost half of the area of Region 2 was attributed to Cluster 2 (53.4%) and the other half to Cluster 3 (46.6%) (Fig. 12b). Region 3 contained all three clusters, but was strongly dominated by Cluster 2 (85.6%), with some small areas identified as Cluster 3 (7.8%) and a smaller proportion of the area attributed to Cluster 1 (6.7%) (Fig. 12c).

If only one species group is considered as an indicator species group at a time, Goldband snapper and the Miscellaneous group was identified as an indicator species group for habitat Cluster 1. Four groups were considered indicator species groups in Cluster 2 (Lethrinids, Red Emperor, Cods and Lutjanids) and two indicator species groups were identified within Cluster 3 (Triggerfish and Rankin Cod). If a combination of up to three species groups are considered, 129 species group combinations present in the data, could be tested. Sixty-nine combinations, each containing up to three species groups were significantly detected as indicator species groups for one of the three habitat clusters. A summary of the most relevant indicator species groups (\(A> 0.5, B > 0.2\), indicator values >0.3, \(p < 0.01\)) can be found in Table 8.

Fig. 12
figure 12

Indicator kriging maps of the three bottom clusters within the three regions

Table 8 Most relevant indicator species groups, composed of up to three species groups, with a probability (p) < 0.01, an indicator value (indval) >0.3, an A value >0.5 and a B value >0.2 for the three acoustic habitat clusters

Discussion

The concept of hotspots is an important part of conservation and spatial management strategies [54]. As acoustic data can be collected over large areas in short time periods, acoustics are a valuable method for detecting hotspots [12] within large survey areas [32]. Here, it was shown that within the three fishing regions examined the acoustic density hotspots of fish aggregations could be identified. The identification of acoustic hotspots is a relatively easy metric to extract from acoustic data. If this data was repeatedly (e.g., seasonally or annually) collected over very large areas, it would be possible to track differences in the spatial structure of fish in the area [12].

Almost half of the areas in the three regions were considered hotspots, which is unsurprising given they represent fishing regions directly targeted by the fishers. The distribution of acoustic densities revealed that despite the small size of the three fishing regions the distributions of high density areas were patchy rather than uniform. The presence of such spatial structures indicates the presence of different habitat types, which are likely to encompass different organisms with their specific habitat selection criteria [79, 80]. For all three regions, the area of detected hotspots increased for lower \(s_{A}\) cut-off values, an indication for the presence of lower \(s_{A}\) values over significant parts of the regions, i.e., the regions are not solely composed of zero or high \(s_{A}\) values. Region 1 contained the lowest percentage of zero values, and the lowest percentage of hotspots (Fig. 2; Table 1). This suggests that fish schools in Region 1 are distributed more evenly. The main hotspot observed in the central part of the area coincided largely with the area of highest bottom kurtosis and roughness (Figs. 289). In contrast, Region 2 was characterised by the highest mean \(s_{A }\) values and the highest percentage of zero values. Hotspots in Region 2 were patchier compared to Region 1 (Fig. 2). The seabed descriptor and hotspot maps show that the distribution of hotspots within Region 2 are largely influenced by high bottom roughness (Figs. 28). This suggests that fish schools are concentrated around the more rugged areas of Regions 1 and 2 [81]. In coral reef ecosystems, areas of higher roughness are often linked to coral patches, which often have higher species richness and attract more individuals resulting in increased biomass when compared to surrounding areas [82,83,84,85,86]. The distribution of hotspots in Region 3 are patchier than Regions 1 and 2, but at the same time the differentiation between hotspot areas and non-hotspot areas was not as pronounced in Region 3.

The patchiness of hotspots within such a small area can be seen as an indication for the complexity of the habitat [81], at scales smaller than those observed in the clusters. As catches in the NDSF contain a mixture of species, often with similar morphology, the use of traditional acoustic classification methods is not possible [16]. Furthermore, the verification of acoustic detections is hindered by the lack of dedicated alternative sampling evidence [8, 31]. Information obtained from traps typically contains a temporal and spatial lag with the collection of acoustic data. In general the traps are left to soak over a number of hours, while the fishing vessel is resting several miles away so not to induce an avoidance behaviour [11, 45]. The present study is among the first to describe the acoustic diversity of fish schools within a tropical environment based purely on commercially collected acoustic data.

The acoustic geometric and energetic metrics used in this study are widely used [16, 22, 23, 87,88,89] to classify acoustic targets through ordination techniques or neural networks. Similar to [16], the energetic descriptors contributed more strongly to the classification than the geometric descriptors. Geometric features had less influence on the categorisation of the fish schools (Table 3). Geometric descriptors largely followed the trends of the energetic features, but differences were less pronounced (Table 5; Fig. 5). The lack of contribution from the shape related descriptors may be linked to high levels of observed variance (Table 5; Fig. 5). This variability it likely caused by external (e.g., prey–predator interaction, vessel avoidance or fishing pressure) or internal factors (e.g., size distribution or life history traits) which influence the schooling behaviour of the fish [90, 91]. Furthermore, due to the relatively narrow beam width (38 kHz \(=\) 9.6\(^{\circ }\); 120 kHz \(=\) 7\(^{\circ }\)) of the echosounders used, on occasion only the edge of a fish school may have been detected [92, 93]. This error is amplified inversely with depth. Given the smaller beam volume at shallower depths, there is an increased risk of missing parts of a school if the school is occurring in shallow parts of the water column.

The stability of the school descriptor clustering results, given by the Clest algorithm, indicates the presence of patterns (mainly driven by energetic descriptors) that clearly distinguish the three clusters. These may reflect biological patterns, such as species composition [16, 59]. Even though significant indicator species groups could be detected for some of the school clusters, the relationships were generally less pronounced than for the habitats. Mainly high A values, indicating a strong association of the given species group with the cluster, could be detected, while B values generally remained low. Low B values are an indication that school compositions of the given clusters are highly variable. For example, in Region 1 for school Cluster 1, if Triggerfish are caught, there is an 86% chance that the surrounding area is dominated by schools of Cluster 1, but if a school which is classified as Cluster 1 is observed, there is only a 10% chance that Triggerfish will be caught. One exception to this pattern is Cod in Region 3. If schools of Cluster 1 are observed in Region 3, there is a 60% chance that Cod will be observed within the catch. Other noteworthy B values were observed for Lutjanids (0.45) and Cluster 1 in Region 2, as well as for Rankin Cod (0.41) and Cluster 2 in the same region. The lack of a clear relationship between the school clusters and species group composition of the catch information, may be an artefact originating from the nature of the data. As previously stated, there is an element of spatial and temporal separation between the detection of fish schools on the echosounder images and the biological sampling process [11]. Despite this, it has been demonstrated that distinctive patterns, mainly explained by energetic descriptors, in fish aggregations could be detected (Table 5; Figs. 345). It is recommended that dedicated simultaneous sampling, such as optical recordings at depth, to complement acoustic recordings should be undertaken [31, 94, 95].

The bottom descriptors divided the area into three clusters, within which Cluster 3 (Figs. 1011) was mainly driven by depth. To relate these clusters to meaningful measures, require the collection of high-resolution physical sampling (e.g., grab samples) [34, 37]. It should be noted that an important component of many bottom classification algorithms is the information contained in the second reflection of the seabed [35, 69, 96]. In the present study, information on the second bottom echo, which is used to describe the hardness of the seabed, was unavailable. In broad terms, Cluster 1 and 2 are most likely to represent sandy bottoms with different degrees of coral or rock cover when compared to maps presented in [97].

The different bottom clusters could be related to indicator species groups. This is a strong indication for habitat selectivity among the different species groups. For Goldband snapper in particular, the high A and B values within Cluster 1 indicate that the location where Goldband can be found, is likely to be classified within Cluster 1. Furthermore, if the location is categorised as Cluster 1, there is a >70% chance of encountering Goldband. In Cluster 2, indicator species groups were not as prominent, but Lutjanids, Lethrinids and Cod remained strong indicators, with indicator values as well as A and B values of around 0.5. Similarly, for locations classified within habitat Cluster 3, the chance of encountering Triggerfish is approximately 50%. Once more detailed habitat information has been collected, indicator species can be relatively easily extracted from commercial catch information. The availability of such metrics could help to quickly identify changes in the distribution pattern which might be related to environmental or management changes [43, 44, 98].

We have shown that acoustic data collected on a small commercial fishing vessel, within a mixed-species environment, can be used to derive meaningful ecological metrics. The methods presented here can be applied to a large variety of environments, but are especially valuable in tropical ecosystems, where complimentary dedicated sampling is often difficult to acquire and the structure of the environment is too complex for currently available acoustic processing techniques. All the metrics used in this study would benefit from long-term data collection programs. If multi-seasonal or multi-annual information from within the same area was available, the metrics would be a useful tool for tracking changes in the ecosystem.