Abstract
This work presents a computational methodology able to automatically classify the echoes of two krill species recorded in the Ross sea employing scientific echo-sounder at three different frequencies (38, 120 and 200 kHz). The goal of classifying the gregarious species represents a time-consuming task and is accomplished by using differences and/or thresholds estimated on the energy features of the insonified targets. Conversely, our methodology takes into account energy, morphological and depth features of echo data, acquired at different frequencies. Internal validation indices of clustering were used to verify the ability of the clustering in recognizing the correct number of species. The proposed approach leads to the characterization of the two krill species (Euphausia superba and Euphausia crystallorophias), providing reliable indications about the species spatial distribution and relative abundance.
L. B. Giosuè and A. Salvatore—Equal contribution.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In the last decades, fishery science widely used acoustic-based technique to obtain information about the spatial distribution and abundance of economically and ecologically important pelagic organisms characterized by aggregative behaviour. Such organisms usually live in groups, often referred to as school or shoals, and thus are easily detected by using acoustic methods. The use of scientific echo-sounder allowed to investigate large sea sectors in a relatively small amount of time, leading to a synoptic and spatially detailed view of the status of aquatic resources. Usually, acoustic data are recorded along specific routes following a parallel-transects survey design. Biological sampling is performed to identify the species inhabiting the water column thus partitioning the recorded echoes among the observed species. Even if the acquisition of acoustic data is a non-invasive procedure, the biological sampling is not, and the sampling effort strongly depends on several factors such as the number of species characterizing the considered ecosystem, the spatial overlap among species, and the possibility to discriminate among different species based on specific acoustic characteristics and/or the shape and structure of observed aggregations. In some complex operative scenarios or particularly vulnerable ecosystems, the possibility to discriminate among species utilizing semi-automatic classification procedures, thus avoiding or reducing the biological sampling effort, represents an important aspect. Recently, a number of scientific papers focused the attention on this topic [1,2,3,4]. Anyway, in mixed-species ecosystems, due to a number factors affecting the characteristics of observed echoes, it is difficult to develop a fully-automatic procedure matching echoes and species [5], and it is necessary to contextualize and validate the procedure according to a deep knowledge of the biology and behaviour of the target species. In this work, we tested the use of an unsupervised clustering algorithm (k-means), to partition the echoes recorded during a multi-purpose survey carried out in the Ross Sea (Southern Ocean) during 2016/2017 austral summer under the umbrella of the Italian National Antarctic Research Program. Acoustic data collected during the survey and relative to the upper water column stratum showed mainly the presence of two krill species, namely Euphausia superba (Dana, 1850) and Euphausia crystallorophias (Holt & Tattersall, 1906). In this context, it was evaluated if the performed classification confirmed some general features (related to the spatial distribution, relative biomass and energetic differences) reported in the literature, providing a way to obtain information about population status even in the case the biological sampling was missing or non-representative.
2 Materials and Methods
2.1 Acoustic Data: Acquisition and Processing
Acoustic data were collected in the period 05/01/2017–11/02/2017 during the XXXII Antarctic expedition on board of the R/V Italica under the Italian National Antarctic Research Program and in the framework of P-ROSE project (Plankton biodiversity and functioning of the Ross Sea ecosystems in a changing southern ocean). In particular, acoustic data were collected through EK60 scientific echo-sounder at three different frequencies (38 kHz, 120 kHz and 200 kHz) and calibrated following standard techniques [6]. Acoustic sampling followed an opportunistic strategy (Fig. 1), recording data among the sampling stations. A total of 2200 nmi were recorded. Acoustic row data were then processed through Echoview\(^{\copyright }\) software [7] to extract all the echoes related to aggregations of pelagic organisms. In the first step, the depth range for the analysis was defined. In particular, the region between 0 and 8.5 m depth was excluded, avoiding artefacts due to beam formation distance and noise due to cavitation and waves. Similarly, the echogram region related to depths higher than 350 m was removed due to the strong attenuation of signals at 120 kHz and 200 kHz. In a second step, background noise was removed by applying the algorithm proposed by De Robertis and Higginbottom [8]; all the echogram regions affected by another noise type (i.e. instrumental, waves, ice etc.) were identified and removed manually. Finally, working on the 120 kHz frequency, all the aggregations (schools) were identified using school detection module in Echoview\(^{\copyright }\). The school detection was applied on the 120 kHz as it was the reference frequency for krill species [9].
Once the school were identified, for each aggregation several parameters related to the energetic, geometric and positioning characteristics were extracted (Table 1).
In addition to the parameters computed by means of Echoview\(^{\copyright }\) software, four more parameters were computed, namely: the frequency response at 120 and 200 kHz (respectively computed as \(FR_{120\_38}=Nasc_{120}/Nasc_{38}\) and \(FR_{200\_38}=Nasc_{200}/Nasc_{38}\) [2]), the difference of MVBS at 120 and 38 kHz (\(\varDelta MVBS=MVBS_{120} - MVBS_{38}\)) [10], and the ratio between the school area and perimeter as an index of shape compactness (\(AP.ratio = Perimeter^2/Area\)). The resulting data matrix was characterized by 4482 rows and 35 columns.
2.2 Exploratory Analysis and Data Preparation
A preliminary data analysis was carried out to evaluate the presence of outliers and multicollinearity, as they can negatively impact on the clustering performance. The presence of outliers could lead to a bad clustering, while strongly correlated variables could over-weight a specific aspect in building the clusters. To reduce the effect of multicollinearity and to highlight the presence of multivariate outlier Principal Component Analysis (PCA) was carried out. In applying PCA it is important to scale the variable if they are expressed in different units, to avoid that a specific set of variables gain much importance only due to a scale problem. Furthermore, the presence of highly skewed distribution and/or non-linear relationships among variables could introduce distortion in the axes rotation thus leading to incorrect ordination. Before applying PCA all the variables characterized by highly skewed probability distribution were natural log-transformed (Table 2, variables marked by using the \(^\#\) symbol). The performance of transformation was checked both using a statistical test (Shapiro-Wilks) and by inspecting the qq-plot. Small deviations from normality were considered not impacting PCA results and were ignored. Subsequently, all the variables were scaled and centred. Based on the PCA results, only the principal components (PC) accounting for more than 80% of the total variance was retained and used for clustering.
2.3 Clustering
Let \(X=\{x_1,..,x_n\}\) a dataset, d a distance measure between element of X, and \(C_1,..C_k\) the k clusters found by a generic clustering algorithm. K-means clustering algorithm is one of the most used among the unsupervised clustering methods. The clustering procedure identify the clusters by minimizing the following function:
where \({c}_{i}\) is the mean (centroid) of elements belonging to cluster \(C_i\).
In this context, standardization is an important preprocessing step to avoid scale problems. Besides, the number of clusters must be defined a priori. In the present study, the correct k value is 2, as only two species were found in the echogram. Anyway, to test if the number of groups was an intrinsic property of the data matrix, or if sub-group could be found in terms of specific acoustic and morphological features, the number of clusters was validated employing validation indices. In particular, due to the lack of the true schools classification, internal indices were used. Internal validation indices are based on the concept of “good” cluster structure [11, 12]. In particular, in the present study, four validation indices were used, to verify if the number of clusters was correctly identified. All of them are based on the compactness and separation measures, i.e. the average distance between elements inside the same clusters and the average distance of elements belonging to different clusters. In the following, the used indices are formally defined.
The Silhouette index [13] combines the compactness and separation according to the following formula:
Assuming \(C_i\) as the cluster of \(n_i\) elements where x belongs, a(x) is the average distance between x and all other elements in \(C_i\) while b(x) is the average distance between x and all the elements belonging to all the clusters \(C_j\) with \(j \ne i\).
The Calinski-Harabasz index [14] evaluate the number of cluster according to the following:
In the case of the Dunn index [15] the clustering is evaluated based on the ratio between the separation and compactness:
Finally, the Hartigan index [16] takes into consideration the ratio between the compactness of two clustering solutions relative to \(k-1\) and k, in the following way:
For all the above-mentioned indices, the optimal number of clusters is the one maximizing the index value.
3 Results
The skewness index computed for each considered variables showed the presence of highly positive skewed variables. In order to reduce the degree of skewness, all the variables characterized by a skewness index higher than 2 were log-transformed. PCA highlighted the presence of strong patterns (Table 2); the first 5 PC’s accounted for about 83% of the total variance and were selected to be subjected to k-means clustering. In particular, the first and second PCs accounted for more than 50% of the total variance and the first PC was strongly related to energetic-related variables (Table 2), while the second one to geometric ones (Perimeter, Area, Length and AP.ratio). The remaining PCs were correlated to a lower number of variables all related to energetic aspects except the \(5^{th}\) PC that was found significantly correlated to the Height_mean only. In terms of outliers, plotting the observation in the PC spaces does not evidence the presence of erratic data points. Internal validation indices (Table 2) were computed on the first 5 PCs (accounting for most of the variance), by testing a vector of cluster numbers from 2 to 10 using the k-means algorithm with d corresponding to Euclidean distance. All considered validation indices highlighted k = 2 as the best solution (Table 3).
K-means partitioning was then applied considering the first 5 PCs and k = 2. Partitioning results identified 2367 observations as belonging to the cluster 1 and 2217 belonging to cluster 2. Plotting observations (categorized by cluster id) in the PCs space, highlighted that the clustering was mainly driven by the \(1^{th}\) PC (Fig. 2).
In particular, looking at variables correlation values of the \(1^{th}\) PC, the first cluster was characterized by lower energetic values and a more homogeneous internal structure than the second one. Finally, to evidence possible differences in the spatial distribution, the two clusters were plotted in the geographical space (Fig. 3).
4 Discussion
Euphausia superba and Euphausia crystallorophias are two key species in the Ross Sea trophic web. Due to their importance, several studies focused on their spatial distribution and abundance through acoustic methods [17,18,19]. Validation clustering indices successfully identified the correct number of clusters, providing the first indication of k-means performance. Also, according to literature, Euphausia superba is most abundant than Euphausia crystallorophias; Azzali et al. (2006) [17] evidenced a value of 8.6 for the ratio between the biomass of the former and one of the latter species. By considering the \(NASC_{120}\) as an abundance index, the above-mentioned ratio according to the obtained classification is 7.9 thus comparable to the one reported in the literature. Looking at spatial distribution, according to literature [18], obtained classification also evidenced the dominance of Euphausia superba in the northern sector of the study area (Fig. 3). Finally, it must be considered that in terms of acoustic properties the two species evidenced a clear separation when comparing, using a scatterplot, the \(Sv_{38}\) vs \(Sv_{120}\) [19]. This separation was also confirmed by the k-means classification results (Fig. 4).
Due to the agreement between the information reported in the literature and the ones obtained from the classification results, the application of the k-means algorithm could be considered as a stable, fast and reliable solution to extract the main features of the two population from acoustic data, allowing the extraction of summary indices about the population status despite the lack of biological sampling.
5 Conclusions
Unsupervised classification of aggregations detected during acoustic surveys represents a useful tool in the post-processing of large acoustic dataset. In the present work, k-means clustering was able to distinguish between the two considered krill species, recognizing the correct number of clusters and providing indications about the species spatial distribution and relative abundance coherent to the ones reported in the literature. We plan to use the same methodology by employing other clustering algorithms, such as the hierarchical ones, and other cluster validation indices, to improve the consensus about the clustering solution.
References
Fernandes, P.G.: Classification trees for species identification of fish-school echotraces. ICES J. Mar. Sci. 66, 1073–1080 (2009)
D’Elia, M., et al.: Analysis of backscatter properties and application of classification procedures for the identification of small pelagic fish species in the central Mediterranean. Fish. Res. 149, 33–42 (2014)
Fallon, N.G., Fielding, S., Fernandes, P.G.: Classification of Southern Ocean krill and icefish echoes using random forests. ICES J. Mar. Sci. 73, 1998–2008 (2016)
Aronica, S., et al.: Identifying small pelagic Mediterranean fish schools from acoustic and environmental data using optimized artificial neural networks. Ecol. Inform. 50, 149–161 (2019)
Campanella, F., Christopher, T.J.: Investigating acoustic diversity of fish aggregations in coral reef ecosystems from multifrequency fishery sonar surveys. Fish. Res. 181, 63–76 (2016)
Foote, K.G., Knudsen, H.P., Vestnes, G., MacLennan, D.N., Simmonds, E.J.: Calibration of acoustic instruments for fish density estimation: a practical guide. ICES Coop. Res. Rep. 144, 69 (1987)
Higginbottom, I., Pauly, T.J., Heatley, D.C.: Virtual echograms for visualization and post-processing of multiple-frequency echosounder data. In: Proceedings of the Fifth European Conference on Underwater Acoustics, pp. 1497–1502. Ecua (2000)
De Robertis, A., Higginbottom, I.: A post-processing technique to estimate the signal-to-noise ratio and remove echosounder background noise. ICES J. Mar. Sci. 64, 1282–1291 (2007)
Leonori, I., et al.: Krill distribution in relation to environmental parameters in mesoscale structures in the Ross Sea. J. Mar. Syst. 166, 159–171 (2017)
Watkins, J.L., Brierley, A.S.: Verification of the acoustic techniques used to identify Antarctic krill. ICES J. Mar. Sci. 59, 1326–1336 (2002)
Hassani, M., Seidl, T.: Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Vietnam J. Comput. Sci. 4(3), 171–183 (2016). https://doi.org/10.1007/s40595-016-0086-9
Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A.: NbClust: an R package for determining the relevant number of clusters in a data set. J. Stat. Softw. 61, 1–36 (2014)
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Comm. Stat. 3, 1–27 (1974)
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975). ISBN 047135645X
Azzali, M., Leonori, I., De Felice, A., Russo, A.: Spatial-temporal relationships between two euphausiid species in the Ross Sea. Chem. Ecol. 22, 219–233 (2006)
Davis, L.B., Hofmann, E.E., Klinck, J.M., Pinones, A., Dinniman, M.S.: Distributions of krill and Antarctic silverfish and correlations with environmental variables in the western Ross Sea, Antarctica. Mar. Ecol. Progress Ser. 584, 45–65 (2017)
La, H.S., et al.: High density of ice krill (Euphausia crystallorophias) in the Amundsen sea coastal polynya, Antarctica. Deep Sea Res. 95, 75–84 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Fontana, I. et al. (2021). Unsupervised Classification of Acoustic Echoes from Two Krill Species in the Southern Ocean (Ross Sea). In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12666. Springer, Cham. https://doi.org/10.1007/978-3-030-68780-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-68780-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68779-3
Online ISBN: 978-3-030-68780-9
eBook Packages: Computer ScienceComputer Science (R0)