Keywords

1 Introduction

As the human footprint extends deeper into our oceans, information on the seafloor and associated biological communities is required for devising appropriate conservation actions to achieve national and international sustainability goals (e.g., Lundquist and Granek 2005; Davies et al. 2007; Micheli et al. 2013; Zampoukas et al. 2014; Henry and Roberts 2017; Danovaro et al. 2020; Manea et al. 2020). There is growing awareness that the mitigation of anthropic pressure on marine ecosystems (e.g., biodiversity loss, transformed food webs, and marine pollution) relies on a more efficient transfer of scientific knowledge to decision-makers (Cvitanovic et al. 2015).

The rapid development of underwater technologies and the concurrent acceleration in computing permit the gathering and handling of a huge quantity of data. For instance, remotely operated vehicles (ROVs) and autonomous underwater vehicles (AUVs) played a pivotal role for discovery, mapping, and detailed examination of ecosystems at depths that were unimaginable just decades ago (e.g., Cordes et al. 2007; Freiwald et al. 2009; Lundsten et al. 2010; Huvenne et al. 2011; Angeletti et al. 2014; Wynn et al. 2014; Correa et al. 2016; Vanreusel et al. 2016; Danovaro et al. 2017). Habitat mapping techniques are a powerful tool to collect raw information on marine benthic environments that is convertible to quantitative data and to date play a primary role in fulfilling the requirements of national and international directives and marine ecosystem management programs (e.g., Marine Strategy Framework Directive (MSFD) and OSPAR Convention). Typical applications include identifying habitats for priority of conservation (e.g., Fosså et al. 2002; Grasmueck et al. 2006; Bongaerts et al. 2010; Howell et al. 2010; Fabri et al. 2014; Rengstorf et al. 2014; Taviani et al. 2017, 2019; IUCN 2019; Angeletti et al. 2020a; Chaniotis et al. 2020; Prampolini et al. 2020), tracking biological community status providing species abundances and biodiversity indices (e.g., Norcross and Mueter 1999; Buhl-Mortensen et al. 2012; Ayma et al. 2016; Consoli et al. 2016; Trotter et al. 2019; Beccari et al. 2020), monitoring the efficacy of management interventions (fishery restricted areas (FRAs), marine protected areas (MPAs): Huvenne et al. 2016; Rowden et al. 2017; Innangi et al. 2019; Angeletti et al. 2020b among others), and reporting the overall environmental status of benthic ecosystems (e.g., Cánovas-Molina et al. 2016; Enrichetti et al. 2019; Fabri et al. 2019). Visual methods for monitoring benthic marine ecosystems based on ROV (or AUV) video surveys provide a relatively high precision in estimating biodiversity and habitat percentage cover (Savini et al. 2014, 2017; Grinyó et al. 2016; Conti et al. 2019) and represent permanent records allowing the comparison of surveys through time and from different areas (e.g., Lundsten et al. 2010; Langenkämper et al. 2019).

2 Processing Techniques of Benthic Visual Surveys

Most common methods for quantitative benthic cover estimation involve manual point-based approaches (Foster et al. 1991; Meese and Tomich 1992; Leonard and Clark 1993; Carleton and Done 1995) and region-based percentage estimations (Meese and Tomich 1992; Garrabou et al. 2002; Teixidó et al. 2011; Pech et al. 2004; Guinda et al. 2014). Automatic and semi-automatic methods have been tested for faster the analysis of benthic video recordings (Stokes and Deane 2009; Aguzzi et al. 2011), but their application is still labor-intensive or requires ad hoc instrumentation (Foglini et al. 2019; Robert et al. 2020). Some visual method applications need a certain degree of overlap among frames to ensure a complete seafloor representation (e.g., 3D reconstructions, Robert et al. 2020), while others avoid frame overlap to reduce analysis replications (Bo et al. 2014).

In the study of benthic habitats and biological communities, ROV video transects should be carried out along linear paths, navigating at constant speed and altitude from the seafloor (Huvenne et al. 2019). This is particularly important for monitoring purposes (e.g., MSFD program: Zampoukas et al. 2014), in order to guarantee a homogeneous representation of the investigated portion of the seafloor and allow the correct estimation of both habitat extents and community compositions (Eleftheriou and McIntyre 2005). However, ROV transect paths and navigation speeds may be altered by the need for higher detail, by the morphology of the investigated habitat, or by external factors (e.g., weather conditions, technical issues).

3 Frame-Based Video Subsamplings: A Methods Comparison

The plasticity of visual methods to study benthic habitats leaves the doors open to a great variety of analytical techniques. However, the analysis of visual data remains challenging in terms of analytical time, often forcing the analysis to only a limited subset of frames, extracted (often manually) at regular time intervals (e.g., Bo et al. 2014; Fabri et al. 2014; Cau et al. 2015 ).

Some major questions arise: does the video subsampling strategy influence the quality of results? What is an efficient compromise between analytical effort and results quality?

To explore the accuracy of frame-based methods, we compared the substrate cover estimates and the biological community taxonomical compositions obtained by the analyses of a subset of frames with those resulting from the analysis of the entire videos. We performed video subsamplings by extracting photograms at regular time (4, 10, and 30 s) or distance intervals (0.5, 1, and 3 m).

Three ROV dives were selected for this study from the MS16_II, MS17_II, and MS17_I oceanographic cruises carried out on R/V Minerva Uno (Table 1), in the framework of the Italian MSFD monitoring program. The video surveys explored three gentle-slope habitats along the Italian margin (Fig. 1): a coralligenous formation between 65 and 80 m on the Amendolara Seamount in the Ionian Sea (Figs. 1a, b and 2A, B; Angeletti et al. 2017), a mesophotic oyster reef off Santa Maria di Leuca in the Ionian Sea between 95 and 115 m (Figs. 1a, c and 2C, D; Castellan et al. 2019; Angeletti and Taviani 2020), and cold-water coral (CWC) mounds in the Corsica Channel located in the Tyrrhenian Sea at 400–430 m depth (Figs. 1a, d and 2E, F; Angeletti et al. 2020c).

Table 1 ROV dive metadata
Fig. 1
figure 1

(a) Map illustrating the locations of the ROV surveys used in the study; CC Corsica Channel, AS Amendolara Seamount, SML Santa Maria di Leuca. (b, c, and d) Detailed maps showing the ROV tracks and the substrate mapped by analyzing the entire videos. Bold contour lines stand for 5 m depth intervals; thin lines refer to 2.5 m

Fig. 2
figure 2

Examples of the different habitats surveyed. (AB) Coralligenous formation at the Amendolara Seamount showing intense faunal cover dominated by several sponges among which Hexadella detritifera (h) is easily recognizable and scleractinian corals such as Phyllangia americana (p) and Filograna-Salmacina complex (f) are also common findings; bar = 20 cm. Close-up (B) of coralligenous formation dominated by the bryozoans Smittina cervicornis (s) and Hornera frondiculata (h); bar = 5 cm. (CD) Mesophotic reef dominated by Neopycnodonte cochlear at Monopoli. Note the tiny nudibranch Hypselodoris tricolor (c) grazing on Neopycnodonte shells; bar = 3 cm. The large undetermined orange sponge represents the mega-epifauna at this site; bar = 10 cm. (EF) Cold-water coral mound at Corsica Channel site showing the colonial scleractinian Madrepora oculata (m) characterizing this site; bar = 20 cm. (F) The octocoral Swiftia pallida (s) co-occurs at this site, while the echinoid Echinus melo (e) is grazing on M. oculata framework; bar = 20 cm

ROV dives were conducted using a Pollux III (Global Electric Italiana) equipped with a low-resolution CCD video camera for navigation and a high-resolution (2304 × 1296 pixels) video camera. The ROV was equipped with an underwater acoustic tracking system that provided position and depth at 1 s intervals. The ROV velocity along the tracks was calculated as the ratio between the distance of the tracked positions and the relative time gap. Three parallel laser beams (with 20 cm separation) were mounted on the ROV providing a scale on the videos. Dives track-points were smoothed utilizing Adelie Video (© Ifremer) and ArcGIS (© ESRI) software. The Adelie Video tool “points to line” was used to produce a line-format track of ROV dives.

Video recordings were done maintaining ca. 2 m of altitude from the seafloor. In Station MS16_II_83, the mean survey speed was equal to 0.13 m/s, and in Station MS17_II_115, the average speed was 0.22 m/s, while in Station MS17_I_135, the ROV sailed at 0.21 m/s (Table 3).

The full-video analysis (hereafter “reference analysis”) was performed by extracting one frame every second. The substrate cover was obtained by recording the changes in dominant substrate type, i.e., when a component was >50% in the video frame (Fig. 1b–d). The seafloor was classified as “Hard” (geological or biological hard structures), “Mobile” (soft bottoms), or “NA” (bottom not visible). The substrate covering extension was calculated using ArcGIS software.

Macro- and mega-benthic organisms were identified to the lowest possible taxonomic rank, counted and georeferenced by using Adelie Video software. Taxonomic classification followed the World Register of Marine Species database (WoRMS Editorial Board 2020). Finally, taxa unidentifiable at species level were categorized only as morpho-species or morphological categories (e.g., Angeletti et al. 2019; Santín et al. 2019 with references therein).

To test the efficiency of time-based (TB) subsampling methods, a frame every 4, 10, and 30 s was extracted using Adelie Video software. Frames were analyzed for taxonomical composition and substrate type following the methodology described above.

The intervals used for video subsampling the videos with distance-based (DB) methods were selected to obtain a number of extracted frames similar to those based on time intervals, allowing the comparison among tested methods. A point every 0.5 m, 1 m, and 3 m was generated along the plan view of the ROV tracks using the “Generate points along line” tool in ArcGIS software. The generated points were paired with the ROV tracks by means of the “Spatial Join” tool (Match option: Intersect; Search Radius: 0.05 m) in order to obtain the UTM time for each generated point. Frames were then extracted from video recordings matching the UTM times and analyzed for taxonomical composition and substrate coverings following the methods described above.

For each ROV video, the substrate extents and the number of taxa obtained by each methodology were compared to those resulting from the reference analysis. The percentage errors were calculated. The Kruskal-Wallis test and the post hoc Dunn’s test were used to assess the differences in the percentage errors among the sampling intervals (4, 10, and 30 s and 0.5, 1, and 3 m) and subsampling methods (TB and DB). Statistical analyses were performed by using R software (R core team 2013).

With the aim of quantifying the number of overlapped frames, a unique serial ID number was assigned to frames extracted with the same technique showing a new section of seafloor. When adjacent frames duplicated portions of the seafloor (>70% of the frame), the same ID was allotted to those photograms. The ratio between the total number of frames and those presenting a unique ID allowed us to estimate the percentage of overlapping images.

3.1 Method Accuracy

3.1.1 Substrate Cover Extent

The reference analysis performed in Station MS16_II_83 revealed that “Hard” and “Mobile” substrate types almost equally composed the 647.7 m of explored seafloor, covering 44.9% (corresponding to 291.1 m) and 41.4% (286.3 m), respectively. The remaining 13.6% (88.3 m) of the transect was classified as “NA” (Fig. 3a; Table 2).

Fig. 3
figure 3

(ac) Bar plot showing the spatial cover extent of different substrate types calculated with the tested techniques. Dashed lines refer to extents calculated by analyzing the entire video footages and used as reference values. (d) Average percentage error in the estimation of substrate covering for each method. Error bars represent standard errors

Table 2 Comparative ability of time-based and distance-based video frame extraction methods to estimate substrate coverage and detect the taxonomic composition of biological communities in surveyed stations

In Station MS17_II_115, the reference analysis detected “Hard” substrate for 53.7% (481.5 m) and “Mobile” for 30.9% (276.7 m), while 15.4% (138.4 m) was assigned to “NA” (Fig. 3b and Table 2).

The longest ROV survey was Station MS17_I_135, with 1041.5 m of seafloor explored. The reference analysis classified 30% (312.6 m) of the transect as “Hard,” the 52.1% (542.2 m) as “Mobile,” and the 17.9% (186.7 m) as “NA” (Fig. 3c and Table 2).

The estimation of substrate cover performed by using TB methods reported strongly higher average percentage errors when compared to DB techniques. The “Hard” class reported percentage errors up to 1.82% ± 0.81 (SE), and the “Mobile” was incorrectly estimated with a maximum average error of 5.44% ± 3.03, while the “NA” was mainly underestimated with errors reaching 4.58% ± 2.24 with TB methods (Fig. 4d and Table 3).

Fig. 4
figure 4

(a) Bar plot reporting the percentage of taxa identified with the tested techniques in each video recording. (b) Average percentage error in detecting taxa composition of surveyed biological communities. Error bars represent standard errors

Table 3 The average percentage and standard errors in the estimation of substrate coverage and community composition detection for each tested technique

On the contrary, DB methods showed average errors always below the 0.15%. The Kruskal-Wallis test proved the observed differences between TB and DB method accuracy, reporting a p-value <0.01.

3.1.2 Taxonomic Composition

The reference analysis of Station, exploring the coralligenous community of the Amendolara Seamount, led to the identification of n = 50 taxa (Table 4). All TB methods efficiently detected the taxonomical composition at this site, showing a performance decrease with wider subsampling time intervals (Fig. 4a and Table 2). The 4 s interval method extracted 1712 frames for taxonomical analysis (Table 2), which resulted in the identification of 100% of taxa (n = 50), with respect to the reference analysis. The lower number of photograms extracted by using 10 s and 30 s intervals (684 and 228, respectively) slightly reduced the taxa detection accuracy, with 10 s method reporting 96% (n = 48) of total taxa and 86% (n = 43) identified by 30 s interval selection. Although the DB methods selected about the same number of frames (Table 2), the percentages of detected taxa were lower when compared to time interval methods: 92% (n = 46) were identified with 0.5 m intervals and 90% (n = 45) by using 1 m intervals, and 80% (n = 40) were detected with intervals of 3 m (Fig. 4a).

Table 4 List of taxa identified by analyzing the ROV videos

Station MS17_II_115 explored a mesophotic oyster reef habitat hosting highly diverse biological community where reference analysis identified n = 82 taxa. The 0.5 m method showed the highest accuracy, detecting n = 74 taxa (90%). The 10 s and 1 m methods reported similar results, identifying n = 65 (79%) and n = 64 taxa (78%), respectively, while the 3 m interval frame selections showed a higher accuracy (n = 55, 67%) when compared to those based on 30 s extractions (n = 49, 60%) (Fig. 4a).

Reference analysis of Station MS17_I_135 recorded n = 26 taxa surveying the CWC mounds. TB and DB methods showed similar performances (Fig. 4a and Table 2). The 0.5 m method recognized 88% (n = 23) of total taxa, while the 4 s method detected 96% (n = 25). The selection of frames every 1 m or 10 s gave similar results, reporting 22 (85%) and 21 (81%) taxa, and the efficiencies of 30 s and 3 m interval methods were equal (21 taxa each, 81%).

On average, the 4 s interval missed 7.29% ± 4.82 of total taxa, 15.97% ± 5.99 were not detected extracting frames at the 10 s interval, and the 30 s interval showed an error of 25.73% ± 7.98. DB methods reported lower accuracies: the 0.5 m method reported an error of 11.22% ± 1.98, while 17.14% ± 3.79 of total taxa were not identified using 1 m intervals, and the 3 m technique missed 25.31% ± 4.26 of the taxa (Fig. 4b).

Although no significant differences among sampling intervals and between TB and DB methods were detected by the Kruskal-Wallis test, the results showed that small extraction intervals and, thus, a larger amount of frames extracted were more efficient in the detection of taxa composition.

3.1.3 The Influence of Survey Velocity

Maintenance of a regular velocity during visual surveys is among the major factors to guarantee a homogenous recording of the seafloor (Huvenne et al. 2019) and ensure the detection and identification of features of interest by operators. The ROV navigation velocity, however, may largely vary along the tracks in relation to technical issues (i.e., navigation against current) and the need for higher-detailed recordings. When using video subsampling techniques based on time interval, the variation in ROV velocity may influence frames distribution along the transects, over-sampling in correspondence of ROV slowdown, and under-sampling when the vehicle velocity increases (Fig. 5). Frame density extracted with TB methods was different when compared to DB methods (Fig. 6). An irregular survey velocity along the transect could have positive unintended advantages: the higher number of frames displaying portions of seafloor characterized by highly dense communities populating hard bottoms or hosting specimens that are more difficult to detect (such as infauna inhabiting mobile substrates) can allow a more precise description of the community composition. During visual surveys, specimens may be not clearly recorded or visible but not easily identifiable in a few frames. Extracting more frames displaying the same specimens could increase the probability of having clearer images, facilitating the taxonomical identification. The comparison between the accuracy of TB methods in the detection of the taxonomical community composition and the coefficient of variation of velocity (CV, used as a proxy of ROV slowdown in correspondence of features of interest, Fig. 7a) suggests that the effect of speed variation on the taxonomical description may be related to the morphology of the habitat explored (e.g., Robert et al. 2020). The highest errors were registered in survey MS17_II_115, which presented the lowest number of ROV slowdowns along the transect (lowest CV value) and the highest velocity. The accuracy showed by TB methods in Station MS16_I_83, thus, suggests that a lower speed and a higher amount of slowdown along the transects may facilitate the detection of the taxonomical composition of biological communities in situations of patchily distributed habitats such as coralligenous outcrops. On the contrary, a regular velocity along the survey transect may instead be sufficient to correctly identify the community composition when exploring large habitat extensions, as the case of MS17_I_135.

Fig. 5
figure 5

Scatter plot showing the relationship between survey velocity and number of frames extracted with each method

Fig. 6
figure 6

(a, b, c) The figure shows the ROV velocity variation and the spatial distribution of frames extracted with tested techniques along the analyzed ROV transects. Colored bars represent the different substrate types characterizing the survey transect. Hard substrate, dark gray bars; mobile substrate, light gray bars; NA, white bars with red borders. Color distributions refer to frame densities obtained with the TB methods, while dashed lines represent frame distributions from the DB methods

Fig. 7
figure 7

(a) Scatter plot of the mean percentage error in detecting the taxonomical composition of biological communities resulting from the TB methods vs. the coefficient of variation of survey velocity. The latter was used as proxy of the variation of ROV velocity along the transect. (b) Scatter plot showing the significant positive correlation between the percentage of overlapped frames and percentage of taxa detected with each method. (c) Plot displaying the significant negative relationship between survey velocity and percentage of overlapped frames extracted with TB methods

Moreover, survey velocity plays an important role in the taxonomical identification accuracy of specimens by influencing the number of overlapped photograms. A larger amount of the latter was, indeed, documented in the slower surveys (Fig. 7b) that reported the higher community composition detection accuracies (Pearson correlation index: p = 0.86, Fig. 7c). Percentage overlap decreased with wider sampling intervals in both TM and DB method correlating with a decrease also in the accuracy of community composition detection. Although having fixed spatial intervals between frames along the track, DB selections showed similar or even higher degrees of overlap when compared to TB methods (Table 2). In some segments of the survey, the ROV moved for a few meters, turning around features of interest to collect more detailed images. Therefore, even frames extracted with an interval of 3 m displayed the same portion of the seafloor, producing the higher number of overlapped frames observed. This may potentially have concurred to obtain only slightly lower values of accuracies in community composition detection shown by DB methods when compared to TB values.

However, the survey speed and its variation along the transect may not have only positive or neutral consequences. TB methods show low accuracies in the estimation of substrate covering, with respect to DB methods. The coefficient of variation (CV) of speed was, indeed, positively correlated (p = 0.83) with the average percentage error in the substrate cover estimation reported by each tested interval (Fig. 8a). In patchily distributed habitats, performing the survey at high speed (i.e., Station MS17_II_115) or frequently varying the velocity along the transect (i.e., Station MS16_II_83) may influence the correct recording of seafloor sections in correspondence of habitat changes, potentially preventing the accurate mapping of their boundaries. On the contrary, in situations of large habitat extensions (Station MS17_I_135), maintaining a regular velocity along the transect may ensure an accurate estimation of substrate cover, with a corresponding decrease in the accuracy when using wider sampling intervals. In MS16_II_83 and MS17_II_115 stations, however, the error in substrates extension detection shows a counter-intuitive trend, reporting a decrease of error with wider sampling intervals (Table 3). The analysis of the coefficient of variation (CV) of the distances among frames, representing the variability of the distance between adjacent photograms, provides a potential explanation, showing a decrease with higher time intervals (Fig. 8b). In TB methods, the increase of sampling interval reduced the variation in the distance among the extracted frames, leading to a more homogenous distribution of photograms along the transect. The use of wider sampling intervals in stations MS16_II_83 and MS17_II_115 may potentially have concurred in reducing the negative influence of the survey speed on the substrates extents estimation.

Fig. 8
figure 8

(a) Scatter plot displaying the significant positive correlation between the mean percentage error in estimating the substrate covering extent reported in TB methods and the coefficient of variation of survey speed. (b) Bar plot showing the decrease of the variability of distance between adjacent frames with wider sampling TB intervals

3.2 Method Strengths and Weaknesses

The choice of the video frame extraction technique for the study of benthic marine ecosystems plays a pivotal role in governing the required analytic effort and, contemporarily, in ensuring the high quality of results. Nevertheless, the selection of the most appropriate frame extraction technique is strongly linked with data collection modalities. Our results showed that variations in ROV speed during the survey influence subsampling methodologies based on time intervals. Alternation of ROV slowdowns and speedups can potentially influence the precise mapping of the spatial limits of the different categories. The variation of survey velocity was, indeed, positively correlated with the error percentages in the estimations of substrate coverings, leading to an increased uncertainty of TB methods when dealing with habitats’ extent estimates. Maintaining a regular survey speed is of a paramount importance in ensuring a high efficiency in the substrate cover mapping. However, in situations with large survey velocity fluctuations, the use of wider sampling intervals may potentially reduce the negative influence of survey speed variations on the estimation of the habitat’s extents.

On the contrary, DB techniques showed higher accuracy in the estimation of substrate cover extent compared to TB, suggesting that frame extractions based on distance intervals are not affected by ROV navigation speed. The maximum percentage error of 0.3% for DB methods (Table 2) ensures higher confidence in the estimation of substrate cover extents, promoting these techniques as the most appropriate for this purpose.

However, habitat coverage is just one of the applications of visual survey methods. The analysis of community taxonomical compositions is fundamental in the framework of monitoring plans and directives, serving as the foundation for the evaluation of ecosystem status and functioning (e.g., Di Camillo et al. 2013; Grinyó et al. 2016; Chaniotis et al. 2020). TB methods showed higher efficiencies in detecting community’s taxonomical composition when compared to DB techniques extracting a similar number of frames.

An irregular survey speed along the track may lead to both a larger number of photograms and a higher amount of overlapped frames extracted with TB methods in correspondence of areas hosting highly dense communities, increasing the accuracy of these methods in the detection of the taxonomic composition. Consequently, the evidence provided suggest that TB methods represent the best approaches for the description of communities’ taxonomical composition, especially by using 4 s or 10 s intervals, which showed the lower estimation errors.

However, a larger dimension of frames subsets corresponded to higher taxa detection efficacies in both tested methodologies. But how much does it cost in terms of time?

On average, the 10 s and 1 m techniques missed 15.97% ± 5.99 and 17.14% ± 3.79 of total taxa from the analysis of 577 ± 54.45 and 760.33 ± 44.58 frames (Table 3), respectively, and with overlapping degrees close to 40%. Methodologies with the lower extraction intervals, 4 s and 0.5 m, showed higher accuracies in detecting the taxonomical composition of the communities (percentage errors: 7.29% ± 4.82 and 11.22% ± 1.98, respectively, Table 3), with an overlapping degrees of ca. 57%, and 1443.67 ± 136.71 and 1463 ± 54.06 extracted frames. Summing up, doubling of the frame number and, thus, of the analytical effort ensured a taxa identification error decrease of ca. 9% with the 4 s technique and ca. 6% with 0.5 m intervals. These accuracies’ increases are crucial for monitoring and experimental purposes, providing precise information on species abundances and the detection of rare taxa. Therefore, when a complete reporting of community composition is not required, intermediate-width frame extraction intervals (i.e., 10 s and 1 m) strongly reduce the analytical efforts in analyzing video surveys guaranteeing a relatively small error in the taxa detection.

Nevertheless, the technique for the analysis of benthic visual recordings collected with unmanned vehicles is related to the aims and the characteristics of the survey. Distance-based (DB) frame extraction methods provided a much higher efficiency in the estimation of the cover extent of the different substrate types, not being affected by vehicle speed variations during the sampling. On the contrary, the increase of frame density and overlapping degree in correspondence of features of interest partially explains the higher performances in documenting the biological community composition showed by time-based (TB) methods.

The recommendations provided are not meant to be a “one-size-fits-all” solution.

For instance, mesophotic-to-deep habitats may occur in vertical or steeply sloping bottoms where the GPS tracking position may not change substantially along the transects. In these situations, a homogenous representation of the explored seafloor in the final frame subset produced by using DB intervals based on plan view of the ROV track may result challenging. The application of DB methods on habitat of steeply sloping bottoms requires ad hoc techniques, such as the transect visualization and point generation along the track in 3D environments.

The comparable number of frames extracted by both TB and DB low, intermediate, and wide intervals, coupled with the percentage uncertainties in estimating the substrate cover and the taxonomical composition of biological communities provided by the results reported in this chapter, provides the context from which to choose the most efficient techniques for the purposes of analysis (e.g., TB methods for taxonomical composition detection and DB for substrate covering estimation), ensuring the comparison of surveys performed in different areas or time windows.

3.3 Future Directions

The wide range of advantages offered by remotely operated and autonomous vehicles, such as the possibility of high-definition mapping of biological communities and habitats at previously inaccessible depths, together with the rapid technological developments in the field and their increasing availability has enabled an increased use of these methods in the study and monitoring of benthic marine ecosystems. Visual recordings can provide information on substrate types, habitat architecture and biological community composition, allowing also to explore the relationships among organisms (Mueller et al. 2013). Despite the ease of collecting georeferenced image and videos by using underwater visual techniques, the analysis of images still typically requires manual processing by an expert in taxonomic identification. Therefore, new methods to process visual surveys faster are becoming protagonists. In the last decade, the use of automatic and semi-automatic methods to analyze benthic video recordings has become more frequent: machine learning and deep learning techniques for automated feature detection (e.g., Stokes and Deane 2009; Aguzzi et al. 2011; Teixidó et al. 2011), photogrammetric habitat reconstructions for the study of spatial patterns of assemblages on vertical walls (e.g., Robert et al. 2020 among others), and hyperspectral imaging for the taxonomic identification of benthic megafauna (Johnsen et al. 2016; Dumke et al. 2018; Foglini et al. 2019) are just a few of the recently implemented techniques. Thanks to these new intelligent and adaptive methods, it can be expected that the volume of high-resolution seabed mapping data will increase rapidly in the near future, opening exciting opportunities for new insights in mesophotic-to-deep ecology and consolidating the integration between automatic methods and scientific knowledge.