Introduction

The scientific exploration of deep-sea environments represents continuously renewing challenges for underwater technology. Investigation and research are related to major societal questions such as biodiversity, global change, living resources, mineral, or fossil reservoirs and in questions related to the impact of human activity on our planet. Relying primarily on the use of remotely operated deep-sea vehicles (ROV), the achievement of underwater research missions depends on the technical capacity to precisely navigate, to provide reliable visual and spatial information, to carry out precise measurements, to collect representative samples of various nature (sediment, mineral, fauna, water, etc.), and to deploy stationary equipment on the seabed. ROVs used by IFREMER for scientific research are shown in Fig. 1.

Fig. 1
figure 1

State-of-the-art scientific vehicle systems, IFREMER’s 6000m rated Victor 6000 (upper) and the fibre-controlled hybrid ROV Ariane [1] (lower)

Highly relevant ecosystems and habitats are generally located in rugged terrain with strong slopes, such as canyons, cliffs, and deposits of mineral resources originated from seismic or volcanic activity, for example, around highly vertical hydrothermal vents. As a result, the scientific investigator and the operator of deep-sea vehicles must obtain a broad and accurate understanding of the local topography and develop situation awareness regarding the local working environment, in order to accomplish efficient intervention tasks. Live video images from several cameras located on the front end of the vehicle represent today the main source of operator feedback. Real-time observation allows the operator and scientific end users alike to interpret the work environment, to identify objects of interest, and to perform tasks and manoeuvres. Direct observation over 2D images represents however a limiting factor to performance and efficiency of the intervention tasks: the human operator must supervise the task execution from several two-dimensional viewpoints in order to deduce a three-dimensional (3D) approach of the scene. This approximated 3D interpretation of the scene is necessary even when performing simple tasks such as placing or handling objects (tools, instruments, samples, etc.) and estimating the position and shape of target samples, which is often made before collection.

The challenge is thus to move towards a concept of augmented perception that implements the idea of transporting humans into deep-waters—this transport will ultimately take place by virtual means. Recent virtual or augmented reality techniques often involve linking known structural models to sensory data. In the present field considering investigation of the natural environment, 3D visual models are in general not available before the dive. In order to make use of emerging techniques related to augmented reality, a model of the environment needs to be reconstructed on-the-fly, and the robotic tasks must be set in perspective with the continuously upgraded model.

This paper is organised as follows: Section 2 presents the recent advances in underwater augmented reality (UWAR), Section 3 develops the concept of virtual transportation into the deep-sea, Section 4 presents the current developments at IFREMER towards intuitive and reliable UWAR applications, and Section 5 draws conclusion and perspectives.

Mixed Reality for Underwater Situational Awareness

Divers, ROV pilots, and oceanographic researchers may benefit from advances in mixed reality (MR) technologies [2] by using it as an enriched 3D perception tool to increase situational awareness. The prime benefit being a better understanding of the terrain topology and features, MR allows to avoid collision with the sea floor, to improve navigation security and intervention efficiency.

Virtual reality (VR) has been used offline to virtually visit underwater sites [3] and to train operators by simulating control interfaces to a simulated environment [4], for near real-time visualisation assistance to track objects in underwater construction and maintenance interventions [5] and for trajectory tracking [6]. Augmented reality [7] was used in the context of underwater cultural heritage [8, 9]. Waterproof tablets have been designed where guidance notes, navigation, and a virtual reconstruction of the archaeological site are displayed [9, 10]. At the current state, applications rely on a prior environment model as is available, e.g. in industrial construction sites. This is a key difference opposing most of such applications to oceanographic exploration.

In order to blend virtual information into the current user view, it is needed to register the virtual world with real world. Thus, the 3D environment must either be previously known or, at least locally, reconstructed. The 3D terrain model is built by processing and integrating information from several sensors, such as monocular [11] and stereoscopic cameras [12••] [9], sonars, structured light devices, laser profilers, and LiDAR systems [13].

If the site can be previously equipped with landmarks, AR markers are widely used as anchors to superimpose 3D virtual objects or information in user view [8, 14, 15]. A combination of visual markers and real-time 3D reconstruction can be used in order to allow onshore operation of underwater robots through haptic and visual interfaces with a virtual world [16].

In the case of exploration of unknown site, 3D sensors such as stereo rig [9] or multibeam echo sounders give local 3D information through point clouds and depth maps. This representation can be used to estimate the pose of a manipulator w.r.t. the environment, to display the relative distance between the end effector and the environment [12••], or to project arm contours in the user video feedback [9].

As soon as several viewpoints are available, structure from motion (SfM) techniques [16, 17••] or simultaneous localisation and mapping (SLAM) [18] have proven their maturity in 3D environment modelling. They can use either monocular [19, 20] or stereo cameras [12••, 21•, 22,23,24,25]. Moreover, stereo VSLAM tends to achieve more accurate results for ego-motion estimation [26]. This will then allow to support the navigation process in cluttered environments and help and improve the consistence of large DTM.

The Perception Leap: Travelling Virtually into the Deep Sea

The ongoing technological development effort in the field of remote perception is specifically targeting to overcome the limitations of exploring the underwater environment through traditional ROV’s cameras. Novel techniques are being introduced based on augmented reality and 3D mapping that will allow the scientific and operational end users to benefit from three-dimensional real-time representation of the underwater environment.

ROV pilots ensure the key interface role between the scientists’ objectives, the intervention tool, and the environment. Following the tasks defined by the scientists, the pilot analyses the feasibility of the requested operations and estimates the best positioning for the ROV, in order to successfully carry out the manipulations. Under many respects, the quality of the scientific end result depends on the successful interaction among these key players. Developing tools to help improve and optimise this interaction is a constantly sought for objective. Consequently, we way witness the progressive evolution of roles where the scientist interacts in a more direct manner with the modelled environment in an augmented reality scenario while the ROV pilot assures a backseat technical supervision of the task. The technical machine will become transparent, giving the scientist a sensory, perceptive, and gestural whole relative to an extended scene. The operator will play a role of supervisor who guides the operation and monitors safety, providing advice on practical and operational aspects.

New remote operations techniques will enable virtual movements, observations, and actions that interface with the marine environment, part of an approach progressively imitating that of humans in terrestrial environments. Changes in piloting modes and the performance of scientific operations will be based on several key functionalities associated with enhanced reality:

  1. a)

    3D visual model of the underwater environment

Depth measurements from acoustic sounders and optical tools (photogrammetry, laser profiling, and LIDAR) will be integrated in a high-resolution 3D model that will integrate environmental data for the entire dive and for a group of dives. The 3D model is fitted with the visual texture provided by photo-video optical imagery. These techniques are now available offline. Real-time, entirely automated availability will be possible in the short to medium term. Viewing these models is possible either on standard or 3D screens or in a virtual reality approach through VR goggles or HoloLens. Given that the 3D model is geo-referenced, dimensional analysis is directly achievable by graphic operations.

  1. b)

    Integration of multimodal data

Digital terrain model (DTM) will be enriched with dedicated layers representing data from specific sensors, either from online or past acquisitions. Currently used in 2D or 3D GIS tools offline, the discipline layers will provide precious assistance for exploring sites. Differential analysis between earlier and new data represents a particularly rich perspective (characterisation of changes in the environment over time).

  1. c)

    On demand virtual viewpoint changing

Scientists will be able to navigate in a virtual environment, without being limited by the actual field of view of the imaging sensor (see Fig. 2). Visualisation of the working spaces that can be reached by equipment (arms, probes, samplers, etc.) in the modelled environment will also make it possible to plan and guide intervention. It also provides the basis for new graphical ergonomics that moves controls into the environment model.

  1. d)

    Towards the performance of entirely automated manipulation tasks

Fig. 2
figure 2

Virtually transport scientists into deep sea: model of hydrothermal vent (left) and exploration with VR headset (right)

Scientists will plan intervention by using virtual tools in the DTM. All operations will be facilitated through fully automated functions such as virtual capture of manipulation targets (corals, rocks, etc.) and automatic management of standard tools (corers, suction devices, probes, etc.).

  1. e)

    Multiple use model and shared exploration

Progressive construction of the model will be adapted to the distribution by satellite link to terrestrial laboratories. Remote presence equipment will rely on immersive techniques in multidimensional models of the environment. Without being present at the place of operation, teams can participate in remote work (complementary skills, etc.) in real or differed time. Annotations and other points of interest will enable organising collective work through a shared interface.

Underwater Augmented Reality Developments at IFREMER

Several key functional aids for the vehicle operator and scientific end user are currently under development at IFREMER. Until now, video feedback from one or several cameras looking to the intervention site is the main source of information the operators use to control the manipulators and the vehicle. Through augmented reality, the operator can use an enriched representation of the environment with data overlay from several sources and processing algorithms without losing focus on the operation. On the top of that, 3D maps can be analysed collaboratively by scientists.

The acquisition and processing of the optical data will feed two distinct uses. The first one is the in-dive real-time 3D processing which provide a better understanding of the environment and assistance tools. The second use case occurs after the dive with data post-processing for increased accuracy on large-scale environment modelling for scientific use.

In-dive processing blends virtual data in the local field of view as the vehicle explores the scene, and it permits to acquire data to shape a wider environment model progressively until the work area has been covered sufficiently. As a major advantage, AR allows to use local 3D data in order to improve interaction between the vehicle manipulators and the scene. Techniques focus on merging video and analytic information and display ergonomics which come in as a key factor for operator acceptance (see Fig. 3).

Fig. 3
figure 3

In-dive augmented perception for operator assistance in direct interaction with the environment

Post-processing uses data sets over a defined time-frame–computation-intensive solutions can be employed in order to cover an extensive area of interest. Resulting 3D models, which have a high resolution and accuracy, will represent a standard for post-expedition scientific investigation and are used in virtual reality concepts. The technical solutions for 3D model construction as well as scientific analysis of said models must handle very large data sets englobing tens of thousands of high-resolution images (see Fig. 2).

The aim of the remote perception and intervention processes will be to merge the two above approaches in a single enhanced reality concept.

Underwater Optical Data Processing

On state-of-the-art scientific ROVs, the use of high-quality video or still images from monocular cameras is widespread, as optical images are an essential source of information for the understanding and analysis of the deep underwater environment.

However, in an underwater context, light propagation properties and disturbances in the sea water such as backscattering, diffusion, and colour and light attenuation degrade the quality of the image formation. Moreover, lighting is provided by artificial sources mounted on the vehicle, and, as a result, shadows are cast on the scene change as the robot moves. Colour attenuation leads features to appear differently when observed at varying distances. To cope with these issues, an image correction procedure is a pre-requisite to any image-based reconstruction. Our computation process includes a pre-processing stage to correct images for nonuniform illumination and colorimetric degradation [27].

Imaging surveys are accomplished to cover a work area with enough overlap in the successive image frames, allowing to reconstruct the scene into a single large and geo-referenced model of the seafloor. This is the purpose of 3D optical mapping.

Gathered and corrected images can be used to solve the structure from motion problem [17••] in order to obtain a sparse 3D reconstruction. It is progressively densified by spreading knowledge from the already triangulated points [28]. Finally, from the 3D point cloud, a mesh is constructed [29], on which texture from the initial image sequence is blended [30] at native resolution. Sensor fusion with the vehicle navigation data ensures geo-referencing the three-dimensional model.

In a complementary approach, the use of stereo sensors allows to immediately obtain three-dimensional information in the camera’s field of view. Stereovision photogrammetry provides a depth field directly associated with the image frame. The use of stereo rigs is hence envisaged in the aim of providing on-the-fly quantitative local information that aids piloting and intervention tasks.

Augmented Reality Tools

Persistent model perception and stereovision field-of-view depth computation are two alternative ways to produce 3D information of the scene. Augmented reality concepts can be applied to persistent digital terrain model as well as on the go 3D reconstruction.

Augmented reality tools can be implemented and deployed immediately in the local field of view (see Fig. 3). Scene maps may be augmented with information and interactive measurement tools that will assist pilots and scientists in the comprehension of the site topography as well as in the execution of tasks. A similar approach can be exploited in post-processing, this time benefitting from a posteriori integration of local point clouds into an extended environment model representation.

  1. a)

    Topological functionalities

Depth maps are processed to compute online topological measurements such as bathymetric level curves, linear or curvilinear distance, areas, and slopes. Three-dimensional distance w.r.t. the environment is displayed in gradient colour, yielding to a primary perception of the scene 3D structure.

Level curves such as isobaths can help to better understand the slope of the terrain and its orientation w.r.t. the robot as well as the size of objects. Level curves perpendicular to the camera optical axis are useful to understand the diameter and the longitudinal profile of confined spaces such as underwater caves.

Slopes can be calculated from the 3D information as the local average inclination of segmented areas of interest in the images. This is helpful during the exploration of very rugged terrains such as hydrothermal vents, canyons, and cliffs.

The computation of areas, which are defined in the image as 2D non-convex polygons, is based on a local meshing of the terrain inside the polygon [31, 32].

  1. b)

    Overlay of scientific data information

Physicochemical measurements collected during the dive using the scientific payload can be geo-referenced and displayed in real time, allowing a deeper understanding of the underlying phenomena that characterise the ecosystem’s dynamics. Finally, annotations, comments, and short analysis made by scientists can be geo-referenced and attached to the terrain model. They can thus be shared with colleagues and be used in further offline studies.

  1. c)

    Assistance for ROV manoeuvre

The processed 3D model of the environment allows computing distances between selected points in the image and specific reference frames of the robot, such as camera frame and robot arm frame. The manipulator arm’s reach on the environment can be calculated from the intersection of the manipulator envelop diagram and the local 3D model of the terrain. The projection of the end effector axis and the modelled environment can also be graphically visualised, to help the operator decide the suitable trajectory to control arm movement (in Cartesian control frame for instance) towards a predefined goal.

Before positioning the vehicle on the seabed, augmented reality display helps the pilots analyse the nature of the terrain. The same tools allow them to visualise the footprint of the vehicle on the seabed and the reachability diagram of the manipulator arm. These graphical representations allow to better identify the suitable landing configuration with regard to the planned intervention task. This in turn minimises the potential impact on the environment (sediment up stirring, contact with structures, etc.) through unnecessary repeated landing retries.

Post-processing Tools

  1. a)

    3D reconstruction for simplified non-expert use: Matisse

The request for 3D reconstruction tools of the underwater seabed by non-expert users such as oceanographers and biologists has led to the design of a fully automated, graphic user-friendly software called Matisse [33]. This software is based on several open source libraries such as open MVG [34] or mvs-texturing [30] that have emerged to solve the large-scale offline 3D reconstruction problem with good accuracy. An example of 3D geo-referenced textured model produced by Matisse is shown in Fig. 4. This kind of model generates a global representation of the site and provides accurate and measurable information of the environment in a way that goes far beyond the perception the user has during the actual dive.

Fig. 4
figure 4

Example of measurement in 3DMetrics on a geologic fault

  1. b)

    Handling of large 3D models for geo-spatial analysis: 3D Metrics

Multiple 3D processing and software visualisation tools (CloudCompare, Meshlab, AutoDesk suite) exist, some of which offering advanced analysis algorithms—yet none of them were fully adapted to scientific geo-referenced data exploitation. In order to handle geo-referenced 3D models, textured meshes, and point clouds, we developed a new software application called 3D Metrics. Simultaneous visualisation of multiple datasets such as large-scale acoustic models and optical mapping is a basic feature. Annotation functions, as well geometric operators such as distances, surface, and slope estimation (see Fig. 4), provide the environment for scientific analysis. Models and measurement layers can also be ortho-projected and exported to be used in classical 2D GIS software.

  1. c)

    Virtual immersion in digital terrain model

3D models can be used by end users to explore the 3D scene with a perception that is not limited by camera field of view or by the range of lighting. Virtual reality tools such as motion-sensing stereo headsets are used for collaborative virtual exploration as well as for demonstrations to the general public. The quantitative exploitation of 3D geo-referenced models through computer applications or VR immersive techniques has proven a strong potential for the remote work with the robot vehicle on the sea floor.

Conclusion and Outlook

This paper provides an oversight on current developments and applications on Underwater Augmented Reality (UWAR) at IFREMER. In recent years, techniques to build textured 3D maps of the seabed from video or photo sequences were successfully implemented. Three-dimensional models are computed by means of structure-from-motion (SfM) techniques from image sequences issued by high-end optical imaging devices. Geo-referencing is obtained by data fusion algorithms using the vehicle navigation data. Resulting high-resolution 3D maps are used by scientists for immersive revisiting of the work sites and for quantitative geometric analysis of the site topography. The offline process is evolving towards an on-dive 3D perception of the environment, thanks to fast 3D point cloud computation with stereo cameras and real-time visual simultaneous location and mapping (visual SLAM) algorithms. Fast, online processing allowed for augmented reality image overlays on operator camera views, greatly enhancing the ability to perform remotely operated tasks on complex environments.

The applications presented in the article include examples of UWAR functions designed to improve intervention task efficiency and reduce potential environment disturbance. Scientists and pilots will be able to discuss and exchange more precisely, since reliable information will be provided in real time aiding the decision-making process.

A more accurate 3D perception of the environment will lead to greater automation of robotic tasks, as the pilot will play a role of supervisor, guiding and monitoring the safety of the operation. Emerging techniques in recent years, such as artificial intelligence (AI), should also be used for shared virtual exploration of the environment model, combined with collaborative functions such as manual or automatic feature tagging for characterisation and classification of seabed sediments and specimen, which allow to further enhance the qualitative perception of the work site [35, 36].

Perspectives for future work concern semi-autonomous manoeuvring for positioning and grasping tasks [37]. The scientific user might then mimic the intervention in the virtual environment, the ROV being able to carry out the task in the real scene.