Introduction

The ability of animals to find their way around the world underpins many fundamental biological processes, such as central place foraging, pollination and parental care. Navigational competence involves knowing routes and places or in the case of path integration the direction and distance of a goal location (Fig. 1a). Vision plays a dominant role at least in local, if not also in global navigation, because the non-uniform light distribution in the celestial hemisphere provides a compass reference and thus cues to heading direction and in the terrestrial hemisphere cues to both heading direction and to location. Celestial compass cues can be used over large distances of travel, because they are at relative infinity, but need to be time-compensated due to the rotation of the earth. On a local scale, infinitely distant celestial cues cannot be used to pinpoint locations. In comparison, the compass cues provided by the terrestrial landmark panorama are geostationary but degrade with distance traveled depending on the distance distribution of visual features in the scene. However, places in the natural world are uniquely defined by the view of the landmark panorama at these locations.

Fig. 1
figure 1

Modified from Zeil 2012

Cues supporting navigation. a Navigation involves traveling along routes and pinpointing places. Cues involved in route guidance and place recognition are given in italics. b The navigational content of panoramic snapshots in the form of global image differences (root mean squared pixel differences, generating a rotational image difference function (rotIDF) in case of image differences due to rotation (left) and a translational image difference function in the case of image differences that are due to translation (right).

Celestial compass cues support path integration and the control of heading direction in migration (Heinze and Reppert 2011) and in straight line navigation (el Jundi et al 2016; Dacke et al. 2019), but in certain situations are supplemented or calibrated by reference to the magnetic field (Fleischmann et al. 2018; Dreyer et al. 2018). Equally, many insects pinpointing goals such as their nests, either through path integration or through visual guidance, are in addition guided by olfaction (Buehlmann et al. 2020) or any other cue that uniquely defines the location of goals (Buehlmann et al. 2012a).

Here I review the navigational guidance provided by terrestrial landmark panoramas, how navigation-relevant visual information is acquired by insects and what constraints there may be on how this information is processed, stored and used by the insect brain. Throughout, I will be discussing global image difference functions as the most assumption-free way of describing and quantifying the navigational information provided by views of natural scenes.

The navigational information content of views

A panoramic snapshot taken at a place in the natural environment offers two pieces of navigation-relevant information: it provides directional or compass information and it provides location information by uniquely identifying the place. The snapshot provides compass information because a misalignment with the facing direction of the animal when the snapshot was memorized leads to an increase in the image difference between the original snapshot and the current view. Most importantly, image differences increase smoothly with the size of the misalignment, so that an agent sensitive to image differences can find the original snapshot orientation by gradient descent in image differences (or alignment matching, Collett et al. 2013a). The function of the image difference between a reference image and rotated views over the angle of rotation is called the rotational image difference function (rotIDF, Fig. 1b, Zeil et al. 2003).

In any given scene, the depth of the rotIDF degrades with time and with distance from the reference location. The temporal degradation at different time frames is due to the movement of clouds, the movement of wind-driven vegetation and the movement of the sun which together lead to changes in illumination and to the movement of shadows. An example of a scene on a windy day is shown in Fig. 2, demonstrating that in the dorsal visual field, 45° above the horizon, the depth of the rotIDF decreases within minutes as clouds move across the sky and subsequently remains flat without a prominent minimum in a consistent compass direction. However, the more terrestrial features contribute to the view, by increasing the vertical visual field from the horizon to the zenith (0°–90°) or from below the horizon to the zenith (− 45° to 90°), the more robust, the rotIDF becomes against temporal degradation (Fig. 2). At this particular location, with dense vegetation and on a windy and cloudy day, the rotIDF between the scene at time 0 and the scene 20 min later shows a clear minimum in the direction in which the reference image was recorded. The movement of vegetation and that of shadows does not appear to contribute much to the development of image differences over time (Fig. 2). The effects of such scene dynamics are further reduced by the way in which photoreceptor signals are being processed by early stages of the visual system such as local contrast normalization, provided, for instance, by lateral inhibition (Stürzl and Zeil 2007).

Fig. 2
figure 2

The temporal degradation of the rotational image difference function (rotIDF). Image on top is the reference image at time = 0. The panels below show the shape of the rotIDF between the reference image and images captured at the same location over time for different elevation segments of the scene as indicated in the panels. The blue line shows the minimum of the rotIDF. Images captured with a RicohTheta S camera at 25 fps, IDFs calculated with Matlab (Mathworks) circshift function

While the dynamics of visual scenes pose potential problems for visual navigation by decreasing the reliability of information, such as heading direction, over time the decrease of the depth of the rotIDF with distance from the reference location (Fig. 1b) actually constitutes a gain in information: the minima of the rotIDFs between the reference image and images taken at different distances from the reference location increase smoothly with distance, reflecting changes in views that are exclusively due to translation (Figs. 1b, 3). The fact that in natural environments, this translational image difference function (transIDF) is smooth has important consequences for visual navigation (Zeil et al. 2003; Philippides et al. 2011). It means that a place in the natural world is uniquely defined by the view experienced and memorized at this place, so that an agent sensitive to image differences can return to that place by gradient descent operating on the transIDF value (Zeil et al. 2003).

Fig. 3
figure 3

How image differences develop with distance from a reference location. Images on top show the reference images at the northern (left) and the southern end (right) of a 60 m transect through an urban open woodland (center). Panels below show the rotIDF (red) and the transIDF (blue) over distance from the reference locations for different elevation slices as indicated in the panels. Center inset shows for the whole image (-45° – 90° elevation) the transIDFs for the northern reference (bottom) and the southern reference (top) and the depth of the rotIDFs at 10 m intervals. Note that the southern image close to a large tree generates a steeper gradient of the IDF, but smaller depths of the rotIDF over distance (smaller catchment area), compared with the northern reference image scene that contains more distant landmarks. Image acquisition and processing as for Fig. 2

Rotational and translational image difference functions thus can be used to travel along routes (Baddeley et al. 2011, 2012) and to pinpoint places (Zeil et al. 2003; Narendra et al. 2013). The move-and-compare or gradient descent process that leads to the recovery of heading direction (rotIDF) and to the return to a goal location (transIDF) is essentially a generalized form of the snapshot matching model put forward by Cartwright and Collett (1983, 1987). The difference being that the original snapshot model required the identification of individual discrete landmark objects in both the memorized and the currently experienced view to minimize mismatch, while there is no need for such image segmentation in complex natural scenes. This is, because the same task can be performed by minimizing global image differences, via ‘alignment matching’ (rotIDF, Zeil et al. 2003; Collett et al. 2013a) and ‘positional image matching’ (transIDF, Zeil et al. 2003; Collett et al. 2013a).

The properties of image difference functions in natural scenes

A number of properties of image difference functions are worth noting: the depth and width of both rotIDFs and transIDFs depend on image contrast (Stürzl and Zeil 2007; Zahedi and Zeil 2018), on the spatial frequency spectrum of scenes (Zeil et al. 2003; Stürzl and Zeil 2007) and on contour orientation (Zahedi and Zeil 2018). As a corollary, image differences as experienced by the brain are also affected by the way in which photoreceptor signals are processed and encoded, although the basic information they provide remains unaffected. rotIDF and transIDF information can not only be recovered from pixel-based image comparisons, but also from Fourier (Stürzl and Mallot 2006) or wavelet-transformed images (Meyer et al. 2020) or by tracking image features (e.g., Fleer and Möller 2017). Image difference functions share many properties with optic flow (Koenderink and van Doorn 1987), with all visual features independent of distance contributing to rotIDFs (rotational optic flow), while transIDFs are dependent on the distance distribution of objects (translational optic flow; Zeil et al. 2003; Stürzl and Zeil 2007). The range over which they offer navigational guidance (their catchments) depends on the distance distribution of visual objects in the environment, on their apparent size and relative contribution to the panoramic scene. For instance, the more distant features contribute to the scene along a route, the larger the distance over which the rotIDFs have a detectable minimum, and therefore the transIDF a detectable gradient (Fig. 3 left). Conversely, if the scene is dominated by a dense clutter of nearby objects, rotIDFs will degrade quickly with distance from the reference location, meaning that the catchment or depth of the transIDF will be small (Fig. 3 right). It would be important to consider these spatial constraints when simulating view-based navigation in synthetic environments (e.g., Baddeley et al. 2012, Ardin et al. 2016, see Wystrach et al. 2016).

The catchments of panoramic images are in fact volumes that define the three-dimensional space in which a gradient descent leads into the reference location (Fig. 4a, b; Zeil et al. 2003; Murray and Zeil 2017). These catchment volumes become larger, the greater the height above ground at which a reference image is taken, mainly due to the increased distance of visual features as height above ground increases (Murray and Zeil 2017). Once heights well above vegetation are reached, these catchment volumes will thus likely become very large (e.g., Gaffin et al. 2015), a property that should receive more attention when considering the navigational information available to insects (and birds) flying over distances of 100 s of meters and beyond.

Fig. 4
figure 4

Using image difference functions to map the navigational information in different habitats. a Panoramic images were rendered in 3D grids in 3D models of a dense woodland scene (left) and an open grassy woodland (right). b The catchment volumes color-coded by height above ground of a panoramic snapshot taken 5 m off ground in the two scenes. Catchment volumes are defined by the volume of space from within which a gradient descent in image differences successfully reaches the reference location. a and b modified after Murray and Zeil 2017. c The horizontal extent of the catchment areas of nest-oriented snapshots (transIDF) close to two Myrmecia croslandi nests in an urban park. Color code units are in sum of squared pixel differences. Red and blue lines show the homing paths of ants released at different compass bearings 10 m away from the nest. Red paths by ants that were caught returning to the nest (zero vector ants), blue paths by ants caught at the base of their foraging trees (marked with yellow stars) which therefore had path integration information, but did not use it (full vector ants). Modified after Stürzl et al. 2015

Global image differences can thus be used to map the visual information that is in principle available in natural navigation environments (Fig. 4c), either by capturing panoramic images directly (Zeil et al. 2003; Narendra et al. 2013; Müller et al. 2018) or by rendering panoramic images in 3D models of natural environments (Stürzl et al. 2015; Murray and Zeil 2017). Drones could be used more systematically in future for mapping navigational information in three dimensions and for reconstructing what navigating insects see (Müller et al. 2018; Polster et al. 2019; Paffhausen et al. 2021), in particular when they are learning the location of their nests (e.g., Stürzl et al. 2015, 2016) and when they explore new environments for the first time (e.g., Degen et al. 2015; Osborne et al. 2013; Woodgate et al. 2016).

Image representation and the information content of views

The properties of image difference functions have a number of interesting consequences for how views may be processed and stored in (insect) brains. First, resolution is not an issue and representing views in low resolution may actually be of advantage, because the width of IDFs becomes larger, meaning the catchment area becomes wider, when scenes are low-pass filtered (Fig. 5a,b; Stürzl et al. 2015; Wystrach et al. 2016). At least the directional information provided by panoramic views (i.e., the rotIDF) can be recovered even from a 1° wide and low-pass filtered strip of the scene (Fig. 5c) acting like a barcode, as long as the full panorama is covered (Wystrach et al. 2016). Because of this, very coarse and sparsely distributed filters, including sparse motion signal distributions (Zanker and Zeil 2005), can be used to store views and subsequently determine whether a currently experienced view is familiar based on the minimum of the rotIDF (Fig. 5d and e, Baddeley et al. 2011, 2012). Such coarse and sparsely distributed filters have been found in the Drosophila central complex (Seelig and Jayaraman 2013) and have been shown to be involved in place learning (Ofstad et al. 2011) and in determining heading direction relative to the landmark panorama (Seelig and Jayaraman 2015) because the activity of these filters does represent both the rotIDF and the transIDF (Fig. 5f, Dewar et al. 2015).

Fig. 5
figure 5

The information provided by panoramic views does not depend on image representation. a A panoramic view seen with different spatial resolution. b The rotational image difference functions (rotIDF) for each of the views shown in a. c The rotIDFs for a 1° wide horizontal strip of a panoramic scene and of its low-pass filtered versions. d The rotIDF of a scene filtered by a bank of coarse Haar-like feature detectors (see Baddeley et al. 2011). Zanker and Zeil, unpublished data. e The rotIDF of a scene represented by the motion signal distribution generated by brief rotation (Zanker and Zeil, unpublished data, see Zanker and Zeil 2005). f A filter bank or coarse filters modeled by Dewar et al. (2015) after ring-neuron receptive fields characterized by Seelig and Jayaraman (2013) in Drosophila

Besides the coarse filters discovered in Drosophila, it is presently not known how views are represented in the brain of insects, in particular of central place foragers, such as ants, bees and wasps. As pointed out earlier, views may be represented in the local or global spatial frequency domain (Stürzl and Mallot 2006; Meyer et al. 2020; Stone et al. 2018; Sun et al. 2020) or by feature extraction (e.g., Fleer and Möller 2017). Activities in any of the diverse filter banks present in the medulla and lobula complex of insects—orientation, wavelet, color or motion filters—can potentially represent the navigational information provided by the visual panorama. Indeed, some evidence demonstrates that ground-nesting wasps (Zeil 1993b), ground-nesting bees (Brünnert et al. 1994), honeybees (Lehrer and Collett 1994) and bumblebees (Dittmar et al. 2010) can make use of motion parallax cues to determine the distance of landmarks close to the nest, suggesting that they generate, memorize and use motion signal distributions (Zeil 1993b; Dittmar 2011). In addition, pre-processing of views is likely to involve spectral or spectral contrast processing, in particular UV-green contrast, which renders views invariant to illumination changes and provides high contrast between the landmark panorama and the sky (Möller 2002; Kollmeier et al. 2007). UV contrast is crucial for ants determining heading direction from the rotIDF (Schultheiss et al. 2016) and makes place recognition in outdoor robotics experiments robust against changes in illumination (Stone et al. 2016).

Acquiring views

At the beginning of their foraging careers, individuals of central place foraging ants, bees and wasps engage in a series of excursions around the nest and in the wider environment that have been called learning flights, learning walks and orientation or exploration flights (reviewed in Zeil et al. 1996; Collett and Zeil 2018; Zeil and Fleischmann 2019). These learning routines are also performed by experienced foragers after the visual appearance of the nest environment has been modified or after a returning insect has been forced to search for the nest by a local disturbance, such as small objects displaced by wind or covers placed over the nest entrance by researchers (e.g., Zeil 1993a).

In the vicinity of the nest, the insects pivot around the nest entrance systematically experiencing views across the nest location from different compass directions. The goal anchor of these pivoting movements is provided by path integration in the case of ants (Müller and Wehner 2010) and by visual tracking and potentially also path integration in bees and wasps (Zeil 1993b; Samet et al. 2014; Schulte et al. 2019). While bees and wasps keep the nest entrance in the left or the right visual field on alternate loops in opposite directions around the nest, ants walk in one direction at a time and alternate between looking in the direction of the nest and looking in the opposite direction (Fig. 6a). Both learning walks and learning flights have a distinct and very regular spatio-temporal organization with segments of pure translation approximately perpendicular to the home vector direction alternating with saccadic gaze changes, which in the case of the learning flights in ground-nesting wasps keep the retinal position of the goal at about 45° in the lateral visual field (Zeil 1993a; Zeil et al. 1996; Stürzl et al. 2016), or in the frontal visual field in the case of social wasps and bumblebees (Fig. 6B; Collett 1995; Collett et al. 2013b). In the case of learning walks, segments of translation alternate with head and body rotations first toward the nest followed by a 180° rotation in the opposite direction (Fig. 6a; Jayatilaka et al. 2018; Zeil and Fleischmann 2019). It is interesting to note that learning flights are 5D events, taking place in x,y,z,t and include gaze direction, while learning walks are 4D events, happening in x,y,t and gaze direction. Considering that ants operating in 4D have evolved from wasps operating in 5D, it is a challenge to understand how they coped with the transformation from 5D to 4D.

Fig. 6
figure 6

The acquisition of views. a A learning walk segment of a Myrmecia croslandi ant returning to the nest. Arrows indicate gaze directions of the ant every 40 ms as determined by recording head orientation. Elongated colored arrows show instances where the ant looks into the nest direction (red) and into the opposite direction (blue). Modified after Jayatilaka et al. 2018. b The learning flight of a social wasp at a feeder. Open circles mark the position of the head every 40 ms and lines attached to them the orientation of the wasp’s longitudinal body axis. Elongated red arrows mark instances where the wasp looks at the feeder. Modified after Collett and Lehrer 1993. c View of a 3D model of the nest environment of a Cerceris wasp that was generated from a series of reconstructed learning flight views (see Stürzl et al. 2015) processed by pix4Dmapper (pix4D, Lausanne, Switzerland)

The behaviors of insects performing learning walks and learning flights give the distinct impression of a systematic sampling of the visual scene around the nest, so that early naturalists described them as ‘locality studies’ (Peckham and Peckham 1905) that allow insects to gather information about the nest location relative to the visual scene. Many observations and experiments have shown that these learning routines are about place information: for instance, learning flights and walks are also made after foragers discovered a new food location (e.g., ants: Nicholson et al. 1999, Müller and Wehner 2010; social wasps: Collett and Lehrer 1993, Collett 1995; honeybees: Lehrer and Collett 1994; bumblebees: Robert et al. 2017, 2018); when honeybee hives are transported to new locations, apiarists have to make sure that foragers perform learning flights, because otherwise the bees would end up in the old location (Wolf 1926); the search for the nest of homing ground-nesting wasps can be shifted by shifting the pivoting point of their preceding learning flights with a moveable and visually high-contrast collar around the nest entrance (Zeil 1993a); and homing with the aid of visual landmarks gradually improves with the number and range covered by successive learning walks (Fleischmann et al. 2016; Deeti and Cheng 2021).

What do insects learn during learning flights and learning walks? For naïve foragers, which for the first time are exposed to the environmental light field, learning walks and learning flights serve to calibrate their celestial compass systems (Grob et al. 2019). But in addition, there are clear opportunities to associate the visual appearance of the nest (in the case of flying insects) and the current state of the home vector as computed by the path integration system with the panoramic scene from different compass directions (Müller and Wehner 2010; Graham et al. 2010). During learning walks, ants repeatedly turn toward and away from the nest direction and thus would be able to associate the landmark panorama with the current length and direction of the home vector both when gaze direction and home vector are aligned and when they are not aligned. The choreography of these learning routines generates additional information: the linear motion parallax generated by the translational movements of the insects causes close objects to stand out against a stationary distant background (Boeddeker et al. 2010, 2015; Braun et al. 2012; Lobecke et al. 2018), while the pivoting parallax created either by smooth counterturning or by integrating view changes across path segments, will emphasize objects close to the nest at the pivoting center against background features that move at the pivoting speed across the visual field (Zeil 1993a,b; Zeil et al. 1996; Voss and Zeil 1998; Riabinina et al. 2014; Doussot et al. 2021). If views and view changes are combined, learning flights and probably also learning walks, would potentially allow the insects to build 3D models of their nest environment. This can be shown by feeding reconstructed learning flight views of ground-nesting wasps into a camera-based modeling software (Pix4DMapper by Pix4D, Lausanne, Switzerland, Stürzl et al. 2015), which results in a local 3D model of the immediate nest environment (Fig. 6c, see also Baddeley et al. 2009). The distinct vertical height oscillations during learning flights (Zeil 1993a; Lobecke et al. 2018) may play a role in distance scaling not only the odometer of the insects, but also such allocentric representations (Bergantin et al. 2021; Doussot et al. 2021).

It is worth noting that the action-dependent sensory activity patterns generated during learning flights and learning walks automatically select the dominant navigational cues and the mechanism that can be subsequently used for homing: In a featureless environment, such as the salt-flats inhabited by some Cataglyphis ants, any rotation of the insect will lead to changes in the activity patterns of receptors sensitive to the magnetic field and of photoreceptors in the dorsal visual field (including the ocelli), in particular those sensitive to the plane of polarization of light. The pattern of celestial light does not change when the insect translates, so that the rotIDFs of particular views do not deteriorate with distance from their reference locations. As an ant moves further and further away from the nest, looking back in the nest direction from different distances but with similar bearings, she will experience the same depth of the rotIDF as she turns, independent of distance. In such a landscape, there is no transIDF gradient, because the scene does not change with translation. If, however, there is visual structure in the landscape with dense visual features provided by nearby objects, then views will change as the insect translates while moving further and further away from the nest. This goes some way to explain why ants inhabiting landmark-rich environments rely less and less on path integration, depending on the navigational information provided by their habitat (Narendra 2007; Buehlmann et al. 2011; Cheng et al. 2012; Cheung et al. 2012; Narendra et al. 2013).

In landmark-rich habitats, ants travel and memorize idiosyncratic routes and can recapitulate them with stunning accuracy (Wehner et al. 1996; Kohler and Wehner 2005; Mangan and Webb 2012). This ability can be modeled by assuming that the insects learn views along the route and can retrace their steps by monitoring the familiarity (based on the rotIDF) of currently experienced views (e.g., Baddeley et al. 2011, 2012; Ardin et al. 2016). However, the rules governing the acquisition of route views are not known. It would make sense if acquisition would be driven by view changes rather than by temporal or spatial sampling as has been assumed in some simulation studies (e.g.,Baddeley et al. 2012; Differt and Stürzl 2021). There is no point in learning, if the scene does not change. Modeling indeed suggests that this may be the automatic result of feeding views through an associative network where only new features lead to changes in the network (Antoine Wystrach, Personal communication).

Using views for navigation

If insects memorize views during learning flights, learning walks, exploration flights or when traveling along new routes, they would be expected to use them on their return along routes or when pinpointing goals by some kind of move-and-compare, or gradient descent strategy, as was originally suggested by Cartwright and Collett (1983, 1987) and shown to work in principle with panoramic images in complex natural scenes by Zeil et al. (2003). These alignment and position matching strategies (Collett et al. 2013a) will differ depending on the degrees of freedom of motility available to animals (Dale and Collett 2001): pedestrian ants, for instance, move more or less in a 2D world, they can rotate but with a constant orientation can only move forward. Flying insects, in contrast, have more translational degrees of freedom, by being able to move sideways and up and down without having to change their orientation. Matching strategies will also depend in what format insects store images: Some current models of route following in ants do implement alignment matching through rotational scanning (to detect the rotIDF minima, Baddeley et al. 2011, 2012), while others assume image representations that are rotation invariant which do not require physical rotation for familiarity detection (Kodzhabashev and Mangan 2015; Stone et al. 2018; Differt and Stürzl 2021; Sun et al. 2020). Whether or not insects are to some degree immune against rotational misalignments between current and remembered views will also determine whether there is a need for tight control of head roll and pitch during visual navigation (Boeddeker and Hemmi 2010; Ardin et al. 2015; Raderschall et al. 2016; Doussot et al. 2021).

So what do we know about how insects behave when traveling along routes and when pinpointing places? Students of ant navigation have repeatedly noted the scanning movements the insects perform (e.g., Wystrach et al. 2014, 2019, 2020), but their dynamics and scene dependence have rarely been analyzed closely. Wood ants, Formica rufa L., perform large scanning movements when approaching a feeder in an experimental arena with high-contrast patterns associated with the feeder and correct the mismatch between memorized views and the current scene with large saccades (Lent et al. 2010, 2013). Both M. croslandi (Murray et al. 2020; Clement et al. 2022) and the meat ant Iridomyrmex purpureus (Clement et al. 2022) show regular gaze oscillations when running on a trackball outdoors. The amplitudes of their scanning movements change depending on whether they see a familiar or unfamiliar scene, or whether they are placed over the nest (Murray et al. 2020; Clement et al. 2022). It has been suggested that this modulation reflects the interaction between attractive nest-directed view memories and repellent memories of views directed away from the nest determining a ‘directional drive’ (Murray et al. 2020; Le Moël and Wystrach 2020). The scanning behavior of ants during their learning walks implicates that ants may learn such attractive and repellent views (Jayatilaka et al. 2018; Zeil and Fleischmann 2019).

The scanning behaviors are particularly noticeable in the first foraging trips of naïve ants and when a familiar panorama or route is altered (Wystrach et al. 2014; Islam et al. 2021). Scanning is also observed when experienced foragers are released at a site they have never visited, but which is within the catchment areas of views they know: they look around briefly and then in most cases head off into the direction of their nest (Narendra et al. 2013; Zeil et al. 2014). The ability of both Jack jumper ants (M. croslandi) and desert ants (Cataglyphis velox) to home backwards when loaded with heavy prey items critically depends on their frequent turning round to face in the home direction (Schwarz et al. 2017, 2020). Lastly, and interestingly, night active bull ants descending from trees engage in yaw, roll and pitch scanning when getting their nest bearing from the local panorama (Freas et al. 2018).

All these observations indicate that ants need to scan the scene to obtain navigational guidance from a comparison between memorized views and what they currently see. What happens during scanning does not seem to be straight forward ‘alignment matching’ (Collett et al. 2013a) as has been observed in experimental arenas where ants correct their heading direction when feeder associated high-contrast patterns are moved (Lent et al. 2010) or where ants on track balls realign themselves quickly after large instantaneous rotations of a natural panorama (Fig. 7a, Kócsi et al. 2020). Under natural conditions, the scanning directions in both M. croslandi and M. bagoti are not clearly related to the rotIDF as a measure of familiarity (Zeil et al. 2014; Wystrach et al. 2014). However, scanning amplitudes are modulated by uncertainty: when ants are confronted with an unfamiliar scene, their scanning amplitudes increase presumably because the scene does not match with any memorized view (Murray et al. 2020; Clement et al. 2022). But scanning amplitudes also increase as the ants come close to the nest (Fig. 7b, c) or are tethered on a trackball above the nest location (Murray et al. 2020), presumably because ants close to the nest experience good matches in all heading directions.

Fig. 7
figure 7

Modified from Kócsi et al. 2020. b The path (top) and gaze directions (bottom) of a Myrmecia croslandi forager returning to the nest shown when she is ca 2.9 m (left) and ca 0.5 m away from the nest (right). c Time series and probability densities of gaze and body axis directions and angular velocities for the same two path segments. Ants were filmed at 25 fps with a Sony FDR-AX100E Camcorder. d During their learning flights, Cerceris wasps view the nest in the lateral visual field to the left or right while pivoting along alternating clock-wise and anti-clockwise arcs around the nest (left). Upon returning, they change flight and gaze direction to the left or to the right, when encountering nest-right or nest-left views, respectively. Modified from Stürzl et al. 2016

Using views. a Left: The path of a Myrmecia midas ant walking on a trackball inside a panoramic LED arena displaying a familiar view along her foraging corridor. The ant changes heading and gaze direction in response to instantaneous 90-degree rotations of the panorama (marked by blue dots), as can also be seen in the time course of gaze direction on the right. 15 s before and after the rotation are marked red and blue, respectively in both path and gaze direction plots.

The degree to which vision also guides the final approach to the nest in ants remains unclear. Simulations using natural scenes and neurally inspired acquisition, storage and recall models (e.g., Le Moël and Wystrach 2020) get agents to within 1 m of a goal location, but pinpointing it requires centimeter accuracy. To solve this task, at least some species of ants are guided by tactile and olfactory cues (Seidl and Wehner 2006; Buehlmann et al. 2012b, 2020).

As far as flying insects are concerned much work has gone into understanding the relationship between learning flights and return flights at nests and at feeding sites of bees (Hempel de Ibarra et al. 2009; Dittmar et al. 2010; Dittmar 2011; Braun et al. 2012; Philippides et al. 2013; Robert et al. 2017, 2018) and wasps (Zeil 1993a, b; Collett 1995; Stürzl et al. 2016). During their return to a goal, honeybees (Boeddeker et al. 2010; Dittmar et al. 2010), bumblebees (Collett et al. 2013a, b; Hempel de Ibarra et al. 2009; Robert et al. 2018) and wasps (Zeil 1993b; Collett 1995; Stürzl et al. 2016) tend to face into similar directions as during their learning flights. Movement patterns are similar between learning and return flights (Zeil 1993b; Philippides et al. 2013) and when looked at in detail, return flights have a saccadic structure: rapid gaze changes, which may indicate alignment matching, alternate with sideways movements (Zeil 1993b; Boeddeker et al. 2010; Braun et al. 2012). Such sideways movements generate motion parallax information (Mertes et al. 2014) and possibly help to match image motion patterns experienced during learning flights (Zeil 1993b; Dittmar et al. 2010; Dittmar 2011), but can also be seen as a way to probe translational image difference functions (Zeil et al. 2003; Doussot et al. 2020). Landmarks close to the nest are kept in retinal positions similar to those experienced during learning flights (Zeil 1993b; Collett et al. 2013b) and in some cases appear to serve as beacons: social wasps, for instance, head toward a feeder-defining landmark and subsequently turn so that the retinal position of the landmark is the same as that experienced during their learning flights (Collett 1995). Ground-nesting wasps, as they pivot around the nest entrance during learning flights keep the nest entrance in the left or right visual field, depending on pivoting direction. Upon returning to the nest, they move left or right when they encounter learning flight views associated with the nest-right or nest-left condition (Fig. 7d, Stürzl et al. 2016).

The homing task differs in pedestrian ants and flying insects, not only due to different movement constraints, notably in the ability of flying insects to translate in directions other than the gaze direction (Dale and Collett 2001), but also in the amount of visual clutter. Ants with their visual systems close to the ground have to deal with a lot of visual clutter from objects and vegetation that are easily shifted by wind, rain and big-hoofed animals, which obscure the more distant panorama and thus offer unreliable features for visual navigation, while flying insects, such as ground-nesting wasps and bees, may be able to better deal with this ground-plane clutter because they have a ‘birds-eye’ view of the scene around their nests and a less obstructed view of the wider landmark panorama. These differences would need to be considered when trying to understand the functional significance of the regular and conspicuous rotational scanning movements made by navigating ants along routes and when pinpointing the nest and the rotational and translational scanning observed in flying insects approaching a goal (e.g., Zeil 1993b; Boeddeker et al. 2010; Boeddeker and Hemmi 2010; Collett et al. 2013b).

Outlook

We have grown used to considering panoramic images and the information they provide by studying them in un-warped, rectangular forms, because they are computationally convenient. What we actually would need to be doing to get an impression what insect memory centers are confronted with, is to consider scenes projected on the unit sphere, at the sampling array of the animals we are concerned with, filtered through their spectral and polarization sensitivities and through their orientation and motion filters. While we now have new ways of mapping the sampling array of compound eyes (Rigosi et al. 2021; Bagheri et al. 2020) and methods that are able to render scenes in the way they would be represented by a compound eye (Stürzl et al. 2010, 2015; Millward et al. 2022), the problem remains that we do not know crucial compound eye parameters for most of the animals we study. Further, we continue to be ignorant about many aspects of insect navigation behavior, in particular beyond the range of our cameras. It would be important to know, for instance, how learning flights turn into exploration flights as well as being able to record the flight height, the fine-grained flight behavior and the gaze directions on these flights. Navigation is an experience-dependent process and it would, therefore, seem to be crucial to monitor in detail the foraging careers of individual insects, with a view to identifying the opportunities insects have to gather navigation-relevant information and how this shapes the way in which they subsequently use that information. Lastly, we continue to have little insight into the neural dynamics of freely behaving insects under the complex natural conditions in which they operate. In short, while the recent progress in understanding the behavioral, computational and neural basis of insect navigation is exhilarating, we should not underestimate—but also cherish the opportunities offered by—the extent of our ignorance.