Introduction

The underwater acoustic environment is a noisy, complex world, and the seas and oceans around Australia are no exception. From the early publications of Moulton [1] and Cato [2], the reports of different sounds from waters around Australia have grown, whether of biological (mammals [3, 4], fish [5, 6], invertebrates [7,8,9]), geophysical [10] or anthropogenic origin [11,12,13,14]. The marine soundscape is not static, but fluctuates on various overlapping time scales. Antarctic ice calving events, for example, are typically seasonal, as are broad patterns in wind and rain [15,16,17]. Anthropogenic activity, in particular near shore, is often more prevalent in calmer weather. For example, recreational boating increases in the summer, while tropical offshore operations are reduced in the monsoon season.

Fig. 1
figure 1

Map of CMST recordings taken around Australia, including data sets (>1 month, some totalling decadal periods, red dots), and the locations of the Passive Acoustic Observatories of the Australian Integrated Marine Observing System (IMOS; http://imos.org.au/acousticobservatories.html) shown by black dots. Note some locations include arrays of multiple loggers, such as the IMOS arrays. Deployments have also been conducted in waters near Antarctica and elsewhere around the world

It is not only human activities that are associated with environmental conditions. Faunal distribution, behaviour and therefore their sound production can be driven by variables, such as salinity, temperature, dissolved oxygen or light, and vary on a range of scales, such as solar, lunar or seasonal cycles [18,19,20]. Migrating populations of marine fauna can take months to pass a certain location, calling or singing as they go [22]. In addition to natural environmental drivers, anthropogenic activities might affect vocal behaviour [23]. Thus, important acoustic events may be missed when measured using short-term recordings, and patterns in biological sound misinterpreted, or perhaps only the edge of a long-lasting acoustic episode may be captured. Underwater acoustic measurements of the marine environment over periods of months to years require pre-programmed, autonomous loggers, and a number of research and commercial groups have responded to this need by developing systems for this purpose.

CMST has studied underwater sound since the mid-1970s, when John Penrose was investigating sounds of western rock lobster (Panulirus longipes) [7]. This work increased in the 1980s, a time when the most common method of obtaining recordings was to hang a hydrophone over the side of a vessel, attach it to a pre-amplifier and use a then “top-of-the-range” recorder. An alarm clock to wake the scientist acted as the timer. Reel-to-reel tape decks were replaced by cassette tapes with 90-min recording times, before digital tapes captured data for up to 4 h. Initially tape decks were attached to timers to lengthen hours into days using devious contraptions, before hardware developed rapidly in the late 1990s. At that time, with seed support from the Defence Science and Technology Organisation (DSTO, now DST Group), the Centre for Marine Science and Technology (CMST) developed a long-term recorder with the digital technology available over 1999–2001. The loggers have now been deployed over 600 times, in environments ranging from shallow rivers, estuaries and coral reefs, to the deep ocean off Antarctica (Fig. 1). This article provides a brief description of the current iteration of the CMST-DSTO underwater sound recorders (USRs) and presents some case studies of their use and the information they have provided.

Fig. 2
figure 2

Photograph of CMST-DSTO underwater sound recorder, dismantled showing battery pack, processing unit and housing (left) and assembled, together with stabilising T-bar (right)

System Description

Detection of acoustic pressure variation begins with an external hydrophone, in our case a High Tech Inc. HTI U90, Massa TR1025C or on occasions a General Instruments C32, which are factory calibrated from 2 Hz to 20 kHz and with typical sensitivity (without pre-amplifiers) of around −196 dB re \(1\,\hbox {V}/\upmu \hbox {Pa}\) (−200 dB re \(1\,\hbox {V}/\upmu \hbox {Pa}\) for the GI C32). These are comparatively high-capacitance hydrophones (\(\approx \)13, 40 and 450 nF, respectively) which is important in setting the roll-off due to the hydrophone-to-pre-amplifier impedance difference; in general, the larger the hydrophone capacitance, the lower the frequency below which the system gain will roll off due to the impedance mismatch. A high hydrophone capacitance is also beneficial for reducing low-frequency noise. The hydrophone cable enters a stainless steel housing end cap via a hard-wired bulkhead connector. Several underwater connectors were trialled, but failed insidiously over time, as a result of low-level corrosion in the wires that changed the hydrophone impedance “seen” by the pre-amplifier and hence the system’s frequency response. This was solved by replacing the underwater connectors with hard-wired penetrators. The logger housing comprises a 1-m-long, 100-mm-inner-diameter stainless steel tube of 10-mm wall thickness that holds the logger and battery pack, with one sacrificial anode located on the bottom of the housing, one in the middle and a third on the end cap with the hydrophone. The standard housing is rated to 800 m depth, and other housings of deeper rating have been developed and tested. For deployments on the seafloor, one or more stabilising bars (1-m-long, 15-cm-wide acrylic bars with 2 kg weights at each end) are fitted to the housing (Fig. 2). These bars aid in logger positioning on the bottom and minimise housing movement due to current or surge during recording. The hydrophone cables are weighted with lead sinkers to prevent these moving in currents.

A pre-amplifier is fitted into the end cap such that it is electrically shielded, with electronics attached to the end cap and above the battery packs (Fig. 2). The recorder requires a main 9 V battery for electronics and hard disk drive and a separate 9 V battery for the pre-amplifier. Batteries consist of 14 sets of six alkaline D-cells in series (a pack), with multiple series connected in parallel through diodes. A single, isolated battery pack powers the hydrophone pre-amplifier, while 13 packs power the system electronics. In addition, a standard CR2032 3.3 V lithium battery provides backup power to the RAM memory and a real-time clock. A standard \(180\,\hbox {mA}\,\hbox {h}\) CR2032 battery can supply power for approximately 2 years of suspended power requirements; however, as with the main battery packs, these batteries are replaced after every deployment of significant duration.

The pre-amplifier is diode protected, with an input voltage range of \({\pm }\)2.5 V. This is because a disconnected, high-capacitance hydrophone left in the sun can build up a static voltage and can damage the pre-amplifier upon reconnection. The input signal is amplified using the impedance-matching pre-amplifier (0–20 dB gain), and then a high-pass filter with a low-frequency roll-off starting at 8 Hz (loss increasing with decreasing frequency) that flattens the naturally high levels of low-frequency ocean noise and increases the system’s dynamic range. Amplitude resolution is 16 bits, yielding a nominal 90 dB dynamic range. A variable bandwidth filter, designed to provide anti-aliasing for the analogue-to-digital converter, is applied to the amplified signal. Each input channel is passed through a user programmable gain stage with a gain range of 0–20 dB. The signal is then fed to a sixth-order Butterworth active filter which has a programmable upper frequency limit. The signal is sampled according to a pre-programmed sampling schedule, writing samples to a small flash card. When the small flash card is nearly full, files are transferred to an IDE formatted hard disk or flash card of 123 GB capacity (set by the system’s 32-bit operating system). By writing to a small-capacity flash card first, the system avoids electronic noise artefacts from the operating hard disk, increases battery life (alkaline batteries last significantly longer when cycled as opposed to run continuously) and reduces the time to read the small flash card FAT table. (For large flash cards which are nearly full, there are significant time costs associated with reading FAT tables.)

Fig. 3
figure 3

Example frequency response curves for the performance of the USRs, showing four receivers, two with Massa hydrophones (\(\approx \)40 nF capacitance, shown by curves which roll off at 20 Hz) and two with High Tech U90 hydrophones (12–13 nF capacitance, curves which roll off at 100 Hz). The white noise fell below the system noise at frequencies below \(\approx \)1.5 Hz due to the applied low-frequency roll-off

Files are written with text header and footer information and unsigned 16-bit binary data (the system uses a 0–5 V rail). The file name of each sample is the file opening time coded in hexadecimal seconds from 01 January 1970. The file header and footer information carries the sampling setup, the date–time sampling started and the finish date–time of that sample. The file opening time is not the same as the file start time; the file start time is the time, according to the system clock, at which sea noise sampling starts. Saved files can be read directly using a PC and are analysed in purpose-written Matlab codes.

System clocks (quartz crystal) are temperature dependent, and the USRs are deployed into conditions that can vary significantly between Antarctic temperatures of a few degrees Celsius to tropical waters of \(>30^{\circ }\hbox {C}\). System clocks are set to GPS transmitted UTC time before deployment, and the clock drift is read after retrieval using internal hardware, a GPS Genius (Rojone Pty. Ltd., G1 series) and software. Clock absolute accuracy, when corrected for drift, is \({\pm }{\approx }0.25\,\hbox {s}\), with the error due to clocks jumping because of the temperature differences at deployment and recovery. Systems are calibrated for their frequency-gain response pre- and post-deployment with broadband white noise (typically either −90 or −110 dB re 1 \(\upmu \hbox {Pa}^{2}/\hbox {Hz}\)) inserted in series with the hydrophone. This form of calibration is also called “insert voltage calibration” [24]. Calibration recordings are typically conducted over a 15-min period while the system is electrically and acoustically isolated. By incorporating white noise into the recording system in series with the hydrophone, all of the system impedance matches are correctly dealt with, i.e. hydrophone to pre-amplifier and pre-amplifier to digitising electronics. This feature of the CMST-DSTO USRs avoids the common, and often incorrect, assumption of a flat system frequency response, which is unlikely to be the case below 100–200 Hz (and sometimes up to 1 kHz for systems with poor impedance matches in hydrophone to pre-amplifier and pre-amplifier to digitising electronics). An example of how hydrophone capacitance can alter system roll-off response is shown in Fig. 3. Note that such insert voltage calibration has recently become the Grade A standard in underwater acoustic recordings (ANSI/ASA S12.64-2009/Part 1, ISO 17208-1:2016). Calibration data (gain versus frequency) are combined with the hydrophone sensitivity to provide received spectral levels or to calibrate signals in the time domain directly. An example system gain curve is shown in Fig. 3.

The loggers were designed to be as flexible as possible with regard to programming and deployment. The USRs can be configured at base or just prior to deployment, usually with a delayed start time, then sealed in the housings and deployed. They can be shipped as separate parts, and once programmed, a logger retains its sampling regime, which is followed when booted by plugging in the main battery, so no communication post logger setup in the laboratory is required. Communication with the logger is conducted via a RS232 serial link using HyperTerminal or similar terminal software. All programming and drive formatting is conducted through HyperTerminal. Deployment configurations or duty cycles vary, with a widely used setup being 300-s (5 min) samples every 15 min (33% duty cycle), as in the passive acoustic observatories of Australia’s Integrated Marine Observing System (IMOS). This recording schedule allows up to 350–365 days of sampling with the standard alkaline battery packs. While the instruments can be used at much higher duty cycles and sampling frequencies, the deployment duration decreases, requiring more frequent redeployment in the field. The 12-month turnaround works well in the IMOS situation. The systems can be programmed for one sample rate, multiple sample rates (e.g. a 6 kHz sample every 15 min and an 18–20 kHz sample once/day is common) or split gains where one byte has one gain and the alternate byte has a different gain (giving two channels at half the designated sample rate, with independent gains).

The basic operations card is a Persistor Instruments Incorporated CF1 card designed as a processing platform for instrumentation applications. It utilises a Motorola MC68CK338 processor with on-board flash and RAM preloaded with a BIOS and a DOS. The application program is loaded into an on-board flash card, and the BIOS is instructed to run this at start up.

The *.DAT files recorded by the loggers can be read into multiple programming packages, though the preferred format for CMST is to process in Matlab. Here, analysis is conducted on multiple scales, typically either through purpose-designed suites of functions, or CMST’s CHORUS package [25]. Initial analysis of a long-term acoustic data set comprises several stages:

  1. 1.

    The data are assembled in a single directory and the header and footer information from each file is read, the sample date-time extracted, the date-times sorted and a log file created which contains the file name, the date-time of the start and end of sampling (time of sea noise sampling not file opening) and various sample parameters. This file can then be interrogated to locate file names (samples) by number or time.

  2. 2.

    The power spectral density (PSD) of the soundscape (or noise spectrum) is calculated for each recording at several frequency resolutions and saved. The PSD is corrected for the frequency response of the acoustic recording system derived from the calibration data and from the hydrophone sensitivity, so that the noise spectra and spectral levels are represented in absolute values (\(\upmu \hbox {Pa}^{2}/\hbox {Hz}\) and dB re 1 \(\upmu \hbox {Pa}^{2}/\hbox {Hz}\), respectively). These spectra are used to plot long-term average spectrograms of underwater sound. Such spectrograms are then used to visually review the main features of the local soundscape and their long-term variations in the second stage of data analysis. Various options are available for removing noise artefacts which are sometimes present.

  3. 3.

    Long-term data are visualised in low temporal resolution soundscape spectrograms and particular recordings selected, based on spectrogram features of interest. Waveforms and spectrograms within the individual recording are then analysed at a finer temporal scale, all conducted using a Graphic User Interface developed in Matlab (an example of this software architecture is available on the IMOS website https://acoustic.aodn.org.au/acoustic/). During this stage, the time–frequency characteristics of any sounds are investigated in more detail using high-resolution spectrograms. Moreover, a de-spiking filter and low- and high-pass filtering can be applied to the signal to reduce the effect of snapping shrimp noise and to extract signals of interest (e.g. dolphin clicks and whistles). An auto-regression method is used to locate the impulses of snapping shrimp sound and to remove these impulses and interpolate the signal waveform within the resulting gaps. Such an approach to signal de-noising in post-processing allows the preservation of spectral characteristics of other noises recorded on the logger. A comparison between raw and processed recordings can be made during analysis for quality assurance which, given the number of sound sources that remain to be identified, is a necessity to ensure signals are not incorrectly discarded.

The USRs are deployed in a variety of environments and water depths, targeting stationary and mobile sound sources at all levels of the water column. As a result, numerous deployment configurations have been used, with retrieval systems to match (Fig. 4). Retrieval of bottom-mounted systems is conducted primarily through the activation of an acoustic release (typically EdgeTech ORE CART), which raises a ground line (\(\approx \)2 times the water depth in length) to the surface (Fig. 3). The floats attached to the acoustic release are then brought on board, and the remainder of the mooring, including the recorder, raised to the surface. On occasion, the acoustic release system may be unsuccessful for a number of reasons including biological growth over the release or tangling of the ground line or float line either during deployment or due to current movement of the mooring. In these conditions, grappling across the ground line and hauling the entire mooring is used as a secondary method of retrieval. In deep water (\({>}\)100 m), the moorings configuration shown in Fig. 4 cannot be recovered if the current is greater than approximately 1.5 knots as the drag on the surface buoys pull the buoys underwater and it is difficult and dangerous to grapple in strong currents. When set correctly the mooring anchoring system will hold the mooring in place indefinitely after it is released and we have had instances of releasing a mooring in a strong current, being unable to recover it, then returning several months later to the find floats at the surface on-location.

Fig. 4
figure 4

Typical deployment configurations for the CMST-DSTO USRs

The noise loggers have been set multiple times in a “tracking” configuration. In these instances, four loggers are ideally set, with three in an equilateral triangle and one in the centre, with triangle sides of 1–6 km depending on the species of interest. The intention is that these loggers will sample with some overlap, and so data from the four loggers can be used in a time-of-arrival difference fashion to localise sources, across deployment durations (of up to a year). To synchronise the instruments, which contain on-board clocks that all drift at different rates, the central mooring has an acoustic release modified to ping at 7.5 kHz every 20 s for 30–35 min, once per day. The USRs are all programmed to collect one high-frequency sample (20 or 22 ksps sampling rate), once per day along with the normal samples made, at a time overlapping the acoustic release pings. The relative arrival time of the pings at three loggers compared with a fourth in the high-frequency sample allows the synchronisation of USRs once per day.

Case Studies

To date, the USRs have been deployed over 600 times in riverine, estuarine, nearshore, offshore, coral reefs and deep-water environments, acquiring over 50 TB of data around Australia and internationally. If “taped” together this would equate to roughly 100 years of continuous recording or 300 years if including the typical duty cycle. Several locations around Australia have had deployments maintained nearly continuously for over a decade [19, 20], with some locations containing multiple sites [20, 26, 27]. These studies have focussed on biotic and abiotic sound sources.

Off Portland, Victoria

CMST has been recording underwater sound at the IMOS passive acoustic observatory off the Portland coast since 2009. USRs have been deployed on the seafloor at \(\approx \)160–180 m depth, close to the continental shelf edge. The soundscape is dominated by sounds of different origin at different frequencies and times of year and day:

  • Ships contribute significant acoustic energy at 8–250 Hz throughout the day and year.

  • Whales are detected seasonally, for example, Antarctic minke whales at 50–500 Hz, July–October; Antarctic blue whales at 17–27 Hz, March–October; pygmy blue whales at 17–100 Hz, December–June.

  • Biological choruses, likely from fish, are heard at 100–200 and 1000–2000 Hz at night-time throughout the year.

  • Wind-dependent noise is recorded at 200–3000 Hz any time of the year.

  • Rain adds to the soundscape above 2000 Hz and peaks in winter.

A snapshot of the soundscape during a 16-day period is shown in Fig. 5, with some examples of these signal types highlighted. The frequency-dependent variability of the soundscape can be quickly reviewed and compared temporally through the production of PSD probability (PSDP) plots or spatially separated recordings compared from their respective PSDP plots (Fig. 6b, [28, 29]). Boxplots of frequency band levels can highlight outlier levels and identify the true noise floor of the recording equipment (Fig. 6a), for which the CMST USRs are considered low, compared with other long-term historic and current recording systems. Acquiring sufficient samples and recording duration is an important factor in truly understanding the local soundscape, rather than a potentially misleading snapshot of it, and the determination of a cumulative dynamic range index [29] can assist in determining how representative the recorded sample is (Fig. 6c).

Fig. 5
figure 5

Spectrogram of 16 days of recording from the Portland site (a) with marked periods and short breakout spectrograms highlighting sounds attributable to anthropogenic noise such as vessels (b, dark grey) and a constant frequency anthropogenic signal (c, black, with magnification \(c_{i})\), geophysical sounds caused by wind (a, beige box) and biological sounds including Antarctic minke whales (d, white boxes, magnification in \(d_{i}\)), two types of fish chorus (e, red and orange boxes, magnification in \({e}_{i}\)), Antarctic blue whales (a, pink box), and two types of unknown baleen whale call (f, blue box—often mistaken for an Antarctic blue whale call, and g, yellow boxes)

Fig. 6
figure 6

Box plot and outliers for 2013 recordings at the Portland IMOS site (a), together with percentiles overlaid on PSD probability (b), and the dynamic range index for the site over the same period (c)

Swan River

Passive acoustics have been shown to be a valuable method of monitoring the sounds and choruses of vocal fish. Each summer, a large number of mulloway (Argyrosomus japonicus) swim along the Swan River, in the evening, to spawn. Over the last decade, a logger has been deployed to one of the most prominent spawning sites, usually between October and May (though in some seasons the entire year), to record their vocalisations and the fish chorus around and after sunset. These records have led to characterising the mulloway calls into three categories (short calls, long calls and groups of very short calls in quick succession) and a number of sub-categories for long call category where the call has been separated by a gap in the train of swimbladder pulses [30]. Time-of-arrival difference and energy differences have been applied to recordings from single hydrophones and hydrophone arrays to range, locate and follow individual fish to estimate behaviours, such as swimming speed and position in the water column [31, 32]. The localisation of individual sources also facilitated the confirmation of call source levels, a first step in using passive acoustics to estimate relative and absolute caller numbers [33]. Long-term data sets of each chorus have been compared to environmental data, such as temperature, salinity, tidal patterns and lunar phase to show that the chorus levels and timing at individual sites can be dependent on each factor to different extents and that those responses to drivers can be compared between seasons [19]. It also highlighted how the daily received spectral peak frequency of a fish chorus may vary with temperature (Fig. 7, [19]), a factor which has the potential to impact correct identification of sources where visual confirmation is not available. Through generalised estimating equations, acoustic habitats around the river have been compared to better understand how fauna such as the mulloway and a local population of resident dolphins may respond to the different acoustic contributors [34, 35]. Interestingly, energy <20 Hz from road traffic on nearby highways and local trains, which travels through the limestone substrate and some of which “leaks” back into the water column, can be detected in the low-frequency portion of noise loggers set in the Swan River.

Fig. 7
figure 7

Relationships between call spectral peak frequency and temperature during the 2006–2007 (a) and 2007–2008 (b) spawning seasons of mulloway in the Swan River

Fig. 8
figure 8

Mean number of individual calling pygmy blue whales counted in individual 15-min recording samples, averaged over 24-h periods to track their southern and northern migrations. Red horizontal lines indicate when USRs were deployed. No recordings were obtained during months without red lines

Tropical Water Sites Around Northern Australia

Fish are significant contributors not only to estuarine soundscapes, but ocean and tropical locations. Similar to temperate sites, the USRs have been used to detect and describe fish choruses and delineate their long-term temporal patterns at multiple coastal and reef sites in tropical Australia [19, 21, 36,37,38]. These include world heritage listed sites where identifying current contributors to the soundscapes can be compared to past and future reports of species presence to help assess the diversity and health at the recording site [19, 38]. In recent years, significant work has been conducted to explore acoustic ecology and its relationship with the local physical environment [39,40,41]. PSD probability plots highlight the difference between soundscapes in the wet and dry seasons, while fine-scale modelling has illustrated a variety of lunar patterns in fish chorusing, sometimes with the same speculated source species exhibiting different patterns at different sites [37]. Data from the USRs have been used to separate different types of biological and anthropogenic signal that may contribute to the soundscape, via the use of a signal complexity matrix, with a view of determining the impact these differences may have on acoustic indicators used in soundscape ecology [37].

Perth Canyon and Western Australian Pygmy Blue Whales

The CMST has been recording sea noise on a “plateau” in 400–490 m depth which overlooks the Perth Canyon, since 2000 as part of monitoring blue whale populations [42]. Various attributes of the soundscape of the Perth Canyon were described by [29]. The pygmy blue whales that migrate north and south along the Western Australian coast use the Perth Canyon as a feeding stopover on their northbound migratory leg, if there is enough food (krill) available [43]. Using the Perth Canyon pygmy blue whale calling data to elaborate some of the complexities of calling behaviour has allowed us to convert raw call rates to counts of the numbers of individual whales calling at any instant in time. Overlaying the yearly trends of the 24-h averaged (to remove diurnal trends in call rates) number of instantaneously calling pygmy blue whales, at multiple sites along the Western Australian coast highlights their migratory pulses (Fig. 8). The pygmy blue whales tend to have a rapid southerly migration down the WA coast mostly over November and December, followed by a slower northbound pulse between March and August (depending on latitude), the following year.

Summary

These case studies have shown a few examples of the information gained using long-term underwater recordings, but this is just the tip of the surprisingly noisy iceberg [44]. Other examples include identifying source levels for numerous biological [45] and anthropogenic signals [46, 47], quantifying long-term changes in characteristics of biological sounds [45], identifying vocal repertoires for populations of marine fauna [48, 49], monitoring responses by marine fauna to anthropogenic noise [50,51,52], verifying propagation models of various sources [47, 53] and testing the performance of communication systems [54,55,56,57]. The deployment of USRs in an array format has produced one of a few systems that have allowed accurate localisation of sources, such as the great whales, without requiring cabled connections between recorders. Tracking data have been successfully used in the Perth Canyon and Geographe Bay for tracking pygmy blue whales to show their large-scale migratory movements [5, 47] and are currently being used to look at signals associated with feeding pygmy blue whales. The accumulation of recordings from along the coast provides the opportunity to monitor the passage of migratory species on a coastal scale and emphasises the applicability of these systems to monitor wide ranging, predominantly sub-surface, yet ecologically important species around Australia.

Future Directions

Such studies, amongst others (www.cmst.curtin.edu.au/publications/), highlight the utility of long-term, accurate, autonomous, USRs and the need for more comprehensive recording, monitoring and understanding of the soundscapes around the world and the behaviour of marine fauna both in the absence of and in response to anthropogenic activity. While the acquisition of long-term data sets is key to truly understanding the marine environment and linking observed phenomena to large-scale physical drivers of climate, oceanography or long-period earth oscillations, the need for real-time monitoring has always existed. The next generation of CMST loggers moves into the high-frequency, localisation space to help fill gaps that currently exist in underwater acoustic monitoring. To this end, we have developed high-frequency recorders (up to 300 kHz), based on commercially available digitising electronics packaged into our own hardware system which can be calibrated accurately. We have put considerable effort into understanding and dealing with the many quirks of acoustic digitising systems. We have also developed an instrument which utilises atomic clocks “on a chip”, to improve the ease of uncabled tracking configurations.