Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Flowing and static bodies of water are vital for shipping—the movement of goods and passengers—and as a source of food. Less critical, but still not insubstantial, is the recreational value (swimming, diving, boating, etc.) of seas, lakes and rivers. Thus ensuring maritime security and safety is an important concern for any nation, especially with coastal borders.

In order to safeguard the water, authorities such as the coast guard monitor the waters for signs of activity that could threaten these values. The variety of threats includes smuggling, piracy, accidents (collisions, sinking) and oil spills (accidental or intentional).

Nowadays, ships usually navigate using tools such as GPS, charts and radar in combination with ordinary piloting skills. In narrow waterways, such as a river or an estuary, piloting becomes more important. The seamen need skills in estimating the changing current and topography which depend on different water conditions such as the state of the tide, water level management, or rain fall. The route the ship follows is dependent on both the flow of the water and the piloting skills of the navigator (and any assistant tugs etc.). The contribution of piloting to navigation is difficult to observe and study, as it depends on cognitive skills distributed among the individual crew members [5].

Since 2002 more and more ships are equipped with AIS (Automatic Identification System) navigational equipment that sends and receives position data and other relevant parameters such as course, speed, daytime, type of ship, etc., to give ships and other interested parties a real time view of the shipping in an area [6]. Currently, the use of onboard AIS transceivers on international waters is enforced by the International Convention for the Safety of Life at Sea (SOLAS), and the measurement precision as well as the receiving coverage is constantly increasing. By recording AIS data it is possible to observe precise ship movements even in a narrow waterway and, hence, to get an indirect view of the process of piloting. This enables studying and modeling the actual effect that movement of water has on movement over water, and with sufficient feedback, also the reverse process.

This study uses a method based on potential fields applied to ship movement tracking data (AIS), gathered over a period of time. The potential fields are meant to represent the traffic, its specific properties and intensity in a discretized form, susceptible to visualization. Other existing path plotting solutions are capable of displaying the exact past paths of ships or providing a general statistic overview of the traffic. This study takes a step further in order to accommodate various vessel behavior properties (position, course, speed, daytime), as well as generalize over the data to provide an abstract traffic model.

To that end, all AIS ship tracking data are represented by abstract charge units. Each ship position reported by AIS generates a single charge (dropped by a ship) with values describing the ship’s behavior at the reported longitude and latitude. The collection of all charges distributed over a geographical grid give rise to a potential field, which, when visualized, resembles an approximate plot of all the commonly traveled waterways [6]. Unlike a plot, the potential fields has no linear representation, but a smoothened, heat-map-like representation, where stronger potentials are more desirable and locations with absence of potential should be avoided. Here, due to the limitations of the grayscale printing, potential fields are represented by shades of gray: from nearly white (weakest) to nearly black (strongest). In the original implementation, the colors are analogous to the actual heat map displays: from green being the least intensive traffic, through yellow, to red being the most intensive traffic. Contrary to a chart, which warns for rocks, reefs and shorelines, the extracted pattern shows preferred positions and routes, where deviations are shown as anomalies.

An additional benefit of the method is the decay effect allowing constant model retraining. Namely, over time the accumulated charges are affected by a decay factor, which weakens them the older they become. It allows for the unfrequented and closed waterways to expire, and be removed from the normal traffic model, thus keeping the model up to date. New and more frequented waterways are easier to establish, as the newly deposited charges are the strongest.

To investigate the applicability of modeling AIS traffic records using potential fields, we present an anomaly detection prototype system called STRAND (Seafaring TRansportation ANomaly Detection), which implements the traffic modeling for the collected AIS data. The system functionality is demonstrated here in a complex water system scenario (narrow navigation space, heavy traffic, complicated route).

The following section of this chapter presents the domain background and related research (Sect. 14.2). The in-depth description of the method design can be found in Sect. 14.3. It is followed by a case study in Sect. 14.4 and analysis in Sect. 14.5. Sect. 14.6 encompasses the inspection of study outcomes along with a transdisciplinary discussion into the unified framework and reflections about related open topics. The chapter is ended by concluding remarks and projection of possible future work in Sect. 14.7.

2 Background

In general, to aid the kind of monitoring performed in the scope of this study, many sources of data collection could be used, e.g., shore based radar, AIS, visual observations from, e.g., Coast Guard Cutters or civilian traffic. The data is augmented with data from databases, e.g., detailing ownership of ships, their particular properties, photographs and description, or weather forecasts. However, as previously mentioned, the focus is on the AIS system as a source of input, as AIS data are open, freely available and provide a variety of essential information about current vessels’ state.

A problem here is the sheer amount of data that has to be processed. In other security domains (such as computer and network security, monitoring of nuclear material, etc.) traffic and behavior modeling as well as anomaly detection (the automatic detection of deviations from normal behavior) has proven useful in handling comparable amounts of data and providing operators with indications of wrongdoing [1]. Such solutions comprise of advanced, self learning, modeling and anomaly detection systems that, given a mass of historical data, pick out patterns of normal (and abnormal) behavior, and apply this knowledge to the situation at hand. The systems continually appraise the situation and alert the operators about incidents that merit further investigation.

Data modeling and knowledge discovery systems are typically built on some form of machine learning. Machine learning is the study of algorithms that learn in some sense [11]. That is, these algorithms are not programmed in the normal sense of the word. Instead the algorithm is presented with a set of input data and builds a model of the input data domain. This model can then be used, for instance, to classify new input data (in case of anomaly detection typically into two classes: anomalous or normal) and predict how a modeled system will behave in the future. Subsequently, the operator can extract information about the model to gain insight into the modeled domain.

The evaluation of a modeling method’s quality and performance is often a challenge. The correctness of a model is frequently shown or demonstrated by various non-numerical visualizations and displays [810, 14, 15]. Nevertheless, the ultimate way of assessing or comparing performance of a data modeling method requires a quantitative approach. In this case the applicability of the developed method is quantified by applying the acquired potential field based models to perform anomaly detection. In this study, the process of anomaly detection simply examines whether the currently observed ships are behaving in a way that conforms to the normal model. The definition of an anomaly in this study aligns with that of Chandola, Banerjee and Kumar [3]. Anomaly detection (or outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or common behavior in a dataset. Where the terms anomaly (used here) and outlier are sometimes used interchangeably. Therefore, there is no specific definition of an anomaly and its properties, other that not fitting the normal model (i.e., model describing all possible behaviors considered normal).

Returning to machine learning, on a technical level there are two paradigms that are suitable for the purpose of developing this type of self-learning modeling and anomaly detection systems: unsupervised and supervised learning [17]. Which type is used, is typically based on the sort of input data one has access to. If the historical data (the input) is labeled with correct classifications or predictions (the output), it is possible to apply supervised learning algorithms to generalize from data with known classifications. If, on the other hand, the historical data does not include any associated predictions or classifications, there is a need to use unsupervised learning techniques instead of, e.g., clustering [17]. Consequently, the unsupervised approach is to learn from observations of input data without quantifiable evaluation of the output accuracy. Thus, for a classification problem, an unsupervised learning algorithm automatically partitions the data into groups, while a supervised learning algorithm instead generalizes from data which has already been partitioned into groups. In actual cases the distinction between these two approaches is not necessarily as clear. In particular, in the AIS case investigated here, while the amount of data that can be classified as normal is large, there are no known (labeled) incidents occurring in the observed time frame. As a consequence, there is an overwhelming majority of benign over malicious event examples, i.e., examples of dangerous or unwanted behavior. Therefore, the approach in this investigation (as is often the case in anomaly detection) is somewhere in between supervised and unsupervised learning, with the classifier learning normal behavior from given examples, but having no well formed idea about unwanted behavior as such.

This work uses artificial potential fields as the machine learning algorithm. Potential fields were first developed in the AI community as a navigation and decision making mechanism, mainly for the development of game AI. The idea behind potential fields, used in that respect is, analogously to, e.g., electrostatic potential, to assign an attractive potential to a desirable position and repelling potential to undesirable positions [4]. This enables, e.g., a simulated unit in a computer game to find optimal positions and paths, by hill climbing in the resulting potential field.

In this work, that technique is used in the reversed manner, by using the idea of artificial potential fields as a learning algorithm, instead of using it for movement generation. The ship movements are used to create charges that ultimately define potential fields representing the patterns of normal behaviors, but also commonly occurring unwanted behaviors (e.g., sailing too near a coast line). By selecting the strength of the potentials and the functions that define how this potential decays with time and dissipates with the distance from the source, the performance of the navigation algorithm can be optimized. To our best knowledge this approach to machine learning, with the potential field defining what the system learns, is novel.

Regardless of the machine learning approach used, a common problem with anomaly detection systems is that they tend to overwhelm the operator with false alarms, i.e., alerts that are not connected to any notable malicious situation [3]. A majority of the commonly applied machine learning algorithms are designed with the main objective of generating the most accurate classification models, where accuracy is defined as the ratio of true classifications (both positive and negative) to all classifications. Accuracy is indeed an important factor to consider, when determining the suitability of an anomaly detection system. However, the fact that one system is proven to be more accurate than other systems does not necessarily imply that it raises fewer false alarms. In fact, it might very well do the opposite. Furthermore, even if there are techniques to address the false alarm problem at its root, for many real-world problems, it is still not possible to completely eradicate the existence of false positives.

Consequently, there is a need for complementary techniques to handle the remaining false alarms in a suitable way. Self learning systems are often opaque, i.e., not transparent in their structure or not intuitive to use. Thus it is difficult, not to say impossible, for the operator to develop a feel for exactly what the system has learned, and hence, to evaluate the correctness of the system. Does it, for instance, have enough background data pertaining to the situation at hand, to make an informed decision, or is it drawing far reaching conclusions based on a too limited data set? In order to overcome this issue, and to address the remaining false positives, information visualization is applied to the problem of bridging the gap between the operator and the system [12].

Information visualization in the form of the heat-mapped [16] presentation of the potential fields, that make up the learned behavioral model, is used to put the operator in the loop. This way the instances of overfitting (where the detector has drawn too specific lessons from the available data), under training (where the detector instead draws too far reaching conclusions from the little data that is available) can be detected. The latter type of model fitting error often manifests itself in traditional systems as a detector that delivers its results with perfect assuredness, but is in fact wrong. Making the detector more transparent to the operator so that these and other situations can be detected and fixed is a major part of this research. Heat mapping is simply the visual representation of the field strength by color, where, e.g., the weaker (less desirable) field strengths are light gray to white and more intensive (and desirable) ones are dark shades to black. We demonstrate and discuss the actual use of visual pattern representation in the STRAND prototype in more detail in the sections to follow.

3 Potential Fields

The idea in using potential fields in maritime traffic modeling is to characterize a collection of an illegible mass of traffic data as an abstract structure that can be easily and intuitively perceived. The concepts of a magnetic field surrounding a magnet, an electrostatic field surrounding an electric charge or a gravitational field surrounding a celestial body, are common knowledge, and, as common, omnipresent phenomena observed every day, can greatly contribute to comprehension if used as an analogy.

Conceptually, each AIS trace (ship movement trace) assigns a charge to a specific location passed by a ship. A collection of charges distributed over an area generates a potential field. The strength of the local field also depends on the surrounding charges (i.e., their density and strength). The three main concepts, introduced by the potential field based method, are:

  • the total strength of a local charge,

  • the decay of potential fields, and

  • the distribution of a potential field around its charged source[12].

Each vessel tracked by AIS is characterized by a collection of n numerical and textual properties. Those properties include the vessel’s static parameters, (e.g., name, flag, type), as well as the current state of its dynamic behavior (e.g., speed, course, location), and are either inherently nominal or discretized to a nominal scale. A single vessel carries a set of charges of equal strength, representing its state and behavior projected onto these coordinates. For each AIS report, the set of charges c that a vessel carries is assigned to a location characterized by geographical position coordinates. Mathematically this can be expressed by a vector \(c_{lat_{k},lon_{k}}\) with n components

$$\displaystyle{ c_{lat_{k},lon_{k}} =\langle c_{lat_{k},lon_{k}}^{1},c_{ lat_{k},lon_{k}}^{2},\ldots,c_{ lat_{k},lon_{k}}^{n}\rangle, }$$
(14.1)

where \(c_{lat_{k},lon_{k}}^{1}\) to \(c_{lat_{k},lon_{k}}^{n}\) are the component charges reflecting reported vessel properties: speed, course, etc.; and lat k , lon k are the geographical latitude and longitude coordinates at point k. A vessel traveling in the evening hours (e.g., 21:20) with a northerly course (with a maximal deviation of ± 2230’) at a speed of 4 knots could for instance drop charges expressed by the following vector \(c_{lat_{k},lon_{k}}\) at the passed location k: \( \begin{array}{lll}\rm{Charge}\quad\quad\quad\frac{\rm{Course}}{N\,\,\,NE\,\,\,\cdot\cdot\cdot\,\,\,NW}\quad\frac{\rm{Speed[{Knot}]}}{0-1\,\,\,1-7\,\,\, \cdot\cdot\cdot\,\,\,>60 } \quad\frac{\rm{Daytime[h]}}{6-12\,12-18\,18-0\,0-6}\\{c_{{{lat}_{k}},{lon}_{k}}}\, =\quad \langle\,\,1,\,\,\,0,\,\,\,\cdot\cdot\cdot\,\,\,0,\quad\quad0,\,\,\,1,\,\,\,\cdot\cdot\cdot\,\,\,0, \quad\quad0,\,\,\,0,\,\,\,1,\,\,\,0,\,\,\,\,\,\rangle \end{array}\)

The total charge C at a location is calculated as the sum of all local charges c. In electrostatics the greater an electric charge is, the stronger the electric potential field that surrounds it. Analogously, the more vessel visits are reported at a location, the higher potential builds up in and around it. Hence the aggregate charge \(C_{lat_{k},lon_{k}}\) accumulated at a location k, over a time period τ is computed as:

$$\displaystyle{ C_{lat_{k},lon_{k}} =\sum _{ t=0}^{\tau }c_{ lat_{k},lon_{k}}. }$$
(14.2)

The potential field formed by a single charge is most intensive at the location of the charge, and attenuates with (radial) distance. Areas where a potential is very strong represent a traffic pattern and belong to the model of the normal behavior. Areas where a potential is very weak or nonexistent, signal absence of normal behavior—an anomaly. In this study, the anomaly levels are determined using minimal potential thresholds. The total potential at location k is the superposed potential generated by all surrounding charges in location i, decreased by the distance between these locations. Here the potential distribution P is described by two-dimensional Gaussian smoothing, using Euclidean distance for measuring the radial distance between two points:

$$\displaystyle{ P_{lat_{k},lon_{k}}(t) =\sum _{i} \frac{1} {2\pi \sigma ^{2}}e^{-\frac{(lat_{k}-lat_{i})^{2}+(lon_{k}-lon_{i})^{2}} {2\sigma ^{2}} }C_{lat_{i},lon_{i}}, }$$
(14.3)

where σ is the standard deviation of the Gaussian distribution. This use of two-dimensional smoothing draws an analogy to the smoothing of gravitational sensor readings [7].

These equations assume no loss of charge over time. Continuous data collection, defined in that manner, would allow charges to accumulate without an upper bound. This is undesirable, as it would undermine the ability to compare and follow changing trends of the maritime traffic behaviors over time. E.g., once established real-world traffic patterns may be abandoned as time passes. Therefore, it is desirable for the potential fields that model maritime traffic, to evolve over time to reflect such changes in patterns.

Researchers presenting different approaches often address the problem of real time continuity by applying constructs such as a sliding time frame or a data window [2, 13]. Potential field theory offers an alternative construct of potential decay. Adding a decay factor enables the continuous updating and retraining of the model, by representing charge at a location as a function of time:

$$\displaystyle{ C_{lat_{k},lon_{k}}(t) =\sum _{ t=0}^{\tau }d(t)c_{ lat_{k},lon_{k}}, }$$
(14.4)

where d(t) is a non-increasing decay function with limit at zero, describing the decrease of a local charge over time.

4 Case Study Setup

Figure 14.1 demonstrates the general idea of a normal model of vessel behaviors in traffic over water. The map-based representation displays a section of the Polish northern coast and the Gdansk Bay. The overlay, ranging (here grey scale) in shades from almost white to almost black, represents the potential field. In this case, the field expresses all vessel traffic without a specific parameter (such as a course, speed range or certain daytime). It becomes instantly apparent where the ports and harbors, as well as where all the regularly traveled waterways are. Nevertheless, most of the map remains without a visible potential overlay. It does not mean that these waters are closed or restricted, but only that transport in those areas is very infrequent or non-existent.

Fig. 14.1
figure 1

Potential field representing an overall view of the traffic in the Gdansk Bay

On the other hand, Fig. 14.2 displays a specific traffic sub-pattern. It is a potential field representing all traffic with course reported as N (northerly). The potential field is visibly sparser and only intensifies in specific areas. There is a clearly visible, regularly traveled, waterway between the port of Gdynia and Jastarnia within the Puck Bay. Both major ports in the region: Gdynia and Gdansk also generate visible amounts of traffic with a northerly course. A portion of the traffic on route from the Gdansk and Puck Bays also needs to pass close by the Hel Peninsula, therefore a local northerly traffic potential field can also be observed in near proximity to the eastern end of the peninsula. The further away from the ports, the more pale and scattered the northerly potential field becomes, which indicates, that this course is not frequently followed. This figure also demonstrates an example of detection. The captured anomalies, represented by larger gray arrows, picture ships violating the displayed northerly potential field. Two of them are observed in Zalew Wislany, where no prior vessel traffic has been seen whatsoever. Whereas the two other detections visible towards the north of the bay, appear to be following a vaguely visible, but not well defined northerly potential, which is too weak to recognize their behavior as being normal. In the example from Fig. 14.2, the observed detection indications are supported by a display of the violated potential field, which provides instant insight into the situation, allowing to further minimize the incident reaction time.

Fig. 14.2
figure 2

A sub-pattern of traffic for course N, and examples of detections (potential violations)

4.1 Grid Size Versus Display Resolution and Detection Sensitivity

The collected AIS system records consist of packages of current data from each active vessel, downloaded every 90s. Each AIS report contains numeric and textual properties including vessel’s static parameters (identification number, call sign and name) as well as current state of its dynamic behavior (time of day, speed, course, location). The tracking data updates are limited to one every 90s, which means that every time a snapshot of the state of maritime traffic is acquired, it takes another 90s to acquire the next one. The potential field based method, expressed by the STRAND system, requires the setting of different parameters including the grid resolution, the optimization of which is essential for efficient pattern extraction and accurate detection.

The grid resolution in particular is a parameter defining the maximal geographic precision of the model. The linear domain of geographic latitude and longitude is discretized for the computations. In effect the map is represented in form of a grid. All AIS reports recorded within one grid point contribute to the charge of that grid node, therefore the grid size (or density) defines firstly, how dense or sparse the grid will appear when visualized, and secondly, what the maximal sensitivity of the anomaly detection is in terms of physical distance. A smaller size (or denser) grid results in more detailed visualization and higher detection sensitivity. A larger (sparser) grid provides a smoother visualization but lowers the sensitivity of the anomaly detection. Logically, the larger grid sizes are more applicable in locations where the charges (AIS ship position reports) are expected to be more distant from one another. That occurs when ships travel at higher speeds and through less strictly defined paths, which is mostly the case on the open sea. On the other hand, in constrained conditions, where vessels follow the same paths and need to either remain static or strictly limit their speed, it is logical to expect to receive many AIS reports from very close locations. In the latter situation, where traffic incidents occur in narrower spaces and smaller distances, an increased density of the grid (smaller grid size), implying an increased visualization resolution and detection sensitivity, would be more applicable.

River systems, basin areas and water canal networks are characterized by narrow passages and often heavy traffic due to the limited space available for vessels traversing them. Depending on the time of day and specific location, this means more or less limited speed and narrow room for maneuver for the involved vessels. The experiment performed in this study focuses on such an area, and compares it with less restricted harbor setting. Traffic recorded over the same time period (20 days) from these two areas is modeled using the STRAND system. The selected harbor area includes the two major Polish ports of Gdynia and Gdansk, in the Gdansk bay and the bay of Puck, along with a stretch of the coast and a fragment of the Hel peninsula (see Figs. 14.1 and 14.2). The approximate coordinates are 54.3–54.7N, and 18.4–18.9E. The river area studied in the experiment is the Piast Canal connecting the Oder Lagoon with the Baltic Sea. It allows the eastern part of the natural Swina River to be bypassed, providing a more convenient north-south connection for large ships between the Baltic Sea and Stettin. The canal is approximately 12 km long and 10 m deep. In particular the investigation focuses on a narrow stretch of coast near the estuary limited by the approximate coordinates: 53.89–53.93N, and 14.24–14.29E (see Figs. 14.3 and 14.4). During a period of 20 days worth of data 2,263 MMSI (Maritime Mobile Service Identity) numbers, identifying one individual vessel each, were registered in the southern Baltic area containing both of the areas investigated in the case study. At the time point selected for demonstrating and analyzing the anomaly detection, in total 654 vessels were present in the southern Baltic basin, from which 120 were within the Gdansk Bay area, and 36—in the river area.

Fig. 14.3
figure 3

Traffic pattern for course SW

Fig. 14.4
figure 4

Traffic pattern for course NE

The investigation involved modeling traffic and performing anomaly detection in both of those waterways with grid sizes changing from 10 m to 2,000 m. The results acquired for all types of anomalies (i.e., course anomaly, speed, daytime and waypoint—the general anomaly type) were then stored as the ratio of the detection count to the total number of all examined vessels. With a grid size ranging from 10–100 m, the experiment was performed for each 10 m, in the range 100–1,000 m—for each 100 m, and additionally for 1,500 m and 2,000 m.

Generally speaking, the number of suspected waypoint anomalies decreases when the grid size is enlarged. The disadvantage is that a bigger grid size may also cause failures to recognize real anomalies. Consequently, there is a need to find a reasonable ratio, minimizing the amount of false anomalies without overlooking any actual incidents, in order to balance the trade-off between benefits and disadvantages of more general and more specific parameter settings. A locally optimal grid size (specific for a particular traffic case), would minimize the general type of anomalies (i.e., violations of traffic patterns for all types of potential fields), and present locally stable ratio between specific attribute detections (course, speed and daytime). If the optimum for specific detection rates correlates with a low count of the general (waypoint) detection type, a possible optimal grid size candidate is found.

Intuitively, an area of open sea should have a sparser grid than a harbor or river area grid. On the open sea traffic is typically faster, i.e., vessels pass longer distances between sending each two AIS reports; and it is sparser, with ships more distant from each other. Therefore, for a group of grid units to meaningfully represent a traffic pattern on the open sea, the grid ought to be relatively larger. For the case of the open sea the acceptable grid sizes range from 300 m to 1,000 m for course and speed, where the anomaly detection count stabilizes. The anomaly rate of waypoint detections is steadily decreasing until 2,000 m [12]. A comparison with the harbor and river traffic cases accentuates the influence of grid size and its relation to the traffic type.

One property common to the traffic scenarios with close proximity between vessels and land or other obstacles, is the limited speed. A vessel moving with the speed of 2–5 knots, which is usual for narrow and heavily traveled passages in inland waters, will change its position by approximately 90–230 m during 90 s. Therefore, coarse grids, suitable for the open sea, are probably suboptimal in these areas.

5 Analysis of Obtained Results

The applicability of the potential field based method is demonstrated in practice by the STRAND system. The Piast Canal and the Gdansk bay areas, chosen for the case study, are constrained by factors such as the proximity of land, water flow (rate and direction) and maritime navigation rules. The visualizations of the patterns extracted using STRAND enable the observation of distinctive behaviors and learning about particular properties of the traffic.

5.1 Potential Fields as Traffic Patterns for Courses SW and NE

Figures 14.3 and 14.4 show patterns specific to the SW and NE courses, where the shades going from white, through grays to black, (in the original version, colors: green through yellow to red) represents gradually increasing traffic. In Fig. 14.3 the traffic following a south-westerly course seems mostly regulated: it intensifies mostly at the north-western bank of the river, while very sparse on the other side of the river. Still, it does get less regulated to the south, where ships seem to get closer to the opposite riverbank either to dock or “cut the turn”. On the other hand, the north-easterly traffic in Fig. 14.4 seems to keep to the right bank in the south, but gets somewhat diffused towards the mouth of the river. The distribution of the potential also allows the observation of a probable reason for that NE traffic diffusing. A point at the western riverbank in the middle of both figures appears to be a frequent destination for tracked vessels, some of which seem to display a disregard for traffic rules when departing in north-easterly direction. That behavior occurred often enough to build up a relatively strong traffic pattern, and if repeated—it would be concerned normal, from a course-specific point of view. If speed, daytime and type of ship are evaluated, new anomalies may be discovered.

5.2 River and Harbor Case Comparison

The plots in Figs. 14.5 and 14.6 represent the numbers of detections of types: waypoint, course, speed and daytime. The total is the sum of all anomalies, (i.e., positive detections) regardless of type. Course and speed are stored with precision 0.1 and 0.1 knot respectively, but for practical reasons are grouped by range. Course is divided into 8 equal 45 intervals: N, NE, E, SE, S, SW, W, and NW. Speed ranges correspond to a speed division levels common for maritime traffic (in knots): Static (0–1), Very slow (1–7), Slow (7–14), Medium (14–22), Fast (22–30), Very fast (30–45), Ultra fast (45–60), and Beyond 60.

Fig. 14.5
figure 5

Positive detections percentage in the harbor area as a function of grid size

Fig. 14.6
figure 6

Detections percentage in the river area as a function of grid size

The course, speed and daytime are detections made based on an observation of a time of day, course or speed, with which the ship travels, and which is unusual for the ship’s present location. It is important to note that these detections may overlap, e.g., a ship may travel with anomalous course and time of day, but at a speed that is normal for its current location. The waypoint detection signalizes the most severely anomalous behavior, and is triggered when a vessel is observed in an area in which no prior visit of any other vessel was ever observed. As a consequence this type of anomaly also indicates anomalous speed, course and daytime, increasing the detection count.

For the harbor area there is a range between 60 m and 200 m where an increased number of course and speed anomalies are found. At the same time the number of waypoint anomalies decreases and flattens out after 200 m grid size. Also the total percentage of anomalies are quite high in this range, up to 25 %.

For the river area the active grid size is even smaller. Starting at 30 m and until 100 m there is an optima for course anomalies. Waypoint, speed and daytime stabilizes at a low anomaly level (below 3 %) at a 100 m grid size, making the total anomalies appear slightly below 3 % of all observations.

5.3 Analysis of the Speed and Course Range Binning

Added experimental data sets for course and speed are produced by performing detection on data with altered speed and course.

One of the data sets is computed for the real course (say NE) altered as if the ship was traveling with a course slightly more to the left (the altered course is N). In the second data set, the course is altered to the next bin right (e.g., S to SW). In the case of speed the next lower and next higher speeds are tested.

The diagram in Fig. 14.7 plots the results of detections performed on the traffic tracking data with the real speed and course, and with their altered (increased and decreased) values. It is immediately apparent that the amount of detections for the true observed speed and course is lower than for their altered values. The plotted results for altered courses are larger than for the real course. The observed distance between the real course plot and the altered courses shows that there are significantly fewer positive detections for the true 45 interval of vessels’ courses, i.e., if the observed vessels were sailing a course set more to the left or to the right, approximately 3three times more of them would be marked as anomalous. This finding suggests that the course scale binning succeeds in differentiating the correct and faulty vessel courses. The lower speed is consistently only slightly greater than the true speed, while the higher speed plot shows much higher initial values and a substantial decrease after.

Fig. 14.7
figure 7

Course and speed binning comparison for the harbor area

The patterns for the different parameters are similar, in Fig. 14.7 and 14.8, with high speed at the top and low speed at the bottom (but still above the general speed parameter). Left and right directions are above the general course parameter, with a left turn causing more anomalies compared to a right turn.

Fig. 14.8
figure 8

Course and speed binning comparison for the river area

6 Open Issues and Interdisciplinary Discussion

Unlike the land, where for the sake of the transport roads need to be built, water is readily available for the transport over even the longest distances. The nearly omnipresent navigable waters do, however, require adjustments in key areas. Usually the changes made to waterways are focused where the most desirable itinerary destinations are, be it trade centers, naval bases, shipyards or leisure centers. The modifications include deepening the riverbed, or less frequently the seabed, to accommodate the draughts of all vessels sailing to that particular destination. Natural river systems may also require widening, to allow passage for broad vessels, traffic in both directions, or for multiple vessels to pass each other.

A larger scale of interference with the natural distribution of waterways is creating new ones, where they previously did not exist. Waterways of this kind are usually built in the most crucial transit locations, not necessarily being destinations on their own. Best known and most impressive examples include the Panama Canal linking the Pacific and Atlantic Oceans, the Suez Canal cutting the distance between Europe and the Far East, and the Welland Canal linking Lake Ontario with Lake Erie (and the St. Lawrence Seaway), avoiding the Niagara Falls by lifting vessels over the height of nearly 100 m. All this to expand the navigable waterways and connect basins for the purpose of water transport.

Such large scale changes may induce direct alterations as well as indirectly impact the natural or previously created waterways. Man-made water reservoirs cause abrupt breaks in the traffic, and may cause damaging alterations to the transport conditions downstream the dam, as well as the balance of the sweet water fauna and flora. To mitigate this kind of negative impact, a lock chamber is often used to open new waterways or substitute natural ones. Nevertheless, it still may prevent fish from migrating if it does not provide an alternative upstream fish-way, which guarantees a sufficient flow of water. Enhancing waterways by straightening a meandering riverbed often results in changed and intensified sedimentation, thus indirectly changing the conditions for shipping. Such conditions may vary over time and be monitored using the potential fields, through their impact on vessels’ behaviors.

These changes to nature are made because of the presence of a land barrier and too shallow or too narrow passages limiting the throughput, freedom, convenience etc. of the transport over water. The need for them, for centuries, was either a common knowledge based on the experience of those privy to the secrets of the seas, ranging from sailors through world traders, to trade and transport strategists. In the current age of digital water traffic registration, computing techniques are able to aid that process. The method based on potential fields, described, demonstrated and analyzed in this study, provides insights into vessel traffic that could be used as an aid or base for projecting maintenance needs and future needs for development of waterways.

The visualized potential fields show the intensity, type, and the time, course and speed related behaviors occurring in the traffic. The selective potentials display very distinctive patterns, which allow the observation of the specifics of the traffic. Thanks to the decay factor, the method also enables the observation and analysis of the changes in traffic trends over time. Such observations may result in the identification of undesired or unexplained behaviors in the traffic, as well as new trends that need to be accommodated. A simple example could be, where vessels commonly change their paths to avoid an unknown obstacle (e.g., a recently sunken ship), which should be removed or properly marked. It is possible to spot increases and decreases in traffic intensity, and examine their underlying causes using more specific potential field views. Another possibility is to observe the changes in speed. A common slowdown in traffic may indicate the presence of obstacles, degradation of the waterway, intensified traffic or a “jam”, common violations of the law, etc., which can be made visible and susceptible to analysis by the use of potential fields.

As mentioned, some changes in traffic may be desirable or inevitable, and need to be taken into account. One such change may be the emergence of a new anchorage area, implying insufficient capacity of legitimate docking and anchorage areas. The observation of such behavior is also possible using the potential field based method. The answers to this kind of traffic issues may be, e.g., to increase the proper anchorage area capacity, or to regulate the new anchorage area in an organizational and legislative manner. If the newly emerged behavior is highly undesirable, there may be a need to strengthen the local law enforcement and introduce penalties for violating traffic rules. All along, the changes to traffic patterns, the emergence and decay of water transport trends, can be observed using potential fields. In that way, the STRAND tool can be of great use for waterway development and maintenance.

Furthermore, the adjustable resolution of the traffic modeling and anomaly detection have the potential to provide insights into traffic over water on a very detailed scale. It is possible to notice the symptoms of inefficient, insufficient or improperly used infrastructure. In ports with a large cargo flows the efficiency of docking, loading and unloading, as well as arrivals and departures is crucial for the proper functioning of the port. Unwanted behaviors spotted either as new trends modeled by the potential fields, or as specific anomalies detected using the normal traffic model, may merit immediate reactions. Therefore our method may be of value for port authorities as well.

In river systems, harbor areas and open canal networks, the transport of water often varies either unconditionally or predetermined depending on local regulations, heavy rainfalls or temperature shift. Traffic conditions are also affected by topographical conditions, windy weather, or time of day. All of these conditions are not visible on a sea chart or by plotting the AIS data. Instead, the skills of the sailors involve both: handling navigational instruments, and making informed, knowledge-based decisions. For the transport over water we investigate three different benefits of using potential fields: visualizing navigational skills, acquiring an overview of actual frequented waterways, and detecting anomalous behavior.

An extended analysis of ship navigation and its actual practice aboard large ships in a naval setting has been the subject of Hutchins’ study [5]. The observations in his study involve distributed ship navigation composed of multiple crew members, navigational instruments, water and environmental conditions. Cognitive skills observed in the actual practice aboard, referred to by Hutchins as cognition in the wild, constitute the human knowledge.

The ultimate outcome of the efforts of a crew, i.e., how their ship makes its way, is reflected by its AIS positions. Navigational skills are monitored and visualized by the concept of potential fields. Therefore, distributed cognitive processes in a ship may partly be described by potential fields, because of the effects of cognitive skills embedded in the AIS. The visualization takes into account strength and distribution of the potential field during a specific period of time. A decay function guarantee a flexibility over time; varying long term conditions affect the result by favoring newer measurements over older. In that interpretation, the developed method corresponds very closely with Hutchins view on learning, paraphrased here:

I look at learning or conceptual change as a kind of local adaptation in a larger dynamic system of coordinations of representational media.

Current tools that visualize the AIS system give an overview of vessels encountered in a given area, including, e.g., type of ship, speed and position but also the previous route and destination. A disclaimer is that some of the data must be inserted manually and the AIS equipment may deliberately or accidentally be out of service. So a perfect overview of the actual traffic cannot be guaranteed. Adding visualization of the potential fields as a complement to the current AIS tools will improve the overview because it takes into account data of past experiences.

Potential fields may act as a way of planning the actual route to a destination. Waypoints may deviate and a certain velocity may be preferred depending on time of day, weekday and time of the year. One obvious way of detecting anomalies is to follow deviations from the route and the course of the ship. By plotting transmitted waypoints it is possible to detect previous positions and the proportion of vessels that visited the same waypoints in the past. In the same way, by plotting courses and speed it is possible to detect more volatile changes in the positions done by the vessels. By adjusting the grid size to the specific conditions of transport in river and open canal networks, it should be possible to fine tune the anomaly detection facilities. We found preferred grid sizes varying from 300–1,000 m in the open sea, 60–200 m in the harbor case to 30–100 m in the investigated river case. Also, the number of proposed anomalies decreases to about 10 % of all observations or less, compared to about 15 % in the open sea and about 25 % in the harbor case.

For both harbor and river cases high speed generates a fivefold or more increase in the number of anomalies compared to low speed. In the harbor case, turning left or right has about the same amount of anomalies for the investigated grid size. In the river case, turning left causes around 50 % more anomalies than turning right. Traveling along a river is similar to traveling on a motorway, the driver needs to follow traffic rules, especially the right-side traffic rule and speed limits. The generated potential fields discover these rules automatically. For both cases: avoid traveling too fast because of heavy traffic and narrow passages. For the river case: avoid shifting towards oncoming traffic, i.e., turning left.

Although real incidents, i.e., requiring the reaction of the authorities, represent a vanishingly small percentage of all alarms, the potential fields monitor the actual behaviors, without any knowledge introduced in advance, i.e., are able to detect anomalous incidents in advance. Even though there will be a lot of false alarms (and most likely some real incidents will go undetected), for a supervisor, monitoring the actual traffic, introducing a tool for visualizing the potential fields would still facilitate traffic surveillance.

The AIS is an open system for the automatic identification of all marine traffic (as well as airplanes) live and in a real world setting. In the majority of marine traffic visualization tools available online, traffic is by default displayed as the plot of markers representing vessel positions symbolically marked on a two-dimensional map. Many of those pages enable viewing most recent position history for specific vessels, but none visualizes data of previous vessels no longer present at the sea chart. Therefore, the majority of visualization of AIS and other positioning equipment data, offer at best the viewing of the current state of the global fleet. The STRAND tool, computing and visualizing the potential fields, starts with a blank sheet, and fills it with accumulated traffic data recorded by AIS. The most frequented waterways become visible as lines and spots of intensive potential. The contours of the river banks or quays in the stream become visible as the lack of data instead of a pre-drawn template. All unexpected events, such as entering a not previously visited position or sailing with an unusual course, are regarded as anomalies.

The introduced anomaly detection tool, STRAND, should be regarded as a visualization tool for a human expert, introducing some novel skills compared to traditional equipments like radar and GPS. There are various potential benefits and practical applications of the method, depending on the user. From a ship navigator point of view, the display of patterns of correct or normal behavior, in this simple and clear form, aids the choice of the safest and most optimal path. From a traffic safeguarding perspective, the anomaly detection based on potential fields may help quickly and comprehensively inspecting possible traffic incidents. Finally, from the authorities’ point of view, the clear overview of traffic may help recognize traffic regulation and legislation issues, as well as aiding the process of waterways development and maintenance.

The investigation results suggest that it is possible to optimize routes using potential fields as a way of planning the actual route to destination, i.e., based on past experiences. Waypoints may deviate and a certain velocity may be preferred depending on time of day, weekday and time of the year. This may act as a planning tool subject to direct observations.

7 Conclusions and Future Work

This chapter described the successfully implemented method based on potential fields. The STRAND prototype system allowed to demonstrate the modeling capabilities of the method and optimize the detection performance. In two investigated study cases, the system has been shown to successfully model traffic and perform anomaly detection with results dependent on the geographical grid resolution. The visualization of the potential fields displayed a comprehensive representation of various traffic patterns, which allowed for an observation of trends in traffic flow and its regulation. The analysis of detection outcomes led to identifying the optimal grid sizes for each of the cases, and furthermore, resulted in an observation of the asymmetry in course-based detection, suggesting a right-hand sailing rule.

The visual and quantitative analysis based on the STRAND system demonstrated the applicability of the potential field based method. Nevertheless, many research goals remain to be addressed in future work. A major challenge is to conduct a valid quantitative study analyzing the detection performance of the method and its implementation. In the face of the lack of labeled AIS or radar data sets, and maritime anomaly detection benchmarks, an interesting performance study could be based on a combination of real AIS traffic data and information about incidents (e.g., from coast guards incident reports or IMO marine incidents database), which would be treated as labeled anomalies. Another challenge is to define and publish a first proper labeled data set with maritime traffic anomalies, to enable reliable comparative performance studies.

Another research direction left for possible future work is addressing the need for different modeling resolution (grid size) in different areas, depending on the intensity of the traffic and complexity of the waterways. A related issue is the distance represented by a unit of longitude, cosinusoidally decreasing with the increase of latitudes from the equator to the poles. In the examined case, the precision of longitude units is approximately halved (cosine at the latitude of ca. 55), in order to prevent the latitude-longitude inequality in detection sensitivity.

There are various potential benefits and practical applications of the potential field based traffic modeling method, depending on the user domain, ranging from ship navigation, through traffic safeguarding and waterway maintenance, to legislation. When thoroughly examined, tested and optimized, the method and its implementation could be used as an object of a user study, and deployed as a real-life application, either addressing a specific user group characteristics and requirements, or as a generic maritime information system, supporting the maritime domain awareness in a broader sense.