Introduction

Plastic debris in the natural environment is a well-known fact (Eriksen et al. 2014; Newman and Crawley 2014; Suaria and Aliani 2014), which is becoming more acute as the yearly production of plastic materials has ballooned from 322 million tons, in 2015, to 348 million tons, in 2017 (PlasticsEurope 2017, 2018a, b). Because waste management efforts across the world are currently not satisfactory, the virtually insatiable demand for plastics generates huge volumes of waste. Not surprisingly, more often than not waste finds its way from users to the physical environment, the coast and the oceans. Wind, effluent water, shipping activities, natural disasters, hurricanes, and many other means can carry waste from the mainland and shorelines to the seas, while the final destination of effluent solids is a matter of debate (Andrady 2011; Cózar et al. 2015; Jambeck et al. 2015; Woodall et al. 2014). An example of a natural disaster is the tsunami that struck Japan in March 2011. The Japanese government estimated that the tsunami washed 5 million tons of litter out to the sea as the water receded (National Oceanic and Atmospheric Administration (NOAA) 2015). Once objects reach the sea, water currents and surface waves can distribute them to various geographical areas be they costal zones, the oceans, submarine locations, or even at the deepest seabed (Corcoran et al. 2009; Pierdomenico et al. 2019). At the sea surface, the five subtropical gyres, namely, the North and South Pacific Ocean, the North and South Atlantic Ocean, and the Indian Ocean, feature the highest concentration of floating marine debris (Cózar et al. 2014; Lebreton et al. 2018).

According to Lebreton et al. (2018), 79 thousand tons of plastic debris over an expanse of 1.6 million square kilometres floats at the sea surface. Interestingly, the Mediterranean Sea harbours a similarly high concentration of floating plastics, which is estimated to be between 1000 and 3000 tons (Cózar et al. 2015). Other efforts to trace open ocean geographical hotspots where an abundance of marine debris is more likely to congregate were conducted with the aid of multistage modelling and remote sensing techniques (Mace 2012). To this end, the study utilised a combination of models, satellite, radar, and multispectral data and airborne remote sensing tools. In parallel with their increasing abundance, as time passes by, plastic materials in the marine environment decompose into smaller fragments, known as microplastics and eventually into nanoplastics. Microplastics measuring 5 mm or smaller in size, whose ubiquity in the oceans is well documented, in many cases can prove fatal to marine life (Andrady 2011). When ingested, plastics can fill the stomach of marine creatures (Barboza et al. 2018; Lusher et al. 2015), and owing to their indigestible nature, they can cause death by starvation (Cole et al. 2011; Nelms et al. 2015). Recently, the toxicity of plastics was recognised and reported in some micro-organisms but pending further scientific verification (Gallo et al. 2018). The epidemic of plastics entering the sea warrants urgent action if humanity is to stave off a collapse in fish stocks. The matter is of paramount importance considering that 3.2 billion people depend on fish for 20% of their mean per capita intake of animal protein (FAO 2018).

Given the impact of plastics in the marine environment, it will be prudent to include the concentration of microplastics as one of the Blue Flag beach criteria. Presumably, this idea will prompt sunbathers to take steps towards limiting the release of plastics at beaches. Meanwhile, larger size plastics, also known as macroplastics, should attract the same attention as microplastics do because their accumulation across the world’s seas progressively increases with time. Accurate estimates of the amount of plastic litter in the marine environment are hard to come by. Therefore, the development of new methods for the detection, classification, and collection of floating marine debris is critically important. Such methods could capitalise on artificial intelligence (AI) that is characterised by some unique capabilities already demonstrated in other applications.

Deep learning (DL) tools, which fall under the AI family, have been applied in a plethora of areas including medicine, decision-making, government, and others (LeCun et al. 2015). AI techniques have successfully attempted many unsolved problems lending superb performance to applications from image recognition (Farabet et al. 2013; Krizhevsky et al. 2012; Szegedy et al. 2015; Tompson et al. 2014) to the reconstruction of brain circuits (Helmstaedter et al. 2013) to speech recognition (Hinton et al. 2012; Sainath et al. 2013) and the analysis of particle accelerator data (Ciodaro et al. 2012; Mikolov et al. 2011). The high accuracy that AI tools can provide inspired the authors to develop an image classifier tailor-made to detect the main types of marine debris—especially plastic debris—across the world’s oceans and beaches. Initial attention focused on identifying floating macroplastics on or near the surface of seawater (Kylili et al. 2019). In this line of investigation, a more sophisticated method is proposed, which is capable of discerning plastic debris scattered at coastal areas and the shorelines. One of the primary objectives was to develop a robust and flexible image classifier able to categorise a diverse spectrum of plastic litter. To attain the task, the AI tool used here is a convolutional neural network (CNN), which employs the bottleneck method (BM). More details about the BM are explained in “Method,” which describes the approach.

Related work

Detecting marine debris floating at the surface of the sea was until recently a manual task. Prevailing methods for identifying and classifying marine debris are predominantly conducted by humans. Firstly, debris is collected using surface net tows (Galgani et al. 2013) that are hauled by boats (Barnes et al. 2009; Goldstein et al. 2013; Ruiz-Orejon et al. 2016). Once retrieved from the sea, these items are sorted out manually and ranked into distinct object categories. Hence, the sampling process is rather limited in scope, time demanding and entails substantial human input to deal with the various steps. Moreover, the sampled areas tested by these methods are, however, small relatively to the sheer size of the zones, which harbour marine litter, while the deployment of marine vessels for expeditions is accompanied by appreciable financial expenses.

Encouraged by the need to automate the process while gaining more insights as to the distribution of floating objects has led to the invention of a new breed of methods for collecting and categorising marine debris. Autonomous vehicles remotely controlled by humans, drones, and cameras mounted onboard ships are some of the means deployed to-date to tackle the problem of plastics at sea. Aerial surveys using drones fitted with high-definition cameras filmed coastal areas that are not easily accessible by humans (Moy et al. 2018). Recorded airborne footage obtained during these missions was later collated, using image processing tools, to create a mosaic of images. Subsequently, researchers manually identified possible marine debris captured in the combined mosaic images. Even though drones are popular, their limited flight time severely restricts their reach.

One initiative intended to automate the classification and recognition of waste in the environment is the Floating Litter Monitoring Application (FLM App), which guides observers to upload pictures of debris floating in rivers, or at beaches, and tag the images from a drop-down list, which displays various object categories (González-Fernández and Hanke 2017). The intention of the FLM App was to create a large database of labelled images of debris and to use them to train an image classifier, utilising machine learning techniques, so as to automate the labelling process. But no evidence has been published to-date to prove that this method was further improved. The research team of Ge et al. (2016) proposed a partially unattended method for recognising marine debris on beaches. Team members used the remote sensing technique of light detection and ranging (LIDAR) and the support vector machine (SVM) classifier to categorise marine debris into four general categories: (1) plastics, (2) paper, (3) clothes, and (4) metallic material. This work is a semi-automatic technique and requires a lot of post-processing to categorise marine debris into their respective classes. Other devices, such as ultrasonic sensors, can equip robots to perform indoor autonomous trash detection (Kulkarni and Junghare 2013). Their robot estimated the position of the nearest stationary trash (aluminium can) using two ultrasonic sensors. Subsequently, the robot was automatically instructed to approach the nearest trash.

Besides the detection of waste on coastal areas and the shorelines, research teams are working on detecting and identifying trash underwater with the help of autonomous underwater vehicles (AUVs). Carefully examining the recorded videos, an observer manually extracts image frames and constructs an image-set consisting of “plastic” litter. Next, these snapshots were used to train a CNN-based trash detector. Finally, the authors applied the garbage detector on new videos featuring underwater marine debris and assessed the performance of their technique in terms of categorising three possible categories. These comprised “plastic,” which refers to marine debris made of plastic, underwater “remotely operated vehicles” (ROVs) that are man-made devices, and “bio” that include all visible marine life (Fulton et al. 2019). By being able to detect only three categories of objects, the trash detector exhibits limited capabilities. Differentiating among the plastic items will offer new insights as to the level of pollution affecting the sea floor. Deployed with AUVs, forward looking sonars (FLS) have also shown promise in detecting submerged marine debris (Valdenegro-Toro 2016).

This paper presents an image classifier created using a DL method, which can identify images of plastic debris in the marine environment, that is, at the shoreline and the seawater. More specifically, the CNN-based classifier can distinguish between six types of plastic debris and one type of marine life. Moreover, it is also able to recognise objects that are neither plastics nor marine life and rank them in the “other” category. Summarising, the contributions of this paper comprise (a) an image classifier that can distinguish between eight types of objects: six types of plastic debris, one type of marine life and other items, and (b) a method capable of recognising litter encountered in the marine environment either on the shoreline or the sea.

Method

Overall framework

To take advantage of prevailing sizeable datasets tested on existing classification tasks as well as to apply a particular CNN architecture whose efficiency has been proven, we have adopted the bottleneck method (BM). As a way of saving time and conserving computational resources the approach adopted in this research was to exploit the features’ extraction layers of an existing CNN. Making use of the image features assimilated from a sufficiently large image database obviates the need to assemble a large database dedicated to the specific classification task investigated here. These learned features are valuable for a variety of computer vision problems because they yield a higher level of accuracy that would have been otherwise attained only by relying on available data.

Applying the selected CNN, in this case the VGG16, the approach described above has produced a very high performance. In short, the motivation was to activate the convolutional part of the model, which does not include the fully connected layers. Subsequently, this part of the model was applied once on the training image dataset while logging the output of the bottleneck features. Finally, the proposed fully connected model, formulated on the classification task of identifying marine debris, was trained on the already stored features. Summarising the complete framework, Fig. 1 depicts the creation of the classifier.

Fig. 1
figure 1

Overall framework of the proposed marine debris image detection methodology

The VGG16 model

The CNN technique adopted in this research is the VGG16 model. Structurally, the algorithm consists of 16 layers and was pre-trained on millions of images from the ImageNet database—a database of over 14 million images divided into 1000 classes (Simonyan and Zisserman 2014). Because important image features have already been learnt during the pre-training phase, the model can be applied to other classification tasks, such as discerning marine debris, while realising an enviable level of accuracy.

A schematic representation of the VGG16 model is summarised in Fig. 2. The block diagram depicts how an image inserted into the VGG16 model is processed so as to extract indispensable image features. Subsequently, an image is fed into the Convolutional Block 1, which consists of convolutional layers and max-pooling layers. A convolutional layer is responsible for organising the units of an image into feature maps, while a max-pooling layer merges the semantic features of these units into a unified feature map (LeCun et al. 2015). Next, an image propagates through all of the convolutional blocks until Block 5—just before the fully connected layer. Here, the fully connected layer is constructed by one flatten layer and three dense layers. The flatten layer reduces the dimensions of the feature map into a single column that is passed to the fully connected layer, whereas the dense layer ties the fully connected layer to the neural network. Henceforth, the layer just before the fully connected layer extracts the “bottleneck features” from the customised dataset.

Fig. 2
figure 2

Schematic representation of the VGG16 model, which consists of five convolutional blocks. The extraction of the image attributes just before the fully connected layer refers to the “bottleneck features”

The BM optimisation procedure

The optimisation procedure of the BM image classifier is formalised by means of an iterative descent of gradients in the loss function quantifying, thus the error in predictions (weights). As an approximation to the true gradients, image processing utilised the mini-batch stochastic gradient descent with the Adadelta learning rate. The Adadelta learning rate method for gradient descent was selected after a thorough investigation indented to identify a suitable optimiser capable of executing the image classification tasks described in this paper. More details regarding the adoption of the Adadelta method are elaborated in Kylili et al. (2019). The loss function for a batch of N samples was derived from the categorical cross-entropy loss function:

$$ \mathrm{Loss}=-\frac{1}{N}\sum \limits_{i=1}^N\sum \limits_{c=1}^Cq\left({x}_{i,c}\right)\log p\left({x}_{i,c}\right) $$
(1)

where N is the total number of samples and C is the total count of classes. Term q is an indicator factor, which assumes a value of 1 only if sample xi,c belongs to its category c, else it is assigned a 0. Parameter p is the estimated probability produced by the model for sample xi belonging to category c. Probability p is obtained from the “Softmax” function:

$$ p{\left(\mathrm{x}\right)}_c=\frac{\exp \left({x}_c\right)}{\sum \limits_{j=1}^C\exp \left({x}_j\right)} $$
(2)

where probability p is a normalised exponential that accepts as input a C-dimensional vector x and generates as output a C-dimensional vector p of real values ranging between 0 and 1. Term xc refers to the elements of vector x.

Dataset

The dataset used in this study consists of eight (8) categories of objects: six types of plastic debris, one type of marine life, and one category labelled as “other,” which comprises articles such as boats, shipping containers, rocks, and swimmers. Plastic bottles, plastic buckets, plastic bags, fishing nets, plastic straws, and food wrappings make-up the six categories of plastic litter, while the flying fish refers to marine life. The images were mainly retrieved from ImageNet, a vast online database of images categorised into various classes (Krizhevsky et al. 2012). Non-profit Algalita has also kindly provided us with marine debris images and videos acquired during their boat expeditions dating in 2014. Figure 3 displays a sample from these images. Images available in the ImageNet dataset may not be suitable for autonomous systems. However, the use of transfer learning has demonstrated that such datasets are extremely useful for autonomous systems or realistic marine debris detectors. Here, the VGG16 was pre-trained on the ImageNet database, which consists of 14 million images subdivided into 1000 classes, to assimilate generic image features. Followed then, another image dataset consisting of marine debris images at the sea and the shores as well as images belonging to the “other” category was used to train the BM image classifier (Kylili et al. 2019). This is related to the fact that this dataset, used for training the classifier through transfer learning, consists of images that were retrieved from different sources (ImageNet, Algalita) that are comparable to a collection acquired by an autonomous image acquisition system. The validation and test steps were performed on separate datasets that differ from the training set and were retrieved from various sources similar to what one could expect from an autonomous image acquisition system (Algalita 2014, National Oceanic and Atmospheric Administration (NOAA) 2018).

Fig. 3
figure 3

Examples of images retrieved from the initial image-set. Object a is a plastic bag and item, b is a plastic bottle floating at the sea surface, respectively. Body c is a partly submerged plastic bucket, while image d depicts a food wrapping in seawater. Label e pictures a ghost fishing net about to be recovered on board a boat. Image f displays plastic straws, while g shows a flying fish. Last, h is a motor boat, an example of an object drawn from the “other” class. Courtesy: ImageNet (Krizhevsky et al. 2012)

This research constitutes a major improvement of previous efforts, which dealt with the classification of plastic debris floating at the sea surface (Kylili et al. 2018; Kylili et al. 2019). Progress relates to the BM structure where the authors examined in detail some internal parameters that increase the classification accuracy of the proposed method. For example, the number of epochs was increased to 50, the batch size was reduced to 5, the regulariser 1_2 value was set to 0.001, while the percentage of images utilised in the training and testing sets was kept constant at an analogy of 80:20. Presumably, the modification that influenced the most the performance of the proposed image classifier was the number of images dedicated to the mini-batch. Compared with earlier research contributions of Kylili et al. (2018) and Kylili et al. (2019), the changes to the preceding parameters yielded a 4% improvement in the ability of the BM to identify marine debris and other categories.

Both the structure and the inner workings of the BM technique remained essentially unaltered during these modifications so as to permit the comparison between the different experiments. An additional contribution of this work was the enrichment of the image dataset. All new images added to the expanded dataset were carefully selected so as to ensure a high correlation of the object portrayed in the image with the function of the proposed BM image classifier. Moreover, the new images, which expanded the image-set, were of high resolution. In parallel, images, which exhibited lower resolution or did not clearly depict marine plastics, were discarded. Consequently, being more selective with the images resulted in a reduction to the number of images, in each object class, from 250 to 200 images.

While conducting various experiments, it became apparent that the image resolution of the training and testing sets played an instrumental role to the ability of the BM classifier to discern specific marine objects. Taken together the enhanced quality and better images partly explain the improved performance of the image classifier. Suffice to mention that all of the marine litter was either floating on the sea surface or rested about in the marine environment. Furthermore, this dataset contains images of plastic debris encountered at the shoreline. This enrichment renders the proposed BM image classifier suitable for identifying debris in different settings. Another contribution of this research pertains to the diversification of the number of marine debris categories from three to eight supplemented also by a new class on marine life. Notably, the addition of the “other” category enhances the usefulness of the proposed method by rendering it able to distinguish marine debris from some unrelated objects that can be encountered at the coast.

Concurrently with broadening the number of categories, one would expect a deterioration in the aptitude of the classification technique to recognise marine bodies at the expense of greater computational effort. This is reasonable as the performance of an image classifier is strongly correlated with the pool and the quality of the images contained in the new categories. But if the new dataset is comprised of high-resolution images that display good representations of marine plastics, the performance of the CNN technique can improve. In contrast, if the recalibrated dataset is made-up of low-quality images it is expected to adversely affect the performance of the BM.

Collectively, the initial image-set features eight categories of objects each containing 200 images. The limited number of images seems not to affect the performance of the proposed image classifier, as we have expanded the original dataset through the use of data augmentation (DA) manipulations. These manipulations apply different types of changes to the original image dataset, either through geometric modifications or changes in intensity or both. Experiments conducted by the authors in the past have revealed that increasing the number of augmented images can result in a higher classification accuracy. Nevertheless, there is a limit in the number of manipulations that an image maybe subjected to. In this case, DA adjustments generated almost 25 new images from every image in the original image-set. Hence, the final image-set contains 4000 images in each object class with the total number of images amounting to 32,000. Both the image dataset used to train our approach and its code have been partly released on our website (www.carbonlab.eu).

Training set and testing set

The final image-set was then divided into two subsets: the training and the testing sets. Of the two, the training set was used to train the BM image classifier and the testing set to assess its performance. Using the splitting function, 80% of the images of the final image-set was assigned to the training set and the remaining 20% to the testing group. The selection of the 80% and 20% proportion of images, respectively, derives from a detailed study, which aimed to enhance the performance and reliability of the BM tool (Kylili et al. 2019). Each training category contained 3200 images, while discrete testing classes featured 800 images. In aggregate, the total training samples comprised 25,600 images, whereas the overall testing samples consisted of 6400 images.

At the end of the training process, the training accuracy and the testing accuracy were obtained from:

$$ \mathrm{Accuracy}\kern0.5em =\kern0.62em \frac{\mathrm{TP}\kern0.5em +\kern0.5em \mathrm{TN}}{\mathrm{TS}} $$
(3)

where letters TP refer to the true positives, which are the positive samples whose predicted class has been correctly matched. Acronym TN refers to the true negatives, which are the negative samples whose actual class is their negative predicted class. Finally, TS are the total samples that have been tested by the BM image classifier.

Validation set

In building a new image database called the validation set, a number of new images were collected for the eight categories. This set was used to validate the performance of the BM classifier. The difference between the validation set and the previous sets is that the BM image classifier never encountered or trained on these validation images before. These images were processed through the BM for the first time allowing us to map out its capability of generalising, which means to correctly classify the input images (validation accuracy).

Results

This section outlines the results obtained from the training, testing, and validation of the BM image classifier. The aim was to boost the capability of the BM technique in achieving a high level of prediction as it pertains to the class where the input image belongs. “Qualitative and quantitative evaluation” describes the qualitative and quantitative evaluation of the classifier, while “Validation results” outlines the validation results. All calculations were executed on an Intel® Xeon® machine equipped with an Intel CPU Core E5-2630 v3 (2.40 GHz) with 48.0 GB of memory (RAM) and an NVIDIA Quadro K4200 graphics card, clock-rated at 784 MHz with 28.6 GB of memory.

Qualitative and quantitative evaluation

After assigning the network weights owing to the Rectified Linear Unit (ReLU) initialisation, we used stochastic optimisation relying on the Adadelta optimiser with a learning rate of 0.001. Practically, ReLU is an activation function that is linear for input values greater than zero. In this case, the output value is equal to the input value. However, for negative input values ReLU behaves as a non-linear function. Under these circumstances, the output value always yields a zero value. Mathematically, the ReLU activation function can be expressed as g(z) = max {0, z}, where z is the input value (Goodfellow et al. 2016). Additionally, regulariser 1_2, that penalises the weights in the learning process and helps improve the final accuracy of the method, was implemented. Regulariser 1_2 is a combination of the Ridge (1) and the Lasso (2) regression methods, and it was selected following a detailed analysis, which identified the regulariser that best fitted the accomplished classification task. According to Kylili et al. (2019), regulariser 1_2 produced the highest validation accuracy when compared with regulariser 1 and regulariser 2 separately or in the absence of a regulariser.

Furthermore, the BM divides the training samples into groups of images called mini-batches. For each mini-batch, the algorithm calculates an error and updates its internal parameters (weights). With the creation of all mini-batches, an average error is computed for the training and the testing process of the model. This error decreases as the number of epochs, that is the number of times that the whole training set is processed through the algorithm progresses. By increasing the number of epochs, one would expect an improvement in the training and testing accuracy and loss of the BM image classifier. Based on these observations, we examined the following two scenarios in order to evaluate the performance and trustworthiness of the BM image classifier.

The first scenario evaluated the performance of the BM image classifier by varying the number of epochs. The second scenario assessed the performance of the method by altering the number of images allocated to the mini-batch. Regarding the first scenario, two cases were investigated during which the BM algorithm was permitted to execute (1) 6 epochs and (2) 50 epochs. For this investigation, 10 images were assigned to each mini-batch. Findings pertaining to the training and the testing performance of the 50 and the 6 epochs are displayed in Figs. 4 and 5, respectively. Closer inspection of Fig. 4 a and b, for 50 epochs, offered inspiration to further examine whether the BM performance past epoch 6 (cross-over between training and loss) embraces some unwanted training image features. These traits were reflected by the fluctuations in the testing loss (Fig. 4a). To appraise this assertion, the BM was tested by permitting it to progress until epoch 6. As illustrated in Fig. 4 a and b, 50 epochs scored a training accuracy of 98% at a training loss of 0.10. The testing accuracy of the 50 epoch run (Fig. 4b) peaked at 95% at a testing loss of 0.20.

Fig. 4
figure 4

a Displays the model training and testing loss curves, while b shows the respective accuracy curves after 50 epochs

Fig. 5
figure 5

a Presents the training and testing loss and b shows the accuracy curves, respectively, for 6 epochs

Meanwhile, the 6 epoch run (Fig. 5 a and b) produced a training accuracy of 93% with a loss of 0.25. Likewise, the testing accuracy dropped to 92% at a loss of 0.29. Comparing the two cases of the first scenario, the BM, which was permitted to run for 50 epochs, yielded higher training and testing accuracy and a smaller error in relation to the 6 epoch computation. Clearly, the BM image classifier improves itself progressively on every epoch, and if the number of epochs is adequate the overall accuracy would progress up to a certain level. Results from the 50 epoch case are comparable to the findings of a previous investigation during which the image classifier was built on three categories of marine debris, namely, plastic bottles, plastic buckets, and plastic straws. For the record, the training accuracy attained in previous research by the authors was almost 100%, while the testing accuracy amounted to about 99% (Kylili et al. 2019).

Referring to the second scenario, three cases were devised to assess the performance of the BM image classifier. Collectively, the three cases, which featured 5, 10, and 16 images (batch size), made-up the mini-batch. For all cases, the number of epochs was set to 50. The performance of the proposed image classifier, using different batch sizes, is depicted in Fig. 6. Since the training loss and training accuracy for all batch sizes were comparable, only the respective testing loss and accuracy are presented. The top graph of Fig. 6 shows the testing loss for each batch size, while the bottom graph of Fig. 6 depicts the corresponding testing accuracy. Batch size 5, as demonstrated in Fig. 6a, exhibits the highest loss of 0.22 compared with the other two batch sizes of 10 and 16, which yielded a testing loss of 0.20 and 0.18, respectively. Inspecting Fig. 6b, which displays the testing accuracy, it is evident that for all batch sizes the testing accuracy reached a peak of almost 96%.

Fig. 6
figure 6

a Shows the testing loss of the BM image classifier for three different batch sizes: 5, 10, and 16. b Presents the testing accuracy of the method for all of the cases

Summarising, the 50 epochs and batch size of 16 generated the highest training and testing accuracy, while it produced the lowest loss of about 0.18. At the same time though the validation accuracy is the most important parameter, which governs the method’s performance and mirrors how well the learning process fared during the training process.

Validation results

The performance of the BM image classifier was evaluated on the validation set, which the classifier never encountered or trained at before. The validation accuracy was computed from

$$ \mathrm{Validation}\ \mathrm{Accuracy}=\frac{\mathrm{CP}}{\mathrm{TS}} $$
(4)

where CP are the correct predictions, which are the validation samples correctly identified by the image classifier and TS is the total number of validation samples. As it can be observed in Fig. 7, the BM image classifier was able to discern the different categories of marine debris and label each image with its respective class. The classifier successfully assigned the correct label to an image even though some of the images contained other distractions, such as the chain shown in the “Net” snapshot (Fig. 7). Besides correctly identifying marine debris and other images, the BM image classifier has at various instances falsely recognised objects such as those depicted in the collage of Fig. 8.

Fig. 7
figure 7

Some examples of the successful recognition of images, at the coast and the sea, conducted by the BM image classifier. Labels display the correct items

Fig. 8
figure 8

Some examples of the unsuccessful recognition of images conducted by the BM image classifier. Labels denote false items

Regarding the first scenario, the bar chart of Fig. 9 summarises the performance of the BM obtained from 400 validation samples or 50 images for each class. With a validation accuracy of 88.7% and 89%, respectively, the 50 epochs and the 6 epochs realised an almost identical performance. Examining each class separately, it is evident that for the “Bucket,” “Fish,” and “Straw” categories the 50 and the 6 epoch cases yielded the same number of correct image identifications. Tests on images featuring bottles produced almost the same results with a difference of ± 1 image, as illustrated in Fig. 9. In relation to the plastic “Bag” and “Wrap” (Wrapping), the discrepancy in identifying the objects stands at ± 4, while for “Net” and “Other” it ranged between ± 5. More specifically, the BM image classifier having completed 6 epochs generated the highest validation score for the “Bag” and the “Net,” while for the “Wrap” and the “Other” classes, the 50 epoch classifier was more accurate. Remarkably, the overall validation performance of both BM image classifiers built on 50 epochs and 6 epochs, respectively, remained almost identical at 89%, which means that 356 validation samples out of 400 were correctly identified. Hence, in this investigation the performance of the BM appears to be independent of the number of epochs as demonstrated by the validation accuracy.

Fig. 9
figure 9

Classification results. Bar chart depicting the correct classification samples obtained for the 50 epochs (blue bars) and 6 epochs (misty rose bars). Overall, both cases scored a performance of 89%, meaning that 356/400 images were correctly recognised

Shifting attention to the second scenario (Fig. 10), it is apparent that varying the number of images in the mini-batch (batch size) yields a slightly different validation accuracy. Of these, batch size 5 produced the highest validation accuracy of 90% or 360 correct classifications out of 400 validation samples. Batch size 10 and 16 generated a slightly lower validation accuracy of 88.7% and 88.2%, respectively. Interestingly, the validation results contrast with the testing loss of the qualitative and the quantitative evaluation (Fig. 6) in which batch size 5 scored the highest loss and yet the best validation accuracy (Fig. 10). Because the performance and trustworthiness of the BM image classifier is better served by the validation accuracy, it is recommended to select the combination of batch size 5 and 50 epochs. Concluding, the recommended BM image classifier attained a training accuracy of 98% with a training loss of 0.13, a testing accuracy of 96% and a testing loss of 0.22 accompanied by a validation accuracy of 90%.

Fig. 10
figure 10

Validation accuracy for varying batch sizes, spanning from 5 to 10 and 16 images. As indicated by the negative trend line, batch size 5 performs better than the batch sizes 10 and 16

Another idea that aimed to test the capabilities of the bottleneck method to recognise plastics at the coast and the sea dealt with an external parameter, namely, the resolution of the images. Image resolution is an important parameter since a high-resolution depiction is characterised by richer information with presumably good quality making it easier for the BM classifier to successfully discern plastic debris. To test how sensitive and effective the image detection technique is to resolution, it was decided to vary the resolution of the images in a systematic manner. In this regard, the resolution (pixels per inch) of the validation set images was gradually reduced by 50%, 75%, 87.5%, and 93.75% of their original size. Subsequently, the truncated size images were processed by the BM image classifier probing in this way the evolution of its competence in telling whether the same images contained plastics.

Results presented in Fig. 11 reveal the total correct classifications as a function of image resolution. Strikingly, the BM image classifier retained the same high performance in distinguishing between the different categories of plastic debris despite a drastic reduction in the resolution of the validation images (Fig. 11). In other words, the algorithm is capable of identifying marine debris even from very poor resolution images. It is only when the resolution of images was scaled down by ¾ (75%) of their original size that the ability of the BM classifier to correctly recognise marine debris deteriorated dramatically. Because the findings of this research study were obtained from images which exhibit much higher average resolution (125 pixels per inch) compared with the reduced size images which suffered a 75% reduction in quality (30 pixels per inch), the results of the deep-learning technique presented herein appear credible.

Fig. 11
figure 11

Resolution dependence. This figure presents the total number of correct classifications on validation images when reducing the resolution by 50%, 75%, 87.5%, and 93.75%. Interestingly, the performance of the BM image classifier retains its capability even with 3/4 of reduction in resolution

Conclusion

The method expounded in this paper is based on convolutional neural networks (CNNs) that utilise the bottleneck method (BM) to create an image classifier for discerning marine plastics at the shoreline and the sea. The BM image classifier that learnt on the training set was evaluated on the testing set with its performance assessed on the validation set. Two scenarios were examined in this study, which gauged the performance of the technique. Initially, the first scenario examined the variation in the number of epochs, while the second scenario varied the size of the mini-batch. Results from the two scenarios demonstrated that the BM image classifier of 50 epochs and batch size 5 achieved the highest validation accuracy of 90%.

Arithmetically, the 90% accuracy means that 360 out of the 400 images featuring plastics or from another category were correctly identified by the BM. Thus, the BM image classifier formulated on these parameters can recognise and differentiate between plastic debris and other objects, like fish and unrelated bodies found in the marine environment, with a very high level of accuracy rendering the method reliable. The fact that the machine learning technique proposed herein can distinguish between plastic debris from marine life and other unrelated bodies render this classifier a powerful tool in the fight against marine litter at sea as well as on land. As the number of categories of marine objects grows, one would expect the performance of the BM image classifier to deteriorate. Apparently, this is not happening in the case presented here since the proposed method not only does it maintain its performance but it fares better in relation to the three object classes (plastic bottles, plastic buckets, and plastic straws) investigated before by Kylili et al. (2019).

Underpinning previous results, the resolution dependence investigation revealed that the quality of the images used in the 8 categories of our investigation was sufficiently high and, therefore, lend credibility to the findings. This is reflected by the fact that even when the images retained only 25% of their initial resolution size, the BM image classifier was still able to correctly pinpoint marine debris. This is an important observation as the BM retains its capabilities to correctly differentiate plastic debris even in inferior quality images. Other than the broad diversity of objects that the proposed BM image classifier can recognise, it can successfully identify debris irrespective of whether they are encountered in the sea or the shore. Depending on the scene in which the object appears, it could trick the image classifier to erroneously recognise an item. However, as evidenced from the findings presented in this study the BM classifier has proved very reliable in identifying plastic objects either in the sea or the coast. Simply put, this performance ranks the BM image classifier as robust and multitasking.

Despite its merits, the BM technique exhibits certain shortcomings. For example, the BM classifier is able to identify marine debris items from still images. That is, the BM can tag the class of the plastic debris featured in a snapshot image scene. One other limitation that is strongly related to the classification accuracy of the BM is that the images, which made the dataset, were manually selected. At the outset, the manual selection ensures the high correlation of the images with the purpose of the proposed image classifier. Only images with a good representation of plastic debris were among those which made it into the dataset. Considering the difficulties of manually selecting the images from various sources, we were bounded by the finite number of available images. Overall, 200 images were collected for each class. With the aid of data augmentation manipulations these images were later on expanded to 4000 per class. Moreover, from the plethora of types of plastic debris encountered in the marine environment, the proposed BM image classifier can classify the eight predominant classes. These comprise six types of plastic debris, one species of marine life, and the “other” class. In the near future, we plan to expand the number of categories that the BM can identify in order for the technique to become even more broad in scope. Currently, the proposed approach does not focus on a specific geographical region, but it is rather generic in nature. Incorporating geospatial information and temporal details in the output result of the BM classifier is another forthcoming goal. For that to happen the original camera or video-camera will have to tag the captured images and footage onsite.

Concluding, the BM image classifier proposed in this study constitutes a significant step to the creation of an intelligent identification system for tracking plastic debris in various settings like the sea and the shoreline. It is worth mentioning that the most common way of sorting out marine debris is manually conducted by humans. Clearly, the method recommended here can automate the process of classifying debris into several categories, giving a high validation accuracy at a moderate computational effort. Owing to the fact that the BM image classifier is fast, rigorous, and reliable renders it a highly promising intelligent tool for recognising marine plastics. Ultimately, it is envisioned that this research work will constitute a leap forward towards mapping with more accuracy the abundance of plastic litter in the marine environment helping humanity craft more efficient strategies in tackling them.

Future work

As already explained, the proposed image classifier was grounded on a CNN technique, that is, the BM classifier. In turn, the BM derives from the VGG16 pre-trained model. Among other future plans it will be meaningful to test the performance of other CNN methods on the same database. One state-of-the-art CNN method that appears promising is known as the You Only Look Once–version 3 (YOLOv3). Preliminary results associated with the application of the YOLOv3 algorithm (Redmon and Farhadi 2018) on the customised image-set produced comparable results to those of the BM proposed herein. Certain attributes that differentiate YOLOv3 from the BM are the multi-object detection in an image scene as well as the (near) real-time object classification and localisation in video footage featuring marine debris. While running more experiments using YOLOv3, it will help to change its internal parameters. Such changes are expected to generate valuable results regarding the technique’s dexterity to identify marine debris. Ultimately, we plan to apply YOLOv3 in real-time on videos so as to study the abundance of plastic debris across the shorelines of Cyprus.