1 Introduction

The word corrosion is derived from the Latin word “corrodere” which means to “wear away” [1]. Corrosion can be described as an irreversible, spontaneous process that acts to cause deterioration in properties of materials (metals and metal alloys) through chemical or electrochemical reactions with the environment. This often leads to reduced life of components, safety concerns, economic and material losses [2, 3]. According to the National Association of Corrosion Engineers (NACE), the global estimated cost of corrosion was estimated as $2.5 trillion. NACE also estimates that about $600 billion could be saved if appropriate corrosion practices were implemented [4, 5]. Due to the extreme losses caused by corrosion, significant research has gone into the various methods of corrosion detection, inspection and protection using novel green inhibitors from products like expired drugs, plastic wastes and more [6,7,8]. There has been increased interest in applying deep learning techniques for corrosion detection with Convolutional Neural network (CNN) been the most widely used models [5, 9,10,11]. CNNs are a type of artificial neural network that have been shown to outperform other state of the art models and have found use for image classification [12, 13], object detection and tracking [14, 15], text detection and recognition [16], speech and natural language processing [17], radiology [18], agriculture [19, 20], etc. The elimination of prior knowledge and the effort needed to pre design the classification features is a major advantage that CNNs have over other deep learning methods [21]. CNN’s can automatically and efficiently extract features directly from the images and hence eliminates the need for manual feature extraction as seen in other machine learning models. With the advent of large image datasets, parallel GPU computing and cloud computing. CNN’s have been shown to achieve human level accuracy in image classification and image segmentation tasks [10, 22].

2 Convolutional Neural Networks—A Brief Introduction

To understand how a CNN works, one must first understand the setup of the CNN architecture. The convolutional neural network architecture is made of the convolutional base and the dense head. The convolutional base is the part of the convolutional neural network that is responsible for extracting all the necessary features from the image data. The convolutional base is made up majorly of convolutional layers (the depth and width varies from model to model) and there are usually some other layers like the dropout layer, activation layers and normalization layers [23]. Immediately after the base section, comes the head section. The head section is the part of the neural network that is responsible for processing the features and performing the required task such as classification and segmentation [4]. The head section of the CNN architecture is made up mostly of dense layers, the dense layer assigns each image to a class based on probability values [1]. Each of the layers and their unique tasks and operations are briefly discussed:

  1. i.

    The Convolution Layer this is the most important layer of the CNN. In this layer, relevant features are automatically extracted from the image without any manual feature extraction. A convolutional layer consists of a set of filters or kernels with trainable weights that perform a convolving operation on the image, to produce a matrix of features. The most common convolution used is the 2D Convolution layer in which a filter or kernel moves over the input image in strides and operates on sub arrays of the input image to produce a 2D matrix of features. It is important to note that, the smaller the strides, the larger the number of features that are extracted from the input data, but this also increases the time and computational resources required to carry out the convolution [23, 24]. Every image can be represented as a matrix of numbers, and a kernel performs a convolution operation on each section of the image and produces an output vector as shown in Fig. 1.

  2. ii.

    The Pooling Layer The Pooling layer reduces the dimensions or size of the feature maps, and extracts only the most useful features from the images. In this layer each sub array of the data is reduced into a single value using a down sampling operation.

  3. iii.

    There are two main methods of carrying out this pooling operation namely: mean and max pooling. In mean pooling the average value of each sub array is taken to be the new value and in max pooling, the maximum value in the sub array is taken to be the new value for that sub array as shown in Fig. 2. This pooling operation is carried out to reduce the computational requirements of the CNN [23, 25].

  4. iv.

    The Activation Layer The activation layer introduces non-linearity into the model. This makes the model capable of fitting curves and non-linear data. The major activation functions used in this layer are, Rectified Linear Unit (ReLU), sigmoid, and tanh. This process is essential because real world data is barely ever linear hence, the model has to be capable of handling these nonlinear situations [26].

  5. v.

    Auxiliary layer Convolutional neural networks are prone to overfitting the training data, hence auxiliary layers are added to reduce the risk of overfitting. One of the most common auxiliary layers is the dropout layer. The dropout layer randomly drops out a fraction of neurons from the model, thereby forcing the model to learn more stable patterns [24].

  6. vi.

    The Fully Connected Layer: These layers are responsible for mapping the features learned in the base to the associated output. In this layer, the feature maps from the previous layers are converted into a 1D feature vector (flattened) which can be used for classification of the input images and other pertinent tasks [22].

Fig. 1
figure 1

Showing the convolutional layers

Fig. 2
figure 2

The pooling layer

To carry out classification tasks using CNNs, the image data is used to train the model by feeding the labelled imaged data to the model. The model extracts valuable features (shapes, lines, contours, edges, etc.) from the image in the convolutional layer, these features are then transferred to the dense layers for classification. The dense layer assigns each image to a class based on probability values [27]. The structure of the CNN layers to study corrosion is shown in Fig. 3.

Fig. 3
figure 3

Structure of a CNN

3 Applications of Convolutional Neural Networks to Corrosion Studies

According to Ahuja et al. [28], the processes involved in using a CNN to study corrosion are:

  1. i.

    Image acquisition these are performed using drones, smartphone cameras, microscopes among other devices [29,30,31].

  2. ii.

    Image Pre-Processing This is the process of cleaning and preparing the images to be used to build the deep learning model. It involves carrying out a series of processes like resizing, image augmentation and data labelling [9].

  3. iii.

    Feature extraction In this stage, the features required from the image are extracted from the images in the head layer. The features contain valuable information from the image data.

  4. iv.

    Classification and Analysis This is the final state of the process, in this phase the features have been extracted and each image is assigned to a class based on the features that have been extracted from it. This phase could also include other corrosion analysis operations.

CNNs have been used to study corrosion in the following domains:

3.1 Aerospace

The most common use of convolutional neural networks in corrosion studies is for classification purposes. Classification involves the use of a CNN model to distinguish between corroded and non-corroded samples using image data. Brandoli et al. [5] used 210 images to build a CNN model that distinguished between corroded and non-corroded images of different parts of an aircraft. The CNN model built in this study had an accuracy of 92.2%. Zuchniak, et al. [32] used 13,000 fuselage images obtained using D-Sight Aircraft Inspection System (DAIS) to build a CNN model. They aggregated the results of multiple teacher models, and used the results to build a student model employed to detect the corroded spots on an aircraft fuselage and rivets.

3.2 Marine and Structures

Cha et al. [33] used a region-based CNN architecture to study corrosion and other structural defects as shown in Fig. 4. In their study 2366 images were used and precision ratings of 83.4%, 82.1%, 98.1% were obtained for medium steel corrosion, high steel corrosion and bolt corrosion categories. In another study by Rahim et al. [31] 146,688 segmented images were obtained using digital cameras. These were used to build a custom CNN model that classified images into corrosion classes and was stacked on a Mask R-CNN model to determine the nature of the structure. The model performed well with an accuracy of 93%. The study by Andersen et al. [34] also used a recursive algorithm to localize exact regions at which corrosion occurred. This study stacked a ResNet50 model for corrosion classification on a Mask-RCNN to localize the exact points at which corrosion occurred. They used 1314 images (comprising 820 corroded images and 494 non-corroded images) captured by a drone. Yao et al. [24] built a CNN model to study the hull structural plate corrosion using 330 images taken with a digital camera. An overlap-scanning sliding window algorithm combined with the AlexNet for HCDR network model was used to improve the recognition accuracy of corrosion areas at sliding window boundaries. Holm et al. [35] used 9300 bridge images in their classification study of corrosion and coating damage on bridges. AlexNet, VGG16, ResNet, GoogLeNet were used in the study. Of the CNNs trained, VGG-16 had best performance with average recall, precision, accuracy and F1 score of 95.45%, 95.61%, 97.74% and 95.53%, respectively.

Fig. 4
figure 4

CNN Model for detecting corrosion damage types [33]

3.3 Oil and Gas

Bastian et.al [36] used 140,000 images from water and oil pipeline videos to create a CNN classifier that was able to distinguish between four different levels of corrosion (no-corrosion, low-level corrosion, medium-level corrosion, high-level corrosion). The authors built a custom CNN architecture for image classification rather than using publicly available image classification CNN architectures. A recursive region-based algorithm was used to determine the exact points at which corrosion occurred. Ejimuda and Ejimuda [9] used CNN to improve corrosion risk management for oil and gas facilities. In this study, 36 galvanic and pitting corrosion images were scraped from the internet. Image augmentation was applied to increase the size of the dataset. The study used a Faster R-CNN with ResNet50 model trained on the MsCoCo dataset. Bhowmik [37] built a CNN model using 4000 offshore pipeline inspection images taken from the video frames of a Remotely Operated Underwater Vehicle (ROV). These studies demonstrate the power of CNN to improve offshore asset management at minimal cost. A similar work was carried out by Soares et al. [38] used a CNN model that could classify underwater images of oil and gas infrastructure taken by an ROV into four corrosion levels (high, medium, low and no-corrosion).

3.4 Others

Petricca et al. [39] used a deep learning model to classify corrosion into corroded and not-corroded. The deep learning model built in this study had a confidence level of 80%. Similar localization algorithms have been used by other researchers to determine the exact regions where corrosion occurred [1, 33, 40]. Yu et al. [41] demonstrated how CNNs could improve corrosion detection capabilities of Micro Aerial vehicles (MAVs) using Yolov3-tiny network. The image dataset from the field were labelled with four types of corrosions i.e. nubby, bar corrosion, exfoliation and fastener corrosion were used for training and testing the model. The model had a mean average precision of 84.96% when compared to existing detectors. In some cases, a single CNN model might not produce the best results depending on the task to be solved and there might be a need to use more than one model to solve the task. A machine learning model that utilizes two or more models to make a decision is called an ensemble model [42]. The study by Xu. et al. [43] compared the performance of an ensemble mode to single stack models. The ensemble models performed better in all corrosion recognition tasks. In another study by Idusuyi et al. [44], CNN was used to classify corrosion on mild steel samples in a laboratory setting using images from a digital camera and a mobile phone. The study showed that CNN corrosion classifiers had accuracies above 80%. Ta and Kim [11] used a regional CNN network to monitor corroded bolts on steel structures. From their study the model could distinguish rusted bolts from those without rust for light intensities greater than 63 lx. The classification accuracies in the studies presented above show that the method of using convolutional neural networks to study and identify corrosion in metal components is both viable and promising.

3.5 Corrosion Training Dataset for CNN

From the studies reviewed it is evident that data acquisition (in terms of quality and quantity) is vital to building an accurate CNN corrosion model. The general rule of thumb is that the more the data the better. In other words, the more the data samples used to create a model the better the generalization ability of such a model. In a research by Ejimudua and Ejimuda [9] only 36 images were used to create a model to classify between corroded and non-corroded steel, the model performed well on the test set with an accuracy of 83.3%, but when the model was used for other types of corrosion defects the model did not generalize well. The poor generalization ability of the model was due to the fact that the dataset was too small to create a robust model that could handle a multitude of use cases. It is also important to note that as the size of the dataset increases, the time required to train the model, the computational resources and cost also increases. This additional cost could be a drawback in using the CNN methods in resource limited settings. Low quality data will act as noise to the CNN model and form patterns based on these noise attributes. For this reason, the image capture process is sensitive and should be done using appropriate devices and in the right environment conditions [33, 45, 46]. In cases where the corrosion data is insufficient, manual data augmentation techniques like rotating, mirroring, sampling, shifting, etc., are used to turn the available corrosion data into a new set of data without any alteration to the original data. This again can increase computational cost when the dataset is large [13]. Using CNN for corrosion modelling can be quite challenging due to the presence of scales and patterns that could be hard to distinguish from corrosion products on the corrosion image. This would require high resolution image capture, additional expertise and computational skill.

4 Transfer Learning and the Effect on the Accuracy of Corrosion CNN Models

Convolutional neural networks are generally designed to solve a specific task; such models are built from the ground up using large datasets and high computational resources which might not always be available. Classification CNN models also require labelled data which can be expensive. This problem can be addressed with the use of transfer learning, where the knowledge gained from one task is transferred and reused in another task [47]. The use of transfer learning for CNN corrosion applications has been made possible with the availability of large annotated image datasets like the ImageNet dataset [48] and the MSCoco dataset [9]. The ImageNet dataset is a large dataset that contains over 1.2 million images with about 1000 classes, this large collection of labelled data has been used to build several CNN models which serve as the backbone for many image classification and object detection problems [49]. Some of these models have been made easily available and can be modified to suit the required needs of a specific domain. The AlexNet and GoogleNet are some of the most popular CNN models built on the ImageNet dataset [35]. The transfer learning approach to solving machine learning problems is very attractive in the field of CNN modelling due to the high requirements needed to build a solid CNN model from scratch. It is however important to understand the rudiments of the transfer learning method and when to use it for a given task [50]. The source domain and the target domain must be sufficiently related or else little to no useful information will be transferred which leads to the problem of negative transfer [47]. A diagrammatic representation of transfer learning is shown in Fig. 5.

Fig. 5
figure 5

Diagrammatic representation of the transfer learning process

One of the most common methods of applying transfer learning to image classification is the freeze method [1]. In the freeze method the base layer of the CNN is kept unmodified and only the head (classifier) is trained on the new data as shown in Fig. 6.

Fig. 6
figure 6

Freeze transfer process

The freeze approach is very useful in situations where limited data are available. Ejimuda and Ejimuda [9] used the freeze approach to create a corrosion classifier. In this study, only 36 images were available, using some data augmentation techniques like flipping and rotating the images through angles, the number of images was increased to 336. The study used a transfer-learned ResNet model trained on the MsCoco dataset as the basis for the model. An 83.3% accuracy was achieved for using the model. Holm et al. [35] also used the freeze transfer learning approach to create a corrosion classifier. In this study, the base of the CNN model was not altered but some layers of the head had to be dropped to solve the target task. One of the pre-built models used in this study was the AlexNet model. In another study by Matthaiou and Papalambrou [1] the convolutional base of the ResNet50 was frozen and the head was replaced with two dense layers to fit the required task. The study by Bastian et al. [36] made a comparison between the use of the transfer learning approach and building a custom CNN for the task of classifying corrosion into four different classes. A total of 1,200,000 images were created using augmentation techniques. In this study ZFNet, and VGGNet were the pre-trained networks on which transfer learning was applied. To use transfer learning effectively the domains must be sufficiently related to ensure there is no case of negative transfer and thereby ruin the model [47].

5 Future Work

Although CNNs have been successfully used to model corrosion for different infrastructures, there is still a dearth of information on applying CNN for structures like LPG tanks, crude storage facilities and gas gathering stations, etc. Designing and implementing remote data collection for these facilities would aid in the development of reliable CNN models useful for maintenance and planning.

6 Conclusion

From the review presented, the following conclusions can be drawn:

  1. i.

    Convolutional Neural Networks can be used to detect corrosion following these steps- Image acquisition, image pre-processing, feature extraction and analysis.

  2. ii.

    Convolutional Neural Networks with Recursive algorithms could be used to pinpoint exact locations where corrosion occurs.

  3. iii.

    Quality and sufficient data needed to build reliable Convolutional Neural Network models.

  4. iv.

    CNN models using the Freeze transfer learning approach is useful when limited data are available.

  5. v.

    Convolutional networks have shown promising applications for corrosion classification purposes with accuracies above 80%.