Introduction

Engineering structures like buildings, bridges, and towers are often subjected to cyclic loading resulting in fatigue stress, which leads to the cracks that usually initiate at the microscopic level on the structure’s surface (Mohan & Poobal, 2018). Conventional human-based crack detection method relies on trained inspectors to find cracks on the surface of a concrete structure based on their expertise and years of experience (Li & Zhao, 2019). For concrete bridges to remain structurally sound and reliable, it is essential to detect cracks on bridge decks (Prasanna et al., 2014). Cracks in reinforced concrete significantly impact its lifespan because they provide an accessible channel for aggressive chemicals to reach the reinforcement and initiate corrosion. Therefore, crack identification is an important part of damage evaluation (Silva et al., 2018). Computer vision techniques have been proven effective in automated image-based crack detection approaches for their non-contact, high precision, and good real-time performance. Their use has become a research issue in recent decades (Soni 2019).

Deep learning has sparked much interest in recent years. The development of deep learning architectures, which offer greater and higher levels of representation, was made possible by rising computing capability (Arel et al., 2010). Kaveh & A Khalegi (1998) used artificial neural networks for different types of concrete mixtures to predict the strength of concrete. Kaveh and Servati (2001) employed a backpropagation algorithm evaluation of the maximum deflection, weight, and design of double-layer grids. Iranmanesh and Kaveh (1999) also employed gradient‐based neural networks for structural optimization. Kaveh et al., (2001) studied the prediction of moment-rotation characteristics for semi-rigid connections using BP neural networks. Kaveh et al., (2001) used neural networks and genetic algorithms for the optimal design of transmission towers. Rofooei et al., (2011) estimated the vulnerability of the concrete moment resisting frame structures using artificial neural networks.

A neural network architecture specifically designed for image recognition is the convolutional neural network (CNN) (LeCun et al., 1998). The MNIST handwritten digit identification benchmark (Ciregan et al., 2012), the Google StreeView house number (SVHN) data set (Goodfellow et al., 2013), and the German traffic sign recognition benchmark (Cireşan et al., 2011) are only a few benchmark problems where outstanding performance of CNNs has been reported. CNNs are multilayer neural network architectures that employ convolutional layers for local receptive fields and pooling layers for invariance with respect to small geometric deformations.

Convolutional neural networks have shown a strong capability for feature extraction and target detection and have been used in structural health monitoring (SHM) (Oliveira et al., 2012). CNN, in general, is a more powerful and accurate method of solving classification problems (Avendaño 2020), whereas other NNs are used for regression problems. Figure 1 shows the representation of artificial intelligence (AI), machine learning (ML), deep learning (DL), and convolution neural network (CNN).

Fig. 1
figure 1

AI, ML, DL and CNN

Digital image processing has a broad spectrum. It has varied applications, such as remote sensing via satellites and other spacecraft image transmission. It automates the inspection of industrial path storage for business applications, medical processing, radars, and acoustic image processing robotics. Digital image is composed of a finite number of elements, each of which has a particular value at a particular location (Munawar et al., 2021). These elements are referred to as picture elements, image elements, and pixels. A pixel is most widely used to denote the elements of a digital image. It provides accurate results compared to the conventional manual method (Feng et al., 2017). Image classification and segmentation are applied through multiple techniques, e.g., object detection, localization, or recognition tasks in many applications, such as face recognition. Civil structure damage detection is also one of the most important research areas of image classification and segmentation (Ali et al., 2022). Many methods have been established based on image processing techniques, including edge detection, Hough transform, image segmentation, identification and detection of feature points, the digital image correlation (DIC) method, and photogrammetry (Hamishebahar et al., 2022). Therefore, in the present study, the application of CNNs has been used for crack detection, whereas crack quantification is measured by image processing technique.

Organization

The outline of this paper is as follows: Sect. 1 anticipates the significance of DL in crack detection. Section 2 describes an overview of related work in crack detection. Section 3 elaborates on structural health monitoring and methodology adopted for crack detection. Sect. 3.1 discusses crack detection using CNN architecture, Section 3.1.1 discusses details of the dataset used, and Sect. 3.1.2 discusses the VGG16 model in CNN. Section 3.2 provides information regarding cracks, Sect. 3.2.1 explains crack width measurement, and Sect. 3.2.2 discusses the image processing technique and width measurement. Sect. 4 provides the results and discussion, and Sect. 5 provides the conclusion and future scope.

Related work

Yuan et al., (2021) proposed a framework based on CNN (convolutional neural network) and digital image processing to monitor crack propagation length. A crack tip-detection algorithm was established to locate the crack tip accurately and was used to calculate the length of the crack. Silva and Lucena, (2018) presented a review of CNN implementation on civil structure crack detection. Dong et al., (2020) established a CNN model to accurately identify microseismic events and blasts. Dais et al., (2021) examined deep learning techniques for crack detection on images from masonry walls. It is the first implementation of deep learning for pixel-level crack segmentation on masonry surfaces. Mohammed (2021) focused on developing three open-source CNN models, and it is trained using 40,000 images and tested for crack detection. The performance of three different convolutional neural networks (CNN) models was subsequently assessed. Yiyang (2014) have proposed a crack detection algorithm based on digital image processing technology. They obtained information about the crack image by pre-processing, image segmentation, and feature extraction.

Alazani et al., (2021) have used DIP (digital image processing) to perform crack detection and geometry measurements like angle, length, and width by building a MATLAB code. Image processing and failure pattern recognition techniques were employed by suitable machine learning algorithms and validated the techniques using Python programming (Aravind et al., 2021). The images were used from basalt fiber-reinforced polymer/glass fiber-reinforced polymer and steel bar beams when subjected to a four-point static bending test. They employed six machine learning classifiers, and the failures in the structures were classified into three classes: flexure, shear, and compression. It was found that among the six classifiers used, the support vector classifier gave the best performance with 100% accuracy in identifying the failure patterns.

Many researchers used CNN for feature extraction and target detection in structural health monitoring. The present novel technique combines target detection with the quantification of a given crack. The novel system will effectively monitor civil infrastructure’s structural health since crack identification is insufficient in most structures. In the present work, crack detection is done using deep CNN with 40,000 images, and subsequently, crack quantification is measured by image processing. Thus, measured crack width is validated with manually collected 280 images measured by crack microscope.

Methodology

SHM (structural health monitoring) is a routine inspection procedure used to track the performance and qualities of civil structures. Its primary purpose is to assess changes in the civil structure, develop a maintenance strategy, and take appropriate action in response to structural anomalies. However, there are numerous methods for monitoring structural health. Damage detection is a physical disturbance that affects and weakens the features of the structure. A crack is typically seen as a flaw capable of causing considerable delamination and implications. Cracks are one of the factors that impact the structural condition of highways, subways, bridges, buildings, dams, tunnels, monuments, and other structures (Li & Zhao, 2019).

This paper uses advanced computer techniques such as deep convolutional neural networks to detect cracks. In addition, to crack identification, the width of the crack is measured using images taken by a crack microscope and image processing. These photos have cracks and crack-like noise, which is minimized using CNN techniques in Python software. Then, a CNN network was trained on a huge dataset of tagged crack images to accomplish reliable crack identification and utilized to determine the crack region. A step-by-step procedure of crack detection by CNN is explained in Fig. 2. The suggested model's architecture is described in Sect. 3, and the procedure of preparing training datasets and crack area detection results are discussed. Following that, image processing is used to determine the crack width.

Fig. 2
figure 2

The methodology adopted for crack detection using DCNN

Crack detection using deep convolutional neural networks

The convolutional neural network (CNN), invented in the 1980s, is one of the most established, advanced, and commonly used deep learning (DL) techniques. However, the concept gained a foothold as machines' processing power in computation and database retrieval and storage improved. Later, CNNs were effectively utilized for classifiers and surpassed computer vision applications. They are a feed-forward network with numerous convolutional, pooling, and fully connected layers that require many data to train. In computer vision, CNN identifies characteristics by integrating values from the image pixels. CNN, in general, is a more powerful and accurate method of solving classification problems. For cases where datasets are restricted, and image inputs are not required, ANN remains dominating. The limitation of CNN is that it requires parameter tuning.

CNN architecture

The CNN architecture comprises convolution, pooling, and full connection layers. The CNN architecture is shown in Fig. 3.

  1. (i)

    Convolution layer. The objective of the convolutional layer is to recognize and examine the properties of the input. They can allow them to distinguish between low-level elements like edges and lines and high-level features like objects and shapes. The first layer’s extract edges and patterns, which are inferior characteristics.

  2. (ii)

    Pooling layer. The pooling layer, sandwiched between two convolutional layers, functions as a funnel, reducing the amount of reporting and disclosure. The middle layers extract features such as object shape and color, whereas the higher layers extract features such as full objects. Since their purpose is to produce a result concerning message categorization (using the output data from the pooling/convolutional layer), there might be numerous layers depending on how sophisticated the system needs.

  3. (iii)

    Full Connection layer: The fully connected layers are usually the last few layers used to summarize information. The features extractor layer's final output is fed into a fully connected neural network for classification or a bounding box and pixel classification layer for segmentation.

Fig. 3
figure 3

CNN architecture

Building the dataset

CNN is a data-driven methodology in which to acquire the desired findings, a large amount of data are required. Before training the CNN model, the researchers used an online source to retrieve pre-built crack images on concrete surfaces, including huge fissures and photos with minor cracks. This dataset contains 40,000 images, comprising 20,000 crack images as shown in Fig. 4 and 20,000 non-crack images as shown in Fig. 5, each measuring 227 \(\times\) 227 pixels. The procedure is split into three parts, the training step, the validation step, and the test step. To deploy the cross-validation concept of 80% for training and 20% for validation, these crack images were separated into two groups: positive (crack images) and negative (non-crack images).

Fig. 4
figure 4

Sample images of 20,000 crack images (positive)

Fig. 5
figure 5

Sample images of 20,000 non-crack images (negative)

Training the model

During training, batches of data pass through the model according to a set batch size from input to output through the hidden layers. When all the batches of the training set have gone through the model, it is said that the model has completed an epoch. (Mohammed, 2021). After this, the model provides an output in the form of a probability. This probability represents the likelihood of the image being part of the "positive" labeled group or the "negative" labeled group. The error between the prediction and the true label is the loss. To study the loss of all the elements, we use a loss function.

Validation step

The validation step uses different data (validation set) from that used to train the model and performs a similar action as in training. The model creates predictions on the data and compares the results to the actual labels. The validation step results give metrics for the network's loss and accuracy after each epoch to evaluate how the model is progressing.

The images from the structure were captured using a phone camera and subjected to the image pre-processing process and further analysis. About 280 photos are manually taken using a crack microscope to measure crack width and resized into 227 \(\times\) 227 pixels.

VGG16 architecture

Because of its uniform architecture, VGG16 has been identified as the highest-performing model in CNN for image processing. VGGNet-16 is a model in CNN that consists of 16 layers in total, as shown in Fig. 6, including 13 convolutional layers and three fully connected layers. The image dimension that is fed into the network is 227 × 227x3. The first two layers have 64 channels of 3 × 3 filter size as padding, followed by a max pool layer of stride (2,2), two layers with 256 filter size convolution layers, and filter size (3, 3). All of that is followed by a stride (2, 2) max pooling layer that is the same as the preceding layer. There are two convolution layers with filter sizes of 3 and 3 and a 256 filter. Following it, there are two sets of three convolution layers and a max pool layer. Each has 512 filters of the same size (3, 3) and padding. This image is then fed into a two-layer convolution stack.

Fig. 6
figure 6

VGG16 architecture

Cracks

Cracks are small dark lines that emerge on the surface of a solid material where it has split but has not separated. Material shrinkage and expansion, shifting foundations, premature drying, overloading, hydrostatic pressure, uneven blend, swelling soil, inadequate soil bearing, creep damage, settlement, and farming can all cause cracks on any concrete surface. Fatigue cracks, block cracks, edge cracks, longitudinal cracks, transverse cracks, and reflection cracks are the most common types of cracks. The naked eye can see cracks as simple flaws. However, a tiny crack might grow into a larger and more dangerous problem. Because of the complicated texture pattern in the background, some cracks are difficult to see. As shown in Table 1, different types of cracks have different diameters, such as hairline cracks, which have a width of 0.1 mm and can be spotted on a clean background but are hard to observe with lighting variation. According to the IS 456:2000 code width conditions, the risk groups are classified below. If the width exceeds 0.3 mm, maintenance work will begin, and if the width is between 0.3 mm and 0.2 mm, we will monitor the trend. Cracks that are less than 0.2 mm in diameter are deemed low risk.

Table 1 Classification of cracks based on width IS:456-2000

Manual crack width measurement

Observing crack width changes is one of the techniques used to monitor structural damages due to ground movements, and several instruments are used to monitor crack width changes. A crack microscope is a high-quality product measuring crack widths in concrete and other materials. It is a very simple, graduated × 50 microscope with internal illumination. It allows the user to quickly determine width by counting the number of graduated graticules on the scaled lens and calculating the value. The instrument is a microscope that has a calibrated graticule (scale). The graticule is seen through the eyepiece when the instrument is focused on the concrete surface; it appears as though the scale is on the crack, allowing the user to precisely measure the crack's width. Figure 7 shows the images collected for measuring crack width using the crack microscope.

Fig. 7
figure 7

Images collected using a crack microscope

Image processing technique used for crack detection and width measurement

The basic architecture for crack detection via image processing presented in this section is represented in Fig. 8. The main advantage of image-based crack detection is that it offers more accurate results than traditional manual approaches.

Fig. 8
figure 8

Sequence of steps in image processing

The difficulty of crack detection processing is entirely dependent on the image size. The steps in the image processing technique are as follows: (1) High-resolution images of the target component are first obtained using a camera or other imaging tool. (2) After the image acquisition, the images are subsequently pre-processed, including filters, segmentation, and other techniques to remove noise and shadows. (3) The image is then converted to grayscale or binary form if the specific crack detection method is required. (4) The resultant image is applied to the crack detection procedure, which uses image processing techniques like edge detection, segmentation, or pixel analysis to highlight or segment the cracked part in the image. (5) Parameter estimation involves calculating the specific properties of the detected crack, such as its width. Such measures help in making decisions regarding the severity of a crack.

Crack width measurement and classification by severity

Image binarization Fig. 9 shows the results of image binarization. It divides the image into black/white pixels to separate the cracks and non-cracks within the image.

Fig. 9
figure 9

The results of image binarization

Skeletonize Fig. 10 shows the results of the skeletonization. It extracts the central skeleton of the crack. The crack’s progression is visible. By drawing a line parallel to the crack’s direction of propagation at the pixel on the skeleton, the crack’s width can be determined.

Fig. 10
figure 10

The results of skeletonization

Edge detection In edge detection, it extracts the outline of the crack. In the skeleton, a line perpendicular to the crack propagation direction and the crack outline is used to find the crack width.

Results and discussion

Detection of the crack using CNN

This section consists of the results of the evaluation of the proposed method for crack detection using deep CNN architecture in Python source code. Table 2 summarizes the architecture and its related model used for crack detection. The important parameters used to assess the performance of network designs are F1-Score, precision, and recall. These metrics are used to measure the accuracy of crack classification. The accuracy of CNN is highly related to the depth of the network.

Table 2 Deep convolutional neural network method for crack detection

The evaluation of the network is based on the values of precision, recall, and F1 score. These metrics are given as:

$$Precision= \frac{TP}{TP+FP},$$
$$Precision= \frac{TP}{TP+FN},$$
$$F1 Score= \frac{2 \times Precision \times Recall}{Precision+Recall},$$

where TP, FP, TN, and FN correspond to true positive, false positive, true negative, and false negative, respectively. Figure 11 shows the results of the images that detect cracks using deep CNN architecture.

Fig. 11
figure 11

Sample images that show the detection of cracks through deep CNN

Quantification of crack width using microscope and image processing technique

The manual measurement of micro-cracks relies on the skill and experience of the inspector, thus being a subjective measurement to some extent. The measurements manually made have an error caused by the human factor. By comparing the measured and the estimated values, the error is expected to decrease with increasing crack width. Table 3 shows the results of the crack detection using original images for some sample images. After crack detection the crack width has been given by image processing technique. The % accuracy has been showed for best images. The quantification of cracks has been done through a technique called image processing using Python code. The features extracted are crack detection and crack width for different concrete surfaces. A total of 280 images have been captured through Mobile Camera.

Table 3 Results of crack detection and crack width using image processing

Table 4 shows the information regarding the image processing method. The accuracy for determining the crack width measurement is in the range of 65–98%. The accuracy is less due to varied distance between the concrete surface and camera.

Table 4 The information regarding the proposed method

Table 5 summarizes the results of crack width of the sample image through image processing w.r.to the change in the distance between the camera and the surface and constant pixel size. The result shows that there is a variation in the error w.r.to change in the distance between camera and the surface. Therefore, it is concluded that error can be reduced with constant pixel size and a fixed distance between the concrete surface and camera. Thus, the proposed method can further be developed by considering more parameters like pixel size, distance between the camera and surface of the concrete, day light etc., to improve its accuracy.

Table 5 Results of images that are collected w.r.to pixel size and distance

Conclusion and future scope

Crack identification is carried out using a deep convolution neural network, and a concrete crack is quantified using image processing. The conclusion of the study is as follows:

  1. 1.

    The accuracy obtained through the algorithm proves that a CNN is especially powerful in image classification as it can automatically learn certain features from many images. The accuracy of the CNN has yet to be improved to reduce the number of false positives.

  2. 2.

    The quantification shows that the percentage of accuracy obtained through image processing varies from 65 to 98% in comparison with the actual physical measurements.

  3. 3.

    To increase the accuracy, few images were collected with constant pixel size and a fixed distance between the concrete surface and camera. It has resulted in better accuracy. The proposed method can further be developed by considering more parameters to improve its accuracy.

Future scope

The extension for this project in the future is followed by gathering a huge number of datasets with different crack widths to train and test the CNN algorithm for accurate results. Estimating parameters like length and depth of cracks plays a major role in detailing the severity of cracks on the concrete surface.