Introduction

This section gives a glimpse of what our proposed method is all about and it introduces the basics and the overview of the proposed technique. It also includes the motive for the proposed method and the necessity for the model implementation with objectives.

Effective damage detection techniques are necessary to identify damage quickly and avoid catastrophic failure. The examination includes detecting cracks in structures. In most cases, structural cracks are examined by humans. During this type of examination, the cracks are noted along with the surrounding anomalies. The human method lags in the quantitative analysis because it is completely based on the experience and method of the examining person (Shan & Dewhurst, 1998). Thus, automated identification of cracks becomes one of the most important methods in examining the structures and at the same time a challenge for developing intelligent maintenance systems (Zhou et al., 2021).

Automated fracture detection has shown to be a difficult endeavor due to issues including noises in the captured images, non-uniformity, and other complex topologies.

This study introduces a convolutional neural network that can train on deep convolutional features in the images, improving the discriminant of captured image features along other complex conditions (Han et al., 2018). With the right data processing, it is possible to extract useful information from structures and use it for spotting cracks. This helps in taking early measures for maintenance and preventing any major damage to the structures. The experimental findings show that our detection approach produces good performance with a precision of above 98.00% after we train and analyze the data on our Dataset. This method is highly effective when analyzed with other existing methods and approaches.

The motivation is that, currently, there are many uses for image processing. Being involved in this field now is fascinating and thrilling. Visual information that is delivered in the form of a digital image is more important in today’s society. In the future, image processing may be widely used to identify and make people aware of cancerous tumors, which aids in disease prevention by raising awareness. Digital image processing is a branch of signals and systems, where the images are manipulated and it concentrates mostly on pictures and visuals. The input for the system is an image, which is processed by effective algorithms and methods to generate the required output.

We can see that there are numerous instances of buildings collapsing owing to deterioration and fractures in the walls, and numerous accidents on the highways take place for the same reason. This can occasionally result in numerous casualties and losses. In the circumstances mentioned above, early damage or crack identification enables us to take preventative action to lessen or avert damage and potential failure. This technique may be enhanced and made accessible to everyone.

The goal of the research is to locate fractures in concrete surfaces. Builders may quickly determine the strength of any concrete building using this computed information and take any required adjustments right away. Concrete has a low tensile strength and quasi-brittleness. Tensile stress can arise in concrete as a result of applied loads, harmful chemical reactions, and environmental influences. The concrete will break if these tensile stresses are greater than its tensile strength. The size and number of fractures affect how well bridges and structures function. Although this cracking may be minimized by carefully choosing the elements that makeup concrete, some cracking is unavoidable.

The primary goal is to construct and create a software tool for crack detection with Python and algorithms, as well as ideas from machine learning and deep neural networks.

The contribution and objectives of the proposed work are.

  • To implement localization methods of signal processing in the detection of cracks.

  • To use the fundamentals of image processing for crack detection.

  • To collect datasets.

  • To develop an algorithm for analyzing the dataset.

  • To apply the mathematics to implement a software-based process Framework to the sampled data.

Different methods used in crack detection” deals with studies and research made by some of the certified researchers throughout the world on crack or damage detection and related works. “Methodology” gives the required procedure to detect cracks using image processing and also deals with the explanation of the block diagram which is used for implementation with the necessary evaluation models.

Result and discussion” deals with the final step of our project, in this section we discuss the accuracy and losses of the training and testing model that we have implemented with the validation loss over time. We will also have a look at the model classification reports.

In “Conclusion and future scope”, we conclude our work by explaining the challenges that are present in surface rack detection and future scope of the proposed work.

Existing work

Many efforts are made to examine the structural properties by use of an image processing approach with different cameras for highly efficient examination of the structures. Many methods have been developed over time for the detection of structural damages or cracks using various methods.

Cracks or damages on concrete structures are one of the early signs, and concrete quality is a key indicator in assessing the quality of construction projects (Arun & Poobal, 2018). It’s important to find cracks and insect holes in structures since they frequently influence the quality of concrete surfaces (Hassene et al. 2017; Sun et al., 2021). One of the biggest challenges in the industry today is to maintain the quality of the surface of the structure which directly indicates the durability and maintenance of the buildings (Yao et al., 2019). Nevertheless, because of uneven light, deformation, possible shadows, and other factors, it is challenging to precisely identify pavement fractures in complex formations (Qu et al., 2020). Due to the fast increase in traffic, many older bridges’ load designs are no longer able to accommodate the necessary loads, significantly jeopardizing the safety of the structure. Thus, it is essential to regularly check infrastructure like tunnels, bridges, etc., and identify any possible structural damage to guarantee operational safety (Zinno et al., 2022; Dong et al., 2019).

Using image processing and convolution neural network

Many image processing methods have been put into place to find civil infrastructures and partially replace on-site inspections that are handled by humans. Yet, very different real-world circumstances (such as lightning and shadow) might provide difficulties (Dong et al., 2019). Cracks that are not identified at the earliest or structures that are less maintained can lead to major damages. While identifying the cracks using image processing, one of the major problems is noise-captured images. So, based on a discretized variation minimization model with constraints, Amir et al. provided gradient-based solutions for picture denoising and deblurring issues (Beck & Teboulle 2009; Cha et al., 2017). A deep architecture using a convolutional neural network and infrared thermal image processing-based fracture detection assist overcome these difficulties (Mao et al. 2020). Vision-based algorithms are also often utilized in crack detections, as the human inspection takes more time and includes many other difficulties (Lionnie et al. 2022; Yang et al., 2019; Yeum & Dyke, 2015).

Shaoqing et al. suggested using RPN (region proposal network) that uses complete image-based convolutional network features including the network of crack detection, so, making an almost cost-free region proposals network (Ren et al., 2017). Liyan Zhang et al. suggested a convolutional neural network-based method combined with the Internet of Things with an accuracy of more than 90% (Zhang et al., 2018). Kaiming He et al. proposed spatial pyramid pooling with CNN based technique with accuracy much better than the R-CNN model (He et al., 2015). Jun Yang et al. proposed infrared thermal image crack detection with an accuracy of 95.52% (Yang et al., 2019). Chen et al. put forward a multi-task enhanced faster RCNN approach and the results with the K-MABtrA method were useful when dealing with multiple or small objects and were able to get 80.02 average precision (mAP) (Zou et al., 2012).

Different methods used in crack detection

As cracks are of high non-uniformity and topological complexity using a multiscale feature attention network or multiscale dilated convolution model yields better efficiency in crack detection (Song et al., 2019). Pavement crack detection can also be done using a Gabor filter which is very much a potential and possible technique for crack detection in various directions (Salman et al., 2013). Lee et al. proposed a bridge inspection system using a robot, as conventional bridge inspection has a lot of challenges (Oh et al., 2009). Adaptive thresholding and deep convolutional neural network model can be implemented for crack detection to yield better accuracy (Fan et al., 2019). Hyunwoo Cho et al. put forward a structural damage detection approach that depends on edge finding with many intermediate steps. It is based on CWT (crack width transformation) algorithm for damage detection (Cho et al., 2018). Dhanajitha et al. proposed crack detection in buildings using drones and which were able to inspect the cracks in high-rise buildings from various angles and were able to get 90.67% accuracy (Danajitha et al., 2022).

Kaveh worked on damage detection with different technologies and he introduce different optimization technologies his primary goal is to determine the position and severity of multiple damages in buildings or structures (Kaveh & Maniat, 2015). To achieve this, natural frequencies and mode shapes are utilized in constructing the necessary objective function (Kaveh, 2017). The authors presented an alternative technique for detecting structural damage in beams and frames by leveraging natural frequencies (Kaveh & Dadras, 2018; Kaveh & Zolghadr 2012).

Kaveh and Maniat researched identifying structural damage in skeletal structures when only incomplete data is available (Kaveh & Maniat, 2014, Kaveh & Zolghadr 2017a) and the authors proposed the tug-of-war algorithm for the detection of structural damage (Kaveh & Zolghadr, 2017b).

Here, we analyzed numerous approaches and fracture detection systems used to concrete civil constructions. The research we have done gives a thorough analysis of various technologies with different approaches used to identify cracks in concrete buildings. This investigation also sheds insight into the difficulties associated with fracture identification on concrete buildings and the potential avenues for future research. In conclusion, the analysis of the crack-detecting method demonstrates significant advancement in several areas. Although coping with varying camera resolutions has not been a major barrier for these studies, a bargain in the middle between the accuracy of the system and the complexity of an algorithm still exists which has to be addressed.

Methodology

This section gives the required procedure to detect cracks using image processing and also deals with the explanation of the block diagram which is used for implementation with the necessary evaluation models.

The main benefit of employing image processing instead of traditional manual methods for crack identification is that the results are more accurate with image-based analysis. One of the major challenges in damage detection is the size of the image. The image resolution of modern digital cameras exceeds 10 megapixels.

The ability to capture detailed photographs of concrete surfaces is made possible by the improvement in resolution. The outspread view of the structure surface is captured on a single image utilizing modern commercial cameras. A broad range of pictures is employed for realistic fracture detection in low-cost applications.

The overall framework of the cracks identification model by utilizing image processing is shown above. The block diagram (Fig. 1) includes the following:

  1. a.

    Using a camera or other source, first, get a picture of the required structure which will be used in the fracture identification technique.

  2. b.

    Succeeding the image acquisition, the gathered pictures are pre-processed to perform techniques like segmentation and make the image processing process more effective.

  3. c.

    To process the subtracted picture sample, certain image processing methods are implemented.

  4. d.

    Using the output from image processing, and fracture detection on the structure can be observed here.

Fig. 1
figure 1

Block diagram for crack identification utilizing image processing

Image acquisition

Illuminating the surface with the requisite light is required to obtain a picture of the fracture surface that is of high quality. For the broken surface to receive stable light, the light source is also crucial. Some of the light sources used are tungsten (wide spectrum), light emitting diodes, Xeon and sunshine, etc. They are utilized to shine a light on the surfaces of cracks in steel and concrete civil constructions. Digital cameras are typically utilized in concrete construction. Digital camera resolution power and lighting conditions determine the accuracy and precision of the images.

Pre-processing and image processing

The majority of pre-processing approaches are based on filtering techniques and these are used in differentiating the crack feature from the surroundings. Multi sequential image filter is implemented to filter out background noises and spot fuzzy fractures. By removing of input picture from smoothed picture, the smoothing filter gets rid of the uneven lighting conditions and shading effects (Cao et al., 2020). Line emphasis filters are used to eliminate the noise from the input image. Morphology- and algorithm-based pre-processing methods are additional crucial methods. Figure 2 displays a pie chart with the various pre-processing procedure types displayed on it.

Fig. 2
figure 2

Types of pre-processing techniques

The pixel value of the support region is operated linearly by a filter known as a linear filter (or weighted summation). So, the “filter matrix” designates the support zone, which is represented as H. (i, j). The filter region’s size is referred to as H, and the coordinate system for the filter matrix is unique with I denoting the index in column and j the index in row. Its center serves as the origin point and is called a hot spot. Noise is removed with a smoothing filter (a linear filter), which produces a blurred image structure, line, and edge. This problem was solved using non-linear filters, which operate nonlinearly.

Following steps are used to apply the filter to the image.

  1. a)

    Adjust the filter matrix so that I and H (0, 0) match the image’s current location (u, v).

  2. b)

    Multiply each of the filter’s coefficients H (i, j) by the associated picture component I (u + i, v + j).

  3. c)

    Calculate the result for the present location I (u, v) by averaging all the results from the previous step.

The equation below can be used to describe each stage.

$$I^{^{\prime}} \left( {u,v} \right) \leftarrow \mathop \sum \limits_{i = - 1}^{1} \mathop \sum \limits_{j = - 1}^{1} I\left( {u + i,v + j} \right).H\left( {i,j} \right)$$
(1)

One of the operations associated with linear filters is linear convolution. The convolution operation is described as the equation for the two-dimensional function I and H.

$$I^{^{\prime}} \left( {u,v} \right) \leftarrow \mathop \sum \limits_{i = - \infty }^{\infty } \mathop \sum \limits_{j = - \infty }^{\infty } I\left( {u - i,v - j} \right).H\left( {i,j} \right)$$
(2)
$$I^{^{\prime}} = I*H$$
(3)

when the convolution technique is used. If you look at the equation, you’ll notice that this procedure yields results that are comparable to those of linear filters that have filter functions that take into account Vertical axes and also in horizontal axes. The kernel is convolution matrix H.

One type of non-linear filter is the minimum and maximum filter which generate the least and maximum value respectively in moving region R of the original image. Defining these filters,

$$I^{^{\prime}} \left( {u,v} \right) \leftarrow \min \left\{ {I\left( {u + i,v + j} \right)|\left( {i,j} \right) \in R} \right\}$$
(4)
$$I^{^{\prime}} \left( {u,v} \right) \leftarrow \max \left\{ {I\left( {u + i,v + j} \right)|\left( {i,j} \right) \in R} \right\}$$
(5)

Another type of Non-linear filter, the median filter’s output is the median of each value in R (moving area). Moreover, these filters are frequently implemented to eliminate pepper and salt noises in pictures. This filter’s definition is

$$I^{^{\prime}} \left( {u,v} \right) \leftarrow {\text{median}}\{ I\left( {u + i,v + j} \right)|\left( {i,j} \right) \in R\}$$
(6)

Filters can be implemented where the quality of the image has to be improved (e.g., noise removal). It may be used to sharpen the image and find edges.

Crack detection

There are various crack identification methods constructed on image processing. The first sign of a structure deteriorating is when cracks start to show on its concrete surface. Regular fracture revelation will result in the structure being rigorously destroyed. To solve these issues, it is crucial to spot the fissures as soon as possible. Nevertheless, voids and delamination, as well as other elements, make fracture identification on a concrete picture challenging. This section discusses numerous techniques for fracture detection on the surface of concrete. Several image processing techniques may be used to find fractures in concrete constructions (Gui & Li, 2020). Based on the sort of technology used to find cracks in concrete buildings, crack identification methods are divided into 3 groups.

  • Model-dependent approaches.

  • Thresholding-based strategies.

  • Pattern-based approaches.

Image processing methods implemented in crack identification and analyzed datasets are specifically designed for certain pictures. For fresh photos and datasets gathered under diverse lighting circumstances and in the presence of shadows, the approaches might not produce correct findings. In some cases, crack detection methods are also classified into three groups as Filter dependent approaches, Machine learning-dependent approaches, and the last one which is machine learning and filtering-dependent methods (Sizyakin et al., 2020).

Procedure

Annotations and Labelling: The images were taken from a Kaggle dataset having 227 × 227 dimensions. The final dataset had a total of 6000 images labelled as cracks. Images with cracks were labelled as ‘Cracked’ and images without cracks were labelled as ‘Uncracked’ and based on that datasets were divided into file paths and labels that contain the file path of the images with labeling from the starting index.

Here, we first build a model using Keras layers which are already defined in the network, and compile it. Then training of the model is carried out.

Although the model's structure is similar to that of VGG-16, it has fewer layers and a considerably more compact input picture size. Three convolutional blocks make up the model, after which fully connected layers along with an output layer. Spatial dimensions of activation maps after each convolutional block were also shown in Fig. 3. This illustration helps us understand how the Keras layer works.

Fig. 3
figure 3

Convolutional neural network

As we have only considered binary, with only two classes, binary classification methods employ Sigmoid, whereas multiclass issues require SoftMax. So, here we have used sigmoid activation after the dense layer.

The default behavior has no padding therefore the convolutional layer’s output will have a spatial dimension that is somewhat less than its input if we don’t specifically specify this padding option. Except for the output layer, we apply a ReLU activation function across the board throughout the network. In addition to the consideration of Convolutional neural networks, the convolution of the kernel is a crucial part of many other Computer Vision approaches. In the procedure, the kernel (a small number matrix) or filter is utilized to change the image depending on filter values. The formula, which is used to produce subsequent feature map values, represents the input picture as the letter f and the kernel as the letter h. The result matrix’s indices for the rows and the columns are denoted by the symbols m and n.

$$G\left[m,n\right]=\left(f*h\right)\left[m,n\right]=\sum_{j}^{.} \sum_{k}^{.} h\left[ j,k \right] f[ m-j,n-k ]$$
(7)

Figure 4 shows the architecture of the convolutional neural network. Representation of the convolutional network as shows the Fig. 5.

Fig. 4
figure 4

Architecture of the model convolutional network

Fig. 5
figure 5

Representation of convolutional product

Filters are applied to a particular pixel, following every kernel value in pairs is multiplied with matching values in the input picture. In the end, everything is compiled and the outcome is placed in the appropriate location on the output's feature maps.

The output of the kernel by use of a convolution product is also referred to as a filtered image. This can be represented as below:

$$G (x,y) = w\times F (x,y) = \sum_{\delta x=-ki}^{ki}.\sum_{\delta y=-kj}^{kj}. w(\delta x,\delta y)\cdot F(x+\delta x, y+\delta y)$$
(8)

Here, k represents a kernel and

$$-kj\ge \delta y\ge ki, -kj \ge \delta x\ge -ki$$
(9)

For the Conv2D layer, we have considered only three arguments, these are filters, activation, and kernel size. The output of the convolution layer can be represented as

$$\mathrm{output }= \frac{\mathrm{input }-\mathrm{ kernel size }+ 2\times \mathrm{padding}}{\mathrm{stride}} + 1$$
(10)

After the above, MaxPooling2D with argument Pool_size and Global average pooling 2D is used.

Activation functions

These are used for converting the neuron linear output into non-linear output which helps a neural network to learn the non-linear conducts.

To implement this, Rectified Linear Unit (ReLU) is used, which gives x for all the positive values of x and zero and gives 0 for the negative values of x which is the same as max (x, 0).

The ReLU equation is:

$$f\left( x \right) \, = \, \max \, \left( {0, \, x} \right)$$
(11)

Function ReLU (Fig. 6) and its derivative are monotonic and the output range of the function varies from 0 to infinity. It’s a default activation function and is widely implemented in neural networks, particularly CNNs.

Fig. 6
figure 6

ReLU activation function

The sigmoid activation function is differentiable and also bounded. It is also a real function that has a single point of inflection and the derivative at every point is non-negative. It is interpreted for every real value.

It is also defined as a sigmoid curve and is described as a function having a recognizable S-shaped curve. The logistic function is denoted by the following formula:

$$\mathrm{S}(\mathrm{x})=1-\mathrm{S}(-\mathrm{x})=\frac{1}{1+{\mathrm{e}}^{-\mathrm{x}}}=\frac{{\mathrm{e}}^{\mathrm{x}}}{{\mathrm{e}}^{\mathrm{x}}+1}$$
(12)

The sigmoid function (Fig. 7) is also monotonic. One pair of horizontal asymptotes act as a constraint on a sigmoid function as \(\mathrm{X}\to \pm \infty\)

Fig. 7
figure 7

Sigmoid activation function

In many of the examples, that point is 0, and for values below that point, sigmoid is convex and concave for values above it.

Compiling and training the model

After the above steps, we compile the model by specifying the optimizer type as Adam. We have specified loss as binary_crossentropy. Lastly, we provide accuracy as a further parameter to track throughout training. Although the loss function’s value is always recorded by default, you must specify it if you want accuracy. After this, we train and acquire the results required.

Evaluation metrics

It is important to keep false negatives to a minimum at the expense of raising false positives since the purpose of assessment is to find as many examples from a community as feasible for a screening approach. As a result, false positive rate and true positive rate with accuracy should be considered. The first parameter is referred to as sensitivity (SEN) in medical terminology and is represented by the symbol equation:

$${\text{TPR}} = {\text{SEN}} = {\text{TP}}/P$$
(13)

Here, P stands for the occurrences of positive events and TP represents the number of genuine positives. False positive rate estimation, stated as the equation:

$${\text{FPR}} = {\text{FP}}/N$$
(14)

The fraction of false positives is FP, the Number of total genuine negative samples is N and cumulative negative events in the population is N. Evaluation matrix as shown in Fig. 8. On the other side, this statistic is best understood in terms of true negatives and true negatives or specificity (SPEC), which is provided as equation:

$${\text{TNR}} = {\text{SPEC}} = {\text{TN}}/{\text{N}} = {1}{-}{\text{FPR}}$$
(15)
Fig. 8
figure 8

Evaluation matrix

Lastly, accuracy establishes the ratio of genuine positives to true negatives. When the number of good and negative events is not equal, this statistic might be very helpful. This can be written as equation:

$${\text{ACC }} = \left( {{\text{TP}} + {\text{TN}}} \right)/\left( {{\text{P}} + {\text{N}}} \right)$$
(16)

Result and discussion

As we are in the final step of our proposed work, in this section we discuss the accuracy and losses of the training and testing model that we have implemented with the validations loss over time. We will also have a look at the model classification reports.

We have taken the number of epochs to be 100 to get the maximum accuracy in our training model with each epoch going over 105 cycles. The batch size of the model is 32 which is the standard batch size for any model.

After training the model, we obtained a training accuracy of 98.23%, which is excellent accuracy for any model. The validation of our trained model is 98.21%. The training and validation loss is reduced to a minimum to improve the system’s accuracy.

From the plot shown in Fig. 9, we can observe that as the number of epochs increased our validation and test loss over time almost becomes equal (tends to zero) The testing datasets that we used to test the model yields an accuracy of 97.11%, which is excellent for any model. Figure 10 shows the number of epochs used for training and validation.

Test loss

0.10665

Test accuracy

97.11%

Fig. 9
figure 9

Training and validation loss

Fig. 10
figure 10

Epochs processing in the system model

The shown graph in Fig. 11 represents the accuracy of the trained datasets and validation datasets. It represents an increase in accuracy as the number of cycles increases. The confusion matrix given below shows the implemented model success rate by giving true positive and true negative numbers in the testing of our model, which indicates the error rate and accuracy of the system model.

Fig. 11
figure 11

Accuracy curve

From the confusion matrix in Fig. 12, analyzing system model performance using classification reports can be done. Table 1 shows the classification report. From the classification report, we can observe that the precision of predicting positive images is correct being almost 99% of the time and the negative images (with no cracks) being 96%. The accuracy of the mode is also well above 97%. Table 1 shows the classification report for the proposed model. Table 2 shows the comparison result of the proposed model with existing model.

Fig. 12
figure 12

Designed model confusion matrix

Table 1 Classification report for the designed model
Table 2 Comparison of the proposed work with different methodologies

Conclusion and future scope

Here, our work is concluded by explaining the challenges that are present in surface crack detection and how we have tried to overcome them with the future scope of this proposed method.

This work developed a model for image processing to find structural damages and flaws in the structures such as buildings. As digital pictures used for crack investigation provide a variety of challenges for image analysis, some of which are inconsistent lighting, less contrast, and noise. Also the distance of the concrete crack from the camera. The resolution of the camera is also important which might result in blurry photos, poor contrast images, and thin objects that are difficult to identify are the key factors affecting how well the different crack detection systems in concrete operate. The majority of crack detection methods currently in use rely on photographic images, which are ineffective at identifying internal flaws like voids, environmental factors like weather, the color of surfaces, the fog presence along with the appearances of structures, optimizing the minimum value for termination procedures, photos with poor contrast and blurs, all of this lower required accuracy for the cracks identification process. Problems with convolutional neural network approaches include the need for several or large amounts of sample dataset points for fine-tuning a network, also data overfitting, tweaking parameters to boost accuracy, and the necessity for fast processors like graphics processing units are included. To manage the enormous quantity of data collected by such autonomous systems and to find, quantify, and categorize different types of cracks, a system must be developed. It is necessary to undertake a quick examination of fracture detection utilizing reliable decision-making techniques. Another crucial research question is the impact of lighting conditions on the efficacy and effectiveness of crack-detecting systems. In this paper, we’ve spoken about utilizing neural networks to solve such issues. We have developed a model that effectively and accurately detects fractures in surface pictures. To prevent data build-up and to improve the system, we employed Keras pre-processing.

The future scope of the proposed work is:

  • Real-world application of the model: In this study, we developed a model that virtually utilizes the provided datasets. The technology will next be put into practice in areas with real-world applications.

  • Building a more effective model: The model may be made to accumulate data more quickly, prevent overfitting issues, and process data more quickly for implementations in practical applications.

  • Automated crack inspection: The process of finding a crack in a structure using any method is called crack detection. The proposed approach makes use of radiometric, geometric, and contextual data that were retrieved from the photos in turn.

  • Railway track damage detection: A neural network-based method for measuring and detecting railway deterioration. Train accidents are frequently the result of damage to the railroad. This neural network-based measuring system provides a high degree of accuracy and is appropriate for applications requiring online railway damage identification and monitoring.