Keywords

1 Introduction

Over years, automated structural health monitoring (SHM) techniques has been developed to efficiently obtain reliable information from maintaining the structural health of the building. Among such information, the ground settlement-an important source regarding the certain structural damages to buildings-needs to be quickly detected and assessed to prevent against the spread of damages. Thus, evaluating the level of the ground settlement plays a crucial role for SHM and damage inspection of buildings. Since the rapid advancement of computer vision techniques, several vision-based methods, mainly are image processing techniques, have been applied to the damage classification issues. Deep learning-based on a CNN has been considered as an efficient approach for the object identification [1,2,3]. The architecture of CNN-a multi-stage or multi-layer architecture-was firstly proposed by LeCun [4]. For a given the robustness of the deep learning models, the CNNs have been considered as the dominant method of machine learning for visual object recognition. The CNN transforms raw data into representative feature spaces via different combinations of linear and non-linear operations. Then, the feature spaces can be further transformed in accordance with the goal of the tasks. Numerous SHM techniques using CNN have been developed to replace time-consuming and costly traditional methods [5,6,7,8], and most such methods only classify specific types of structure, such as concrete or steel [9,10,11,12,13]. In the design of a good classification system, image properties resulting from low-level image processing operations should be extracted to be highly detailed and distinct from the represented class, called features extraction [14, 15]. Many deep learning models together with pre-trained weights are promising frameworks because of their high accuracy predictions in image classification and recognition, such as Xception, VGG16, InceptionV3, MobileNet, DenseNet121, NASNetMobile, EfficienNetB0. In these CNN architectures, many loss function optimization algorithms are often coupled inside the architecture to update and optimize the learned network parameters, for instances, SGD, AdaGrad, RMSprop, Nadam, Adam, Adamax, Ftrl. In this study, the performance of various CNN-based models on the datasets of over thousand images of ground settlement was in cross comparision in terms of the evaluation of accuracy. Then, the model with the highest accuracy was optimized with numbers of optimization algorithm to improve the accuracy of training, validation, and testing sets.

2 Datasets

The database contains 1200 images (ground settlement) taken by the camera of smartphone Iphone 12 Promax, and were labeled for three categories of damage level. These images were taken approximately 1 m away from the surfaces, in which the camera were aligned perpendicularly to the ground surface. The representative images of three classes are shown in Fig. 1. We noted that the captured figures were artificially meshed with the grid size ~ 5mm, in which the mesh at settled zones were nonlinear distributed with different level of settlement as shown in Fig. 1.

Fig. 1
figure 1

Three categories of damage level with artificially meshed grid

The image datasets were then selected for training, validation, and testing sets. As a general rule of thumb, 70% of randomly selected images from the database are used for training while 15% of that are used for validation throughout the training, and the rest of 15% are used for testing. Table 1 presents the number of images used for training, validation, and testing.

Table 1 Number of images used for training, validation, and testing

3 Results and Discussions

3.1 Performance of the Models

Seven state-of-art pretrained networks, including: Xception, VGG16, InceptionV3, MobileNet, DenseNet121, NASNetMobile, EfficienNetB0 were used for the inspection of the damage classification of the ground. All networks were pre-trained on ImageNet data by Python code, and the deep learning models were formulated based on Tensorflow library. Each of pre-trained networks was trained with the training datasets as shown in Table 1 for 50 epochs. Figure 2 presents the results of accuracy on the training and validation set. It can be seen that the convergence stays at around 30 epochs for the most networks.

Fig. 2
figure 2

Accuracy on a training set and b validation set

For the testing set, the performance of these networks was lastly evaluated with the results of accuracy as shown in Table 2. It is interesting that the VGG16 network demonstrates the lowest value with an accuracy of 87.22%, whereas the DenseNet121 network performs the highest one with an accuracy of up to 96.11%.

3.2 Performance of the Optimizers

To minimize the loss of neural networks, backpropagation algorithm was used in this study. The algorithm calculates derivative of the cost function for parameters in the neural network. When a neural network passed through a batch with a returned value, the decision on the use of differences between the returned value and the foreknown value needs to be correct in order to adjust the weights of the nodes in the network. The algorithm used in this step can be called as the optimization algorithm. Seven optimizers, including: SGD, AdaGrad, RMSprop, Nadam, Adam, Ftrl, Adamax, were observed in this study. The results of accuracy and loss on the training and validation sets are shown in Figs. 3, and 4, respectively.

Table 2 Results of accuracy on the testing set for seven networks
Fig. 3
figure 3

Accuracy on a training and b validation sets

Fig. 4
figure 4

Loss on a training and b validation sets

The accuracy on testing set of all optimizers is shown in Table 3. As clearly, the DenseNet121 architecture using the Adam optimization algorithm performs the highest accuracy of 96.11%, while the SGD optimization algorithm presents the lowest accuracy of 87.22%.

Table 3 Accuracy on the testing set of seven optimizers

3.3 Performance of DenseNet121 Network

In the DenseNet architecture, each layer receives additional inputs from the all previous layers and passes its own feature maps to all subsequent layers to maintain the forwarding nature of the feed. Therefore, the problem of vanishing derivatives can be solved in this kind of architecture. Concept of the architecture is illustrated in Fig. 5.

Fig. 5
figure 5

A 5-layer dense block with a growth rate of k = 4. Each layer takes all preceding feature-maps as inputs [16]

Figure 6 depicts the change of loss and accuracy values versus epoch. It can be seen that the value of losses during the training is continuously decreased, however, with a slow rate. Apparently, the convergence stays at around 30 epochs. Figure 7 shows the confusion matrix and normalized confusion matrix, in which the accuracy of each class tests is represented by the color saturation contours. The results showed that 7 out of 180 images were misclassified. Additionally, the receiver operating characteristic (ROC) and Precision-Recall curves are shown in Fig. 8.

Fig. 6
figure 6

a Accuracy and b Loss on the training set and validation set

Fig. 7
figure 7

a Confusion matrix and b Normalized confusion matrix

Fig. 8
figure 8

a ROC curve and b Precision-recall curve

4 Conclusions

Since the ground settlement can result to cracks of the buildings or failure of the diagram wall during excavation phase. It is needed to monitor and detect these kinds of settlement by real-time image processing. Recently, with the advancement of artificial intelligence, the deep learning technique has been widely applied to various fields for detection and classification using image database. In this study, the feasibility of deep learning technique application to the assessment of ground damage is presented. Seven network architectures, including: Xception, VGG16, InceptionV3, MobileNet, DenseNet121, NASNetMobile, and EfficienNetB0 were in cross comparison in terms of accuracy.

Consequently, the pre-trained networks were highly applicable for the classifications of damage level, although they were trained on completely different datasets due to the sharing of low-level features. These features were learned during the training process and transferred to other objects with a rapid convergence and high accuracy. The pre-trained networks are promising for their implementation in CNN architecture with a limited number of training samples. Among seven selected CNN architectures, the DenseNet121 network architecture performs the most efficiently accuracy of 96.11% for testing set. Furthermore, seven different optimizers were coupled on DenseNet121 model, in which the Adam optimizer performs the highest accuracy in comparison with other optimizers.

In further researches, the database with images of ground settlement should be captured considering various conditions (such as lighting, camera distances, and angle to the ground sufaces) to improve the accuracy and robustness of the proposed method.