1 Introduction

In principal infrastructures like cable-stayed bridges, monitoring systems are used to evaluate the integrity of structures at all times through various measurement sensors. The monitoring system measurements include structural responses, such as girder and pylon displacement, acceleration at a cable and some specific locations, and environmental elements, such as temperature, wind speed, and earthquake. These measurement data are used to directly or indirectly evaluate the state of a structure through various methods. They are then stored and managed such that they can be used for analysis when needed.

One of the methods of directly using the measurement data is evaluating structure safety by comparing these measurement data with specific control criteria determined by various methods, such as structural analysis or load test. Accordingly, appropriate judgments and actions should be taken when a value that is not within the criteria is observed. However, no manuals or guidelines have yet been provided for that, and many studies were conducted to determine the criteria rationally (Cho et al. 2005; Chung et al. 2014; Kim and Song 2016). A number of researchers have attempted to evaluate the state of structures by directly using the measurement data. Kim (2017) assessed the soundness of a cable-stayed bridge by analyzing the correlation between temperatures and displacements. Wang et al. (2017) developed a time-dependent method for evaluating fatigue crack using limited measurement data. Meanwhile, Xia et al. (2017) suggested a damage identification method considering the temperature effect using strain data. Zhou et al. (2018) evaluated the status of a bridge using monitoring data of 2 years and a finite element analysis. Ye et al. (2020) studied the evaluation of the prestress loss behavior of prestressed concrete using fiber optic sensors. These methods of directly using the measurement data need further studies for practical application because of the restriction of available data, simulation- and experiment-based verification, limited application range, and so on.

The indirect methods for evaluating structures include damage detection methods based on a signal analysis and modal analysis methods in the frequency domain. Sohn et al. (2001) developed a method for detecting structural damage through autoregressive (AR)-based models. Langone et al. (2017) developed an integrated damage detection algorithm using two complementary damage indicators and the kernel spectral clustering algorithm. Meanwhile, Hearn and Testa (1991) verified, through an experiment, that damage can be estimated with modal parameters, such as the natural frequencies and damping ratios of structures. Cross et al. (2013) analyzed the effects of temperature, wind speed, and traffic load on natural frequencies using acceleration data and suggested a model for predicting the changes in natural frequencies considering external factors. Islam and Bagchi (2014) conducted a statistical analysis of the measurement data to evaluate structure damage. Choi et al. (2017) developed a technique for estimating the entire structure behavior using limited displacement data. In order to estimate the state of the structure indirectly, various studies have been attempted using measurement data.

Moreover, many studies on the methods of using artificial intelligence have lately been conducted in the field of structural damage detection. Abdeljaber and Avci (2016) developed a nonparametric damage detection algorithm with ambient vibration response and verified its applicability through a finite element model. Abdeljaber et al. (2017) also developed a vibration-based structural damage detection method using convolutional neural networks and experimentally verified the efficiency of the proposed method. In addition, Lin et al. (2017) developed a method for identifying damage locations using a deep convolutional neural network and validated its strengths with a numerical simulation. Padil et al. (2017) proposed a damage detection method using the modal characteristics of the structure and non-probabilistic artificial neural networks. Consequently, they verified the method with a numerical model and a laboratory test. Teng et al. (2019) proved that the damage detection accuracy can be improved by using convolutional neural networks with modal strain energy and dynamic response as inputs. Most of these AI-related studies focused on improving algorithms for learning various intended damage cases.

Structural responses are determined by the mechanical relations of a structural system and the external loads. Although different responses can be obtained depending on the applied loads, the responses would be in a certain possible range when the structure constantly maintains its condition, and the loads are in a normal range. Assuming that the structure responses in a constant state have a certain pattern, the measured data pattern can be regarded as representing the structural system state. A change in the structural state (e.g., a partial defect of a structural element) can affect some responses related to it. Changed responses can lead to a change in the other responses and can be represented in the pattern change of a certain response of the structure. When the pattern change of the structural response is identified, it can be suspected that some structural systems have been changed, even though the significance of the amount of pattern variation is not exactly known.

Pattern changes can occur not only temporarily from accidental events, but also gradually over long periods of time. If the pattern change caused by an accident is sustained, the structural system can be regarded as changed by that event. Long-term accumulated data represent the historical information of structural behaviors, rather than only the quantitative data increase, and can be used to estimate the gradual degradation of the structural performance over a life cycle. If the tendency of the structural degradation can be determined, then the structural life can efficiently be managed by establishing appropriate maintenance strategies in a long-term perspective. Analyzing the data pattern change considering the mechanical relation between the data and the structure makes it possible to estimate the change of the structural system causing the pattern change.

In monitoring the variation of the measurement data pattern, the pattern must be numerically represented, and the difference between the patterns must be quantified. This study aims to analyze the long-term accumulated measurement data based on a pattern analysis and propose a method for evaluating the pattern variation and represent it with a numerical quantification. The autoregressive model has been used to numerically represent the pattern of time series data, while the reference pattern model has been constructed using representative data. All data are applied to the reference pattern model, and it leads to simulate the data reflecting the reference data pattern. The pattern differences are computed by comparing the simulated data with the actual data. The measurement data of the cable-stayed bridge, which have been accumulated for approximately 3 years, are analyzed using the proposed method.

2 Methodology

2.1 Autoregressive Model

The AR model is based on the assumption that the value of the data over time varies depending on its past value. As shown in Eq. (1), the data at time t is represented by the linear combination of previous data and error term.

$$y\left( t \right) = \emptyset_{1} y\left( {t - 1} \right) + \cdots + \emptyset_{p} y\left( {t - p} \right) + e_{t}$$
(1)

where, y(t) is the time series value at time t; p and \(\emptyset_{i}\) represent the order and parameters of the AR model respectively; and \(e_{t}\) is a white noise.

The analysis of the time series data using the AR model has been widely conducted to date based on Yule’s research (1927) on predicting the annual changes in sunspots (Figueiredo et al. 2010; Yao and Pakzad 2012; Guidorzi et al. 2014; Datteo et al. 2017; Kaloop et al. 2019).

2.2 Autoregressive with Exogenous Input Model

The structural response data come from various external factors, but the AR model cannot consider these effects of external factors. In the AR-based model, the model for representing the relationship between the input and the output in the time domain is autoregressive with exogenous input (ARX) calculated as shown in Eq. (2):

$$y\left( t \right) + a_{1} y\left( {t - 1} \right) + \cdots + a_{{n_{a} }} y\left( {t - n_{a} } \right) = b_{1} u\left( {t - 1} \right) + \cdots + b_{{n_{b} }} u\left( {t - n_{b} } \right) + e\left( t \right)$$
(2)

where, u(t) and y(t) are the input and output data at time t; and \(n_{a}\), \(n_{b}\), and \(a_{i}\), \(b_{i}\) represent the orders and the parameters of the output and input data, respectively.

The time series pattern can be defined by the ARX model considering the input data. Moreover, the pattern difference with the other data can be checked using this pattern model. In this study, the ARX model was used to analyze the pattern change of the measured data from the cable-stayed bridge.

2.3 Measured Data

The data used were the temperature and the vertical displacement at the center of the girder hereinafter referred to as the girder vertical displacement (GVD) data and the temperature and lateral displacement at the top of both pylons in an operating cable-stayed bridge. The lateral displacement at the left pylon was the longitudinal direction and hereinafter referred to as the left pylon longitudinal displacement (LPL) data. The transversal displacement was used at the right pylon and hereinafter referred to as the right pylon transversal displacement (RPT) data. The 1 h-averaged data were used for the pattern analysis. In addition, the temperature and displacement data were utilized as the input and the output, respectively. As shown in Fig. 1, a number of outliers and missing values are included in the measured data for approximately 3 years from January 2016 to November 2018. However, no preprocessing for correcting these data was conducted herein because artificial corrections may affect the pattern analysis.

Fig. 1
figure 1

Measured data used

Table 1 presents the statistical features of the data (i.e., mean, standard deviation (STD), skewness, and kurtosis) with the number of missing data. The mean value for the LPL data was approximately − 13.79 mm, indicating that the pylon was slightly inclined to the abutment. Meanwhile, the mean value of the RPT data was approximately 1.991 mm. The standard deviation showed a larger value in the LPL and GVD data than in the RPT data. The GVD data had a skewness value of 1.273, showing a little biased distribution compared to the lateral displacements of both pylons. Such a biased distribution of the GVD data can be regarded as caused by the vehicle loads applied on the girder at all times. The LPL data had a kurtosis of approximately 2.127, which is lower than 3 (i.e., kurtosis of a normal distribution), but the RPT and GVD data had large values of 9.596 and 32.984, respectively. The data that are more outlier-prone than the normal distribution show a larger kurtosis value than 3. Figure 1 shows that the RPT and GVD data had many outliers, and such excessive outliers enlarged the kurtosis value. The mean and the standard deviation of the temperature data at the girder had slightly larger values than those of both pylons. Moreover, more missing data were distributed in the displacement data than in the temperature data.

Table 1 Statistical features of the used data

2.4 Pattern Analysis Scheme

The pattern analysis of the measured data was conducted as shown in Fig. 2. The data were divided into subsets with a certain length to determine the pattern variation progress. Subsequently, the pattern differences of each set were analyzed. Therefore, the continuous input and output data of the same time span, which did not include the missing values, were set to be one dataset. The dataset length should be appropriately determined considering the data characteristics. The length of one dataset herein was set to 30 days.

Fig. 2
figure 2

Pattern analysis procedure

Prior to the pattern analysis, all the input and output data were divided into datasets with a constant length and the same measured time, except for the missing range. The data of both pylons and the girder consisted of 27 and 26 pairs of datasets, respectively. The shaded area in Fig. 3 represents the inappropriate range for composing the dataset, in which the missing values were contained and cannot be continuous for 30 days.

Fig. 3
figure 3

Dataset configuration

The reference pattern for comparing the differences must first be defined to analyze the pattern change. These reference data can be selectively used with appropriate methods from the entire data, or the initial data can be used. This study uses the first dataset as the reference data. The ARX model was employed to formulate the input and output relation of the reference dataset (i.e., pattern). The ARX model orders for the pattern model construction should be rationally determined considering the data characteristics. Once the ARX model orders were determined the ARX model can be constructed by calculating the parameters that best match with the reference data. Various optimization methods can be used for calculating the ARX parameters. Among them, the least square method was used in this study.

The constructed ARX model exhibited a relationship between the input and the output of the reference dataset. Thus, the virtual output values reflecting the reference data pattern can be simulated by applying the input and the output of the other datasets to the reference pattern model. If the applied dataset pattern is similar to the reference dataset, the resulting output will match the actual value well; otherwise, it will show some differences. Therefore, the difference between the values of the simulated output data from the pattern model of the reference dataset and its actual values meant the difference of the patterns between the data and the reference data. These differences can be determined through various discriminant indexes. The normalized root mean square error (NRMSE) fitness value in Eq. (3) was used to quantify the differences.

$${\text{NRMSE}} = 100\left( {1 - \frac{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} } }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \bar{y}} \right)^{2} } }}} \right)$$
(3)

where, \(y_{i}\) is the measured output data; \(\hat{y}_{i}\) is the simulated output data through the ARX model; and \(\bar{y}\) is the mean of the measured output dataset. The input and the output of all the data were applied to the ARX model in Eq. (2). Subsequently, \(\hat{y}_{i}\), which reflects the reference data pattern, was simulated step by step. The difference between the measured (\(y\)) and simulated (\(\hat{y}_{i}\)) data for each dataset was calculated using Eq. (3).

The model accuracy according to the ARX order was analyzed with the reference data of the left pylon displacement using the NRMSE value to determine the ARX model orders. The ARX parameters for each ARX order were derived using the least square method. Figure 4 plots the difference between the actual data and the ARX model-reflected data. The ARX model accuracy gradually increased with the ARX order and appeared to converge near 25 of the order. Due to the data used herein are 1 h-averaged data, both the input and output orders of the ARX model were set to 24 considering the periodicity of 24 h per day.

Fig. 4
figure 4

Model accuracy according to the ARX order

3 Measured Data Pattern

The first dataset was used as the reference data for comparing the whole data patterns. The pattern of the input and output relations over time of the reference dataset was defined through the ARX model. The input and output orders of the ARX model were applied as 24, indicating that 24 input and output data were used to estimate one output data value. The ARX parameters that best fit the reference dataset pattern were determined by the least square method. Figure 5 shows the ARX model parameters of each reference dataset. In the output parameters, the first parameter had the largest value in all models, implying that the pattern of the measured displacement data was most affected by the data just before. No tendency of decrease was observed in the parameters even if the ARX orders increased. Thus, it is considered that the influence of all data was properly reflected in defining the pattern models.

Fig. 5
figure 5

Input and output parameters of the ARX order

Table 2 presents the statistical features of each ARX model. The maximum and mean values were the absolute parameter values. The pattern contribution value (PCV) was defined to check the influence of the input and the output in ARX model construction. The data and the ARX parameters had different scales of variation range; thus, these differences were considered with the standard deviation of each value. Let the standard deviation of the data be denoted as \(STD_{d}\) and that of the ARX parameters be \(STD_{p} .\) The PCV is then expressed in Eq. (4) as

$${\text{PCV}} = STD_{d} \times STD_{p}$$
(4)
Table 2 Summary of the ARX parameters

The PCV represents the effectiveness of the ARX parameter considering the data variation. Let the ratio of \(PCV_{o}\) in the output parameters over \(PCV_{i}\) in the input parameters be defined as the PCV ratio that represents the ratio of the effectiveness of the output and input data used for pattern definition. A larger value of the PCV ratio meant a lower influence of the input data for defining the output data pattern. This result implies that the LPL data pattern was dependent on its own past value compared to the temperature, while the GVD data had a temperature-dependent pattern. The input and output influences were evenly reflected in defining the RPT data pattern.

Figure 6 shows the output values of the actual and simulated data from the reference ARX model. The pattern of all datasets was properly defined because the simulated data represented the actual data well. Table 3 briefly summarizes the difference between the actual and simulated data. The maximum and mean values were the absolute error values. Compared with the results of both pylons, the maximum, mean, and standard deviation values of the difference between the measured and simulated GVD data had larger values. The large variation of the GVD data was considered to cause these results. In order to exclude such differences, the results were divided by each standard deviation (Table 4).

Fig. 6
figure 6

Measured and simulated data of the reference dataset

Table 3 Validation results of the defined pattern model
Table 4 Normalized validation results of the defined pattern model

The highest NRMSE value in the normalized results was 91.91 in the LPL data. Although the influence of the input data on defining the LPL data pattern was low, an accurate pattern model was derived. From this result, the longitudinal displacement of the pylon can be regarded to have a pattern that depended on its own past values. The NRMSE values of the RPT and GVD data were 73.25 and 78.24, respectively. The pattern model of the RPT data that relatively evenly considered the influences of the input and the output showed a less fit result than the GVD data, which had an input-dependent pattern model.

4 Pattern Analysis Results

In the pattern analysis of the measured data, the variation of the pattern differences was examined based on the first dataset pattern. The reference pattern model of each data was used to derive the simulation data throughout the entire dataset. The simulated data were compared with the actual values to determine the difference in the patterns. Figure 7 shows the measured and simulated data for the entire dataset. The simulated data represented the measured data well in all datasets, except for that containing outliers.

Fig. 7
figure 7

Measured data and simulated data in all datasets

Figure 8 depicts the NRMSE values for each dataset. The NRMSE values, which were supposed to be in the normal condition, fluctuated between 85 and 90 for the LPL data, 65 to 75 for the RPT data, and 70 to 80 for the GVD data. Transient differences in the NRMSE values were found in dataset No. 22 of the LPL data, dataset nos. 14 and 23 in the RPT data, and dataset nos. 4, 19, and 21 in the GVD data. Figure 9 shows the measured and simulated data of these datasets that represent temporary differences. In all datasets containing outliers, the errors between the measured and simulated data rapidly increased after the outliers were found. Settling such sudden errors took time. The errors were accumulated in this settling time and led to noticeable differences.

Fig. 8
figure 8

NRMSE variation in all datasets

Fig. 9
figure 9

Validation of datasets with outliers

An analysis of the temperature and displacement data patterns of both pylons and the girder showed no meaningful change or gradual tendency in the pattern for approximately 3 years. That is, no change in the structural system was observed for this measured period.

5 Conclusion

This study investigated a methodology for analyzing the pattern variation of measured data. The pattern variation identification can be used in the structural maintenance field in various manners through further studies on its application, although the pattern of the structural response data is not directly related to the structure performance. Assuming that the structural performance degradation induces a pattern change in structural responses, the structural damage occurrence or structural performance deterioration can be evaluated in a long-term point of view through a pattern analysis of the measured data. The measured data of the cable-stayed bridge for approximately 3 years were used herein to examine the data pattern. The pattern variation of the displacement data was analyzed with the temperature data as an input. The following conclusions are drawn:

  1. (1)

    From the result that no significant pattern change was observed, it is considered that the structural performance degradation did not occur during the measured period of the data used herein.

  2. (2)

    For the dataset containing outliers, the error of the ARX model rapidly increased, requiring some settling time to reduce the error, such that the pattern difference remarkably appears.

  3. (3)

    For the data showing a difference in pattern compared with the other data even though no outliers were included, the event that occurred, which caused the change, or the sensor inspection history at that time should be checked.

  4. (4)

    The methodology used can be applied to establish a monitoring strategy for the structural behavior in a long-term point of view and support decision making relevant to structural maintenance.