1 Introduction

To ensure the safety and operational efficiency of high-speed trains, it is essential to maintain sufficient stiffnesses and high natural frequencies in railway bridges. Furthermore, they must withstand seismic forces without damage during low-intensity earthquakes, and ensure the safety of passengers within train carriages during high-intensity seismic events. The ballastless track offers some longitudinal restriction for bridges with the extensive laying of seamless lines, which improves bridge integrity and connects neighboring span simply supported girders as a connected structure [1].

China railway track system Type II (CRTSII) is a typical structural style of ballastless track [2]. It is specifically developed to address the demands of railroad bridges, particularly those that span considerable distances. CRTSII has found application on several long railway lines, including the Beijing-Tianjin, Shanghai-Hangzhou, and Beijing-Shanghai routes [3,4,5,6].

China's high-speed railways have extended dramatically over the last several years [7, 8]. The impact of the track construction on the bridge's seismic resistance cannot be ignored, as the track bears a part of the ground vibration conveyed by the foundation under an earthquake. To ensure train running safety is not only of significant theoretical importance but also holds practical engineering value [4,5,6, 9,10,11].

To assess the seismic performance of both a simply supported girder bridge and CRTSII track slabs [12, 13]. Researchers found that stresses in the rails, track plates, and base plates were present close to the abutments or anchorages of the bridge. Montenegro et al. [14]  proposed an analytical approach for nonlinear train-bridge interaction method, and assess the running safety evaluated with safety criteria existent in the literature. Zhao et al. [4,5,6] proposed the velocity-related SI index for the first time and improved the limitations of the conventional train running safety indices. Su et al. [15] uses spectrally similar short- and long-period ground vibrations to analyze how ground movement length affects the seismic behavior of reinforced concrete (RC) bridge piers. To evaluate the seismic performance of the track structure from a stochastic perspective, Li et al. [16] evaluated the stochastic seismic response of a high-speed railroad's connected rail-bridge system using a probabilistic densities evolving technique.

In the realm of seismic testing methods, the shaking table's test methodology are critical for precise earthquake vibration simulation to study structural analysis in a laboratory setting [17]. Jiang et al. [18] conducted experiments on continuous girder bridges for high-speed railroads to analyze the structural damage state and investigate the effects of various seismic levels and installation orientations on structural seismic response. Wang et al. [19] examined liquefaction’s impact on seismic safety and bridge collapse dynamics using shake-table tests on pile-group supported bridges in liquefiable and non-liquefiable sands. Yang et al. [20] studied the effects of a collision on the lateral seismic response of a bridge model and the damping effect of rubber buffers through a series of shaking table experiments on a 1/6 scale bridge model, also exploring the seismic response of a bridge via shaking table testing.

Over the past few decades, Fiber Bragg Grating (FBG) sensors have gained prominence in assessing the structural condition of existing infrastructure. Zhang et al. 21 employed quasi-distributed optical fiber technology to investigate the force characteristics, ductility performance, and damage mechanisms of the ballastless CRTS II plate in shear failure scenarios. Chan et al. [22] provided an extensive conceptual study of fiber grating sensors in current infrastructures. Wang et al. [23,24,25] employed strain transfer analysis to assess the utility of fiber optic sensing technology for in-situ monitoring of the structural integrity of single and multi-layer asphalt pavements. To monitor the deterioration response of asymmetrical concrete-reinforced cracking structures exposed to increasing seismic stress, Zhang et al. [26, 27] utilized fiber-optic grating sensors. In the context of bridge engineering, Lu et al. [28], investigated the dynamic and stationary pressure division technique for large-scale strain gauges on substantial-span rigid bridges under vehicle loading using an externally attached fiber-optic grating strain transmitter. Zhao et al. [29] built a train-bridge dynamics model and investigated the influence of temperature deformation of the main beam on beam deflection generated by the train.

Based on existing fiber grating detection examples in engineering, this paper employs fiber grating Wavelength Division Multiplexing (WDM) technology [22] to realize a quasi-distributed fiber grating sensing system by connecting multiple FBG sensors in series on a single fiber [30,31,32] and attaching the series fiber grating to the scaled CRTS II track model to achieve long-range multi-point acquisition. The advent of advanced technologies such as big data, machine learning, and artificial intelligence has heralded novel concepts and methodologies in seismic mitigation theories and technologies for bridge structures.

Artificial neural networks (ANNs) have been widely used in recent years to predict structural seismic responses [33,34,35], damage state [36, 37], and failure mode [38], as well as to evaluate structural seismic performance [39] and damage state [40] by demonstrating superior nonlinear function modeling capability [41]. Wang et al. [31, 32] investigated machine learning (ML) methodologies for an accurate estimate of bearing deformation and column drift ratio responses of bridges, particularly those supported by extended pile shafts. To forecast the time series of seismic reactions of ground structures, one-dimensional convolutional neural networks (1D-CNN) and long-short term memory (LSTM) neural networks were built using extensive research on artificial neural networks [27, 35, 42]. Several recent studies have delved into the potential of deep learning in this domain. For instance, Bilal et al. [43] developed an early warning system for earthquake prediction from seismic data using batch normalized graph convolutional neural network with attention mechanism that can successfully predict the depth and magnitude of an earthquake event at any number of seismic stations in any number of locations.

Similarly, Zhang et al. [21] and Zhao et al. [44] utilized recurrent neural networks (RNNs), particularly LSTM networks, to model temporal sequences of seismic data, showcasing their efficacy in real-time earthquake detection. Furthermore, hybrid models that combine the strengths of multiple deep-learning architectures have gained traction.

In light of these advancements, this research aims to further the understanding of seismic responses prediction through deep learning techniques. This research proposes a CNN-LSTM network hybrid model response prediction approach based on the CNN and the LSTM network to increase the prediction accuracy of seismic response fitting. It combines CNN and LSTM network features, and employs quasi-distributed fiber grating to gather stresses of simply supported girder bridges under seismic impacts, and creates a continuous feature map of the observed grating locations, seismic orientations, and peak accelerations as input. Leveraging the deep learning algorithm, the model is adept at predicting the strain across the span, showcasing the potential for improved accuracy in seismic response predictions.

However, it's essential to acknowledge the potential limitations of our proposed method. While our model has demonstrated superior performance in controlled experiments, its computational effectiveness in real-life scenarios, especially for high-speed railway bridges, warrants further exploration. Deep learning models, particularly hybrid ones, can be computationally intensive. When applied to complex structures like high-speed railway bridges with vast amounts of data, the computational time might increase, potentially affecting real-time applications. Moreover, the model's accuracy could be influenced by the quality and quantity of the data available. In real-world scenarios, data might be noisy, incomplete, or not as diverse as in controlled experiments, which could impact the model's predictions.

Nevertheless, the promising results from our research provide a strong foundation for future studies. Further optimizations and refinements, both in terms of the model architecture and data processing, could address these limitations, making the method more suitable for real-life applications.

2 Data gathering and processing

By constructing a scaled basic girder bridge on a shaking table testing system [11], quasi-distributed fiber-optic gratings were installed at the track plates of the scaled bridge's mid-span section, respectively, to measure strain response in various directions along the same line.

2.1 Seismic table experimental setup

In this study, we use the multi-span simply supported girders of the CRTSII plate ballastless track system as the research object, create a scaled-down model of the bridge with a similarity ratio of 1:10, and build a bridge operation test platform with four rows of shaking tables.

The prototype of the scaled-down model is a Chinese high-speed railway simple supported box girder bridge [45]. The piers are round-end solid piers with heights varying from 3 to 20 m, and the girders are prepared having concrete simple-supported box girders with an overall length of 32.5 m. Equal-section piers are those with a height under 14 m, whereas variable-section piers with a slope of 1:45 are those with a height over 14 m. The anti-fall girder mechanism has a trigger spacing of 20 cm. Basin rubber bearings have 5000 kN and 1000 kN maximum vertical and horizontal bearing capabilities, respectively. To create relative movement between the ceiling and the bottom basin, polytetrafluoroethylene (PTFE) plates with a low coefficient of friction may be employed. Under three-dimensional stress, the rubber is fluid and may be utilized to rotate the main beam. Seals are used to keep the rubber from deteriorating due to exposure to air.

The track is a ballastless slab-type track system known as CRTSII. To reduce the temperature stress on the track construction, a sliding layer is inserted between the box girder and the base plate. As a buffer layer between the filler materials, a layer of CA mortar is installed between the base plate and the track plate. Fasteners hold the rail to the track plate; transverse blocks are installed on both sides of the bottom plate and the track plate to limit their lateral movement; shear reinforcement is installed between the bottom plate and the track plate at the ends of the girder joints; and shear grooves are installed on the surface of the box girder above the fixed supports to limit the movement of the bottom plate. Blocks and fasteners are separated by 0.65 m and 6.5 m, respectively.

The test used two sine wave seismic excitations, the characteristics of which are reported in Table 1. The earthquake frequency is 10 Hz, and the peak acceleration is 0.1 g and 0.2 g. Figure 1 depicts the model installation. The scaled-down bridge is a steel bridge with a 1:10 scale, each span is 3.25 m long, and there are a total of 11 spans. Table 2 shows the model similarity coefficients. Steel plates were used for fasteners; rails, track plates, base plates, girders, and piers were made based on equivalent bending stiffness; shear bars, shear gears, lateral blocks, and bearings were experimented for different sizes of specimens based on the principle of equivalent effectiveness and displacement, and the most suitable size was selected based on the experimental results; One 4 m by 4 m six-degree-of-freedom fixed table and three 4 m by 4 m six-degree-of-freedom movable tables make up the slide of the shaking table testing system. There is an adjustable separation of 625 m between the table array.

Table 1 Parameters of sine wave
Fig. 1
figure 1

Device model diagram

Table 2 Scaled-down model similarity coefficients [46]

2.2 Data capture device with a fiber grating

An optical fiber with seven grating spots is epoxy resin attached to the track at the middle portion of the bridge span, and it is organized as illustrated in Fig. 2. To guarantee that the grating points were uniformly distributed on the monitored structure, the optical fiber with seven grating points was pasted on the track plate according to the fourth grating point matching to the midpoint position of the bridge span. Figures 2 and 3 illustrate the data collection and schematic diagrams, respectively.

Fig. 2
figure 2

FBG arrangement diagram

Fig. 3
figure 3

Schematic diagram of data acquisition

3 Model architecture

The CNN-LSTM model combines two models: CNN and LSTM. The feature vector is initially extracted using CNN, after which it is created into a time-series sequence and used as input data for the LSTM network. The LSTM network is then used to forecast responses. CNN is used to extract spatial information from response data at each time point, primarily via convolutional operation and pooling operations. The original response sequence is transformed into a depth feature time series after feature extraction. The LSTM model is used to train the depth feature time series retrieved by the CNN algorithm. The whole procedure may be broken down into two stages: data pre-processing and model training.

3.1 Introduction to CNN models

The processing of image data by ANN is inefficient since there are too many inputs and training parameters. CNNs were created to work around the constraints of ANNs while analyzing class image data. CNN is the first genuinely constructed multilayer neural network technique with high network depth scalability. The number of network parameters is minimized while the deep characteristics of multidimensional data are preserved by using convolutional and pooling techniques. As a result, CNNs are frequently utilized in image recognition and computer vision. Convolutional layers, pooling layers, and fully linked layers comprise the CNN architecture. Brief descriptions of these several levels are provided below.

  1. (1)

    Convolution operation [47].

A crucial phase in the CNN operation is the convolutional operation. Convolutional processes are used to calculate the input and output by the convolutional layer and the nonlinear activation function, as shown below.

$$y_{i} = \sigma \left( {k_{i} * x + b_{i} } \right), \quad i = 1,2...K$$
(1)

where \(x\) is the convolutional layer's input with width \(W_{1}\), height \(H_{1}\), and depth \(D_{1}\). The convolution operation is denoted by the operator \(*\). The letter \(k_{i}\) represents the \(i\)th trainable convolutional filter, Its dimensions are \(F \times F \times D_{1}\)(width, height and depth, respectively), The \(i\)th deviation of the convolution filter \(k_{i}\) is denoted by \(b_{i}\), \(\sigma\) stands for the nonlinear activation function, \(y_{i}\) represents the \(i\)th output matrix of the \(i\)th convolution filter. K convolution filters are used in each layer. Figure 4 shows the convolution procedure, with the step size of the convolution process equal to 1. The width is \(W_{2}\), the height is \(H_{2}\), and the depth is the number of convolution kernels K for the output of the \(i\)th convolution process \(y_{i}\). The input matrix and the \(i\)th convolution filter K dot product are used to compute each member of the \(i\)th output matrix.

$$\left\{ \begin{gathered} W_{2} = \frac{{W_{1} - F + 2P}}{S} + 1 \hfill \\ H_{2} = \frac{{H_{1} - F + 2P}}{S} + 1 \hfill \\ \end{gathered} \right.$$
(2)
Fig. 4
figure 4

Convolutional operation of convolutional neural network

When the span is S = 1, typically, setting the number of filled zeros on either side to \(P = \left( {F - 1} \right)/2\), assures that the input and output amounts are the same size in space. In Fig. 4, the convolution process of the convolutional neural network inserts a layer of zeros (gray) around each side of the original input matrix (purple). As a result, the following equation may be used to compute the width D and height H of the output.

  1. (2)

    Pooling and fully connected layers.

The pooling operation is a down-sampling method that takes the low-dimensional output of the appropriate sampling window and recovers an element (such as the maximum value, average value, and L2-parametric number). The down-sampling procedure of the pooling layer is presented in Fig. 5 pooling operation of a convolutional neural network. The output matrix has the same depth dimension as the input matrix since the procedure of getting the highest value in the pooling window is carried out separately on each slice in the input depth dimension. Additionally, the step size is often the same as the width or height of the pooling window, and the length and width of the matrix of results may be determined similarly to the convolutional layer. Layer pooling may minimize the number of representation spaces while keeping deep characteristics. It can minimize the number of parameters in the convolutional neural network, lowering the computing cost of the model, preventing model overfitting, and improving CNN's generalization capabilities. The completely connected layer, as the name indicates, contains numerous neural connections between two layers.

Fig. 5
figure 5

Pooling operation of convolutional neural network

3.2 Introduction to LSTM models

The LSTM network is a kind of temporal recurrent neural network that has been modified (RNN). It has been suggested and enhanced with the inclusion of another forgetting gate. The upgraded LSTM network eliminates the issue of "gradient disappearance" in model training and can learn long and short term time series dependent information. Figure 6 depicts the network's core units.

Fig. 6
figure 6

Basic units of LSTM

The LSTM network's fundamental unit consists of forgetting gates, input gates, and output gates [48]. Together with the state memory unit \(S_{t - 1}\) and the intermediate input \(h_{t - 1}\), the forgetting gate's input \(x_{t}\) determines the forgetting component of the memory unit. The sigmoid and tanh function modifications jointly decide the \(x_{t}\) in the input gate in order to keep the vector in the state memory cell. The updated \(S_{t}\), coupled with the output \(o_{t}\), determine the intermediate output \(h_{t}\), which is computed as follows [49].

$$f_{t} = \sigma \left( {W_{{{\text{fx}}}} x_{t} + W_{{{\text{fh}}}} h_{t - 1} + b_{f} } \right)$$
(3)
$$i_{t} = \sigma \left( {W_{{{\text{ix}}}} x_{t} + W_{{{\text{ih}}}} h_{t - 1} + b_{i} } \right)$$
(4)
$$g_{t} = \phi \left( {W_{{{\text{gx}}}} x_{t} + W_{{{\text{gh}}}} h_{t - 1} + b_{g} } \right)$$
(5)
$$o_{t} = \sigma \left( {W_{{{\text{ox}}}} x_{t} + W_{{{\text{oh}}}} h_{t - 1} + b_{{\text{o}}} } \right)$$
(6)
$$S_{t} = g_{t} \odot i_{t} + S_{t - 1} \odot f_{t}$$
(7)
$$h_{t} = \phi (S_{t} ) \odot o_{t}$$
(8)

where \(f_{t}\),\(i_{t}\),\(g_{t}\),\(o_{t}\),\(h_{t}\), and \(S_{t}\) are the corresponding states of the oblivion gate, input gate, input node, output gate, intermediate output, and state unit. \(W_{{{\text{fx}}}}\), \(W_{{{\text{fh}}}}\), \(W_{{{\text{ix}}}}\), \(W_{{{\text{ih}}}}\), \(W_{{{\text{gx}}}}\), \(W_{{{\text{gh}}}}\), \(W_{{{\text{ox}}}}\) and \(W_{{{\text{oh}}}}\) are the relevant gate's matrix weights multiplied by the input \(x_{t}\) and intermediate output \(h_{t - 1}\), respectively. \(b_{f}\), \(b_{i}\), \(b_{g}\), \(b_{o}\) are the bias terms of the associated gates. \(\odot\) represents the bit multiplication of vector elements. \(\sigma\) represents the sigmoid function's change. \(\phi\) represents the tanh function's change.

3.3 CNN-LSTM network hybrid model

3.3.1 Model architecture of CNN-LSTM

This paper delves into the intricacies of the fiber grating measurement data associated with the track plate. Central to our discussion is the CNN-LSTM network hybrid model, as illustrated in Fig. 7. This model is an amalgamation of 17 meticulously stacked functional layers, bifurcated into two primary segments: the CNN dedicated to feature extraction, and the long short-term memory LSTM network, which shoulders the responsibility of load prediction.

Fig. 7
figure 7

CNN-LSTM model

Before delving into the model's architecture, it's crucial to understand the nature of the input data. In Block 1 of the CNN algorithm, the input data comprises spatial–temporal features derived from the fiber grating measurement data. These features capture both the spatial relationships inherent in the data and the temporal dynamics over time. The CNN, with its convolutional layers, is adept at extracting spatial patterns and relationships from this data. Once these spatial features are extracted by the CNN, they are then transformed into a format suitable for the LSTM.

The LSTM, being a recurrent neural network, excels at processing sequences and time-series data. By feeding the spatial information extracted by the CNN into the LSTM, we leverage the strengths of both networks: the CNN's ability to recognize spatial patterns and the LSTM's capacity to understand temporal dynamics.

For a granular understanding of the CNN-specific parameters, readers are directed to Table 3. Our model's foundation is rooted in Python's sci-kit-learn machine learning toolkit, further bolstered by the PyTorch framework. A pivotal component of our hybrid CNN-LSTM network is the time series feature map, which serves as the primary input. It's imperative to note that data elements like grating location, seismic wave type, and monitoring time maintain their distinctiveness as time series. Drawing parallels from natural language processing, we employ the word vector representation method. This technique allows us to sequentially represent the strian at specific instances by aligning it with its associated features, thereby crafting an innovative time series dataset. Each data point encapsulates the historical load's characteristics.

Table 3 Detailed configuration of CNN network architecture

To further refine our model's input, we employ the sliding window approach. This method, with a window width set to 30,000 records, facilitates subsequent network computations. Consequently, the unit feature map dimensions are established at 30,000*6. For a detailed breakdown of the convolutional layers, including their respective sizes and step sizes, Table 3 offers a comprehensive overview, highlighting the model's five convolutional layers.

The input subsequence is initially processed in Block1, which contains three functional layers in that order, including convolution, ReLU, and pooling. In the diagram, they are labeled Conv_1, ReLU_1, and Maxpool_1. Conv_1 is the first layer, with an input size of 30,000 × 1 × 30, and the convolution layer is made up of 32 convolution kernels with a size of 1000 × 1 × 30 and a sliding window step size of 100. The output size is unaffected by the ReLU layer. The pooling layer is 2 × 1 × 32 in size and has a step size of 2. As a result, the output size of Block1 is 146 × 6 × 32. Convolutional layers are utilized in this model to extract the differentiating properties of the input data. The choice of five convolutional layers is based on LeNet-5 classification recognition results [50, 51].

3.3.2 Experimental evaluation index

The prediction results are reviewed to validate the CNN-LSTM model's prediction accuracy. The coefficient of determination R-square (R2) [52], the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE) [53] are used to statistically analyze the model prediction outcomes. The specifics are provided below.

$${\text{RMSE}} = \sqrt {\frac{1}{{T_{e} }}\sum\nolimits_{i = 1}^{{T_{e} }} {\left( {\hat{F}_{i} - F_{i} } \right)^{2} } }$$
(9)
$${\text{MAE}} = \frac{1}{{T_{e} }}\sum\nolimits_{i = 1}^{{T_{e} }} {\left| {\mathop {F_{i} }\limits^{ \wedge } - F_{i} } \right|}$$
(10)
$$R^{2} = 1 - \frac{{\sum\nolimits_{i = 1}^{{T_{e} }} {\left( {\mathop {F_{i} }\limits^{ \wedge } - F_{i} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{{T_{e} }} {\left( {F_{i} - \mathop {F_{i} }\limits^{ - } } \right)^{2} } }}$$
(11)

The RMSE and MAE metrics are both used to indicate how the anticipated value differs from the real value. The RMSE differs in that it first computes the square of the deviation, which magnifies the mistake if it is big. The coefficient of determination R2 is used to measure the model's average prediction accuracy and is the ratio of the sum of squares of total errors to the sum of squares of total deviations. Where \(F_{i}\) is the detected seismic response value, \(\hat{F}_{i}\) is the anticipated seismic response value, \(T_{e}\) denotes the number of detection points and \(\mathop {F_{i} }\limits^{ - }\) denotes the average of detection points, The closer the RMSE and MAE findings are to zero, the closer the results are to one, and the greater the R2 model's prediction accuracy.

4 Analysis of experimental results

4.1 High-speed railroad seismic response dataset

We evaluate the grating strain response data under various seismic excitations in this part, and the findings are displayed along with the impacts of the operating conditions' predictions in Table 2. Figure 8 displays the seismic response data set for high-speed rail, which consists of 900 data points in total. Of these, 720 data points are the training set and 180 data points are the test set. The pre-processing procedure results in the removal of 49 outliers in total. The 49 outliers are due to noise problems in the demodulator's data acquisition and are indicated as "NaN" in the original data, so they are deleted.

Fig. 8
figure 8

Seismic strain response dataset

4.2 Performance comparison with other deep learning-based models

In this study, we juxtaposed the performance metrics—MAE, RMSE, and R2—of our proposed method, CNN-LSTM, with three other prominent prediction techniques: long short-term memory (LSTM), Backpropagation (BP), and Gated Recurrent Unit (GRU). It's pertinent to note that BP is a supervised learning algorithm used for training artificial neural networks, and the GRU is a type of recurrent neural network architecture. A detailed comparison of these methods can be found in Table 4.

Table 4 Comparison of deep learning techniques' performance

The experimental findings demonstrate that the proposed CNN-LSTM model outperforms conventional deep learning approaches in predicting power consumption. The proposed model is followed by LSTM. The competitiveness of the suggested CNN-LSTM technique for seismic response prediction has been shown via tests. The CNN-LSTM network's prediction impact is depicted in Fig. 9. This graph demonstrates how the CNN-LSTM model's predicted outcomes are often compatible with the observed strain trend. No matter how great or tiny the strain value, it has a high forecast accuracy.

Fig. 9
figure 9

The effect of strain response prediction by the CNN-LSTM model

4.3 Model prediction effectiveness in quasi-distributed grating monitoring

In the present study, a systematic control variable methodology was employed to facilitate incremental adjustments to the model. The potential ramifications on predictive accuracy, stemming from augmenting the model's depth, were meticulously assessed by incrementally enhancing the number of layers within the long short-term memory network. To maintain a consistent benchmark, the influence of varying long short-term memory network layers on predictive outcomes was scrutinized, while retaining a static Convolutional Neural Network layer for unaltered feature extraction. The empirical findings are succinctly presented in Table 5, wherein the tabulated data represents the mean values of the evaluative indices across the seven distinct points of examination. It was discerned that while augmenting the number of long short-term memory network layers to deepen the model can potentially bolster predictive prowess, there is a concomitant increase in the error rate when the long short-term memory layers surpass a count of four, indicative of potential overfitting. Consequently, an optimal configuration of a three-layer long short-term memory network was adopted for the experiments conducted in this study.

Table 5 Model combination structure test results

Figure 10 provides a visual representation of the cross-entropy loss for both training and validation sets as they evolve over time. The training process is halted once the cross-entropy no longer exhibits a decline within a specified duration. As elucidated in Fig. 10, a discernible disparity exists between the losses associated with training and validation. This disparity markedly diminishes during the initial five cycles as both training and validation datasets expand. Upon reaching twenty-eight epochs, this gap is observed to be at its minimal magnitude. However, by the fortieth epoch, this difference begins to expand precipitously and lacks stability as the true value escalates, indicative of the onset of overfitting. Given these observations, the training strategy is consequently discontinued upon completion of fifty epochs. Therefore, the convolutional neural network-long short-term memory model, characterized by a total of twenty-eight epochs, is identified as the most optimal model, effectively minimizing the aforementioned gap.

Fig. 10
figure 10

Cross-entropy loss curve

Figure 11 provides an illustrative comparison between the anticipated and actual values derived from the intermediate grating point algorithm model. Figure 11a–f elucidates that, in the context of the strain response under varied seismic excitations, the CNN-LSTM algorithm model retains its capability to extract strain information from one location based on the grating strain response observed at different locations concurrently. However, it is noteworthy that the congruence of data at peak values exhibits some deviations. A closer examination of Fig. 11g, h reveals that the deep learning model proposed in this manuscript exhibits enhanced applicability when considering the track plate, rail, and base plate, as opposed to its performance with the box girder. The strategic positioning of the grating intimates that the box girder might be situated at a considerable distance from the primary site, with the model predominantly relying on strain data sourced from the track plate. Given the proximity of the rail and base plate to the track plate, their predictive outcomes are more aligned. Conversely, the box girder, being remote from the track and serving as a pivotal bearing point for seismic excitations, manifests strain values that deviate significantly from other locations. This divergence culminates in a suboptimal prediction performance for the box girder.

Fig. 11
figure 11figure 11

Comparison of model predicted values and true values

5 Conclusion

In this study, we strategically positioned a quasi-distributed fiber grating system at the track plate, rail, base plate, and beam to monitor strain variations in a shaking table-induced simple beam bridge model. We then introduced a hybrid model that combines the strengths of CNNs and LSTM networks. The CNN processes and extracts salient features from the data, while the LSTM excels in analyzing time-series data. The advantages and efficacy of this approach are further elucidated through our analytical investigations.

  1. (1)

    By employing time-sliding windows as input parameters, we meticulously construct continuous feature maps derived from multi-source data. This approach capitalizes on the inherent feature extraction capabilities of Convolutional Neural Networks, facilitating the extraction of more nuanced and pertinent information embedded within the dataset. The feature vector, constructed in a sequential time-series manner, serves as the foundational input for the long short-term memory network model. This configuration is particularly adept at accommodating the intricate nonlinear interactions and temporal characteristics inherent in the response data.

  2. (2)

    The CNN-LSTM hybrid model, blending the capabilities of both networks, has showcased its resilience and effectiveness through comprehensive analytical evaluations. When benchmarked against metrics like MAE, RMSE, and R2, this fusion model distinctly surpasses its individual counterparts. By offering enhanced feature representation and superior predictive accuracy, it firmly establishes itself as an invaluable asset in civil engineering analytics.

  3. (3)

    The CNN-LSTM hybrid model proves to be an effective tool for predicting seismic responses in bridges via fiber grating. Its adaptability ensures suitability for a vast majority of locations, emphasizing its broad applicability in civil engineering.

  4. (4)

    Maintaining gratings presents challenges due to inherent material properties and unforeseen strain variations at the monitored sites. This study elucidates that deep learning can be harnessed to predict strain at alternative locations, leveraging data from the measured grating points. Such an approach holds the potential to mitigate monitoring expenses and avert data loss stemming from grating deactivation.