Introduction

Vehicle loads are the primary live loads on bridges and are crucial parameters in bridge health monitoring. However, traditional Weigh-In-Motion (WIM) systems require the installation of weighing devices embedded in the road surface, necessitating traffic interruptions during installation. This process is time-consuming, labor-intensive, and costly, which hinders widespread implementation.

Bridge Weigh-In-Motion (BWIM) systems, compared to traditional WIM systems, offer relatively easier installation and lower installation costs to some extent1,2. In the 1970s, Moses3developed the original BWIM system to reduce errors in road-based Weigh-In-Motion systems caused by impact effects. He also formulated and refined the vehicle axle weight calculation algorithm, laying the groundwork for most subsequent BWIM system research and development. Over the past 40 years, BWIM systems have achieved a high level of research maturity, established technical methodologies, and high measurement accuracy. However, they still require specialized instrumentation and have complex installation requirements. Moreover, the characteristic of BWIM systems, where each bridge requires a dedicated system, results in high initial setup and subsequent maintenance costs. This limits the deployment of dynamic weighing systems to only a few specific bridges. Therefore, it is necessary to explore more efficient, fast, intelligent, and cost-effective methods for bridge health monitoring, especially in today’s and future scenarios where bridges are densely distributed.

In recent years, significant advancements in computer vision and image processing technologies have garnered considerable attention in the civil engineering field, both domestically and internationally, for their applications in engineering measurement. These technologies have been extensively employed in various engineering measurement scenarios4,5,6,7. For instance, Ye et al8,9. have attempted to use programmable industrial cameras to monitor the displacement of multiple bridges in real-time and employed high-magnification lenses to monitor the displacement of the Tsing Ma Bridge, achieving a maximum detection distance of over one kilometer. Zhao et al10,11. have implemented structural displacement monitoring using mobile phone cameras, although direct displacement measurement results could not be obtained. Currently, there are numerous research outcomes applying image measurement technology to record structural deformations. Bales12explored the use of close-range photogrammetry on various bridges, verifying the feasibility of using monocular vision to measure bridge deflection, thus laying the foundation for bridge deformation detection based on computer vision. Alemdar et al13. used photogrammetry to measure the deformation of bridge columns on simply supported bridges, installing traditional contact displacement sensors for comparison. The results showed that the photogrammetry method could effectively track the horizontal and vertical displacements of points on the grid surface and the deformation shape of the hinge area, closely aligning with the results obtained using contact sensors, further validating the feasibility of photogrammetry. Jiang, Jauregui, and others14,15conducted displacement monitoring on actual bridges using photogrammetry, developing a corresponding measurement system based on their research. This system performed well and could be used for measurements without the need for special training or assistance from professional surveyors, thus reducing labor input.

With the deepening of research, the study of using computer vision to identify vehicle loads on bridges has also been proposed. Ojio et al16. simultaneously controlled cameras on the bridge deck and under the bridge with a controller, using the bridge deck camera to obtain vehicle axle information and the under-bridge camera to measure the dynamic displacement response data of the bridge. By analyzing both sets of data, they identified vehicle loads, demonstrating the feasibility of using computer vision to identify vehicle weight. Martini et al17. proposed a method to identify the tire loads of moving vehicles, the load positions on the bridge, and the displacement response of the bridge based on computer vision, and constructed a bridge influence line model accordingly. This method employed several cameras working cooperatively. The on-bridge cameras estimated vehicle loads by recognizing tire types and retrieving tire pressure data from a database. The under-bridge cameras detected the bridge displacement response and vehicle load positions. By combining these data, they constructed a bridge influence line model, providing a new approach for monitoring bridge structures using computer vision. Xia et al18. used traffic video to obtain the positions and axle information of vehicles crossing the bridge as auxiliary information. They achieved dynamic weighing of vehicles using the strain influence line of the bridge and recognized various loading conditions of multiple vehicles crossing the bridge by establishing a strain influence line model for different cross-sections of the bridge. Zhou19,20 utilized deep convolutional neural networks to classify and train the vibration signals of bridge responses under loads, ultimately achieving load identification under different vibration signals. Jian et al21. proposed a traffic sensing method combining deep learning-based computer vision technology and influence line theory, effectively identifying key parameters such as vehicle weight. Khuc and Catbas22 discussed in detail the use of computer vision for non-contact, target-free displacement measurements. They proposed an iterative approximation algorithm to construct the displacement unit influence surface (UIS) and estimated the equivalent moving loads on the bridge under multiple vehicle loads using camera data and computer vision algorithms. Dong et al23,24. developed a completely non-contact bridge unit influence line (UIL) identification system using only portable cameras. By processing input data from portable cameras with computer vision techniques, they tracked vehicle positions and identified bridge displacement responses, verifying the feasibility of the identification system through experiments.

In this paper, we further simplified the instruments for load estimation by using only traffic surveillance cameras to obtain video images. The advantage of this approach is that it allows for almost zero-cost load estimation on bridges equipped with traffic surveillance, meeting more practical needs with extremely low monitoring costs and promoting the widespread application of vehicle load estimation. Utilizing oblique photogrammetry with traffic cameras, we captured raw video images of the bridge deck, simultaneously obtaining bridge displacement and vehicle positions. By fully leveraging the information-rich nature of images, we ensured estimation accuracy. We detected target point displacements using a sub-pixel image detection algorithm and incorporated a camera perturbation correction algorithm to mitigate the effects of changes in camera orientation, achieving non-contact image-based bridge structural response measurements and, consequently, vehicle load estimation. Chapter 3 of this paper validates the feasibility of this study through laboratory and field bridge tests.

Theory of bridge load identification

Sub-pixel target tracking and pixel-level vehicle recognition

Accurately detecting the dynamic displacement response of a bridge is a prerequisite for precisely identifying vehicle loads on the bridge deck. Therefore, in image processing, it is essential to accurately recognize and track measurement markers. To balance computational efficiency and accuracy, this paper first employs a template matching algorithm for pixel-level recognition and tracking of the markers, and then performs sub-pixel level calculations to obtain their precise coordinates.

Sub-pixel target recognition and tracking

Detecting bridge displacement response based on video imagery requires processing the entire video sequence to obtain a complete dynamic response curve of the bridge. This paper employs the Mean Absolute Differences (MAD)25algorithm for pixel-level recognition and tracking of the markers.

As shown in Fig. 1, the video image is first segmented so that each sub-image block contains at most one marker. The purpose of this segmentation is to prevent interference between multiple markers during target recognition and to eliminate the influence of other objects in the image on marker recognition, thereby improving the accuracy of target recognition.

Figure 1
figure 1

Template matching algorithm.

Then, the MAD algorithm is used to search through each frame of the video image \(I_{m}\) of size \(u \times v\) , using a template image \(I_{t}\) of size \(u_{t} \times v_{t}\). The algorithm finds the image sub-block most similar to the template and determines its coordinates, \((i,j)\), in the image \(I_{m}\), thereby achieving recognition and tracking of the target point. The similarity formula for the MAD algorithm is as follows:

$$D(i,j) = \frac{1}{{u_{t} \times v_{t} }} \times \sum\limits_{{i_{t} = 1}}^{{u_{t} }} {\sum\limits_{{j_{t} = 1}}^{{v_{t} }} {|I_{m} (i + i_{t} - 1,j + j_{t} - 1) - I_{t} (i_{t} ,j_{t} )|} }$$
(1)

where \(1 \le i \le u - u_{t} + 1,1 \le j \le v - v_{t} + 1\); \(I_{t} (i_{t} ,j_{t} )\) denotes the template sub-image; \(I_{m} (i + i_{t} - 1,j + j_{t} - 1)\) represents the template sub-image overlay region. The smaller Mean Absolute Difference \(D(i,j)\) indicates a higher degree of similarity between the image sub-block and the template image.

After obtaining the integer pixel-level grayscale center of the marker point image using the template matching algorithm described above, we select 9 pixels in the vicinity of the grayscale center along the gradient direction. Specifically, we choose the center pixel and its 8 neighboring pixels in all directions around it. These 9 pixels' grayscale values (\(3 \times 3\) matrix) are then used for quadratic surface fitting. The fitting equation for the pixel grayscale values is given below. The sub-pixel level center coordinates \((\overline{i},\overline{j})\) are determined by finding the maximum grayscale value point on the quadratic surface.

$$h(\overline{i},\overline{j}) = a_{20} \overline{i}^{2} + a_{02} \overline{j}^{2} + a_{11} \overline{i} \cdot \overline{j} + a_{10} \overline{i} + a_{01} \overline{j} + a_{00}$$
(2)

where \(h(\overline{i},\overline{j})\) is the grayscale value at \((\overline{i},\overline{j})\); \(a_{20}\),\(a_{02}\),\(a_{11}\),\(a_{10}\),\(a_{01}\) and \(a_{00}\) are the six unknown coefficients. These coefficients can be obtained by treating the integer pixel template matching center as the origin and using the 8 neighboring pixels as known conditions. By applying the least squares method, we can fit the quadratic surface and determine the six unknown coefficients.

In practical engineering, the actual structural deformation is typically small, and the deformation reflected in the image may be less than one pixel. Therefore, sub-pixel division needs to be less than 0.1 pixel level. However, if the division is too fine, the improvement in detection accuracy diminishes, and the algorithm's running speed is affected. Thus, to balance displacement detection accuracy and computational efficiency, the sub-pixel division accuracy is set to 0.01 pixel level.

Pixel-level recognition of vehicle distribution

To detect the location of vehicles in the image, the intervals on the bridge are divided based on the marker points. The edges of the vehicle's front and rear are targeted, and the midpoint of the distance between the front and rear edges is used to detect the vehicle's position on the bridge.

In general, the recognition error caused by a single pixel in vehicle distribution detection on the bridge is at the millimeter level. Since vehicles in the image comprise many pixels and the content is complex, using sub-pixel edge detection methods would consume a significant amount of computational resources. Therefore, detecting vehicle positions only needs to be at the pixel level.

After analyzing various edge detection algorithms, it was found that the Canny edge detection operator is currently the most effective method for pixel-level edge detection, with a low error rate. The edge detection results using the Canny method are shown in Fig. 2. Based on this, the Canny algorithm26,27 is employed in this paper for edge detection of vehicles in video images.

Figure 2
figure 2

Canny edge detection results.

The implementation process of the Canny operator is relatively intricate, involving the following four steps:

(a) Gaussian filtering: The purpose of filtering is noise reduction, and Gaussian filtering primarily smoothens the image. For a pixel point at position (i, j), its grayscale value is denoted as h(i, j). After undergoing Gaussian filtering, the grayscale value becomes:

$$h_{\sigma } (i,j) = \frac{h(i,j)}{{\sqrt {2\pi \sigma^{2} } }}e^{{ - \frac{{i^{2} + j^{2} }}{{2\sigma^{2} }}}}$$
(3)

(b) Calculating gradient magnitude and direction: Image edges are a collection of pixels with significant grayscale value changes. In an image, the gradient represents the degree and direction of grayscale value changes. It can be obtained by dot-producting a Sobel operator to derive different directional gradient values \(g_{x} (i,j)\) and \(g_{y} (i,j)\). The formulas for calculating the gradient magnitude and direction are as follows:

$$Ed(i,j) = \sqrt {g_{x} (i,j)^{2} + g_{y} (i,j)^{2} }$$
(4)
$$\theta = \arctan \frac{{g_{y} (i,j)}}{{g_{x} (i,j)}}$$
(5)

(c) Non-maximum suppression: During the Gaussian filtering process, edges may be amplified. In this case, the pixel with the maximum gradient value in each gradient direction is chosen as the edge.

(d) Employing dual-threshold for edge detection: Establishing both an upper and lower threshold, pixels falling between them are considered as edges, enhancing the accuracy of edge detection.

Dynamic response detection of bridges based on traffic imagery

To obtain the dynamic response of the bridge, i.e., the bridge displacement, from traffic surveillance video, we only need to use the pixel displacements reflected in the image frames at different times and multiply them by the pixel scale parameters to determine the actual displacement distances. Pixel scale parameters are used to express the actual physical dimensions that an image pixel can describe under the determined intrinsic and extrinsic parameters of the camera, i.e., the actual size described by one pixel.

For traffic surveillance videos with only a top-down view, this paper derives and establishes a monocular vision camera measurement model based on the camera imaging geometry to measure the dynamic response of the bridge. This method establishes the correspondence between any pixel in the image and the actual dimensions in the three-dimensional world. The displacement distance information of the target object is obtained using the distance changes of the target pixels in the two-dimensional image sequence.

Since the identification of vehicle loads based on traffic surveillance video is achieved by measuring the vertical displacement of the beam structure to calculate the loads, we only need to consider the scale conversion in one direction (vertical) in the image. The scale conversion model for oblique photography is shown in Fig. 3.

Figure 3
figure 3

Slope Photography Imaging Model.

As shown in Fig. 3, the vertical pixel length in the image corresponds to the camera's field of view range. Let the horizontal distance from the camera's optical center O to the measured target on the bridge deck be \(d\), and the vertical distance from the optical center O to the intersection point B of the far field boundary line and the target object's motion path be \(h_{0}\)​. For traffic surveillance cameras with only a top-down view, the parameters \(d\) and \(h_{0}\)​ will remain constant if the camera is fixed. Therefore, without considering disturbances to the camera, these physical quantities can be measured and used as fixed parameters.

If the image captured by the camera consists of \(m \times n\) pixels, the image's pixel coordinate system typically takes the top-left corner as the origin. The upper edge of the image is the \(u\) axis, and the lower edge is the \(v\) axis. The image in the pixel coordinate system is shown in Fig. 4.

Figure 4
figure 4

Conversion of RGB images to binary images.

Due to the negligible size of each pixel on the photosensitive element compared to the focal length \(f\), it can be ignored. Connecting the optical center O with the edges of each pixel and considering the infinitesimal vertical field of view angle \(\beta\), each pixel occupies \(1/n\) of the field of view angle, i.e., \(\beta /n\). If we define the vertical actual distance represented by the \(j\)-th pixel to the camera's optical center as \(h_{j}\), then \(h_{j}\) can be calculated using the formula.

$$h_{j} = d\tan (\mu + \frac{\beta }{n}j)$$
(6)

Thus, the scale parameter of the (j + 1)-th pixel in the V-axis direction of the image is obtained as:

$$\xi_{j + 1} = d\tan \left[ {\mu + \frac{\beta }{n}(j + 1)} \right] - d\tan \left( {\mu + \frac{\beta }{n}j} \right)$$
(7)

Utilizing the tangent and difference formulas, the expression can be transformed to:

$$\xi_{j + 1} = d\frac{{\tan \frac{\beta }{n}(\tan^{2} \mu + 1)[\tan^{2} \left( {\frac{\beta }{n}j} \right) + 1]}}{{\left[ {1 - \tan \mu \tan \left( {\frac{\beta }{n}j} \right) - \tan \mu \tan \frac{\beta }{n} - \tan \left( {\frac{\beta }{n}j} \right)\tan \frac{\beta }{n}} \right]\left[ {1 - \tan \mu \tan \left( {\frac{\beta }{n}j} \right)} \right]}}$$
(8)

From the above equation, it is evident that, under fixed internal and external parameters of the camera lens, the vertical pixel scale factor in the image is mainly related to the horizontal distance \(d\) from the camera center to the target object, the camera's observation tilt angle \(\mu\), and the vertical coordinate \(j\) of the image pixels.

Similarly, in the problem of multi-pixel size transformation, the actual length described by the distance between the \(j\)-th pixel and the \((j + k)\)-th pixel is:

$$\delta_{j}^{k} = h_{j + k} - h_{j}$$
(9)

Consistent with the derivation process of a single-pixel scale parameter, the relationship between multi-pixel distance and actual size can be obtained as follows:

$$\delta_{j}^{k} = d\frac{{\tan \left( {\frac{\beta }{n}k} \right)\left( {\tan^{2} \mu + 1} \right)\left[ {{\text{tan}}^{2} \left( {\frac{\beta }{n}j} \right) + 1} \right]}}{{\left[ {1 - \tan \mu \tan \left( {\frac{\beta }{n}j} \right) - \tan \mu \tan \left( {\frac{\beta }{n}k} \right) - \tan \left( {\frac{\beta }{n}j} \right)\tan \left( {\frac{\beta }{n}k} \right)} \right]\left[ {1 - \tan \mu \tan \left( {\frac{\beta }{n}j} \right)} \right]}}$$
(10)

In the equation, let \(\Gamma = \tan \mu\), \(\Phi = \tan (\frac{\beta }{n}j)\), \(\Psi = \tan (\frac{\beta }{n}k)\). From the above analysis, it is known that parameter \(\Gamma\) is a constant when the lens is fixed, \(\Phi\) is a variable related to pixel position, and \(\Psi\) is related to the number of pixels to be converted. Substituting these parameters into the equation and making the substitution, the mathematical expression for the pixel scale parameter is obtained:

$$\delta_{j}^{k} = d\frac{{\Psi (\Gamma^{2} + 1)(\Phi^{2} + 1)}}{(1 - \Gamma \Phi )(1 - \Gamma \Phi - \Gamma \Psi - \Phi \Psi )}$$
(11)

The pixel scale parameter is derived based on the monocular vision measurement model, combined with the characteristics of the traffic surveillance camera. This parameter allows direct conversion of the pixel displacement of target points in the image to physical displacement. It simplifies the calculation process of measuring bridge displacement based on traffic surveillance imagery and improves the efficiency of the algorithm.

Analysis of methods to suppress influencing factors in dynamic response measurement

Camera disturbance correction method for dynamic-static separation

Operational bridge structures are generally subject to dynamic changes, and the traffic surveillance camera systems installed on the bridges are also affected by corresponding disturbances. As analyzed in Section "Dynamic response detection of bridges based on traffic imagery", when the camera pose is not fixed, the parameter \(\Gamma\) varies with the camera's tilt angle, and even slight changes in the camera pose can significantly impact the measurement results of the bridge's displacement. To obtain bridge displacement measurements that can accurately identify vehicle loads, it is necessary to study the uncertainty in the camera's displacement.

To enable the computer to automatically identify and eliminate camera pose disturbances during operation without adding extra hardware, fixed camera pose correction points are set at positions outside the bridge or on the bridge piers in the video. By capturing the displacement signals of these correction points in the image, the camera disturbance signals can be obtained through the pixel displacements of the correction points in the pixel coordinate system.

To reduce the impact of minor camera displacements, this paper uses a correction method based on camera pose adjustments to process bridge displacement signals. The displacement signal \(\sigma\) derived from the video consists of two parts: the bridge's inherent displacement signal \(\sigma_{b}\)​ and the camera pose disturbance signal \(\sigma_{c}\)​, such that \(\sigma = \sigma_{b} + \sigma_{c}\)​. The camera pose disturbance signal \(\sigma_{c}\)​ consists of vibrational signal \(\sigma_{cf}\)​ and static displacement signal \(\sigma_{cs}\)​, i.e., \(\sigma_{c} = \sigma_{cf} + \sigma_{cs}\)​.

To eliminate the effects of camera pose disturbances, we separate the vibrational signal \(\sigma_{cf}\)​ and the static displacement signal \(\sigma_{cs}\)​ from the camera pose disturbance signal σcσc​. This method leverages the stability of reference points outside the bridge in the video to obtain the camera pose disturbance signal through the displacement of fixed points in the image.

By using Fourier transform, high-frequency vibration rates of the camera pose disturbance signal can be obtained, and the Fourier transform equation is as follows:

$$F(\zeta ) = \int_{ - \infty }^{ + \infty } {\sigma_{cf} (t)e^{ - 2\pi i\zeta t} } dt$$
(12)

where \(F(\zeta )\) is the continuous spectrum of the time-domain signal \(\sigma_{cf} (t)\) calculated through the Fourier transform, and \(\zeta\) is the frequency variable.

By using a band-stop filter to filter out the camera vibration frequencies from the displacement signal of the target point, and then applying the inverse Fourier transform, the displacement signal with the camera's high-frequency vibrations removed can be calculated28.

$$g^{ - 1} g(t) = \int_{ - \infty }^{ + \infty } {\sigma_{cs} (\zeta } )e^{2\pi i\zeta t} d\zeta$$
(13)

Additionally, by using the aforementioned filtering method, the high-frequency vibration signal \(\sigma_{cf}\) in the camera displacement signal can be filtered out to isolate the static displacement signal \(\sigma_{cs}\)​ of the camera.

The camera imaging equation \(u = f \cdot X/Z\),\(v = f \cdot Y/Z\) can be used to deduce the static displacement elimination equation of the camera pose:

$$\sigma_{b} = \sigma - k_{cs} \sigma_{cs}$$
(14)

where \(\sigma_{b}\) is the bridge displacement signal after eliminating the static displacement of the camera pose; \(k_{cs}\) is the impact coefficient of the camera static displacement on the measured signal; \(k_{cs} = L_{R} /L_{T}\); \(L_{R}\) and \(L_{T}\) are the distances from the camera to the reference point and the target point on the bridge, respectively.

Image denoising processing

Due to environmental factors during the photography process, image noise generated by surveillance cameras can significantly impact the accurate extraction of bridge information. Therefore, eliminating image noise plays a crucial role in the estimation of bridge deck vehicle loads. To obtain high-quality digital images while preserving the integrity of the original information (i.e., main features), it is essential to eliminate unnecessary information in the signal. In order to enhance the accuracy of displacement detection in structures and subsequently improve the identification of vehicle loads, this study employs the homomorphic filtering method for processing image signals.

Homomorphic filtering is an image processing technique that combines frequency filtering and spatial domain grayscale transformation. It is proposed based on the illumination-reflectance model29. The image is considered as the product of illumination intensity \(f_{i} (x,y)\) and reflectance intensity \(f_{r} (x,y)\), i.e.:

$$f(x,y) = f_{i} (x,y) \cdot f_{r} (x,y)$$
(15)

To transform non-linear noise signals into a linear problem, a common approach is to apply a logarithmic transformation to the original image, followed by a Fourier transform to obtain its frequency domain information:

$$DFT[f(x,y)] = DFT[f_{i} (x,y)] + DFT[f_{r} (x,y)]$$
(16)

Here, \(DFT[*]\) represents the Fourier transform of part “*”.

In this process, the filtering function \(H(u,v)\) is utilized to separate the high and low-frequency components of \(DFT[f(x,y)]\), and subsequently, frequency domain filtering is applied. Finally, the Fourier inverse transform and exponential transformation are performed on the target part to obtain the spatial domain filtering result. It can be observed that the filtering function \(H(u,v)\) is a crucial component influencing the effectiveness and contrast enhancement in the homomorphic filtering algorithm.

This paper employs an improved homomorphic filtering algorithm for processing. Commonly used homomorphic filtering algorithms often use Gaussian or Butterworth filter functions, requiring the introduction of multiple parameters and multiple iterations to achieve satisfactory results. This paper introduces a single-parameter homomorphic high-pass filter and a single-parameter homomorphic low-pass filter30,31, with the filter formulas as follows:

$$H_{h} (u,v) = \frac{1}{{1 + e^{1 - \kappa D(u,v)} }}$$
(17)
$$H_{l} (u,v) = \frac{1}{{\sqrt {1 + [D(u,v)/d_{0} ]} }}$$
(18)

where \(\kappa\) and \(d_{0}\) are the adjustment parameters for the high-pass and low-pass filters, respectively. In order to enhance high-frequency signals in the image while preserving some low-frequency information, a weighted fusion of the high and low-frequency signals is performed to obtain the denoised and enhanced fused image:

$$I = a \cdot I_{h} + (1 - a) \cdot (I_{l} /b)$$
(19)

where \(I\) is the denoised and enhanced fused image after homomorphic filtering, \(I_{h}\) is the image obtained from high-pass filtering, \(I_{l}\) is the image obtained from low-pass filtering, \(a(0 < a < 1)\) and \(b(b > 0)\) are fusion weights. Fusion weight \(b\) is introduced to further suppress low-frequency signals under low-light conditions. As shown in Fig. 5, the visual effects of enhancing and denoising the image using the homomorphic filtering algorithm with \(\kappa = 0.01\), \(d_{0} = 55\), \(a = 0.7\) and \(b = 2\) under low-light conditions.

Figure 5
figure 5

Homomorphic filtering image enhancement results.

Vehicle load identification model

Decomposition of bridge displacement

Through the image processing methods described above, bridge displacements can be accurately obtained from video data. Generally, the bridge displacement response measured on the bridge comprises the following components:

$$\begin{gathered} \delta_{b} = \delta_{v} + \delta_{nv} \hfill \\ \delta_{v} = \delta_{d} + \delta_{s} \hfill \\ \end{gathered}$$
(20)

In the equation:\(\delta_{b}\) is the bridge displacement response measured from the video imagery; \(\delta_{nv}\) is the displacement response caused by non-traffic factors such as wind load, temperature effects, concrete shrinkage, or creep;\(\delta_{v}\) represents the vehicle-induced bridge displacement response, which includes dynamic displacement \(\delta_{d}\) and static displacement \(\delta_{s}\), The dynamic displacement \(\delta_{d}\) is generated due to vehicle vibrations or the impact caused by vehicles traveling on uneven road surfaces.

The dynamic weighing method studied in this paper is based on the static influence line theory of structures and focuses only on the static displacement \(\delta_{s}\) of the bridge under vehicle loading. Therefore, it is necessary to separate the bridge displacement response signal \(\delta_{b}\). To ensure real-time performance and accuracy, this paper employs the Locally Weighted Scatterplot Smoothing (LOWESS) method to achieve time-domain separation of the bridge displacement signal32,33,34.

The smoothing process of the bridge displacement response signal is performed locally, with each smoothed value determined by all data points within the neighborhood of a given data point. The smoothing is achieved through weighting. The specific steps are as follows:

(a) Choose a span parameter \(q(0 {<}q \le 1)\) that determines the width of the sliding window covering the neighborhood of \(x_{i}\). The larger the value of \(q\), the smoother the fitted curve, but the computational time will also increase. If the total number of data points is \(n\), then the number of data points within the neighborhood of \(x_{i}\) is \(\left[ {n \times q} \right]\). Clearly, the larger the value of \(q\), the more data points are included in the neighborhood of \(x_{i}\), resulting in longer computation times for the smoothing process.

(b) Calculate the regression weight for each data point within the neighborhood of \(x_{i}\)​ using the following weight function:

$$\begin{gathered} \omega_{i} = \left( {1 - \left| {\frac{{x_{i} - x_{j} }}{d(x)}} \right|^{3} } \right)^{3} ,j \ne i \hfill \\ j = 1, \cdots ,k, \, k = q \times n - 1 \hfill \\ \end{gathered}$$
(21)

where: \(d(x)\) is the horizontal distance to the farthest data point from \(x_{i}\)​ within the sliding window; \(x_{j}\) represents the data points within the neighborhood of \(x_{i}\)​.

(c) Perform weighted least squares regression within the neighborhood of \(x_{i}\):

$$\beta = (X^{T} WX)^{ - 1} X^{T} WY$$
(22)

where: \(X^{T} = \left[ {\begin{array}{*{20}c} 1 & \cdots & 1 \\ {x_{i} } & \cdots & {x_{j} } \\ \end{array} } \right]\)

$$Y = (y_{i} , \cdots y_{k} )$$
$$W = diag(\omega_{i} )$$

(d) Perform the smoothed fit estimation for \(y_{i}\)

$$\mathop y\limits^{ \wedge }_{i} = \left[ {1,x_{i} } \right] \times \beta$$
(23)

Figure 6 illustrates the decomposition process of the measured mid-span displacement signal of a specific bridge. From the figure, it can be observed that, in the absence of non-vehicle-induced bridge displacement response \(\delta_{nv}\) or known \(\delta_{nv}\) signal, the desired static displacement response \(\delta_{s}\) induced by vehicles can be effectively separated.

Figure 6
figure 6

Displacement decomposition effect diagram.

Bridge displacement influence surface calibration

The method for estimating vehicle loads based on the bridge's dynamic response involves using the bridge structure as a scale for weighing the vehicles. To quantify vehicle loads accurately, this paper uses the displacement influence surface as the scale, making the calibration of the influence surface crucial for weighing accuracy.

Given that bridges are spatial structures with lateral width and often have multiple lanes in the lateral direction, it is essential to consider the lateral force characteristics of the bridge structure to improve weighing accuracy. Therefore, this paper uses displacement influence surfaces obtained from computer imagery to weigh vehicles.

A calibration vehicle of known weight is driven in a straight line along different lanes at various lateral positions on the bridge deck. Using the image processing methods described earlier, the displacement signal curves of the bridge under the influence of this vehicle are obtained from the image data. By matching the vehicle's longitudinal position on the bridge at different times with the displacement data, the displacement influence lines of the bridge structure at different lateral positions are identified.

It is important to note that the width and length of the vehicle are ignored in this context, treating the vehicle as a single concentrated load. This simplification is justified because the vehicle width is small relative to the bridge width, and there is not much variation in the width of different vehicles. Additionally, since the different axles of a vehicle cannot be distinctly identified from the bridge displacement curves, the vehicle length is also neglected.

Finally, the influence lines at different lateral positions are interpolated using cubic interpolation to obtain the bridge structure's displacement influence surface. When a vehicle travels in different lanes on the bridge, the influence lines identified at different lateral measurement points will differ, and the interpolated influence surface will reflect this mechanical characteristic.

This method is derived under the assumption that the bridge's displacement response to normal loads is linearly related to the loads. It does not require any structural modeling or mechanical analysis of the bridge, and the displacement influence surface can be completely obtained from the video image data collected by traffic surveillance cameras, reflecting the mechanical properties of the bridge structure.

Load identification algorithm based on displacement-influence surface relationship matrix

Once the displacement influence surfaces of the bridge structure are identified, the theory of structural influence surfaces can be used to identify the loadings from passing vehicles. The research on vehicle load identification in this paper is conducted under the assumption that the bridge undergoes linear elastic deformation without any structural damage during normal operation. According to the theory of structural influence surfaces, the displacement at a point on the bridge structure can be represented as follows:

$$\delta = \sum\limits_{ve = 1}^{VE} {G_{ve} S_{\delta } (x_{ve} ,y_{ve} )}$$
(24)

In the equation:\(x_{ve}\) and \(y_{ve}\) represent the lateral and longitudinal coordinates, respectively, of the \(ve\) vehicle on the bridge deck; \(S_{\delta } (x,y)\) represents the displacement influence surface value at the bridge deck position coordinate \((x,y)\);\(G_{ve}\) represents the load of the \(ve\) vehicle on the bridge, with a total of \(VE\) vehicles on the bridge deck.

Equation (18) can be used to solve for the vehicle load. However, when multiple vehicles are traveling on the bridge, additional information is required to identify the load of each vehicle. Therefore, it is possible to consider increasing the sampling data from both the spatial distribution and temporal development aspects of the measuring points to establish a displacement-influence surface relationship matrix for solving vehicle loads.

The spatial distribution aspect refers to selecting \(M\)(\(M \ge K\)) target points within the image range as measuring points at the same moment (referred to as moment \(t_{p}\)) to detect their displacements. Based on Eq. (18), \(M\) equations are constructed to form a system of equations. The constructed system of equations is as follows.

$$\left\{ {\begin{array}{*{20}c} {\delta_{1} (t_{p} ) = \sum\limits_{ve = 1}^{VE} {G_{ve} S_{\delta 1} [x_{ve} (t_{p} ),y_{ve} (t_{p} )]} } \\ {\delta_{2} (t_{p} ) = \sum\limits_{ve = 1}^{VE} {G_{ve} S_{\delta 2} [x_{ve} (t_{p} ),y_{ve} (t_{p} )]} } \\ \vdots \\ {\delta_{M} (t_{p} ) = \sum\limits_{ve = 1}^{VE} {G_{ve} S_{\delta M} [x_{ve} (t_{p} ),y_{ve} (t_{p} )]} } \\ \end{array} } \right.$$
(25)

Similarly, the temporal development aspect refers to sampling the structural displacement signal of the same sampling point \(m\) over a period of time and establishing a system of equations with the corresponding influence surface values at the load application points for solving vehicle loads. The constructed system of equations is as follows:

$$\left\{ {\begin{array}{*{20}c} {\delta_{m} (t_{1} ) = \sum\limits_{ve = 1}^{VE} {G_{ve} S_{\delta m} [x_{ve} (t_{1} ),y_{ve} (t_{1} )]} } \\ {\delta_{m} (t_{2} ) = \sum\limits_{ve = 1}^{VE} {G_{ve} S_{\delta m} [x_{ve} (t_{2} ),y_{ve} (t_{2} )]} } \\ \vdots \\ {\delta_{m} (t_{P} ) = \sum\limits_{ve = 1}^{VE} {G_{ve} S_{\delta m} [x_{ve} (t_{P} ),y_{ve} (t_{P} )]} } \\ \end{array} } \right.$$
(26)

It can be observed that calculating vehicle loads using the spatial distribution of measurement points and the temporal evolution of displacements each has its characteristics. On the spatial distribution level, the displacement information from sampling points at different locations on the bridge is utilized to construct a load calculation matrix. This approach comprehensively summarizes the overall bridge condition for load identification. However, it analyzes only a fixed moment during the vehicle's travel, leading to significant differences in load identification accuracy at different positions on the bridge. Conversely, at the temporal evolution level, it complements the spatial analysis of measurement points. To leverage both spatial and temporal data information, increase resolution parameters, and enhance the accuracy and stability of vehicle load identification results, the load calculation equations for vehicle loads on the spatial distribution level and the temporal evolution level are integrated. This integration forms an overdetermined system of equations for vehicle load calculation:

$${\varvec{\delta}} = {\varvec{S}} \times {\varvec{G}}$$
(27)
$${\varvec{\delta}} = \left[ {\begin{array}{*{20}c} {\delta_{1} (t_{1} )} & \cdots & {\delta_{M} (t_{1} )} & {\delta_{1} (t_{2} )} & \cdots & {\delta_{M} (t_{2} )} & \cdots & {\delta_{1} (t_{P} )} & \cdots & {\delta_{M} (t_{P} )} \\ \end{array} } \right]^{T}$$
(28)
$${\varvec{S}} = \left[ {\begin{array}{*{20}c} {S_{{\delta_{1} }} [x_{1} (t_{1} ),y_{1} (t_{1} )]} & {S_{{\delta_{1} }} [x_{2} (t_{1} ),y_{2} (t_{1} )]} & \cdots & {S_{{\delta_{1} }} [x_{VE} (t_{1} ),y_{VE} (t_{1} )]} \\ \vdots & \vdots & {} & \vdots \\ {S_{{\delta_{M} }} [x_{1} (t_{1} ),y_{1} (t_{1} )]} & {S_{{\delta_{M} }} [x_{2} (t_{1} ),y_{2} (t_{1} )]} & \cdots & {S_{{\delta_{M} }} [x_{VE} (t_{1} ),y_{VE} (t_{1} )]} \\ {S_{{\delta_{1} }} [x_{1} (t_{2} ),y_{1} (t_{2} )]} & {S_{{\delta_{1} }} [x_{2} (t_{2} ),y_{2} (t_{2} )]} & \cdots & {S_{{\delta_{1} }} [x_{VE} (t_{2} ),y_{VE} (t_{2} )]} \\ \vdots & \vdots & {} & \vdots \\ {S_{{\delta_{M} }} [x_{1} (t_{2} ),y_{1} (t_{2} )]} & {S_{{\delta_{M} }} [x_{2} (t_{2} ),y_{2} (t_{2} )]} & \cdots & {S_{{\delta_{M} }} [x_{VE} (t_{2} ),y_{VE} (t_{2} )]} \\ \vdots & \vdots & {} & \vdots \\ {S_{{\delta_{1} }} [x_{1} (t_{P} ),y_{1} (t_{P} )]} & {S_{{\delta_{1} }} [x_{2} (t_{P} ),y_{2} (t_{P} )]} & \cdots & {S_{{\delta_{1} }} [x_{VE} (t_{P} ),y_{VE} (t_{P} )]} \\ \vdots & \vdots & {} & \vdots \\ {S_{{\delta_{M} }} [x_{1} (t_{P} ),y_{1} (t_{P} )]} & {S_{{\delta_{M} }} [x_{2} (t_{P} ),y_{2} (t_{P} )]} & \cdots & {S_{{\delta_{M} }} [x_{VE} (t_{P} ),y_{VE} (t_{P} )]} \\ \end{array} } \right]$$
(29)

The overdetermined equation system given by Eq. (27) is solved using the least squares method, resulting in a set of least squares solutions as the applied load forces G on each vehicle on the bridge. Dividing G by the acceleration due to gravity yields the respective vehicle loads.

Experimental study

Experimental study on reduced-scale model bridge

Experimental layout of the model bridge

The model bridge used in this experimental study is a simply supported beam bridge model with a composite structure consisting of a plexiglass bridge deck and aluminum alloy I-beams as the main girders. The model bridge has a span of 5.5 m and a width of 0.5 m. To simulate the deformation behavior of a real bridge as accurately as possible, the model bridge in the laboratory is designed with high stiffness.

The main girder consists of two aluminum I-beams connected at both ends and the midspan with aluminum crossbeams measuring 0.3m in length, 0.5m in height, and 10mm in thickness, serving as diaphragms. The bridge deck is composed of three pieces of plexiglass, each measuring 2m in length, 0.5m in width, and 10mm in thickness, tightly adhered to the aluminum load-bearing beams. The bridge deck is covered with 2mm thick sandpaper to increase friction and simulate an actual bridge surface. When a model vehicle weighing 22550g passes over the bridge, the tested natural frequency of the bridge is 3.54Hz, and the vehicle-to-bridge mass ratio is 20%. The specific dimensions and layout of the model bridge are shown in Fig. 7.

Figure 7
figure 7

Model bridge.

To capture real-time visual data of the test vehicle on the model bridge and the corresponding structural displacement response, a SONY FDR-AX700 camera, mimicking a bridge-mounted traffic surveillance camera, was employed. The experimental setup also utilized contact-type LVDT (Linear Variable Differential Transformer) sensors to achieve real-time structural displacement measurement and vehicle load identification on the model bridge. The specific experimental arrangement is illustrated in Fig. 8.

Figure 8
figure 8

Experimental setup.

During the experiment, the camera and tripod were positioned at the right end of the model bridge, angled downwards to capture the bridge deck. This setup simulates a real traffic surveillance camera capturing vehicle information and bridge displacement data. To verify the accuracy of bridge displacement measurements based on video imagery, a contact-type LVDT was installed on the lower edge of the aluminum I-beam at the mid-span section. Data from the LVDT was transmitted to a computer for analysis using the DH5902N rugged data acquisition system.

The image acquisition device used in the experimental setup is the SONY FDR-AX700, simulating a traffic surveillance camera on the bridge. This camera is capable of long-duration video image capture with high clarity. It is equipped with a Zeiss Vario-Sonnar T* lens, with a maximum aperture range of F2.8 to F4.5 and an equivalent 35 mm focal length range of 29 mm to 348 mm. The sampling frequency was set to 100 Hz during the experiment. The specific model is shown in Fig. 9, and its parameters are listed in Table 1.

Figure 9
figure 9

Camera and lens models.

Table 1 Camera parameters.

Visual recognition markers were placed on the left and right sides of the bridge deck and along the centerline, as shown in Fig. 10. These markers were distributed along the longitudinal sections of the bridge at 1/4, 1/2, and 3/4 of the span, totaling nine visual recognition markers. The three markers at the 1/4 span form Measurement Point 1, the three markers at the 1/2 span form Measurement Point 2, and the three markers at the 3/4 span form Measurement Point 3. These visual recognition markers serve as both aids for the computer to visually identify and calculate bridge displacement and as references to help the computer determine the vehicle's position on the bridge.

Figure 10
figure 10

Distribution of measurement points.

During the experiment, a contact-type LVDT was used to collect displacement data from the load-bearing beam in the middle of the model bridge span for comparison with the image-based method. This sensor features high sensitivity, high resolution, and strong anti-interference capabilities. The data acquisition system used in the experiment is the DH5902N rugged data acquisition and analysis system, set to a sampling frequency of 100 Hz (the same as the image sampling frequency). The accompanying DHDAS software was used for real-time acquisition and analysis of the test signals.

Two model vehicles with different weights were used for loading during the experiment. The first model vehicle is a 3-axle, 6-wheel dump truck model with dimensions of 52 cm × 18 cm × 27 cm (length × width × height) and an unladen weight of 1510 g. It operates in a towed manner with a controlled speed of 0 to 0.4 m/s. The second model vehicle is also a 3-axle, 6-wheel dump truck model with dimensions of 28 cm × 9 cm × 14.5 cm (length × width × height) and an unladen weight of 455 g. It also operates in a towed manner with a controlled speed of 0 to 0.4 m/s. The specific parameters are shown in Table 2.

Table 2 Technical parameters for data acquisition equipment and model vehicles.

The experiment utilizes a calibration vehicle with a known weight of 22,550 g traveling on different lanes of the model bridge deck. Simultaneously, both the camera and the LVDT collect images of the bridge structure and dynamic displacement data. The displacement data from both methods are used to fit displacement influence lines and calculate the displacement influence surface. After identifying the influence surface of the model bridge, the weighted model vehicles are used to conduct driving tests under different conditions. The camera captures surveillance images of the model bridge deck, and the LVDT along with the dynamic data acquisition and analysis system collects the displacement of the bridge structure during the experiments. By obtaining vehicle position information and structural response data, a load action calculation matrix is established using the influence surface data identified by the calibration vehicle to solve for vehicle loads.

The experiment employs model vehicles carrying additional weights to conduct load identification on a model bridge. Two main scenarios are set to validate the accuracy of vehicle load identification based on oblique image data and explore factors affecting load identification. All load identification experiments are conducted using the displacement influence surface identified by a calibration vehicle with a weight of 22,550 g. To obtain a complete displacement influence surface of the model bridge, the calibration vehicle is driven back and forth in different lanes, and the entire loading process is recorded on video.

Scenario one: single vehicle bridge crossing load identification

To verify the accuracy of identifying loads of vehicles with different weights using the imaging method and to examine the impact of vehicle speed on load identification accuracy, a single model vehicle with different weights is driven in different lanes on the model bridge deck.

Scenario two: multi-vehicle (two vehicles) bridge crossing load identification

To further verify the feasibility of identifying loads of multiple vehicles crossing the bridge using the imaging method and to explore the impact of vehicle distribution on load identification, two model vehicles are driven at a constant speed on the bridge deck.

Experimental results and analysis

Due to the most noticeable and distinctive displacement response at the mid-span of simply supported bridges, this paper uses the displacement response measurement results of the mid-span section of the model bridge as an example for validation. Taking the calibration test (using a 22,550 g calibration vehicle traveling on the bridge deck) as an example, the displacement response at the mid-span section (Measurement Point 2) is obtained.

After capturing the video images under different test conditions, the pixel displacement responses at various points of the mid-span section are obtained using the target recognition and tracking algorithm described in Chapter 2. The displacement curves are then decomposed and smoothed using the LOWESS algorithm described in Section "Image denoising processing". The comparison between the results obtained using the image-based method and the contact-type LVDT measurements is shown in Fig. 11.

Figure 11
figure 11

Structural displacement–time curve.

Comparing the results with the contact-type LVDT measurements (blue lines in Fig. 11), the displacement curves obtained by the two methods exhibit similar characteristics but also show some differences. These differences may arise from various factors, with the primary reasons being incomplete elimination of image noise and measurement calculation errors in the pixel scale parameters. To quantify the error in detecting bridge displacement using the image-based method, the displacement measured by the LVDT is taken as the reference. The absolute peak value error (APVE) and the normalized root mean square error (NRMSE) are used to evaluate the displacement error based on video image data. The calculation formulas are as follows:

$$APVE = \frac{{|\max (|\psi_{i} |) - \max (|\omega_{i} |)|}}{{\max (|\omega_{i} |)}} \times 100\%$$
(30)
$$NRMSE = \frac{{\sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {(\psi_{i} - \omega_{i} )^{2} } } }}{{\max (\omega_{i} ) - \min (\omega_{i} )}} \times 100\%$$
(31)

where \(\psi_{i}\) represents the displacement measured by LVDT, and \(\omega_{i}\) represents the displacement calculated based on video image data.

In Condition 1, the maximum displacement at the mid-span section measured by both methods, along with the calculated errors, is visualized in the figures provided. Figure 12 intuitively displays the accuracy of measuring bridge displacement based on video image data. The bar chart is divided into two halves for each condition number.The left half of each bar represents the NRMSE error of the displacement detection results. The right half of each bar represents the APVE error of the displacement detection results. Each condition number includes displacement information from the three measurement points at the mid-span.

Figure 12
figure 12

Displacement measurement deviation bar chart.

In Fig. 12, the APVE and NRMSE errors in displacement measurement are both within 10%, which may be attributed to residual image noise or camera vibrations. From the comparative analysis in the figure, it is evident that the NRMSE error values are generally higher than the APVE error values. This is because NRMSE is calculated over the entire sampling signal, where the calculation errors for small deformation signals are relatively larger compared to those for large deformation signals. As a result, when computed overall, the NRMSE error value is typically higher than the APVE error value, which only considers peak values.

After obtaining the time-history displacement curves for each measurement point from the calibration test, the displacements corresponding to the measurement points for vehicles traveling in different lanes are converted from the time domain to the distance domain. This conversion provides the displacement influence lines for each lane at the respective measurement points. Lateral cubic interpolation is then performed on the influence lines at different lateral positions to obtain the displacement influence surface for each longitudinal measurement point on the bridge. The displacement influence surface results for the 3/4 section measurement point are shown in Fig. 13.

Figure 13
figure 13

3/4 section displacement influence surface.

The morphology of the influence lines can visually reflect the relationship between vehicle loads and bridge structural responses. Taking the left measurement point at the 3/4 section (Fig. 13a) as an example, its displacement influence line shows that when the model vehicle travels in the left lane on the bridge, it reaches a peak value near the 3/4 section. When the model vehicle travels in the right lane, the displacement influence is smaller compared to the former case. These results in the laboratory environment validate that the bridge displacement influence lines calibrated based on image data not only reflect the longitudinal mechanical relationship between structural response and vehicle load but also indicate the transverse mechanical response of the bridge. This creates conditions for the feasibility of estimating vehicle loads on multi-lane bridges.

Condition 1: single vehicle load estimation

To verify the accuracy of vehicle load identification based on the image method for vehicles of different weights crossing the bridge, as well as to assess the impact of vehicle speed on load identification accuracy, a single model vehicle with different weights was driven on different lanes of the model bridge deck. The experimental results are shown in Table 3:

Table 3 Single vehicle load estimation results.

Overall, the non-contact load detection method based on image data shows an average calculation deviation of 7.0% for single vehicle crossing loads. The maximum deviation is 14.2%, the minimum deviation is 0.6%, and 79.2% of the deviations are within 10%. The vehicle load calculated based on the bridge response measured by contact sensors shows an average deviation of 4.7%, with 91.3% of the deviations within 10%.

Under different speeds, the non-contact image-based load detection method achieves better load identification accuracy at lower vehicle speeds compared to higher speeds. This method is more sensitive to vehicle speed compared to the load detection method based on contact sensors. The reason for this difference may be that higher speeds cause greater impact loads on the bridge, resulting in stronger vibrations. These vibrations make it difficult for the image denoising algorithm to completely eliminate the noise, affecting the accuracy of displacement measurements.

Figure 14 shows the visualization of the load calculation results for Condition 1. In the figure, the gray diagonal line represents the actual vehicle load, the red dots represent the vehicle loads estimated based on image data under different loading conditions, and the blue dots represent the vehicle loads detected by contact sensors.

Figure 14
figure 14

The identification results of single-vehicle load.

From Fig. 14, it can be observed that the results of both load monitoring methods are generally close to the actual vehicle load values represented by the gray diagonal line. Comparing the load identification results of the two methods, the accuracy of the non-contact image-based load detection method is somewhat lower than that of the contact sensor-based load detection method. However, the image-based method still provides a reasonably good estimate of the vehicle loads crossing the bridge without requiring additional hardware. This demonstrates its potential to meet engineering practical needs by utilizing the existing traffic surveillance camera systems on the bridge.

Condition 2: multiple vehicle load estimation

To further verify the feasibility of identifying multiple vehicle loads crossing the bridge using the image-based method, and to explore the impact of vehicle distribution on load identification, two model vehicles were driven at a constant speed on the bridge deck. The parameters for Condition 2 are shown in Table 4.

Table 4 Condition 2 loading parameters.

In this condition, test cases 1–13 involve multiple vehicles traveling in the same direction, while test case 14–17 involves two vehicles traveling in opposite directions.

After conducting 17 different loading scenarios, the multi-vehicle load estimation results are shown in Table 5.

Table 5 Load recognition results using traffic image under multiple vehicle loading.

According to Table 5, it can be observed that the spacing between vehicles has a significant impact on load identification accuracy when vehicles are traveling in the same direction at a constant speed. With an average vehicle spacing of 2.75 m (1/2 of the bridge span length), the maximum deviation in total load identification on the bridge is 7.1%, indicating a relatively ideal recognition performance. To some extent, it is also possible to estimate the individual load of each vehicle with reasonable accuracy, with average identification deviations of 4.1% for Vehicle 1 and 7.9% for Vehicle 2. However, Vehicle 2 shows a larger recognition error, with a maximum deviation of up to 16.6%. With an average vehicle spacing of 1.38 m (1/4 of the bridge span length), the deviation in total load identification on the bridge remains stable within 16%. However, the estimation of individual vehicle loads is not accurate. This result further confirms the influence of vehicle spacing on load identification. However, in terms of the accuracy of total load identification on the bridge, especially with smaller vehicle spacings, the recognition is relatively stable compared to the identification of individual vehicle loads. Similarly, it can be seen that when vehicles travel in opposite directions, the deviation in identifying the total vehicle load on the bridge is 13.6%. Comparing this to the error when vehicles travel in the same direction, it can be observed that opposite direction travel affects the load identification accuracy, but the overall impact is limited. It is a secondary factor affecting the identification accuracy.

As shown in Fig. 15, the non-contact image-based load detection method and the contact-based LVDT load detection method exhibit similar patterns, both reflecting the significant influence of vehicle spacing on load identification. However, the contact-based LVDT method generally exhibits higher accuracy, particularly in terms of the stability of total load identification on the bridge. This is partly due to the higher resilience of contact-based sensors to environmental noise such as wind, vibration, temperature, and light. In non-contact measurements, even minor disturbances to the camera pose can cause significant errors in the images. Although this study uses a dynamic-static separation method for camera disturbance correction, the errors are not entirely eliminated. Additionally, the measurement accuracy of non-contact methods is limited by the camera resolution.

Figure 15
figure 15

Comparison of load identification deviation.

The similarity in the patterns of the two methods' results suggests that although vehicles can be differentiated in the images when their distances are close, the duration of each vehicle's presence on the bridge is relatively short. As a result, the bridge displacement curve cannot clearly reflect the impact of multiple vehicles on the bridge structure response, leading to a decrease in the accuracy of individual vehicle load identification. This is determined by the displacement response characteristics of the bridge structure.

Field experimental study on bridges

Overview of field experimental study on bridges

Douzibei Cross-Line Bridge is located at the Longjiawan Tunnel entrance on Caiyuan Road, Chongqing. Figure 16 shows the location map of Douzibei Cross-Line Bridge, while Fig. 17 presents the plan and elevation views of the bridge. The total length of the bridge is 311 m, with four lanes for two-way traffic. The second span of the approach bridge, located on the straight section of the bridge and featuring a simply supported beam structure, was chosen as the subject of this study.

Figure 16
figure 16

Location map of Douzibei overpass bridge. Satellite map created using: ArcGIS3.1.5.41833 GIS Mapping Software, Location Intelligence & Spatial Analytics|Esri.

Figure 17
figure 17

Elevation and plan views of Douzibei overpass bridge.

The research in this paper focuses on load estimation methods based on traffic video data. However, due to the specific nature of traffic surveillance and associated limitations, the research team was unable to access existing traffic monitoring video data. Therefore, in this experiment, cameras were installed on lampposts to simulate the collection of bridge surface images by traffic surveillance cameras. To minimize the influence of camera vibration on the experiment, the camera was positioned on the lamppost at Pier 1, which is relatively close to the middle of the bridge span, resulting in smaller camera displacements caused by bridge vibrations. The installation position of the camera and the resulting image capture are shown in Fig. 18.

Figure 18
figure 18

Layout of photographic equipment.

Due to the speed limits and restrictions on truck traffic on Douzibei Cross-Line Bridge, the bridge primarily accommodates passenger vehicles, including private cars, taxis, and various types of passenger buses. Since the weight of small cars is relatively low, their impact on the bridge structure is minimal, and identifying their loads based on structural displacement response is more challenging, they are therefore disregarded in this study. The experiment focuses on load estimation for city buses and intercity coaches passing over the bridge.

City buses passing over the bridge are mainly HengTong buses. Table 6 shows one model of a new energy bus, with a total weight of approximately 17 tons and a front-rear axle distance of 5050 mm. The maximum speed is 69 km/h, and the speed on the test bridge ranges from 30 to 40 km/h. Besides city buses, intercity coaches and commuter buses also use the bridge. For example, Yutong buses have a total weight of approximately 12.68 tons and an axle distance of about 5700 mm. Due to the high traffic volume on the bridge and to avoid disrupting city traffic, the experiment was not conducted with a closed bridge, and there was interference from small cars. Therefore, this experiment focuses on estimating the load of heavy vehicles.

Table 6 Vehicle parameters.

After setting up and calibrating all experimental equipment and instruments, the camera parameters were measured. The camera's shooting tilt angle was recorded, and the vertical height from the camera's optical center to the bridge deck was measured to be h = 3120 mm. The horizontal distance from the optical center to the mid-span measurement point was measured to be \(d_{M}\) = 20 m. Subsequently, image data was collected at different time intervals.

After securing the camera, video images of the calibration test and the entire process of vehicles crossing the bridge at various times were obtained. A screenshot from the video is shown in Fig. 19.

Figure 19
figure 19

Bridge deck photographs at different time intervals.

After preparing for the experiment, periods with fewer vehicles were chosen for testing. Video recording began as a bus (approximately 17 tons) approached the bridge, capturing the entire process of the vehicle crossing the bridge. This served as the calibration test to obtain the full response of the load crossing the bridge and to fit the bridge displacement influence line. After calibrating the bridge displacement influence line, traffic images of the bridge deck were recorded multiple times at different periods to obtain dynamic displacement data of the bridge. This data was then used to estimate the loads of the vehicles crossing the bridge.

Experimental results and analysis

After capturing the video images during different time periods, the target recognition and tracking algorithm described in Chapter 2 was used to obtain the pixel displacement response of each measurement point at the mid-span section. As an example, a simulated traffic surveillance video was captured on January 30, 2023, at 19:49, with a duration of 24.52 s. Partial screenshots of the video are shown in Fig. 20, displaying the image information captured by the camera when a bus of route 207 arrived at the mid-span of the second span of Douzibei Bridge.

Figure 20
figure 20

Image information captured by the camera.

Due to the nighttime conditions, LED lights were installed as the reference points for structural displacement measurement and camera calibration to facilitate machine vision recognition and tracking. In Fig. 20, the structural displacement measurement points were arranged on the railing at the intersection of the mid-span section's motor vehicle lane and pedestrian pathway, while the camera pose calibration points were positioned outside the bridge.

As shown in Fig. 21, the obtained pixel displacement time series shows that the pink curve represents the pixel displacement time response at the structural displacement measurement point, and the green curve represents the displacement time series of the camera pose correction point. The two curves exhibit similar shapes, as the camera pose correction points are placed on a stable surface outside the bridge. This indicates that the majority of the information reflected in the two displacement curves is due to the camera's own disturbances. Consequently, the bridge structural response signal is entirely obscured by the camera disturbance signal, making it impossible to directly obtain useful bridge structural response information from the displacement measurement point signal.

Figure 21
figure 21

Flowchart of bridge displacement signal processing.

After correcting for the camera disturbances and separating the noise, the Initial Displacement Correction curve is obtained. Although there are still significant vibration signals and noise present, it is possible to observe the structural response signal under load from the curve (corresponding to the time-history signal around 16 s in Fig. 20).

Finally, the LOWESS algorithm is employed to separate the vehicle-induced dynamic displacement of the bridge, yielding the vehicle-induced static displacement response signal of the bridge.

After processing with the above algorithms, the structural response information under load can be more clearly obtained from the images. Although the final signal still contains some residual dynamic displacement of the bridge and a small amount of noise, it reflects the basic mechanical performance of the bridge without the need for additional hardware. This provides a data reference for the health monitoring of the bridge and prepares parameters for subsequent load calculations based on the displacement-influence surface relationship matrix.

Similar to the laboratory model bridge experiment, the test designates four time points \(t_{1}\), \(t_{2}\), \(t_{3}\), and \(t_{4}\) as time parameters. For each condition, the displacement amounts \(\delta\) at the measurement points are obtained from the displacement curves at the corresponding times \(t_{P}\). Additionally, the displacement influence amounts \(S\) at the corresponding positions in the displacement influence surface are obtained. These data are then substituted into the overdetermined equation system (Eq. 27) to solve for the unknown vehicle loads.

The test estimates the loads of 20 heavy vehicles, and the results are shown in Fig. 22.

Figure 22
figure 22

Load recognition results.

Figure 22 is a visualization of the load estimation results for vehicles crossing the bridge, where the bar chart shows the load estimation error for each heavy vehicle. From the figure, it can be seen that although the accuracy of the bridge deck vehicle load estimation method based on traffic surveillance images is not yet on par with traditional contact weighing methods, it can estimate the loads of heavy vehicles crossing small and medium-sized bridges within a 40% error margin without requiring additional hardware. This can provide a useful reference for the health monitoring of short- and medium-span bridges.

At the same time, although the real bridge experiment results demonstrated that the bridge deck vehicle load estimation method based on traffic surveillance video data can efficiently obtain bridge displacement response data and vehicle load information without requiring additional hardware, some influencing factors affecting the accuracy of vehicle load identification remain unresolved due to limitations such as target visibility. This vehicle load estimation method has yet to reach the weighing accuracy of BWIM. To improve monitoring accuracy and reduce monitoring errors, the following directions for further exploration and research are proposed:

  1. a)

    Optimize computer vision processing algorithms. Improve the accuracy of displacement measurement methods in conditions with greater distances and smaller deformations. Further explore camera disturbance correction methods, study more effective image enhancement and denoising algorithms to eliminate the impact of factors such as weather and air quality on image displacement detection.

  2. b)

    Explore more accurate bridge influence surface calibration methods based on photogrammetry. The influence surface calibration method proposed in this paper is based on lateral cubic interpolation of the influence lines obtained from three measurement points on the bridge cross-section. For wider bridges, accurately calibrating the influence surface would require additional measurement points, which would increase the computational complexity and extend the algorithm's running time. Exploring a method that can efficiently and accurately calibrate the influence surface of wide bridges to further reduce errors should be conducted in more detailed research.

  3. c)

    Based on the bridge structure displacement response to identify the loads of vehicles crossing the bridge, accurate vehicle load identification can only be achieved when the vehicle spacing is large. When the vehicle spacing is small, the accuracy of vehicle load identification significantly decreases, and only the total load on the bridge deck can be estimated to a certain extent. Further exploration is needed to accurately obtain the load of each vehicle, including axle loads, under various conditions.

Conclusions

To meet the demand for monitoring vehicle loads on a large number of medium and small bridges, this study, taking into account economic, practical, and efficient factors, fully utilizes existing traffic surveillance systems on bridges. It constructs a bridge displacement measurement algorithm based on video images and establishes a spatial–temporal relationship model between structural displacement, vehicle load, and load distribution to solve for vehicle loads. Multiple loading conditions were investigated through model bridge experiments and engineering field tests, confirming the feasibility of this method. The main conclusions are as follows:

  1. (1)

    Based on the video images from the bridge traffic surveillance, the target measurement points are tracked and located at the pixel level using the mean absolute difference algorithm. Additionally, a surface fitting algorithm is employed to calculate the sub-pixel level positioning of the target by analyzing the grayscale centroid in the local neighborhood. The LOWESS algorithm is utilized to separate the vehicle-induced static displacement from the bridge displacement, thereby improving the accuracy of vehicle load identification. By calibrating the static displacements induced by vehicles on the bridge, the displacement influence lines of the measurement points under moving loads are established, further constructing a bridge displacement influence surface model that reflects the mechanical effects of the bridge load distribution on the structure. Based on the displacement-influence surface matrix, a spatial–temporal relationship model is developed from both the spatial distribution and temporal evolution perspectives of structural displacement, vehicle load, and load distribution, enabling the identification of vehicle loads at any position on the bridge surface.

  2. (2)

    Experimental studies were conducted to estimate vehicle loads on a scaled model bridge based on video images. By capturing the bridge surface using cameras, measurements of bridge displacement responses and load detection under different working conditions were achieved. The displacement measurements exhibited an average percentage variation error (APVE) and a normalized root mean square error (NRMSE) within 10%. The average calculation deviation for single-vehicle load passing was approximately 7%, while the calculation deviation for total bridge vehicle load remained stable within 16%. However, the accuracy of detecting multiple-vehicle loads was highly sensitive to vehicle spacing.

  3. (3)

    Experimental studies were conducted to apply the bridge load estimation method based on traffic surveillance images in practical engineering scenarios. The practical implementation of the vehicle load estimation method using traffic images yielded an average calculation deviation of approximately 18% for heavy vehicle loads ranging from 12 to 18 tons. This method allows for continuous monitoring of bridge loads without the need for additional hardware, enabling the estimation of load distribution on the structure. It provides valuable bridge load information for bridge health monitoring purposes.