1 Introduction and background

Health monitoring of structures and mechanical systems has become a viable tool to assist owners to make informed maintenance and repair decisions. Several approaches have been developed to extract information from the measured vibrational characteristics of structures, which fall in a category well known as “vibration-based structural health monitoring (SHM)” [114]. These methods essentially utilize natural frequencies and mode shapes acquired from different types of physical sensors such as accelerometers. These predominantly used sensors suffer from disadvantages such as time-consuming installation procedures, data acquisition requirements, high number of required sensors, and accessibility on the target structure. Alternatively, optical devices such as laser Doppler vibrometers, which can be applied remotely, have been used but are expensive compared to the digital cameras that we use for our measurements [15]. Digital video cameras in conjunction with image processing techniques have also been used to this aim and other SHM purposes as they offer an inexpensive yet promising alternative. Digital image correlation (DIC) techniques along with other matching algorithms have been employed to monitor displacements with video cameras or tracking certain targets through time [10, 16, 17], [18]. Zaurin and Catbas [1921] applied image processing techniques to use video cameras as loading sensors for bridges and defined a so-called unit influence line (UIL) index as a measure of health in bridges [19]. Elgemal et al. [22] included computer vision in an integrated system to create a “decision support system” for bridges and other lifelines. It is good to note that recently several studies have been conducted to address the issue of measuring displacement of civil structures using videos cameras [2331].

In an attempt to estimate the first fundamental natural frequency of vibration of a structure, we earlier proposed a methodology based on virtual visual sensors (VVS) for video analysis using Eulerian-based coordinates [32]. Our methodology is based on the original idea presented in [33] and further applied in [34]. The essence of the original idea is that single pixels can carry essential information about minute changes of objects that are not visible to the naked eye, but can be made visible by a technique called Eulerian motion magnification [24]. In our work we have showed, experimentally, that change of intensity in certain pixels of a digital video can be related to the natural frequency of a vibrating structure. It should be noted that this approach is in its very nature different from what is used in feature tracking or block matching algorithms such as DIC; we are not calculating displacements in consequent frames but rather simply monitor the change of intensity of a selected pixel (or patch of pixels) with fixed (or Eulerian) coordinate(s) which keeps computational efforts very low.

By employing the basic methodology introduced in our initial work [32] we were unable to observe all natural frequencies of a lab-scale three-story structure (see Fig. 1a). This was partially because of the low signal-to-quantization noise ratio (SQNR) due to the limited and usually uncontrollable range of change in intensity. One solution may be to employ gradient pattern targets, which are mounted to the structure at location of interest, and expand the range of intensity values and thus reduce the quantization noise. As we discuss in Sect. 3.1, non-linear intensity functions produce multiple harmonic frequencies in the frequency domain. We thus decided to evaluate grayscale linear gradient pattern targets (LGPT) with a theoretical range of intensities from 0 (= black) to 255 (= white) (see Fig. 2a). The idea is to introduce a linearly varying background avoiding non-linear or impulse-like behavior. In Sect. 3.2 we discuss that another solution to increase the SQNR, which is oversampling in time. The idea of using a patch of pixels rather than a single pixel to alleviate challenges related to occlusion due to large displacements is discussed in Sect. 3.3. Section 3.4 presents the basic idea of LPGTs and in Sect. 3.5 we discuss two efficient ways to reduce their noise. In Sect. 4 we present our laboratory experiments and discuss the results from a number of different digital cameras used on two different structural systems. The results from monitoring of an in-service pedestrian bridge during an impact test are deliberated in Sect. 5. Finally, in Sect. 6 we present our conclusions and propose further work.

Fig. 1
figure 1

Experimental test setups for: a Three-story structure and b steel beam [35]

Fig. 2
figure 2

a Sample linear gradient pattern targets (LGPT), b intensity values captured by the camera and linear curve fit, c calculated noise, and d the histogram of the noise [35]

2 Motivation and objectives

The objectives of our study were to develop strategies for Eulerian-based VVS that will (a) minimize non-linear effects and (b) improve the signal-to-noise ratio (SNR) of the recorded data to enable the detection higher natural frequencies of vibration, which was not possible previously. We achieved this by introducing linear pattern gradient targets (LGPTs) that are mounted to the structure and by developing a number of signal processing steps. We demonstrate that our strategies work with a set of laboratory experiments as well as a field test on a real bridge.

3 Description of theory and proposed methods

3.1 Theoretical considerations

As mentioned previously, non-linearity in the intensity function in the spatial domain, I(x) will result in higher harmonics in the frequency domain [32]. This can be shown by the following theoretical relationship between vibrations and measured intensity:

$$ I\left( \omega \right) = F\left( {I\left( {x,t} \right)} \right) = 2\pi \left( \frac{L}{2} \right)^{n} \sum\limits_{k = 0}^{n} {\left( { - 1} \right)^{k} \left( {\begin{array}{*{20}c} n \\ k \\ \end{array} } \right)\delta \left( {\omega - \left( {n - 2k} \right)\omega_{0} } \right)} $$
(1)

where L is the amplitude of vibration, n is the degree of non-linearity, k the counter from 0 to n, ω is the radial frequency, ω 0 is the object’s radial frequency, and δ the Delta-Dirac function. For example, if the intensity function in the spatial domain, I(x) is a third degree polynomial, \( I(x) = x^{3} \) as shown in in Fig. 3a, and the displacement follows a sinusoidal function with a frequency of 1 Hz (Fig. 3, second row), the observed intensity response in the time domain, I(t) is not a sinusoidal function, as illustrated in Fig. 3a, third row. In the absence of noise, the Fourier transform of the intensity values have two peak frequencies at 1 and 3 Hz, as shown in the Fig. 3a, fourth row, which verifies Eq. (1). On the other hand, if I(x) is linear, the resulting I(t) is sinusoidal, as shown Fig. 3b. A highly non-linear case such as I(x) = x 99 results in an impulsive response of the intensity, I(t) in turn leading to multiple peaks in frequency domain as shown in Fig. 3c. This illustrates the effect of occlusion discussed earlier in this paper and in [32].

Fig. 3
figure 3

The effect of non-linearity in the spatial domain. Rows one to four (top to bottom) show intensity in the spatial domain, I(x), displacement, d(x), observed intensity in the time domain, I(t), and the Fourier transform of the former, respectively. Column a shows the case of \( I\left( x \right) = x^{3} \), b the linear case of \( I\left( x \right) = x \), and c the case of an impulsive change of intensity modeled by \( \left( x \right) = x^{99} \) [35]

3.2 Quantization error and signal-to-noise ratio

Analog-to-digital (A/D) conversion involves two main steps: sampling in time and quantization. Errors due to quantization and their effect on the signal is a known issue addressed in the literature, e.g., in [36]. Assuming that the quantizer uniformly covers the limit values and its error is independent of the original signal, it can be deduced that the error is equivalent to an additive white noise. The white noise model can also be used with high-resolution quantization, which satisfies the independence condition. In practical signal processing, in a process called “dithering”, some random noise within the range of quantization is added to the analog signal prior to digitization to satisfy the independence of the error from the signal [36].

In commercially available cameras, the quantization resolution to reflect the amount of absorbed energy in CCD sensors is usually 8 bit. However, as discussed previously in our proposed methodology, this energy (or intensity value) does not correspond to any physical quantity such as displacement or any of its derivatives. In other words, higher amounts of displacement, velocity, or acceleration do not necessarily cause higher change of intensity. Assuming that the quantization error can be modeled as white noise, increasing the sampling frequency will decrease its amplitude in the frequency domain. It can also be shown that by doubling the sampling frequency, the power of quantization noise decreases by 3 dB. This means that by doubling temporal sampling rate, the maximum theoretical increase in the SNR is 3 dB. The frame rates of commercially available cameras are in the range of 30, 60 and 120 Hz, which is reasonably sufficient for measuring frequencies in large structural systems such as bridges but may not be sufficient to detect all of the natural frequencies due to the high quantization error. High-speed cameras represent an effective yet expensive solution to this issue, which we have also shown earlier [37]. A discussion of high-speed cameras can be found in Sects. 4.2.4 and 4.3.

3.3 Patch processing

As discussed earlier and visualized in Fig. 3, choosing one pixel in a digital video recorded at a comparatively low frame rate and resolution can lead to ambiguous peaks in the frequency domain, which makes the detection of higher frequencies difficult or often impossible. To solve the problem of occlusion, which produces periodic impulses in time and frequency domain, one can choose a patch of pixels and monitor their average value though time. In other words, by choosing a patch of pixels, we virtually decrease the ratio of displacement to the pixel size which makes the change of intensity smoother. Patch processing can be applied to videos where no targets are used or combined with LGPTs as discussed in more detail in Sect. 3.5.

3.4 Linear gradient pattern targets

In this study we propose the idea of using LGPT mounted to locations of interest on the structure. For this case, we do need access to the structure in order to mount the targets, which may require ladders or lifts, depending in the situation. Also, the target needs to be oriented in the expected direction of motion to capture the vibration amplitudes accurately. We employed LGPTs with different sizes in our experiments as shown in Fig. 2a. The idea of these targets is to create a well-defined, linearly varying background to avoid non-linear behavior as discussed in Sect. 3.1. The criteria for size is to optimize maximum amplitude of vibration, A with the length of the target, L. A typical cross section of an LGPT as it is captured and represented by the camera is shown in Fig. 2b. The intensity value, although designed to be linear, contains noise as is shown in Fig. 2c. This noise was computed by subtracting the linear curve from the captured intensity curve. A histogram of the noise is shown in Fig. 2d.

3.5 Noise reduction strategies for LGPTs

A strategy to reduce the noise would be to average the intensities of a patch of pixels on the LGPT as shown in Fig. 4a. From Fig. 2d it can be seen that the average of the noise is close to zero so it can be deduced that by averaging the pixel values, we essentially reduce the noise. Another strategy for noise reduction is to choose random pixels on the gradient and fitting a linear function through them (Fig. 4b). Tracking the constant part of this linear function through time can lead to a much less noisy signal, in the case of our lab experiments it improved the SNR by 3–6 dB. The requirement for these computationally inexpensive noise reduction techniques is that during the whole vibration phase, the selected pixels should never leave the LGPT range, otherwise artificial non-linear behavior is introduced. A solution to this is to employ LGPTs that consist of several patterns with different lengths as shown in Fig. 2a. The most appropriate target can then be picked after the digital video has been collected, which is one of the advantages of our approach.

Fig. 4
figure 4

Two noised reduction strategies for LGPTs. a Patch on an LGPT to average out the noise, b linear regression approach: measured intensities are mapped onto linear curve fit [35]

4 Laboratory experiments

4.1 Experimental test setup

Two laboratory experiments were performed: Free vibration of a lab-scale three-story structure as shown in Fig. 1a and free vibration of a simply supported steel beam (Fig. 1b). In the first test, an initial displacement was manually imposed on the structure by hand. Following a sudden release, the system’s free vibration was recorded until it was damped out. The 3.6 m-long simple support steel beam with a W15 × 87 cross section was struck with an instrumented hammer to impose structural vibrations. In both experiments, high-precision capacitive accelerometers were attached to the structures to verify the frequencies estimated from the VVS data.

4.2 Three-story structure tests

4.2.1 Cameras used

Three different cameras were used for the laboratory experiments. For the three-story structure experiment, a GoPro Hero 3 camera and a Photron UX100 (Fig. 5a, c) were used. The resolution of the GoPro camera was 1280 × 720 pixels and the frame rate was 120 fps. The Photron camera was used with 500 fps and a full resolution of 1280 × 1024 pixels to evaluate the ability of detecting higher-order frequencies with high-speed cameras. Finally, for the beam experiment, and to explore the limits of our proposed methodology, a Photron FASTCAM SA-X2 (Fig. 5b) with 5000 fps and a full resolution of 1024 × 1024 pixels was employed. It is important to note that there is a trade-off between resolution and frame rate due to the bandwidth limit of the camera hardware. Also, based on our own observations, the spatial noise power in high-speed cameras is relatively high, i.e., higher than in regular cameras. Three LGPTs with dimensions 8 × 60 mm were attached to the three different masses of the three-story structure as shown in Fig. 1a. For our analysis, we used VVS data collected from the first-story mass, as it was best suited for the size of our LGPTs.

Fig. 5
figure 5

Cameras used for the laboratory experiments: a Photron FASTCAM UX100, b Photron FASTCAM SA-X2, and c GoPro Hero 3 [35]

4.2.2 Reference data from accelerometers

Two high-fidelity capacitive accelerometers were attached to the side at the height of masses two and three (see Fig. 1a) and sampled at 1 kHz using a high-speed data recorder. The natural frequencies computed from the acceleration data from the second and third story was essentially the same for all of the experiments, as shown in Fig. 6a, b, respectively. The only difference in the frequency domain was that the magnitude of the peaks slightly varied. This, however, had no influence on the value of the peak frequency. The fundamental frequencies of vibration were found to be 4.70, 14.0 and 20.9 Hz for both stories. It can be observed that the SNR for the second story is higher, which is due to the fact that the third and second natural frequencies of vibration have a much stronger contribution than for the third (= top) story.

Fig. 6
figure 6

Sample data from the accelerometers: a second story and b third story. The left and right columns show data in the time and frequency domain, respectively [35]

4.2.3 Results from GoPro camera

Figure 7 shows the results from the GoPro camera without the use of LGPTs. In Fig. 7a, it can be seen that the first and second peak frequencies are detectable, but at the same time there are several higher harmonics in the frequency domain which made it difficult to choose the right natural frequency. Using a patch of 50 × 50 pixels, however, it is possible to detect all of the natural frequencies (Fig. 7b) bearing in mind that other peak frequencies are just multiples of the first one. Although the magnitude of the third mode is not very large, it is still detectable (Fig. 7b). The duration of the signals are about 10 s, which produces a resolution in the frequency domain of 0.10 Hz.

Fig. 7
figure 7

Data extracted from videos taken by the GoPro camera without LGPTs: a one pixel in the middle of first floor and b a 50 × 50 patch of pixels in the middle of the first floor [35]

As proposed in Sect. 3.4, LGPT should significantly improve the results in Fig. 7. By selecting a pixel on the gradient target of the first floor, all three natural frequencies could be recovered as is evident in Fig. 8a. Figure 8b shows the same data processed using a patch of 5 × 5 pixels on the LGPT which reduced the noise by 2.7 dB. Figure 8c shows the data when the linear regression approach as presented in Sect. 3.5 is employed. As can be observed, this processing step is capable of reducing the noise even better than the patch if applied for the case when LGPTs are used.

Fig. 8
figure 8

Data extracted from videos taken by the GoPro camera with LGPTs: a one pixel on the gradient target, b a patch of 5 × 5 pixels on the target, and c linear regression applied to ten randomly chosen points on the gradient [35]

As can be observed from the inserts in Fig. 8 in the frequency domain, because of the relatively low SNR in Fig. 8a, accurate estimation of the third peak frequency is difficult. Hence, the small difference in the third peak frequency can be associated with the low SNR. As can be seen in Fig. 8b, c, this problem is resolved using patch processing and linear regression, respectively, which notably improved the SNR of the signal. It should be noted that the use of LGPTs improves the contrast and decreases quantization intervals simultaneously. Comparing the time history part of Figs. 7 and 8 shows that the range of change in intensity values is much larger when LGPTs are used. This, as previously mentioned, helps to reduce the effect of quantization noise, which is partially responsible for the missing peak in the frequency domain (Fig. 7a).

4.2.4 Results from photron FASTCAM U100 camera

As discussed before, high-speed cameras can help improve the SQNR. For high-speed cameras, picking a pixel at the bottom of the three-story structure without any noise reduction strategy can reveal all of the natural frequencies (Fig. 9a). Selecting a patch of pixels on the other hand will produce a much less noisy signal and avoid the presence of artificial peaks in the frequency domain as shown in Fig. 9b. As described before, the Photron FASTCAM U100 was used for this experiment. The problems associated with this type of camera are their limited storage, which leads to shorter recording time, limited bandwidth, which results in a sacrifice of spatial resolution with higher temporal resolution, and also higher spatial noise. Also, the higher the frame rate, the better illuminated the medium should be to have high-quality videos. The use of LGPTs for high-speed cameras can be beneficial as well. As can be seen from Fig. 9c, although the signal is noisy, the peaks are more pronounced. Using a patch of pixels on the LGPT (Fig. 9d) reduces the noise, increases SNR by 9 dB, and shows the peaks even clearer. Also, linear regression can increase SNR by almost 12 dB (Fig. 9e). This shows again that linear regression compared to patch processing can result in a better SNR.

Fig. 9
figure 9

Data extracted from videos taken by the FASTCAM UX100 camera with and without LGPTs: a one pixel at the very bottom, b a patch of pixels at the mid-level of the first floor, c one pixel on the LGPT, d a 5 × 5 pixel patch on the LGPT, and e linear regression applied to ten random pixels on the LGPT [35]

4.2.5 Comparison of results

Table 1 shows the SNR for all experiments on the three-story structure. In the case of the high-speed camera, the level of SNR of one pixel signal where no LGPTs are used is artificially higher due to exponential decay of the signals where LGPTs are used. Also, a comparison of cases using low- and high-speed cameras with LGPTs evidently shows an increase in the SNR from 3.4 to almost 6 dB, which is close to the theoretical bound due to the improved SQNR. It should be noted that the setup of this experiment with high-speed camera was a little bit different from the previous tests and that the duration of the signal was 7 s, which gives a resolution around 0.14 Hz. These factors explain the slight difference in the third peak frequency in Fig. 9, and the accelerometers’.

Table 1 Comparison of SNR from three-story structure

4.3 Steel beam tests

Finally, and to explore the limits of our proposed methodology, we conducted a test on a steel beam as shown in Fig. 1b. The stimulus was provided by a hammer strike imposed at the mid-span location. In this test, as shown in Fig. 10a, b, several peak frequencies were deducible from the accelerometer data. These reference data were collected using the same high-precision capacitive accelerometer as used for the three-story structure. Monitoring a patch of pixels at the boundary of the steel beam where the gradient of the intensity is maximum (the edge), it was possible to detect several peak frequencies in congruence with the measured accelerometers’ peaks.

Fig. 10
figure 10

The steel beam test results: a results from the FASTCAM SA_Z camera and b results from one accelerometer [35]

Figure 10a shows the peak frequencies detected by the camera, while Fig. 10b shows the data from the accelerometers for comparison. As can be seen, even without the LGPT, by using a patch of pixels, several peak frequencies could be detected. The interesting point about this experiment is that it involves a continuous system where the high-frequency displacements are extremely small and completely unobservable by the naked eye. However, it was possible to identify frequencies as high as 764 Hz using our proposed VVS methodology. The resolution in the frequency domain is approximately 1 Hz.

5 Field test

To evaluate the real-world performance of our proposed approach, we conducted a field test on the Streicker Bridge (Fig. 11a): a prestressed concrete pedestrian bridge located on Princeton University’s campus in Princeton, NJ. As can be seen from Fig. 11a, the bridge has a unique design with a main span and four horizontally curved legs. The main span consists of a deck-stiffened arch. The bridge is equipped with an SHM system consisting of embedded fiber-optic sensors. The data were made available to us by Prof. Branko Glisic and allowed for a direct comparison with our measurements. The dynamic stimulus was provided in the form of a group of students jumping in unison for a few seconds at the location of our LGPTs.

Fig. 11
figure 11

a View of the Streicker Bridge and b test setup, camera position and LGPTs (insert) [35]

LGPTs were mounted on the inside of a curved leg to measure the vertical vibrations while the cameras were on the other side of the street, approximately 8 m away from the targets. The camera used was a Canon T4i with 60 frames per second and 128 × 730 resolution. The test was performed on April 23, 2014, with adequate lighting conditions and some wind. We verified that the wind did not affect the measurements by comparing several measurements taken at different instances in time. Figure 12 shows the results in the frequency domain for both measurements. As can be observed, the two main frequencies of vibration of the leg, namely 3.0 Hz and 3.6 Hz, were detected by both sensing approaches. The low-frequency content in the VVS data (Fig. 12a) can possibly be explained by slightly periodically varying lighting conditions due to trees rocking with the wind. In this case, this is not a real issue since the motion did not result in an additional peak that could be misinterpreted. Although it was not possible in this case to calculate an SNR, it can be observed that the two frequency plots are of very comparable quality.

Fig. 12
figure 12

Frequency response of the Streicker Bridge from a the VVS located on an LGPT and b the fiber-optic sensor system

6 Conclusions and outlook

In this paper, we evaluate and discuss a number of strategies to detect higher frequencies of vibration using our earlier proposed Eulerian-based VVS [32]. It can clearly be seen that, based on our experiments, the use of LGPT and high-speed cameras can improve the SNR and help detect multiple frequencies in multi-degree-of-freedom (MDOF) structural and mechanical systems. From our study, we conclude the following:

  • The introduction of LGPTs increases the SNR and enables detecting higher natural frequencies which is particularly helpful when standard digital video cameras are used.

  • Analyzing a patch of pixels rather than a single pixel can be employed when no LGPTs are used to smooth the change of intensity, i.e., minimize impulse-type response in the signal.

  • By analyzing a patch of pixels or applying a linear regression approach, the SNR of LGPTs can further be improved.

  • While high-speed camera technology is still expensive and mostly used by researchers, the use of commercially available digital video cameras in conjunction with LGPTs allows for accurate and reliable detection of multiple natural frequencies.

  • High-speed cameras benefit from lower noise amplitude due to oversampling and are able to detect higher frequencies even without LGPTs.

  • Our methodology also works in the field where we found the same peak frequencies compared to the existing SHM system.

  • A limitation of our approach is that it can only accurately capture vibrations that are perpendicular to the line of view of the camera and in the direction of the LGPT.

Future work includes correlation between intensity and actual displacement, evaluation of advanced signal processing methods to further improve the SNR and quantization noise, and account for noise caused by the vibration of the camera due to wind and traffic, and environmental factors such as atmospheric interferences and variable lighting conditions. Finally, we are planning to explore solutions to capture vibrations in two dimensions, perpendicular to the line of view of the camera, by employing two-color targets.