1 Introduction

Over the past 60 years, the construction of large dams has been on the rise due to population growth and the urgent need to manage finite water resources effectively. At present, the International Commission of Large Dams (ICOLD) maintains a record of 61,988 large dams constructed worldwide, boasting a combined storage capacity of about 8767 km\(^{3}\) [1]. In addition to providing great benefits, these structures also represent a potential risk to the ecosystems and populations downstream. In particular, concrete dams can experience changes in their material composition throughout their lifespan due to physical, chemical, biological, and structural factors. These alterations can have a significant impact on both the performance and safety levels of the dams.

The monitoring of the structural performance of dams has been carried out over the years through auscultation. Instruments, such as pendulums, thermometers, extensometers, piezometers, and accelerometers, among others, are installed at key points of the dams. Decisions about structural health often rely on qualitative information from visual inspections and analysis of static data like deformation, stresses, displacements, temperature, and rotations [2,3,4].

The evaluation of dams’ real behavior under vibrations can be traced back to dynamic field tests. Forced vibration tests (FVTs) were the initial tests performed with the aim of validating finite element models and predicting the structure’s response to earthquakes [5,6,7,8]. Subsequently, ambient vibration tests (AVTs) have been introduced. This type of testing has emerged as the most practical dynamic test to be conducted on concrete dams [9] mainly for three reasons. Recent advances in vibration sensors allow the measurement of lower vibrations, as they present a higher resolution and lower noise. Second, there is a lot of theoretical background behind the development of methods for modal identification from only output data [10]. Third, there are some difficulties related to the excitement of the dam from external devices, such as their transportation to the dam location, frequently a remote place, and their position on the dam. Among the objectives of performing AVTs on dams, the most common has been to obtain experimental data to calibrate numerical models. Some studies on this subject are reported in [11,12,13,14,15,16].

Long-term AVTs have been employed in developing vibration-based structural health monitoring systems. These systems depend on the ongoing identification and monitoring of modal parameters. Under normal operating conditions, new data sets are obtained periodically and compared with an initial reference model. A significant change in these data may correspond to damage to the structure [17]. Large concrete dams are challenging due to their location in constantly changing environments. These harsh conditions can result in variations in modal identification that may not consistently indicate structural damage.

The literature reports that continuous ambient vibration monitoring programs have been conducted at the concrete dams Mauvoisin (Switzerland) [18], Hitotsuse and Ohkura (Japan) [19, 20], Fei-Tsui (Taiwan) [21], Cahora-Bassa Mozambique [22], Roode Elsberg (South Africa) [23], Chirkey [24], Cabril, Foz Tua, and Baixo Sabor (Portugal) [25,26,27]. These programs aim to track the temporal evolution of dynamic characteristics, primarily focusing on natural frequencies.

The assessment of identified modal parameters in dam infrastructure heavily relies on the water reservoir level. For instance, Oliveira et al. [21] discovered that a variability of 12 m in the reservoir water level of the Cahora-Bassa dam leads to a relative variation of 9% in the natural frequencies of the first mode and 10% in the natural frequencies of the second mode. Pereira et al. [25] found that the frequencies of the fifth mode of vibration of the Foz Tua dam decreased by about 13%, going from 6.67 Hz to 5.78 Hz in only 18 h with an increase in the reservoir level of about 2 m.

Many experimental and field studies have shown that temperature variation significantly affects the structural vibration properties of large civil structures [28]. Okuma et al. [18] reported that slow changes in natural frequencies of the Hitotsuse dam correspond to monthly and daily changes in surface temperature; however, it was impossible to quantify the variation due to temperature, as it is correlated with the reservoir water level. Ueshima et al. [19] found that the natural frequency associated with the vibration mode normal to the crest line of the Ohkura dam is highly correlated with the dam surface temperature, which causes an annual periodic variation between 2Hz and 3Hz. The authors emphasize that this variation is much larger than the observed when the dam has been subjected to strong earthquakes. After removing the influence of reservoir level on the natural frequencies of the Baixo Sabor dam, Pereira et al. [26] discovered that the frequencies’ daily and seasonal variations are affected by air temperature, specifically for modal orders of 6 or higher.

Statistical models have been used in dynamic dam monitoring to predict structural responses. The models that have been adopted include power regressions (PR) [19], Gaussian process regressions (GPR) [22], and multiple linear regressions (MLR) [24, 26]. Statistical models based on MLR provide several advantages, including simplicity of formulation, rapid execution, and the ability to determine the contribution of each loading action to the dam response [29].

In 2016, an accelerometer network was installed at the Itaipu Hydroelectric Dam located on the border of Brazil-Paraguay. Since 2018, acceleration data have been gathered every 24 h. This paper presents a study on the impacts of upstream water level and temperature on the identified modal parameters of a hollow-gravity concrete block. This concrete block is instrumented with two triaxial accelerometer sensors. An automated procedure based on the automatic interpretation of the stabilization diagram from the Covariance Driven Stochastic Subspace identification (SSI-Cov) is used to obtain a new set of modal parameters every 30 min. Tracking these parameters over 3 years allows for the observation of both daily and seasonal variations in the dam’s structural dynamics. In that sense, the main contributions of this research are summarized as follows:

  • A methodology developed by the authors [30] for automatic operational modal analysis is tested on a real dam structure. This methodology relies on the SSI-Cov method and utilizes stabilization diagrams to identify the structure’s modes.

  • The influence of water level and air temperature on the modal identification process for the natural frequencies and damping ratio in a hollow-gravity concrete block is reported.

  • A statistical procedure for mitigating the influence of environmental factors on the natural frequencies of the analyzed block is described.

  • The importance of considering time lags in the modal identification of the block is discussed, stemming from the slow heat propagation in massive concrete dams.

2 Case study: the Itaipu hydroelectric dam

2.1 Dam and instrumentation description

The Itaipu Hydroelectric Dam is located on the Paraná river, on the border between Brazil and Paraguay. Twenty generating units provide a maximum capacity of 14,000 MW, making it one of the largest renewable energy generators in the world. With an extension of 170 km and a flooded surface area of 135000ha, Itaipu supplies about \(10.8\%\) of energy in Brazil and \(88.5\%\) in Paraguay.

Fig. 1
figure 1

a Itaipu project, overview. b Location of the F17/18 hollow-gravity concrete block in the main dam

Fig. 2
figure 2

Cross-section of the F-17/18 concrete block

Fig. 3
figure 3

Itaipu accelerograph network

The Itaipu Dam is 7919 m long with a maximum height of 196 m. The construction began in 1975 and began operations in 1984. Other details of the dam can be found on Itaipu’s website [31].

Figure 1a shows the general scheme of the project. The structure of the Itaipu complex presents three types of dams: Earth-fill dam, concrete dam, and rock-fill dam. The concrete dam is divided into four parts, corresponding to the right-wing dam, the main dam, the connection blocks, and the bypass structure. The main dam comprises stretch F with 36 hollow-gravity blocks; the wing dam and the connection blocks, stretches D, E, and I with 91 buttressed blocks, and the bypass structure stretch H with 14 gravity blocks.

This paper studies the dynamic behavior of the concrete block F-17/18 located in the main dam as shown in Fig. 1(b). This dam has a length of 612 m at an elevation of 225 m (crest level). At the upper part of this dam, there are water intakes that control the flow of water from the penstocks to a spiral casing, where the kinetic energy of the water flow rotates the turbine of the generating units. These characteristics can be seen in Fig. 2 of the cross section of the concrete block F-17/18.

Itaipu technicians are assisted by 2218 instruments (1362 in the concrete and 856 in foundations and earthen embankments), of which 270 are automated. Figure 3 shows the accelerograph network installed in 2016. The network comprises seven stations distributed around the dam and one station installed in a neutral zone. There are five instrumented concrete blocks, including F-17/18. The sensors are Guralp Systems CMG-5TD digital triaxial force feedback accelerometers configured to continuously acquire acceleration signals at a sampling frequency of 200 Hz and produce hourly.gcf files (Guralp compressed file). For the dynamic monitoring of the concrete block F-17/18, data from the accelerometers ACL05-F18 located at an elevation of 144 m, and ACL06-F18 located at an elevation of 44 m were used. Both accelerometers are positioned with the N (North) component in the longitudinal direction of the ridge and E (East) in the downstream direction.

2.2 A Python-based computational tool for automatic operational modal analysis

To automatically estimate and track the dynamic parameters of the F-17/18 block, a Python-code based tool was developed at the Competence Center for Dam Structures (EB.DT-FPTI) of the Itaipu Technological Park. This tool is adapted to the type of data generated by the Itaipu accelerographic network; therefore, it can be used for the automatic modal analysis of any of the concrete blocks of this dam as long as adjustments are made to the input parameters of the programmed routines. In Fig. 4, the approach used in this application is presented. A more detailed description of this procedure can be found in [30].

Fig. 4
figure 4

Automatic operational modal analysis procedure

The time-domain covariance-driven stochastic subspace identification method (SSI-Cov) has been widely used for continuous dynamic monitoring of large civil structures [32,33,34,35,36]. One of its advantages is the simultaneous identification of several modes. Stabilization diagrams are constructed after applying the method for different model orders, allowing the different modes of the system to be visualized as well-defined vertical columns. The automatic OMA procedure consists of the automatic interpretation of the stabilization diagram.

This procedure starts with the pre-processing of the acceleration time series. A low-pass Butterworth filter is used to remove noises and decimation to reduce the frequency band to one of interest. Then, the acceleration amplitudes are characterized by root mean square (RMS).

The SSI-Cov method is applied to the pre-processed acceleration time series for multiple model orders, and the stabilization diagrams are constructed. Criteria, such as relative differences between natural frequencies, damping ratios, Modal Assurance Criterion (MAC), Modal Phase Collinearity (MPC), and Modal Phase Deviation (MPD), are computed and used as features in a Fuzzy C-means algorithm to separate spurious modes from physical modes. Uncertainty criteria of natural frequencies and damping ratios are also used to eliminate spurious modes from the stabilization diagram. Subsequently, the remaining modes whose characteristics (natural frequencies and mode shapes) are similar are grouped using Hierarchical Clustering Analysis (HCA). Finally, a representative physical mode from each group is extracted and a new set of modal parameters is obtained. This new set is compared with reference modes, which allows the tracking of modal parameters for the monitoring period.

The usual approach for modal tracking involves establishing fixed [37] or variable [38] thresholds to determine the maximum acceptable changes in frequencies and mode shapes. Subsequently, the relative frequency differences and MAC are computed between every new set of modal parameters and the reference modes. Since there are only two triaxial sensors for the continuous monitoring of modal parameters of the F-17/18 concrete block, the number of mode shape components is insufficient to accurately define a mode. This results in a high level of variability when trying to compare it with a reference mode at different time intervals. In this application, each mode is assigned a fixed threshold for the maximum allowed variation of the natural frequencies.

3 Continuous modal parameter identification

3.1 On the configuration of the automatic operational modal analysis algorithm

The developed computational tool was used to identify and track the dynamic characteristics from the data recorded on the F-17/18 concrete block from 01/01/2018 to 31/12/2020.

The sampling frequency was 200 Hz and the frequencies of interest are below 14Hz, and thus, the signals were filtered with a 12th-order low-pass Butterworth filter with cutoff frequency of 20 Hz and then decimated by a factor of 5. The modal identification was finally performed on 30-min acceleration time series, sampled at 40 Hz.

For the SSI-Cov identification algorithm, all channels of the ACL05-F18 and ACL06-F18 accelerometers were used as reference channels. As mentioned in the previous section, in this work, the methodology for estimating modal parameters entails the use of models with varying orders within a predefined interval. A previous inspection of the stabilization diagrams for some data sets revealed that models with orders between 2 and 160 represent the dynamic behavior of the structure. This wide range ensures consistent identification of weakly excited physical modes across the multiple model orders. Furthermore, the construction of the stabilization diagram follows the approach proposed in [39, 40]. This approach allows estimating the uncertainties surrounding the modal parameters identified for each model order to clean the stabilization diagram. The block size of the Toeplitz matrix i is a crucial factor, as it greatly amplifies the computational cost of uncertainty determination using this approach. Through initial executions of the SSI-Cov, it was found that a certain number of block rows, \(i=80\), achieved a favorable equilibrium between computational efficiency and quality of the identified parameters.

To determine the reference modes to be tracked, a preliminary database of modal parameters was created. This was done for the first two months of monitoring, specifically from 01/01/2018 to 28/02/2018. In the 0–20 Hz, ten modes were consistently identified. Nine selected for modal tracking are linked to the concrete structure. The determination of the nature of the remaining mode, with a frequency of around 8.03, requires further information.

To track these reference modes, a threshold needs to be set for the maximum allowed variation in natural frequencies. Initially, this threshold was set at 5% for all modes. However, as the identification of modal parameters evolved over time, it was observed that natural frequency variations of modes 2, 3, 4, and 5 were outside this threshold. Consequently, the threshold for these modes was increased to 10%. Table 1 summarizes the input parameters used. The natural frequencies \(\textbf{ f }\) and damping ratios \(\mathbf {\xi }\) are presented as averages of the analyses performed on the data sets for the first two months of operation.

3.2 Characterization of acceleration amplitudes

Figure 5 displays the characterization of accelerations by RMS measured on the ACL05-F18 and ACL06-F18 accelerometer channels during 2018. The results show two operating conditions. The intensity of accelerations increased during January, February, March, April, October, and November. This was the result of heavy rains in the Paraná river basin and subsequent spillway opening. During May, June, July, August, September, and December, there were regular levels of excitement caused by the typical conditions of the dam, including traffic, wind, and the inherent noise produced by the generating units [41].

Figure 6 shows a zoom of the measured RMS for the first two months of modal monitoring. During January 2018, water discharge occurred almost uninterrupted, with acceleration peaks on January 14 due to a rare event where all 14 gates of the three spillway channels were opened simultaneously.

Table 1 Summary of input parameters for automatic operational modal analysis
Fig. 5
figure 5

RMS of accelerations measured between 01/01/2018 to 31/12/2018

3.3 Time–frequency spectrogram

One way to preliminarily know the content and evolution of natural frequencies in a particular frequency band is to apply and assemble the singular value decomposition spectrum for several data sets. Thus, the energy associated with a frequency at a specific time is presented through colors. Cooler colors indicate the presence of low energy, while warmer colors correspond to progressively stronger energy content. Figure 7 shows the spectrogram of the signals recorded by ACL05-F18 in the E/W direction for the first two months of modal tracking. The black marks indicate the frequencies chosen for modal tracking, which, as noted throughout the two months, were characterized by high-energy contents resulting in the yellow–red stripes.

A comparison of Figs. 6 and 7 shows that the periods where the warm fringes appear in the spectrogram are in agreement with the increase in the intensity levels of accelerations due to the opening of the spillway.

Fig. 6
figure 6

RMS of accelerations measured between 01/01/2018 to 28/02/2018

Fig. 7
figure 7

Spectrogram with frequency evolution from 01/01/2018 to 28/02/2018. Black marks on the frequencies chosen for tracking

3.4 Natural frequencies and damping ratios’ evolution

For the 3-year monitoring, 48,693 data sets were processed. Each 30-min data set was analyzed independently through the automatic procedure for operational modal analysis. Thus, the temporal evolution of the natural frequencies and damping ratios revealed from the analysis is illustrated in Figs. 8 and 9, respectively. Each color represents a vibration mode, and each point in Fig. 8 is an average over an 8-h period. Throughout the analysis period, there were five interruptions in acceleration data acquisition, with the lengthiest interruptions lasting approximately 25 and 22 days, from 20/02/20 to 16/03/20 and 08/06/20 to 30/06/20, respectively.

Table 2 shows the results of continuous monitoring using statistical indicators, such as minimum, mean, maximum, and standard deviation. In addition, it displays the tracking rates of the identified modes. The automatic estimation procedure for the modal parameters successfully identified the nine modes. Modes 5, 6, and 9 were the most challenging to identify. These modes are not very excited either because there are low levels of vibration or because of changes in loading conditions in position and amplitude. In contrast, modes 1, 2, 3, 4, 7, and 8 exhibited a high occurrence, with tracking rates greater than 80%.

While the time evolution of natural frequencies depicted in Fig. 8 may not exhibit noticeable changes due to the scale, the statistical data for each mode reveal significant shifts throughout the tracking period. The maximum relative variations of the identified frequencies are between 2.79% and 11.79%. Moreover, aside from modes 1 and 2, the natural frequencies of the remaining modes exhibit notably high standard deviations, surpassing 0.06 Hz. In contrast to the natural frequencies and as expected, the damping ratios show a quite visible dispersion as revealed in Fig. 9. In the identification process, modes whose damping ratios were larger than 10% were discarded. Excluding the outliers, the damping coefficients are mostly below 6%, except for mode 1, which reached 8%.

Fig. 8
figure 8

Temporal evolution of 8-h average natural frequencies from 01/01/2018 to 31/12/2020

Fig. 9
figure 9

Temporal evolution of damping ratios from 01/01/2018 to 31/12/2020 and damping distributions

Table 2 Statistical results for the 3 years of continuous identification

4 Upstream water level and air temperature effects

4.1 Effects on natural frequencies

In arch dams, after a certain water level, the increase in mass added by the reservoir leads to a decline in natural frequencies [18]. Such an inversely proportional relationship also occurs in the hollow-gravity concrete dams, as revealed in Fig. 10. The plot compares the evolution of the upstream water level with 7-day averages of the natural frequencies for modes 1 and 2 of the F17-18 concrete block.

Fig. 10
figure 10

Comparison upstream water level with 7-day averages of natural frequencies for modes 1 and 2

Observations for the first half of 2018 are not available for this environmental factor. In the period from 01/07/2018 to 31/12/2020, the reservoir level presented an overall variation of 4 m, with a maximum of 220.4m and a minimum of 216.4m. It can be observed that the periods November–December/2018, October–November/2019, and February–March/2020 show abrupt declines. Conversely, in May–June/2019, December/2019, and April/2020, heavy rains caused significant increases in water levels.

During the period November–December/2018, an average effluent of 22,000 m\(^{3}\)/s was presented due to a period of increased power production and the opening of the spillway in November [41]. This resulted in a 4 m drop in the upstream water level. During the same period, the natural frequencies of mode 1 increased by 0.08Hz and those of mode 2 by 0.1Hz. Notably, the natural frequencies linked to mode 1 proved to be the most responsive to variations in the reservoir water level.

Table 3 presents the distance correlation coefficients [42, 43] between the water level and the natural frequencies. This metric enables the quantification of linear or nonlinear dependence between time series. Hence, a distance correlation coefficient equal to 0 indicates independence between the time series, while a value equal to 1 indicates perfect correlation.

Table 3 Distance correlation between upstream water level and natural frequencies

In contrast to modes 1 and 2, the natural frequencies of the other modes exhibit moderate correlations with the water level. The total fluctuation of the reservoir level over the 3 years was 4 m. The observed increase in apparent added mass due to fluid–structure interaction had no significant effect on the variability of all vibration modes. This implies that modes, less influenced by changes in reservoir level, might predominantly vibrate in directions where fluid interaction with the structure is minimal—such as the face of the dam contacting the water. Conversely, directions perpendicular to this face may intensify the interaction between the dam and water, potentially making modes vibrating in this direction more susceptible to influence.

A different behavior is reported in studies conducted in arch dams, where the evolution of the natural frequencies of all monitored vibration modes aligns with water-level variations [37, 44]. Even in the Foz Tua dam [26], with a variability of less than 4 m in the water level, such coherence was also obtained, although differences in frequency amplitudes exist.

Ambient thermal variations have little impact on the behavior of the gravity-type concrete structures of stretch H of the Itaipu dam. In contrast, the buttresses in stretch D, E, and I and the hollow gravity concrete blocks in stretch F are strongly influenced by thermal oscillations that are reflected in internal stresses, underpressures at the concrete–rock contact, drainage flows, and displacements of the dam crest. Hollow gravity blocks on the downstream side are isolated from the environment below EI. 144 m by the powerhouse, and it presents higher robustness in the crest region due to the presence of water intakes. As a consequence, there are smaller horizontal displacements in the crest induced by ambient thermal variations in this type of block compared to the buttresses, which even have a lower height [45].

Dams are spatially oriented, with the downstream part exposed to significant thermal variation and the upstream part in a certain thermal equilibrium due to water contact. This causes these structures to experience a non-homogeneous temperature distribution, leading to non-uniform correlations between temperature and temperature-induced structural responses measured at different components and positions [46]. Meanwhile, the large volumes that compose this type of structure cause the heat conduction through their bodies to occur slowly, lagging the structural response time concerning the observations of air temperature.

To determine the time lags between the natural frequencies identified in the concrete block F17/18 and the air temperature, cross-correlations were analyzed using Pearson’s correlation. As outlined in Table 4, the obtained time lags for each mode, along with the maximum correlation, are presented. The highest correlation was obtained for mode 3 (0.72) with a time lag of almost one month. For mode 6, the time lag reaches almost 2 months with a correlation of 0.57. Meanwhile, for mode 8, with a high correlation (0.7), a lag of only 1 day was found.

Table 4 Correlation between natural frequencies and air temperature

In Fig. 11, the temporal evolution of natural frequencies for modes 3, 6, and 8 is compared with the air temperature observations for the period 01/01/2018 to 12/31/2020. In the region of Brazil where the Itaipu dam is located (Foz do Iguaçu), summer is from November to March, and winter is about 3 months long, from May to August. During summer, the daily average temperature fluctuates between 22\(^{\circ }\)C and 32\(^{\circ }\)C, while in winter, it ranges from 12\(^{\circ }\)C to 23\(^{\circ }\)C. Daily thermal trends are similar during the winter and summer. The lowest temperatures are generally recorded between 06:00 and 09:00 h. After that, the temperature rises to a peak at approximately 15:00–18:00 and then drops.

Fig. 11
figure 11

Comparison of air temperature and natural frequencies evolution of mods 3,6, and 8

Upon initial examination of Fig. 11, it becomes evident that the natural frequencies undergo seasonal changes in accordance with temperature. The natural frequencies associated with modes 3, 7, and 8, which showed the most significant temperature dependence, increased by approximately 0.4 Hz, 0.3 Hz, and 0.5 Hz, respectively, between the winter–summer seasons.

Figure 12 displays a zoom of the evolution of natural frequencies of mode 8 over air temperature for August 2018 (winter) and January 2019 (summer). The daily fluctuations of the natural frequencies of this mode are related to air temperature. For example, between the first and the second day of August 2018, a decrease in temperature of 18\(^{\circ }\)C induced a decrease in mode 8 natural frequencies by approximately 0.08 Hz.

Fig. 12
figure 12

One-month zoom of natural frequencies of mode 8. a Winter month. b Summer month

The directly proportional relationship between natural frequencies and temperature arises from the thermal expansion of materials. During warmer periods, concrete expansion causes cracks and contraction joints to close. Thus, at high temperatures, a temporary increase in stiffness and therefore in natural frequencies is produced. Percolation studies carried out in the dam explain how sensitive the opening and closing of cracks and joints are to temperature variations. Drainage flows from joints between hollow gravity concrete blocks increase proportionally to the cube of the joint thickness during the winter [45].

4.2 Effects on damping ratios

To reduce the high variability of the damping ratios to some degree and subsequently investigate the influence of air temperature and upstream water level, moving averages with 7-day windows were computed. Distance correlation coefficients with reservoir level and correlations based on Pearson’s coefficient with air temperature were determined. Table 5 shows the results for the monitoring period. It can be observed that the damping ratios associated with all modes have little dependence on the environmental factors studied. As shown in the history of modal parameters, the damping ratios presented a high dispersion in time compared to natural frequencies. Hence, the sensitivity of this modal parameter causes them not to be identified with sufficient precision to detect environmental effects.

Table 5 Correlation of damping ratios with environmental factors

4.3 Numerical modeling

Previously, it was observed that the fluctuation of natural frequencies is influenced by upstream water level and temperature. In the context of vibration-based structural health monitoring, it is crucial to minimize such fluctuations to effectively detect irreversible changes caused by structural deterioration. One way to mitigate such effects is by identifying input–output models. This involves finding a relationship between the explanatory or predictor variables (environmental factors) and the dependent variables (natural frequencies). Hence, the statistical technique of multiple linear regressions (MLR) is adopted to establish these relationships. The statistical model for the estimation of natural frequencies obtained from environmental factors can be formulated as follows:

$$\begin{aligned} f_{_{M}}=f_{_{h}}+f_{_{T}}+c, \end{aligned}$$
(1)

where \(f_{_{M}}\) is the frequency estimated by the statistical model, \(f_{_{h}}\) is the variation in frequency induced by the upstream level, and \(f_{_{T}}\) the variation induced by air temperature, and c is a constant value. The effects caused by the upstream water level can be described as a polynomial function of h as

$$\begin{aligned} f_{_{h}}={\mathop {\mathop {\sum _{i=1}^{m}}}}a_{i}h^{i}, \end{aligned}$$
(2)

where \(a_i\) (\(i=1,2,3,...,m)\) are unknown coefficients and h is the upstream water height.

The long lags found in the time evolution of natural frequencies with respect to air temperature entail a particular challenge in the modeling of the variations induced by this environmental factor. To simulate this delayed effect, some researchers have chosen to segment the air temperature measurements for a period of several days prior to the observation and then average the temperature factors [4, 47, 48]. This strategy is applicable as long as there are complete and continuous temperature measurements. Thus, the variations induced by air temperature can be simulated as

$$\begin{aligned} f_{_{T}}=b_{1}T_{0}+{\mathop {\mathop {\sum _{j=2}^{n}}}}b_{j}T_{j\left( p-q\right) }, \end{aligned}$$
(3)

where \(b_j\) (\(j=1,2,3,...,n)\) are unknown coefficients, \(T_{0}\) is the mean air temperature of the day in which the frequency is observed, and \(T_{j}\) is the jth temperature factor of the mean air temperature in the period from p to q days before the day in which the frequency is observed. In this paper, for instance, the relationship between \(f_{_{T}}\) and \(T_j\) when using 180 days of data is presented as

$$\begin{aligned} \begin{aligned} f_{T}={}&b_{1}T_{0}+b_{2}T_{1-2}+b_{3}T_{3-6}+b_{4}T_{7-15}\\&+b_{5}T_{16-30}+b_{6}T_{31-60}+b_{7}T_{61-90}+b_{8}T_{91-120}\\&+b_{9}T_{121-150}+b_{10}T_{151-180} \end{aligned}. \end{aligned}$$
(4)

The use of long-term air temperature measurements provides for daily and seasonal effects, which improves the estimation of frequencies. As no individual temperature measurements are used, this approach allows for a significant reduction in the number of model parameters.

For every observed or identified frequency f at a particular moment k, a frequency \(f_{_{M}}\) is estimated. While the expectation is for f and \(f_{_{M}}\) to coincide, the presence of residuals (r), resulting from the statistical fitting of the model in Eq. 1 to the identified frequency data, is unavoidable. Thus, the frequency identified f and the frequency \(f_{_{M}}\) estimated by the model can be related as

$$\begin{aligned} f_{_k}=f_{_{M_k}}\left( h,T\right) +r_{_k}. \end{aligned}$$
(5)

4.4 Removing environmental effects on natural frequencies

The data comprising the period 01/07/2018 to 31/12/2020 were divided into two data sets. A first data set covering the dates 01/07/2018 to 30/04/2020 was used to train the models. Subsequently, the forecasting performance of the models is evaluated by out-of-sample testing for the last eight months of 2020.

Considering the physics of the problem, which involves time lags between 1 day and almost 2 months of natural frequencies concerning temperature, nine multiple regression models are employed, each incorporating progressively longer periods of air temperature. Table 6 shows the predictors used. A third-degree polynomial function of h (h, \(h^2\), \(h^3\)) was considered for all models. Segmented variable sets with data from 1, 6, 15, 30, 60, 90, 120, 150, and 180 days were used to model air temperature for models MR\(_1\) to MR\(_9\), respectively. Since the data are sampled every 30 min, for a given identified frequency \(f_i\) at date i, \(T_0\) corresponds to the average of the 48 air temperature data points preceding date i; \(T_{1-2}\) represents the average between the 48th and 144th air temperature data points observed before date i; \(T_{3-6}\) corresponds to the average between the 144th and 336th air temperature data points observed before date i, and so forth for the remaining segments.

Table 6 Predictors for the multiple linear regression models

The accuracy of the models is assessed using the coefficient of determination (R\(^2\)), the Root-Mean-Squared-Error (RMSE), and the Akaike Information Criterion (AIC). The results for the training period are presented in Table 7. With an increase in air temperature data, the prediction of natural frequencies demonstrates improvement across all modes. Consequently, the MLR\(_9\) model, using nearly 6 months of air temperature data, yielded the best fit between the identified natural frequencies and environmental factors. Based on the out-of-sample period, this model performs well in predicting the natural frequencies of all modes, except for mode 5, as shown in Table 8. In the case of mode 5, a gradual improvement is evident with the increase of air temperature segments. Nevertheless, even with the employment of 180 days of data, the fit only attains R\(^2\) values below 0.6.

Table 7 Comparison of model quality metrics for the nine MLR
Table 8 R\({^2}\) of test period for each MLR

The performance of the MLR\(_9\) model is also illustrated in Fig. 13. During the test period, which covered the last eight months of 2020, the model predicted 10,661 natural frequencies for each of modes 1, 2 and 3. The prediction curves reveal the model’s capability to identify long-term fluctuations influenced by upstream levels in mode 1 and seasonal changes attributed to air temperature in modes 2 and 3. However, the model fails to precisely capture all the information, particularly in representing short-term variations. This remark underlines the need for future studies to explore the variability of natural frequencies derived from factors such as time [25] or uncertainties in the identification process [49].

Fig. 13
figure 13

Identified and predicted natural frequencies using the MLR\(_9\) for modes 1, 2, and 3

When examining the standard deviations of the natural frequencies presented in Table 9, both before and after removing environmental factors, it becomes apparent that the model decreased the variability across all modes. For modes 1, 2, 3, 6, 7, 8, and 9, the variability reduction of the natural frequencies in the test period of more than 50% was achieved. For modes 4 and 5 in the same period, a variability reduction of 34.97% and 35.41%, respectively.

Table 9 Standard deviations of natural frequencies before and after correction

The reduction of the standard deviations of modes 1, 2, and 3 is illustrated in the histograms of Fig. 14, which contains the distribution of the natural frequencies before and after correction. It is observed that the natural frequency distributions, free from the influence of upstream level and air temperature effects, tend to be more concentrated around their mean, resembling a narrow normal distribution.

Fig. 14
figure 14

Histograms of the natural frequencies of modes 1, 2, and 3 before and after removal of environmental factors with data from the last eight months of 2020 for MLR\(_9\)

4.5 Residuals

The residuals of the natural frequencies, as defined in Eq. 5, are illustrated in Fig. 15 for the out-of-sample period precisely for the MLR\(_9\) model. The 95% confidence intervals are estimated using the standard deviations of the residuals obtained during the training period. The percentage of data points falling outside the generated confidence interval was quantified for each mode and depicted in Fig. 15 for each mode. The number of outliers serves as an additional indicator for assessing the model’s quality. While an expected value is approximately 5%, Fig. 15 reveals that, except for mode 3, the percentage of outliers ranges from 6% to 14%. This can be predominantly attributed to the influence of unknown external factors that the model obviously fails to identify. Additionally, the low identification rate of the natural frequencies for mode 5 results in insufficient information for the model to accurately fit in the training period. Consequently, this leads to an inaccurate frequency forecast and contributes to the 14% outlier value. Nevertheless, the percentage of outliers is reasonable in all modes when observing that the residuals obtained are stable over the eight months of prediction and generally fall within the confidence interval. This indicates that the upstream water level and temperature effects can mainly explain the observed variation in natural frequencies.

Fig. 15
figure 15

Residuals of natural frequencies after removing effects of environmental factors

5 Conclusions

The F17/18 hollow-gravity concrete block located at the main dam of the Itaipu Hydroelectric Dam has been equipped with two triaxial sensors, continuously storing daily acceleration data since 2018. The utilization of Root-Mean-Square (RMS) analysis for acceleration characterization has allowed the identification of two distinct operational conditions. The traffic, wind, and noise generated by the generating units contribute to the usual, consistent excitation levels. However, during the opening of the spillways, there is a significant increment in acceleration amplitudes. This event provides a heightened excitation level, enabling a more precise identification of modal parameters.

An automated procedure for modal parameter estimation applied to 30-min acceleration time series showed good performance despite limited instrumentation. Nine modes in the 0–14 Hz range were identified and tracked from 01/01/2018 to 31/12/2020. The statistical results of the modal parameters revealed variations in natural frequencies ranging from 2.79% to 11.79%. As expected, the damping ratios showed much larger variations, visually perceptible on a graph.

A correlation analysis showed an inversely proportional relationship between natural frequencies and upstream water level. Notably, natural frequencies of mode 1 showed a higher sensitivity to water level variations. Specifically, a 4-m fluctuation was observed to result in a 0.08 Hz change in the natural frequencies of the first mode.

In contrast, a directly proportional effect was identified in relation to temperature. Peak frequencies were observed during summer, whereas minimum frequencies occurred in winter. Specifically, for mode 8, a variation of 0.5 Hz in natural frequencies was observed between the winter and summer seasons. Since the correlations are derived with air temperature, the response of the natural frequencies for the monitored vibration modes exhibited time lags ranging from 1 day to nearly 2 months.

Multiple linear regression (MLR) models were employed to mitigate the influence of environmental factors on natural frequencies. The effect of upstream water level was represented as a third-degree polynomial function, while temperature effects were modeled using averages of long-term segments of air temperature measurements. Nine models, each incorporating progressively longer air temperature periods, were fitted for each mode. The results indicated that the extension of the air temperature data improved the accuracy of the natural frequency prediction.

The MLR\(_9\) model, incorporating 6 months of air temperature data, yielded the most favorable results across all modes. During an out-of-sample period spanning the last eight months of 2020, R\(^2\) values greater than 0.8 were achieved for all modes except for mode 5, which attained an R\(^2\) value of 0.53. The MLR\(_9\) model led to a significant reduction in standard deviations for all modes. For instance, in mode 3, there was a remarkable reduction of 70.01% with an R\(^2\) value of 0.945.

The residuals of the natural frequencies obtained after correction were shown to be much more stable for the out-of-sample period of eight months. Confidence intervals were established, revealing that the corrected frequencies for all modes followed a consistent trend within the estimated confidence intervals. This led to the conclusion that the concrete block had not undergone structural changes.

In the structural health monitoring of the F17/18 concrete block, the next crucial step involves incorporating damage detection techniques based on the residuals of the fitted models [27, 50]. While outliers of the residuals relative to the estimated confidence intervals are deemed reasonable for the undamaged scenario, it becomes imperative to assess the influence of other environmental factors and uncertainty in the identification process. This ensures the quality of minimizing the effects and enables the use of these residuals as a reliable reference for damage detection.