Introduction

The structural members must be compacted at the construction stage to achieve durability and performance. For many reasons, some parts of structural members are compacted manually, or some parts may remain without compaction, and these also vary with the type of structure and location. When structural concrete is exposed to heat load, its hardness and microstructural behavior are affected. It is well-known that reducing the water-to-cement ratio enhances the strength of concrete and affects its hydration and hardness properties. The strength of the existing concrete structure degrades over time due to environmental conditions and carbonation. In the construction phase, concrete shows higher alkalinity (pH-12 to 13) due to the Ca (OH)2 in its interfacial transition zone. Reduction of voids and increased volume of reactants in the concrete due to lower pH and carbonation progress from surface to the internal structure.

The existing structures often need to be assessed for stability in various circumstances, such as rehabilitation, seismic behavior forecasting, and alteration following severe damage. It is essential to test the characteristics of the concrete to evaluate structural performance more precisely [1]. Even so, destructive testing of concrete allows only a tiny range of experiments, attributable to the explanation that, in some cases, it is costly, inefficient, and challenging in many old structures. The rebound hammer (RH) and ultrasonic pulse velocity (UPV) results for the same structures are used for reliability assessment with destructive testing results (DT). This will help reduce the need to take many cores to assess the performance of whole structural concrete behavior using a non-destructive technique [2]. The selection of core location is based on the average core results that can provide a representative picture of the complete dataset. Measurement uncertainty arises because the values obtained from destructive tests do not perfectly match non-destructive test values at the exact test location. Consideration of the position of the test is not represented as a specific point but as a restricted region, which makes it possible to consider the repeatability of the rebound test results.

The average values of root mean square error (RMSE) of the fitted and predicted model dataset could be more efficient, and the resulting reliability of the model needs to be addressed during the prediction phase. However, the reliability of the predicted model could be less than that of the fitted model. The difference between fitted and predicted error in the model generally evolves due to the generalization of the model approach. Benyahia [3] pointed out the technological challenge of testing the model identification approach's real dataset at the prediction stage.

In some instances, the use of a single NDT method can be less precise; hence, combining the NDT (RH and UPV) methods is always better to ensure higher accuracy and reliability of the concrete strength assessment. More emphasis is placed on the trustworthy conversion model, which depends on factors such as the number of core and heterogeneous datasets. [4,5,6].

NDT incorporated with cores provides a better option for structural strength evaluation [7, 8]. An empirical relation must be formed between the findings of the NDT and the strength calculated on cores taken from the exact locations. EN (13,791–2019) [9] suggested that at least eight cores for > 75 mm diameter, where it is recommended to core at least ten test locations, to allow for possible outliers. The minimum number of cores > 75 mm diameter of cores for use in combination with indirect testing is three, where at least four test locations should be cored to allow for a possible outlier. ACI 228.1R [2] standard suggested that at least 6–9 data sets are needed for the non-destructive strength assessment. Even so, consulting practices are typically based on a minimum of 3 or 4 cores, although the accuracy of this measurement is still being determined and needs to be addressed [4].

Cristoforo [10] offered a wide range of approaches for describing concrete strengths, which made use of a broad dataset to verify the efficiency of established prediction models and suggested new relationships that are successful in predicting the substantial strength of the Italian reinforced cement concrete (RCC) structures constructed in the twentieth century.

Careful consideration has been paid to the reliability of the model based on the following parameters: a large number of cores data, NDT methods for quality control, range of statistical parameters (Range, Standard Deviation (SD), coefficient of variation (COV)), reliability of the model, the influence of uncontrolled parameters (cracks, carbonation, moisture) [6, 11].

Due to variations in the mix proportion, the core stresses in normal-strength concrete are double that of those in low-strength concrete if the transverse and longitudinal bars are identical [12,13,14]. A machine learning algorithm was employed to establish a correlation between variables and outcomes and forecast the 28-day compressive strength while modifying the water-to-cement ratio and superplasticizer. The various models are used for predicting future results [15, 16]. An acceptable level of precision is additionally appreciated for concrete strength estimation. Therefore, the resulting equations for strength prediction could be used securely for in situ concrete strength assessment [17].

Implication of the research

  1. 1.

    To address the influence of the number of cores used for calibration using statistical parameters: the root mean square error (RMSE) and the coefficient of determination (R2).

  2. 2.

    To conduct statistical analysis for a dataset collected from various structures constructed between 1975 and 2005, and to compare with two indicators RMSE, R2.

  3. 3.

    To access the reliability precision of the model in the fitting and prediction phase.

  4. 4.

    To compare model identification approach for the effectiveness and performance of single-method versus combined-method strategies.

  5. 5.

    To scrutinize the practical performance of the model in real-world scenario, with a specific emphasis on evaluating the effectiveness of the calibration approach.

Dataset collection

The dataset collected from the Construction Diagnostic Centre, Pune, shown in this research study, relates to thirty (five structures in each group) RCC structures located in Pune, which comes under the metropolitan area, where the humidity and carbonation affect the atmosphere severely. The structural concrete elements (column and beam) underwent NDT (RH and UPV), and DT tests of cores were taken. Coring and NDT measurements were conducted perpendicularly to the direction of the concrete casting. The UPV test is performed at the propagation rate of the ultrasound velocity determined on the structural elements under study. In contrast, the rebound hammer test is defined by the rebound index. A total of 275 elements, consisting of columns and beams, underwent testing through a combination of NDT and coring. For each element, two NDT measurements were conducted before coring, and one to three cores were extracted at the NDT measurement location. In total, 275 core samples with a 61–105 mm dia. were extracted, as shown in Table 1 and subjected to compression testing until ultimate failure.

Table 1 Statistical database parameters based on year of construction

The core diameter drops for a specific height/diameter (H/D) ratio, reducing the specimen volume significantly. As a result, the diameter of the core increases with decreasing strength. The core length generally depends upon the length/diameter (L/D) ratio; it should be 1.0 for the cube strength value and 2.0 for the cylinder strength value. It is required to apply a correction factor if the ratio is less than 2.0. The reason for using a correction factor is that long cores tend to give lower test values than short cores of the same cross-sectional area. A correction factor is applied to the calculated strength to convert all core specimens to approximately the same strength as the standard test cylinder. It is experienced that the core diameter gets affected by factors like maximum aggregate size, member dimension, and steel reinforcement in the member. EN 12504–1 [18] suggested that the core diameter/aggregate (D/A) ratio must be more than 3.0, while ASTM C42 [19] suggested its value as 2.0. Several standards consider the diameter of the core to be 100 mm, although, in Australia, a diameter of 75 mm is considered acceptable [20]. Table 1 shows the dataset for different years of construction grouped into six categories (D1 to D6) since it reflects the variation in strength in structural members caused by environmental conditions and voids, cracks, and carbonation, with parameters such as number of cores (diameter and height), ranges, standard deviation (SD), and coefficient of variation (COV). The core strength mentioned in this study has been calculated after conversion to an equivalent in-situ cube strength (equivalent cube strength as core strength/0.8). The value associated with the COV of the core strength, which is in the range of 13–35%, demonstrates that the dataset presented here is heterogeneous.

Assessment methodology

The assessment methodology is based on and influenced by controlled parameters such as the number of test locations for cores-NT, within-test variability, and methods of NDT (RH, UPV, and Combined Method (CM)). Uncontrolled parameters considered are concrete mix characteristics, i.e., type of cement, aggregate, compaction methods, curing conditions, water-to-cement ratio, elevated temperature, and carbonation. Fig. 1 shows a flow chart of the strength evaluation using reliability techniques. The evaluation approach for model accuracy is implemented and tested on a homogeneous data set of 275 core results collected from the in-situ structure categorized in D1 to D6, following normal distribution. Values obtained from the difference between estimated and observed results are strength values achieved by applying the same technique to a small set of cores. The core strength results selected for each number of cores (NC) between 2 and 20 are randomly chosen from the 275 core samples.

Fig. 1
figure 1

Flow chart of the strength evaluation using reliability techniques

The statistical regression is used to validate the relationship between core strength and strength estimated by the NDT (RH, UPV) method. A set of NC (x and fc) pairs is obtained, where x represents the NDT measurement (RH, UPV- Independent variable) and fc (Dependent variable) is the equivalent cube strength of the core corresponding to the test location. The fitted error has been estimated from the NC pairs, and the predicated error has been estimated from the (NT-NC) pairs in terms of RMSE and the coefficient R2. The compressive strength values are available at all core points in this study. This condition is often impractical in common scenarios due to the constraints on the dataset size. Since the findings may have some randomness, every collection of NC data showed a different outcome. Because of this, the same process is repeated 100 times for each NC element. The mean and standard deviation of the strength values over 100 repetitions shall be calculated for the NC reliability assessment.

Estimation methodology

Coefficient of determination (R2) estimation

The coefficient of determination (R2) is commonly used by researchers as a measure of the effectiveness of the fitted regression model, as indicated in Eq. (1).

$${R}^{2}=1-\frac{SSE}{SST}$$
(1)

where R2—coefficient of determination, SST- is the total sum of squares (total variation in values of the response variable) and SSE- is the sum of squares of residuals (amount of variation unexplained by the fitted regression model)

To analyze the relationship between core strength and NDT parameters (RH, UPV). We used the R2 coefficient as stated in Eq. (2). NDT tests were conducted before extracting the core from the in-situ members. The correlation between the two measures is used to evaluate concrete compressive strength. Ministry of railways-India [21] suggested that the probable accuracy of estimating concrete strength in a structure is ± 25%. The coefficient R2 assesses the reliability of goodness of fit but does not warn about the model's reliability when applied to the prediction. The R2 value for single and multiple variables with two regression equations (Power and Exponential) with nonlinear mathematical models are used in this analysis, as shown in Table 2. Equations of one variable (RH or UPV) provide best-fit correlations R2 > 0.70, which has improved R2 > 0.80 in the case of two variables (combined) equations.

$$R^{2} = \frac{{\left\lfloor {\mathop \sum \nolimits_{i = 1}^{NC} \left( {f_{m} - \overline{f}} \right)^{2} - \mathop \sum \nolimits_{i = 1}^{NC} \left( {f_{m} - f_{e} } \right)^{2} } \right\rfloor }}{{\mathop \sum \nolimits_{i = 1}^{NC} \left( {f_{m} - \overline{f}} \right)^{2} }}$$
(2)

where fm, fe, and f are measured, estimated, and average strength respectively, NC- number of cores

Table 2 Destructive and Non-destructive parameters

According to the EN (13,791–2019) [9], at least eight cores for > 75 mm diameter of cores, where it is recommended to take at least ten test locations (core) to allow for possible outliers. According to the ACI 228.1R-03 [2], a minimum of twelve cores is required for the regression approach. RILEM TC 43-CND [22] committee recommended a process to develop iso-strength curves for concrete (materials and mix from different countries for which the curves are developed). If the mix is known, a correction factor should be applied, and if it is unknown, the correction factor should be estimated using cores extracted from the structure under investigation.

Figure 2 illustrates the combined approach's experimental vs. predicted FCM≈fcm results with the regression model. Less variance is reported in D1, D4, D5, and D6, while more significant variance is observed in D2 and D3. Hence, it is essential for correct model validation to use the regression approach. All outcomes are derived using a power function with a standard deviation ranging from 2.93 to 6.41 and a coefficient of determination (R2) ranging from 0.82 to 0.86.

Fig. 2
figure 2

Experimental vs. predicated FCM results of combined method

Variability

This approach is used to identify “regions” that can be considered homogeneous in terms of the COV. All data sets appear to have relatively closer results according to mean, SD, and COV. Results from 1986 to 1990 present slightly better statistical values than others, as shown in Table 2. The variability of the NDT parameters measured between the mean values for each database is approximately 5 percent, which is just one-third of the within-subset variability. Kumavat and Chandak [23] pointed out that uncertainties affect assessment, and the concrete strength variability was induced in the structure due to batch-to-batch (mix composition, compaction deficiency, different curing conditions), member-to-member (environmental conditions), within member (voids and cracks) conditions.

RMSE estimation

Another measure that Liu et al. [24] can be used for this purpose is the Root Mean Square Error (RMSE) as given in Eq. (3). Its functional interest is that it gives the amount of error linked to the predicted strengths more explicitly.

$${RMSE}_{(fitted)}= \sqrt{\frac{\sum_{i=1}^{NC}{\left({f}_{m}-{f}_{e}\right)}^{2}}{NC}}$$
(3)

The R2 values, indicating the correlation degree between predicted and experimental values, vary between the lowest and highest across all models. The Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) values demonstrate minimal error levels when compared to individual models [25, 26]. The model established for core test locations is used in actual practice to predict strength at test locations where only NDT measurements exist. The predictability of the model is, therefore, a crucial issue. According to Shcherbakov [27], statistics carried out a range of prediction indicators. Between these indicators, RMSE is used by NDT experts to test the error of prediction Nobile [28]. The prediction error can be determined for (NT-NC) test locations using the following adjusted form of Eq. (4).

$${RMSE}_{(predicated)}= \sqrt{\frac{\sum_{i=1}^{NT-NC}{\left({f}_{m}-{f}_{e}\right)}^{2}}{NT-NC}}$$
(4)

where NC- number of cores, NT- total number of test locations for NDT measurements

Results and discussion

Influence of the number of cores on the evaluation accuracy

The values of RMSE (RH, UPV, and CM (combined method)) calculated for fitted and predicated error are shown in Fig. 3 as a function of the number of cores. The average value of RMSE at the prediction stage is always higher than at the fitting stage, as shown in Fig. 3. When NC increases, their standard deviation decreases. The difference between the fitting and prediction error is due to extrapolation. This precision difference is very significant, mainly when NC is small, but in actual practice, only the precision of the fitting model is estimated. If fitting errors are significantly low, it may indicate overfitting, where the model performs exceptionally well on the training data but fails to generalize to new, unseen data.

Fig. 3
figure 3

RMSE (Mean and SD) of fitted and predicted models with construction age between 1975 and 2005

On the other hand, understanding prediction errors assists in refining the model to enhance its generalization ability. Recognizing the precision difference allows practitioners to address overfitting and create robust and applicable models to diverse situations. Kumavat and Chandak [29] pointed out that the relationship between CV and rebound values varies with the structural age (18–32 yrs.) and its type. It is helpful to be conscious that a fitting model could have a good quality of precision and a deficient predictive capacity. In the case of minimal NC numbers, this problem is essential because the graphs show that the predicted RMSE can be doubled that of the fitted RMSE. The consistent values of RMSE are seen when the number of cores ranges between 9 and 20; this indicates that a minimal number of nine cores is required for assessment precision. Regarding the number of cores from 9 to 20, prediction and fitting errors are stabilized at 1.8 MPa for RMSE with RH and 1.9 MPa for RMSE with UPV for groups D1, D2, and D6.

Similarly, prediction and fitting errors are stabilized at 2 MPa for RMSE with RH and UPV for groups D3 and D5 and 1.0 MPa for RMSE with RH and UPV for group D4. These errors occur due to measurement uncertainties and the influence of uncontrolled factors like carbonation, moisture condition, and temperature variation Kumavat et al. [30]. When comparing results from different experiments or instruments, knowing uncertainties is essential. It provides a basis for evaluating the reliability of the data and determining whether observed differences are statistically significant or within the expected margin of error. A clear understanding of uncertainties ensures that researchers stay within the precision of their findings. Uncontrolled factors refer to the variables that can affect measurements but are not under direct observation or control. Error mitigations may involve refining experimental setups, conducting additional analyses, generalizing results, and improving reproducibility.

Table 3 illustrates the comparative analysis of RMSEpred and R2pred between single and combined NDT methods. When NC increases, the RMSEpred of UPV is greater than that of RH, and the R2pred of RH is greater than that of UPV with a consistent variance. According to RMSEpred and R2pred, at least 6–8 cores are required to achieve a better prediction phase with a combined approach compared to a single method. The combined method yields more reliable results than a single method regarding the least RMSE and more excellent R2 value by analyzing 100 iterations for each NC.

Table 3 Comparative analysis of RMSEpred and R2pred between SM and CM (NDT)

Generalization of results

Alwash et al. [4] investigated the efficacy of RMSEpred for a combination of NDT measurements (UPV and R) according to NC. Based on average RMSE, the conclusion indicates the effectiveness is observed beyond a minimal number of cores (9 or 5). It is observed when the NDT measurement quality of the first technique is high (low within-test variability) and the second technique is low or average. Benyahia et al. [3] incorporating a second technique proves effective with at least 5 cores if the first technique is UPV and with at least 6 cores if the first technique is R. This difference arises from the repeatability of rebound measurements. The minimal number of cores specified by both authors may be specific to the case. In the present work, a second technique proves to be effective with at least 8 cores if the first technique is UPV and with at least 6 cores if the first technique is R. In addition, both techniques proved effective for earlier years (2000–2005) results with at least 6 cores illustrated in Table 3.

Influence of number of cores on the efficacy of the CM

The combined method (CM) trends for the same database are almost similar to those of a single approach. A modest improvement in RMSE, due to the combined method, has also stabilized a slightly lower RMSE trend than a single method. By observing Fig. 3, the fitted error is lower for the combined approach (CM) than for the single method (SM) when the number of cores (NC) differs. The efficacy of the combined technique during the fitting phase is not affected by the NC.

When NC varies, the RMSEfit of UPV is more significant than RH and equivalent to the constant difference, as shown in Fig. 4. Also, when NC varies, the R2fit of rebound is more effective than that with UPV which is equivalent to the constant difference as shown in Table 3. Comparative analysis of the fitted R2 value in terms of the construction age (D1 to D6) with regression equation (Power and Exponential function) as shown in Fig. 5. It should be stressed here that these findings are specific to this present study and rely primarily on the respective precision of both evaluation methods in this situation. From previous case studies [3], identical or opposite circumstances seem to be based on the structural condition and measuring system.

Fig. 4
figure 4

Comparative analysis of the fitted RMSE in terms of the number NC with construction ages (D1 to D6). a NDT method (UPV value) b NDT method (R-value) c CM (R, UPV value)

Fig. 5
figure 5

Comparative analysis of the fitted R2 value in terms of the construction age (D1 to D6) with regression equation a Power function, b Exponential function

However, the model of the CM at the fitted phase tends to be less dispersive than the models with an SM, where the NC < 6, as shown in Fig. 4. This is because the model parameters have to be identified from the measurement error database [11]. It demonstrates that using a CM with limited cores at the prediction phase is risky and can result in enormous uncertainties. Hence, attention must be paid to the calibration of the CM since the minimum number of NCs often used is a common practice in consultancy work [4].

The present study shows that at least 6–8 cores (concerning RMSE and coefficient of determination (R2)) are necessary to achieve a better prediction phase with a combined method than a single method. The number of cores should be increased to ensure the CM is extremely strong and significantly more reliable than an SM. Consequently, it is also essential to analyze the statistical interpretation of the data to provide a more detailed understanding of the effectiveness of the CM during the prediction phase. Therefore, it is necessary to check and quantify every 100 iterations to know how the CM contributes to the lower RMSE and a higher R2 than SM.

The absence of unanimous agreement among practitioners and researchers based on the study findings on core numbers can be attributed to the choice of the success rate (tolerance), which is considered a genuine benefit when utilizing CM. This success rate is based on consideration of the practical quality of each non-destructive test; the minimum number of cores needed will vary. The authors recommended that CM measurements require a minimum of 6 to 8 cores for an expected outcome.

Conclusion

After conducting tests on cores obtained from structures built between 1975 and 2005 and performing statistical analysis, the following inferences can be made:

The results were utilized for statistical analysis in assessing strength, emphasizing the influence of the number of cores during the model identification approach using SM and CM methods. In the fitting and prediction stage, the variation in error depends on the number of cores used for calibration, expressed in the root mean square error (RMSE) and coefficient of determination (R2), which will logically reduce as NC increases. The present study concluded that nine cores must be considered a reasonable number of samples to converge for a single NDT method; additional increases yielded little improvements in the assessment.

It is, therefore, necessary to check and quantify each of the 100 iterations to know how the combined method contributes to a lower RMSE and higher R2 than the single method. The average value of RMSE at the prediction stage is always higher than the fitting stage, but its standard deviation decreases when the number of cores increases. Compared with a single method, 6–8 cores are necessary for the combined method to estimate strength precisely.

We believe qualitative declarations in this study are expected to hold true across various situations, but quantitative assertions may require further analysis. Examining the model's performance in actual practice is crucial for evaluating the calibration process or increasing the number of testing samples.

Future recommendation

Additional research is needed to determine variations in the number of cores, while keeping in mind that combining two NDT methods enhances the evaluation.

Research contribution

In present study, it is proposed to take 9 cores as a reasonable number of samples to converge for a single NDT method. Additional increase did not yield any significant improvements in the results. In comparison with single method, 6–8 cores are necessary for the combined method to estimate strength precisely. Examining the model's performance in actual practice is crucial for evaluating the calibration process or increasing the number of testing samples.