Introduction

In recent years, there has been a rapid expansion in the development and utilization of machine learning technology in various fields, with its rate of application expected to continue to rise in the future [1,2,3]. This growth can be attributed to the substantial increase in data generation and accumulation, coupled with the exponential growth of computing power capable of processing such data [4]. The high demand and investment in machine learning technology within the industrial sector has led to the development of algorithms with high dimensionality, which are shared and disseminated at a rapid pace, resulting in changes to the industrial ecosystem.

Within the field of construction materials, experimental research has traditionally been the principal method for evaluating the formulation and performance of materials [5, 6]. This is due to the relative difficulty and inefficiency of theoretically analyzing the complex chemistry of most construction materials, which undergo extensive hydration over the course of roughly twenty-eight days from initial curing, followed by degradation and damage [7].

However, recent research has focused on the development of construction materials that incorporate various materials beyond traditional formulations, and the correlation of their different properties and reactions has demonstrated limitations based on experiments and existing theories [8, 9]. Next-generation construction materials are recognized as key enabling technologies for achieving greener, zero-energy, and smarter structures. In order to successfully commercialize and implement these materials, reliable material formulation-specific interactions and practical methodologies for material performance prediction are essential, and therefore the demand for data-driven analysis and model development is expected to increase [10].

This review article presents examples of research and applications of machine learning technology in construction materials, with the aim of providing a basic understanding of the field. The article first introduces a case study on predicting the piezoelectric properties of cement composites that incorporate multi walled-carbon nanotubes (MWNTs), a representative nanomaterial [11]. By dispersing the appropriate amount of MWNTs in the cement base material, the electrical conductivity of the composite can be significantly improved, allowing it to be utilized as a piezoelectric material [12, 13]. Various machine learning algorithms were applied, and the results were compared to determine the most suitable solution.

The second case study is then introduced, which is centered on the prediction of compressive strength properties in concrete that uses crushed clay brick as coarse aggregate [14]. The compressive strength of concrete with crushed clay brick aggregate exhibits different behavior than conventional concrete, and a constitutive equation combining micromechanics and machine learning was derived to simulate it. Microcracking constants that simulate the nonlinear properties of the material were derived using machine learning techniques, and their validity was assessed by comparing them with experimental results.

Finally, the article presents a case study that focuses on predicting the compressive strength of controlled low-strength material (CLSM), that incorporates interior stone sludge, which is a byproduct of the decorative building material and comprises predominantly silica. While it has not been subject to any specific treatment method thus far, there is a growing recognition that incorporating it into construction materials could lead to a reduction in CO2 emissions by substituting cement and aggregate. Using machine learning technology, we carried out data training and analysis and developed a model that is capable of predicting the key factors that contribute to the compressive strength of CLSM.

Machine Learning-Based Prediction of Piezoelectric Performance in MWCNTs-Embedded Cement Composites

According to [15, 16], the introduction of conductive fillers in cement composites results in piezoresistive behavior, as evidenced by changes in electrical conductivity when subjected to external stress or strain. Figure 1 provides an overview of the piezoelectric performance measurements of cement composites containing MWCNTs. Due to the piezoelectric properties of these materials, various research has been conducted to explore their potential sensing capabilities [17,18,19]. Although multiscale material simulation offers an ideal approach for describing heterogeneous material properties at a full-scale level, its high computational cost and complexity limit its practical application in industrial settings [20,21,22]. Therefore, recent studies have focused on data-driven analysis methods based on experimental data to facilitate their integration into real-world applications.

Fig. 1
figure 1

Overview of piezoelectric performance measurements of MWCNTs-embedded cement composites

The present study assessed several machine learning approaches to predict the performance of cement composites incorporating heterogeneous CNTs based on input variables. To ascertain the most effective machine learning algorithm, a diverse range of techniques including decision tree [23], support vector machine (SVM) [24], Gaussian process regression (GPR) [25], random forest [26], XGBoost [27], genetic programming toolbox for the identification of physical systems (GPTIPS) [27], and deep belief network (DBN) [28] were utilized and analyzed.

The linear regression model is a widely used statistical technique for modeling the relationship between variables by minimizing the mean square error (MSE) [29]. Decision tree, a non-parametric supervised learning method, is utilized for classification and regression tasks. The model performs regression and classification by partitioning the data into branches based on specific numeric values or conditions in the data, analogous to multiple branches on a single stem of a tree [30]. The SVM model is a supervised learning approach that employs classification by example to assign labels to objects. The model estimates a linear function with a specific range of variance in the training dataset and performs optimization in a way that incorporates as much data as possible in the specific range of variance of the estimated function [31]. GPR is a random variable-based approach that can be applied to non-linear regression and classification problems. It has been observed to perform well on small datasets due to its ability to utilize Bayes’ Rule and its lack of functional form limitations [32].

The ensemble algorithm, which includes random forest and XGBoost, is a method that employs multiple algorithms to achieve superior learning and prediction performance compared to using individual learning algorithms separately [33]. GPTIPS utilizes selection, crossover, and substitution in a manner inspired by biological processes. To prevent convergence to an incorrect value, this method selects the optimal solution group by causing a mutation to occur at a certain probability [34]. DBN leverages a multi-layer structure to perceive the distribution of learning data and refines the model through artificial neural networks to develop a predictive model [35]. The model's performance was evaluated using the root mean square error (RMSE).

In this study, experimental data were obtained from previous research studies [36, 37]. The mix proportions of the MWCNT/cement specimens used in the machine learning methods are provided in the references. Kim et al. [36] and Song and Choi [37] conducted experiments using MWCNT/cement specimens with varying water-to-binder (W/B) ratios, MWCNT content, and curing temperature as experimental variables, and tested their piezoresistive performance. Figure 2 presents the predicted results for all specimens based on the various machine learning algorithms. In the figure, the blue dot represents the experimental value, while the yellow dot indicates the predicted result. The X-axis of the graph denotes the applied loading, while the Y-axis represents the fractional change in the electrical resistivities (FCR). To determine the most accurate machine learning method, RMSE values were computed by comparing results from various machine learning techniques with experimental results. The GPTIPS method was identified as the most accurate, followed by the GPR method. The GPTIPS method, which employs nonlinear regression models that consider both model predictive performance and complexity, exhibited the best performance in half of the cases.

Fig. 2
figure 2

Experimental comparisons between experiments and predictions with varying machine learning techniques

A Combined Micromechanics-Genetic Algorithm (GA) Model for Estimating Compressive Strength of Concrete

Coarse aggregate is widely distributed throughout the earth, and thus its supply and demand present no significant challenges. However, in regions where this is not the case, the construction of buildings and infrastructure is significantly hindered. Obtaining coarse aggregate is particularly challenging in desert areas or on islands, and much cost is involved in transporting the aggregate to other regions. Consequently, there is a growing need to develop artificial aggregates that can serve as substitutes for natural aggregates in concrete. Data-driven research based on artificial intelligence has been actively conducted to design new materials and it has the potential to enhance the accuracy of material property prediction [38]. However, it is also highly dependent on data and has the limitation of being unable to predict material characteristics outside the range of trained data [14]. To address these limitations, recent studies have focused on improving deep neural networks and multiscale convergence models.

The representative volume element (RVE) of concrete was constructed using an isotropic cement matrix (phase 0), uniformly and randomly dispersed aggregates (phases 1 and 2), and a microcrack (phase 3). The aggregates were assigned different phases (1 and 2) to represent normal and crushed clay brick aggregates, respectively, based on the mix proportion [39]. Furthermore, microcracks were assumed to exist within the concrete initially and were assumed to increase gradually due to external loading, resulting in a reduction in the mechanical characteristics of the specimen [40]. Figure 3 provides a schematic representation of the initial state and the equivalent damaged state of concrete containing normal and crushed clay brick aggregates.

Fig. 3
figure 3

Schematic illustration of proposed model for predicting engineering properties of concrete specimen

Concrete generally contains voids ranging from 0.5 to 5%, and the number of voids increases as external stress is applied. External stress induces stress concentration around the interfaces between cement and aggregate/void, which leads to material failure. In this study, microcracks were assumed to represent these material properties. Furthermore, the proposed constitutive equation was derived by assuming that these microcracks gradually increase as external stress is applied [41]. The proposed micromechanical model is based on the ensemble volume-averaged approach, and the local solution of the inclusion interaction problem was not considered theoretically. The non-interacting approximation has the advantage of mathematical simplicity. However, the proposed approach may be limited when the inclusion content increases.

To improve the accuracy of predictions, it is necessary to consider the nonlinear damage of materials. However, quantitatively determining the exact nucleation point and progressive pattern of certain types of nonlinear damage, such as microcracks, is nearly impossible [42]. Therefore, damage parameters are typically set as model constants and fitted to experimental outcomes. However, manually determining numerous model constants that satisfy various experimental results can be very difficult and time-consuming [43, 44]. To address this issue, we employed a machine learning technique, namely a GA, to estimate the optimal model constant values of the present micromechanics-based constitutive equation.

The constitutive equation with the optimal microcrack model constant was then implemented into the finite element (FE) code ABAQUS using a user subroutine technique (UMAT) [45, 46]. An FE model was constructed to simulate the experimental outcome, as shown in Fig. 3. A cylindrical FE model, identical in size to the specimen utilized in the experiment, was located between the upper load cell and the lower jig. The jig model was fixed in all directions, and the load cell model was set as a rigid body, designed to move from top to bottom [47].

The predicted elastic modulus values of concrete with respect to the aggregate replacement rate and the w/c ratio were compared with the experimental results obtained in the present study, using the same material and microcrack parameters as before to verify the theoretical effectiveness of the model (Table 1). The predicted values were found to be in good agreement with the experimental results in most cases. The tendency of the elastic modulus in the experiment varied depending on numerous mechanisms; however, in the theoretical analysis, the elastic modulus was predicted to decrease constantly as the crushed clay brick substitution rates and w/c ratios were increased.

Table 1 Comparison of experimental and predicted results for elastic modulus

Herein, we chose to utilize the GA in conjunction with the micromechanics model due to GA's robustness against local optima, ability to explore large and complex search spaces, and capacity to introduce diversity in the solution space through mutation and crossover operations. These attributes make it particularly effective when dealing with the complex variability of the compressive strength of concrete materials. However, other machine learning methods might offer different advantages, and a comparative analysis of these could be a fruitful direction for future research.

GA-Based Predictions of Mechanical Characteristics of CLSM Specimens Containing Interior Stone Sludge

The production of interior stone results in waste containing moisture, quartz, and polymers from grinding and cutting processes, but an efficient recycling method is not available. Figure 4 presents an overview of the interior stone production process. Initially, the quartz powder is selected based on size and color and placed in a mold. A polymer film comprising unsaturated polyester resin and styrene monomer is then applied to the top of the sample. Appropriate temperature and pressure are subsequently applied during the manufacturing process, and the final product is completed through cutting and grinding.

Fig. 4
figure 4

An approximate scheme of the production process of interior stone sludge

CLSM is a cementitious backfill material with self-compacting properties that is highly flowable, but it does not require high engineering properties [48]. Therefore, in this study, the application of sludge was considered to fabricate the specimens. Figure 5a shows the approximate material composition of the sludge-embedded CLSM specimens. Cement, fly ash, standard sand, and lumpy sludge were manually crushed and dry-beamed for 30 s. Water was then added to the mixture until it was sufficiently mixed. The mixed sludge mixture was placed in a 50 × 50x50 mm3 mold, sealed with plastic wrap to prevent evaporation of moisture, and cured for 24 h at a temperature of 18–25 ℃ before demolding and re-wrapping [49, 50]. The specimens were cured at the same location and temperature.

Fig. 5
figure 5

Schematic illustrations of a CLSM specimens containing interior stone sludge and b the concept of GA-based GPTIPS modeling

Figure 5b presents an overview of the GPTIPS modeling used to simulate the performance of CLSM specimens. The experimental results from the experiments were used to develop a model equation for CLSM compressive strength with sludge incorporation using a machine learning-based GA [51, 52]. In the resulting equation, the output represents the compressive strength of CLSM at 28 days, while d denotes the curing period (unit: day), f refers to the amount of fly ash (unit: g), s indicates the amount of sludge (unit: g), w represents the amount of water (unit: g), and a represents the amount of aggregate (unit: g) [53]. The accuracy of the model was compared with the actual experimental results, and an R2 value of 0.802 indicated relatively high accuracy within the limited experimental conditions (Fig. 6).

Fig. 6
figure 6

Comparison of experimental and predicted results for compressive strength of CLSM specimens

Conclusions

This review article aims to provide a comprehensive understanding of the application of machine learning in construction materials by presenting various research examples. Conventional analysis may not be able to accurately predict the performance of cementitious composites, and data-driven modeling through machine learning can be used to overcome this challenge. Machine learning can enable accurate regional analyses at the nanoscale level and predict material/structural behaviors on the macroscale level. The review covers the following main points:

  • The piezoelectric performance of specimens was analyzed, and GPR and GPTIPS were found to have better accuracy. As the cement matrix undergoes a hydration reaction over a long period of time, considering the effect of time is considered appropriate.

  • In the second case, a theoretical approach that combines experiments, micromechanics, machine learning, and finite element techniques was used to predict the mechanical behaviors of concrete mixed with different types of aggregates. The prediction results were generally consistent, except when aggregates or W/C ratios were extremely high.

  • The third case study focused on the performance prediction of CLSM with interior stone sludge replaced by aggregate. A machine learning-based GA was used to derive a model expression for predicting the compressive strength of CLSM specimens, and the accuracy was found to be relatively high.

  • GA and GPR, notable machine learning algorithms, handle aberrations effectively. GA introduces unpredictability via mutations, while GPR makes probabilistic predictions. These traits allow them to reflect variability and error in cement-based materials performance, leading to a more accurate prediction model.

The machine learning-based analyses presented in the review are based on limited experimental results and require ongoing validation and verification. Predicting the properties of cement-based materials can be challenging due to changes in material composition from hydration, and various efforts have been made to simulate their properties. Theoretical research has made significant progress in predicting the performance of construction materials, but limitations of each technique are being exposed. Combining machine learning technology, which has been rapidly advancing in recent years, with existing techniques may help overcome some of these limitations.