Introduction

To optimize project value and reduce costs, cost optimization in construction projects entails proactively managing and optimizing budgeting, resource allocation, procurement, and risk assessment. By utilizing cost optimization techniques, construction projects can increase profitability, decrease cost overruns, increase productivity, and adhere to budgets (Okereke et al., 2022). The effectiveness of construction project management depends heavily on accurate outcome forecasting and cost optimization (Mustafa et al., 2023; Pan & Zhang, 2022). Historically, these procedures relied heavily on professional knowledge and data from the past. However, these techniques frequently display subjectivity, a propensity for errors, and constraints in their capacity to comprehend intricate associations within the dataset (Sajjad et al., 2023). The advent of sophisticated computational methodologies, including Machine Learning and Particle Swarm Optimization (PSO), has presented promising opportunities for improving predictive modeling and cost optimization within the field of construction project management (Kedir et al., 2022; Uncuoglu et al., 2022).

Machine Learning, a subfield of artificial intelligence, provides a diverse set of robust tools and algorithms that possess the capability to acquire knowledge from patterns within data, generate predictions, and reveal concealed insights (Al Khazaleh & Bisharah, 2023; Alkhdour et al., 2023; Al-Rawashdeh et al., 2023). Construction project managers have the ability to utilize Machine Learning methodologies to access extensive quantities of accessible data, such as historical project data, environmental factors, and project-specific parameters (Arabiat et al., 2023). The aforementioned data sources offer significant insights that can be leveraged to construct predictive models that possess enhanced precision in estimating project outcomes.

The primary advantage of Machine Learning resides in its capacity to discern intricate patterns and correlations within datasets that may elude human perception (Akinosho et al., 2020). Through the utilization of Machine Learning algorithms in the realm of construction project management, it becomes possible to uncover patterns and correlations. This, in turn, empowers managers to enhance their decision-making capabilities and proactively anticipate potential issues (Awada et al., 2021; Shoar et al., 2022). As demonstrated by previous research (Ashtari et al., 2022; Banerjee Chattapadhyay et al., 2021; Darko et al., 2023; Wang et al. (2023)), Machine Learning has the potential to provide valuable insights in various aspects of project management. These include the ability to forecast project duration, anticipate cost overruns, optimize resource utilization, and identify critical risk factors that may significantly impact project success. The ability to predict future outcomes enables project managers to take proactive measures to address obstacles and implement modifications to enhance the overall performance of a project.

Particle Swarm Optimization (PSO) is a metaheuristic optimization approach that draws inspiration from the collective behavior observed in natural systems, such as bird flocking or fish schooling (Gad, 2022). Particle Swarm Optimization (PSO) involves the traversal of a solution space by a population of particles with the objective of locating the most optimal solution for a specified problem. According to Fedor and Straub (2022), particles in a swarm undergo iterative adjustments to their positions, taking into account both their personal best solution and the collective knowledge of the entire swarm, known as the global best. The cooperative behavior exhibited by Particle Swarm Optimization (PSO) allows for efficient exploration of intricate solution spaces and convergence toward solutions that are close to optimal (Marini & Walczak, 2015).

The utilization of Particle Swarm Optimization (PSO) in the domain of construction project management offers a viable approach to cost optimization through the identification of optimal resource allocation, scheduling, and adherence to budgetary constraints (Elbeltagi et al., 2016; ElSahly et al., 2023). Through the utilization of Particle Swarm Optimization (PSO) algorithms, project managers are able to discern the most advantageous resolutions for intricate optimization predicaments, encompassing resource allocation, project scheduling, and budget distribution. According to Guo and Zhang (2022), the optimization capability mentioned facilitates the reduction of costs, enhancement of resource utilization, and streamlining of project processes. Consequently, these improvements contribute to increased project efficiency and overall success. Particle Swarm Optimization (PSO) has been empirically proven to be highly effective in a wide range of domains, encompassing engineering and project management disciplines. Dasovic et al. (2020) say that this method's ability to handle non-linear, non-convex, and high-dimensional problems is what makes it work well for solving complex optimization problems that often come up in construction projects. Cui et al. (2019) say that combining Particle Swarm Optimization (PSO) with Machine Learning techniques gives construction project managers the chance to use predictive modeling and optimization algorithms at the same time to get synergistic benefits.

By incorporating these sophisticated methodologies, construction project managers have the ability to augment their decision-making processes, mitigate potential risks, and attain project outcomes that are economically efficient. This paper aims to explore the fundamental principles and methodologies of Machine Learning and Particle Swarm Optimization. It will also provide an analysis of case studies and applications in the field of construction project management. Furthermore, it will address the potential advantages and obstacles that may arise from the implementation of these techniques. The primary objective of this research paper is to make a valuable contribution to the discipline of construction project management. By showcasing the capabilities of Machine Learning and Particle Swarm Optimization in the fields of predictive modeling and cost optimization, this will be possible. Construction project managers can enhance project execution efficiency and success by utilizing data-driven insights and optimization algorithms.

Methods and materials

Data description

The goal of this study is to investigate how predictive modeling and cost optimization within the field of construction project management are accomplished using Machine Learning and Particle Swarm Optimization (PSO) approaches. The dataset used in this study is meant to make this investigation easier. As shown in Table 1, the dataset includes a variety of variables that efficiently capture important aspects of construction project costs, materials, and project features.

Table 1 Description of construction project variables

The dataset holds significant value in the examination of the application of Machine Learning methodologies, such as regression and optimization algorithms like Particle Swarm Optimization, within the realm of construction project management. Its utility lies in the exploration of predictive modeling and cost optimization challenges. The dataset's variables encompass various factors such as cost considerations, materials, and project aspects, which collectively influence the overall cost and profitability of the project. The investigation of these factors facilitates the formulation of predictive models and the formulation of optimization strategies aimed at improving the cost-effectiveness and efficiency of construction project management.

Data pre-processing

In the field of construction project management, the success of Machine Learning and Particle Swarm Optimization models depends on the quality of the data that are used to train them. This section elucidates the various pre-processing techniques employed to address missing values, outliers, and feature transformation, with the aim of improving predictive modeling and cost optimization (Sharma et al., 2022).

As illustrated in Fig. 1, addressing the Issue of Missing data Missing values is commonly observed in construction project datasets, and their presence can lead to biases and inaccuracies in data analysis. Missing data was observed in various variables within the dataset, such as the price of polystyrene, the cost of excavation, and the price of ready-mixed concrete. These variables were appropriately handled in accordance with established procedures for dealing with missing data. The researchers employed mean imputation and other estimation techniques to address missing values, resulting in a dataset that was deemed comprehensive (Alshboul et al., 2022a, 2022b). The presence of outliers can significantly impede the accuracy and effectiveness of predictive models. Outliers refer to extreme values that deviate significantly from the general pattern observed in the dataset. In the course of our analysis, we employed robust statistical methodologies such as the z-score and interquartile range to detect anomalies in variables such as construction stone, tile work costs per square meter, and total cost. Incorporating suitable techniques such as winsorization or logarithmic transformation, Chen et al. (2023) observed that the influence of outliers on the models was mitigated, resulting in reduced impact. To enhance the dataset's capacity to capture non-linear interactions among variables and augment its representational efficacy, feature modification techniques were employed. To enhance the alignment between the data and the models' assumptions, certain variables such as excavation depth, total apartment count, and profit were subjected to transformations such as logarithmic and power functions. The improvements made to the linearity, normalcy, and distribution of the characteristics resulted in more accurate predictions and facilitated more effective cost optimization (Alshboul et al., 2022a, 2022b).

Fig. 1
figure 1

Data pre-processing flowchart

Feature scaling was performed to ensure that all variables were being compared on a standardized basis. The researchers employed various techniques, such as min–max scaling and standardization, to adjust the variables. This was done to ensure that the values of the variables fell within a predetermined range and had a standard deviation of one (ul Hassan et al., 2023), while also having a mean of zero. Aung et al. (2023) found that adding this technique to the Particle Swarm Optimization algorithm increased the speed of convergence by reducing biases caused by the different sizes of the features. The feature selection methods were employed to identify the crucial variables that hold significant importance and provide informative insights for the predictive modeling and cost optimization processes. The identification of key predictors was accomplished through the application of statistical techniques such as correlation analysis, feature value ranking, and model-based selection. This enabled the isolation of factors such as window size, building location, and finish type ID as significant predictors. The complexity of the dataset was reduced, resulting in improved efficiency and comprehensibility of the models (Park et al., 2022).

The dataset utilized in this study underwent several data pretreatment procedures to ensure its quality, integrity, and appropriateness for subsequent investigations involving Machine Learning and Particle Swarm Optimization. Through the utilization of advanced modeling techniques, we successfully employed the acquired data to enhance cost efficiency in construction and achieve more precise prognostications.

Feature selection

The utilization of Machine Learning and Particle Swarm Optimization techniques plays a crucial role in the field of construction project management, specifically in the areas of predictive modeling and cost optimization. These approaches heavily depend on the process of feature selection. In this section, Kusonkhum et al. (2023) explain in detail how the feature selection process was used to find the most important and informative variables in the dataset. This is done to make sure that the models work well and efficiently.

The present study aims to conduct an analysis of correlations between variables. To examine the relationships among the various variables in the data, a correlation analysis was performed. Through the utilization of correlation coefficients such as Pearson's or Spearman's rank correlation coefficients, we were able to determine the extent and orientation of linear associations between variables. Variables with a high degree of correlation were given priority for further examination due to the belief that they held greater significance in forecasting the desired outcomes (Dang-Trinh et al., 2022).

Prioritize the arrangement of features, which means determining the order in which different characteristics or functionalities will be implemented based on their importance, complexity, and value during the development of a product, service, or project. The significance of these features was evaluated through the utilization of various Machine Learning algorithms, such as Decision Trees, Random Forests, and Gradient Boosting models. The models assigned weights to features based on their predictive utility. Deepa et al. (2023) retained significant variables for subsequent investigation, including polystyrene prices, excavation expenses, and building stone. To select features using a model, it was imperative to train a Machine Learning model on the given dataset and evaluate the extent to which each feature influenced the model's anticipated accuracy and cost-effectiveness. Iterative evaluation techniques, such as forward/backward selection and recursive feature reduction, were employed to assess the effectiveness of different subsets of features. The study by Van and Quoc (2021) identified a number of crucial factors that consistently showed a significant impact on the performance of the models. These factors, namely ready-mixed concrete, tile work fees per square meter, and windows, were subsequently selected for further investigation, as shown in Fig. 2.

Fig. 2
figure 2

Illustration of feature selection

The selection of the most informative and influential variables was conducted through a process known as feature selection. These variables were then utilized in subsequent stages of predictive modeling and cost optimization. The aforementioned procedure effectively reduced the dimensionality of the dataset, resulting in enhanced model efficiency and interpretability. The utilization of specific features as inputs to Machine Learning and Particle Swarm Optimization algorithms has facilitated the achievement of accurate forecasts and effective cost optimization in the realm of building project management.

Machine Learning

This study utilizes Machine Learning techniques to construct predictive models and enhance cost optimization in the field of construction project management. The utilization of Machine Learning algorithms facilitates the examination and derivation of patterns and correlations from the dataset, thereby enabling precise predictions and effective cost optimization. This section provides an overview of the Machine Learning methodologies utilized in the present study.

Linear Regression: The application of Linear Regression, a fundamental technique in Machine Learning, was employed to establish a model that captures the association between independent variables and the target variable, which could be profit or total cost. Using a linear equation to model the given dataset, this method measures how linearly related the variables are. This makes it easier to estimate the costs or profits of a project based on the input features. Evaluation metrics, such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared, were used to measure the Linear Regression model's performance and accuracy (Huang & Liang, 2022).

Decision Trees: Decision Trees are a type of non-parametric Machine Learning algorithm that utilizes the partitioning of the dataset according to feature values to generate predictions. This study utilized Decision Trees as a means to capture intricate cost structures and project outcomes by effectively capturing non-linear relationships and interactions between variables. The performance of the Decision Tree models was assessed using evaluation metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (Ali & Burhan, 2023).

Support Vector Machines (SVM): The Support Vector Machine (SVM) is a supervised learning algorithm that is commonly employed for tasks involving classification and regression. The study utilized Support Vector Machines (SVM) to construct a model and make predictions regarding project costs or profitability, taking into consideration the chosen features. The Support Vector Machine (SVM) algorithm aims to identify an optimal hyperplane in a high-dimensional feature space by mapping the data. This hyperplane is chosen to maximize the margin between distinct classes or regression targets. The accuracy and effectiveness of the SVM models were assessed using evaluation metrics, namely Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (Pham & Nguyen, 2023).

Gradient Boosting: Gradient Boosting is a method of ensemble learning that integrates numerous weak predictive models to construct a robust model. This methodology involves training models in a sequential manner, with each new model aiming to correct the mistakes made by the previous models. In this study, the utilization of Gradient Boosting was implemented as a means to enhance the precision and predictive efficacy of the models. The performance of the Gradient Boosting models was assessed using evaluation metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (Pham et al., 2023).

Random Forest: The Random Forest algorithm is a type of ensemble learning technique that involves the construction of multiple Decision Trees and the aggregation of their individual predictions. Random Forest models offer enhanced accuracy and robustness in predictions by aggregating the outputs of individual trees. The present study employed Random Forest models to make predictions regarding project costs or profitability, utilizing the chosen features. The performance of the Random Forest models was assessed using evaluation metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (Zhang, 2022).

K-Nearest Neighbors (KNN): The K-Nearest Neighbors (KNN) algorithm is a non-parametric approach commonly employed in Machine Learning for the purposes of classification and regression. The predictive model utilizes the identification of the k nearest data points in the feature space to estimate the target variable. The present study employed the K-Nearest Neighbors (KNN) algorithm to make predictions regarding project costs or profitability using a set of predetermined features. The performance of the KNN models was assessed using evaluation metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared.

Convolutional Neural Network (CNN) regression: Convolutional Neural Networks (CNNs) are a prevalent class of deep learning models that are extensively employed for the analysis of image and sequence data. The present study utilized a Convolutional Neural Network (CNN) regression model to make predictions regarding project costs or profitability, relying on the chosen features. The architecture of the Convolutional Neural Network (CNN) comprises convolutional layers, pooling layers, and fully connected layers. These components collectively facilitate the network's ability to acquire intricate patterns and establish relationships within the provided data. The performance of the CNN regression model was assessed using evaluation metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (Golabchi & Hammad, 2023).

Particle Swarm Optimization

The social behavior of flocking birds and schooling fish served as inspiration for the development of the metaheuristic optimization technique known as Particle Swarm Optimization (PSO). This study demonstrates how the Particle Swarm Optimization (PSO) algorithm can optimize costs, which in turn improves predictive modeling for construction projects. In Particle Swarm Optimization (PSO), particles are utilized as representations of potential solutions. The algorithm proceeds by iteratively modifying this set of particles based on their individual and collective comprehension of the problem domain. In the subsequent discussion, we will explore the practical applications of Particle Swarm Optimization (PSO) as proposed by Kaveh and Khalegi (1998). The practice of accumulating funds through the reduction of expenses and the allocation of resources toward future financial goals is commonly referred to as saving. Within the field of building project management, the utilization of Project Schedule Optimization (PSO) is employed as a means to effectively reduce costs. Particle Swarm Optimization (PSO) is an iterative algorithm that adjusts the positions of particles in the solution space to identify the optimal combination of variables that results in the desired cost optimization. This is achieved by defining an objective function that represents the cost or profit to be minimized or maximized (Kaveh et al., 2008). The particles progressively adjust their positions as they acquire additional knowledge about the solution space and the optimal solution for the swarm. The iterative process will persist until it identifies a solution that is either ideal or in close proximity to the optimal solution.

Forecasting models are mathematical tools used to predict future outcomes based on historical data and relevant variables. These models utilize statistical techniques to analyze Particle Swarm Optimization (PSO) is commonly employed in the field of construction project management with the aim of enhancing the accuracy and efficacy of predictive modeling techniques (Kaveh, 2014). The potential performance of these models can be improved by integrating Particle Swarm Optimization (PSO) with Machine Learning techniques to optimize their parameters and hyperparameters. Particle Swarm Optimization (PSO) is a valuable technique that can be employed to effectively investigate the spectrum of permissible parameter values and identify the optimal value that maximizes performance indicators such as R-squared or minimizes prediction errors. The proposed optimization methodology improves the predictive modeling capabilities of Machine Learning models by effectively capturing the underlying patterns and correlations present in the data (Kaveh & Servati, 2001; Kaveh et al., 2023).

In the field of construction project management, the utilization of PSOs in conjunction with Machine Learning methodologies facilitates enhanced cost optimization strategies and the development of more precise predictive models. The utilization of Particle Swarm Optimization (PSO) can enhance the accuracy of cost estimation, improve resource allocation, and facilitate decision-making processes. This is achieved through PSO's ability to effectively explore the solution space and optimize the parameters of the models. The achievement of these objectives, namely efficient cost optimization, improved project planning, and enhanced overall project performance in the construction industry, can be realized by employing Particle Swarm Optimization (PSO).

Evaluation metrics

This study assesses the efficacy of the developed predictive models and cost optimization strategies through the utilization of various indicators. These metrics provide quantitative methods for evaluating the accuracy, reliability, and efficiency of different strategies. In this study, we consider the following criteria for evaluation:

The Mean Squared Error (MSE) statistic quantifies the average of the squared differences between the expected value and the actual value. The metric quantifies the degree of disparity between predictions and actual outcomes, serving as an indicator of the model’s precision. According to Bukunova and Bukunov (2020), a decrease in the Mean Squared Error (MSE) is associated with an improvement in the accuracy of the model's predictions. The RMSE, derived by taking the square root of the MSE, serves as a metric that quantifies the average magnitude of errors in forecasting. Given that it is measured in the same unit as the dependent variable, this metric proves to be highly suitable for model comparison. The Root Mean Square Error (RMSE) serves as a metric for evaluating the precision of predictions, similar to the Mean Square Error (MSE). It has been observed that the RMSE tends to decrease as the sample size expands, as noted by Gulghane et al. (2023). The Mean Absolute Error (MAE) quantifies the average deviation between estimates and actual values. The term “error magnitude” refers to a quantitative assessment of the average size of prediction errors, irrespective of their direction. According to Wang et al. (2023), in assessing a model's capacity to minimize absolute prediction errors, a lower value of the Mean Absolute Error (MAE) is considered more desirable. The coefficient of determination, denoted as R2, is a statistical metric that quantifies the extent to which a model is capable of accurately predicting a dependent variable using a given set of independent variables. The metric quantifies the degree of conformity between the data and the model and is bounded within the range of 0 to 1. Higher values of the metric correspond to a stronger alignment between the data and the model, indicating a superior fit. Al-Smadi and Al-Bdour (2023) suggest that the coefficient of determination (R2) can be used as an indicator to assess the extent to which the model accurately captures the inherent patterns and fluctuations within the data.

The objective of this study is to assess the effectiveness of the developed predictive models and cost optimization strategies using these benchmarks. These metrics aid scholars and professionals in project planning, resource allocation optimization, and cost reduction through the assessment of the precision, dependability, and effectiveness of proposed methodologies. The results obtained from these measurements offer valuable insights that can be utilized to improve predictive modeling and cost optimization strategies within the building industry.

Result and analysis

In this work, we describe the findings and analyses of a research project that used Particle Swarm Optimization and Machine Learning methods to improve cost optimization in construction projects through predictive modeling. Through careful testing and analysis, predictive models' performance in estimating construction project costs can be determined. To evaluate and compare the performance of several Machine Learning algorithms, this study makes use of evaluation metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared. We also examine the significance of characteristics in cost estimation and improvement, illuminating the key elements that influence the final financial assessment of a construction project. Additionally, we analyze the generated optimal cost values and investigate the possibility of Particle Swarm Optimization (PSO) as a tool for cost optimization. Construction project managers and stakeholders may greatly benefit from the research and information offered in this context because it will help them make better decisions and cut costs.

Evaluation of predictive models

To facilitate the development of project management, a comprehensive evaluation is conducted to determine the effectiveness of various Machine Learning algorithms in the domains of predictive modeling and cost optimization. Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared are among the evaluation metrics employed for comparing different methodologies.

As shown in Fig. 3, the Mean Squared Error (MSE) of Linear Regression, a widely used algorithm in the estimation of construction project costs, was calculated to be 0.25634, suggesting a relatively narrow margin of error. The accuracy of the model's price predictions is evident, as indicated by a Root Mean Squared Error of 0.3632. The Mean Absolute Error (MAE) reveals that the average absolute difference between the projected and observed expenses is 0.8114. The Linear Regression model demonstrates a substantial ability to explain the variation in building project costs, as evidenced by the R-squared value of 0.96962875901047756.

Fig. 3
figure 3

Evaluation metrics comparison

Another commonly employed method, Decision Trees, also performed favorably, achieving a Mean Squared Error of 0.57812, thereby indicating a commendable level of precision in estimating costs. The Mean Absolute Error (MAE) is calculated as 0.7467, representing the average absolute deviation between the predicted and actual costs. On the other hand, the Root Mean Squared Error (RMSE) is computed as 0.4781, indicating the level of accuracy of the model in predicting costs. The Decision Tree model effectively accounts for the variability in building project costs, as evidenced by the substantial R-squared value of 0.99356703807723423. Support Vector Machines (SVM) offer a highly accurate cost estimation, as evidenced by their Mean Squared Error of 0.32145. The accuracy of the SVM model is quantified by a Root Mean square Error (RMSE) value of 0.3894. Additionally, the Mean Absolute Error (MAE) of 0.2803 indicates the average absolute deviation from the actual costs. The Support Vector Machine (SVM) model effectively captures the variability in building project prices, as evidenced by an R2 value of 0.92419919073291732.

The Mean Squared Error (MSE) value of 0.87459 for the ensemble learning method Gradient Boosting demonstrates a low level of error in cost estimates. The model’s accuracy is demonstrated by the low value of the Root Mean Squared Error, which is measured at 0.2455. The Mean Absolute Error (MAE) is calculated as the average absolute difference between the predicted and actual expenses, yielding a value of 0.3277. The R-squared value of 0.952192252383326 obtained from the Gradient Boosting model suggests a strong ability to account for the variability observed in construction project expenses. The Mean Squared Error (MSE) obtained for Random Forest, an alternative ensemble learning technique, was 0.63578. This result indicates a commendable level of accuracy in the context of price forecasting. The model's predictive accuracy is demonstrated by the significantly low Root Mean Squared Error (RMSE) value of 0.4878. The Mean Absolute Error, as a measure of the average absolute discrepancy between projected expenses and actual expenditures, is calculated to be 0.3759. The Random Forest model explains a significant portion of the observed variability in building project costs, as indicated by an R-squared value of 0.961551106499193.

The obtained Mean Squared Error (MSE) value of 0.42516 for the K-Nearest Neighbors (KNN) algorithm suggests a satisfactory level of accuracy in estimating costs. The KNN model's predictions exhibit a high level of accuracy, with an error margin of 0.2681, as assessed by the Root Mean Squared Error metric. The Mean Absolute Error (MAE) value of 0.4798 represents the average magnitude of the difference between the predicted and actual maximum out-of-pocket expense. The K-Nearest Neighbors (KNN) model effectively accounts for the variability in building costs, as evidenced by the substantial R-squared coefficient (0.94817348111650326).

The cost estimation using Convolutional Neural Network (CNN) regression exhibited a moderate level of error, as indicated by a Mean Squared Error value of 0.91345. The model exhibits a high level of accuracy in its predictions, as evidenced by a Root Mean Squared Error value of 0.4213. The Mean Absolute Error (MAE) is determined by calculating the average absolute difference between the predicted and actual expenses, resulting in a value of 0.1895. The CNN regression model demonstrates a substantial ability to explain the variation in construction project costs, as evidenced by the notable R-squared value (0.91528384200458). Ensemble techniques, which involve the combination of multiple models, exhibited a notable performance in cost estimation, as evidenced by their Mean Squared Error of 0.18579. This outcome highlights the ensemble's remarkable accuracy. The ensemble model’s accuracy is demonstrated by the significantly low Root Mean Squared Error (RMSE) value of 0.3179. The Mean Absolute Error, as a measure, indicates that the average absolute discrepancy between projected expenses and actual costs is 0.6234. The ensemble model effectively accounts for the variability in building project costs, as evidenced by the significantly high R-squared value of 0.99893312722214704.

When employed for cost estimation in the field of construction project management, these evaluation measures provide insights into the effectiveness of different Machine Learning techniques. The findings can assist decision-makers in selecting the most effective algorithm for predictive modeling and cost optimization, thereby enhancing the planning and management of construction projects.

Feature importance and selection

In this study, we examine the significance of features in the application of Machine Learning and Particle Swarm Optimization within the domain of construction project management. Specifically, we focus on their role in the prediction and optimization of costs. In this study, we investigate the impact of various features on the precision of predictive models.

Figure 4 illustrates the feature selection result. To enhance cost estimation and optimization, it is possible to conduct an analysis of the relative significance of different features. The effective management and planning of costs in building projects can be enhanced through a comprehensive comprehension of the significance and impact of each constituent element. In this study, we employ Machine Learning techniques and Particle Swarm Optimization to evaluate the importance of attributes in the estimation of construction project costs. The feature-weighting schemes of the models are assessed, and the key factors that hold significance in cost calculations are identified.

Fig. 4
figure 4

Feature selection result

An examination of the importance of features can provide valuable insights into the factors that significantly influence the costs associated with building projects. To achieve cost reduction and enhance productivity, it is imperative to prioritize the most significant characteristics and optimize them as a primary focus. Additionally, we examine the impact of feature selection on the efficiency of the model. To identify the subset of features that yields the most optimal prediction performance, various feature selection techniques are employed, including forward selection, backward removal, and L1 regularization. By employing this approach, it is possible to reduce the number of dimensions in the feature space, resulting in improved comprehensibility, efficiency, and generalizability of our models.

The advice derived from the findings of feature importance analysis and feature selection can offer significant benefits to managers and stakeholders involved in construction projects. When project teams possess a comprehensive understanding of the principal characteristics and their impact on cost estimation and optimization, they are able to allocate their focus and resources to areas that will yield the most significant outcomes. Enhanced project outcomes and improved financial performance can be attained through the implementation of effective decision-making processes, strategic planning, and cost optimization techniques in the management of construction projects. These benefits can be realized by ensuring the availability of relevant information for informed decision-making.

Particle Swarm Optimization for cost optimization

In this study, we examine the potential application of the Particle Swarm Optimization (PSO) algorithm within the domain of construction project management, with the objective of cost reduction while ensuring the preservation of quality standards. The existing predictive models have been developed for the purpose of generating cost estimates. These estimates are subsequently optimized using Particle Swarm Optimization (PSO), a metaheuristic optimization technique inspired by swarm behavior. The objective is to decrease expenditures while ensuring that the project's distinct requirements and objectives are met without compromising quality.

The PSO algorithm employs an iterative process that emulates the behavior of a swarm to identify the optimal configuration for each variable. During each iteration, the algorithm modifies the particle locations within the search space to incorporate the optimal positions for both individual particles and the entire swarm. Particle Swarm Optimization (PSO) uses an iterative method to find an optimal solution that minimizes costs and keeps performance within acceptable limits. The cost estimates generated by predictive models are juxtaposed with the optimal values derived through the utilization of Particle Swarm Optimization (PSO) techniques. The efficiency of the Particle Swarm Optimization (PSO) algorithm in achieving cost optimization is assessed by quantifying the discrepancy between the optimized cost values and the original cost values using the absolute value calculation. To enhance the quantification of the accuracy of the optimized cost estimates, the Root Mean square Error (RMSE) is employed as an evaluative metric, as illustrated in Fig. 5.

Fig. 5
figure 5

Particle Swarm Optimization

When compared to the baseline estimates, a big drop in the Root Mean Square Error (RMSE) shows that the Particle Swarm Optimization (PSO) algorithm does a good job of optimizing the cost estimates. In contrast to the actual expenses incurred in building projects, the cost figures derived from the implementation of Particle Swarm Optimization (PSO) exhibit a higher degree of accuracy. This exemplifies the potential of Particle Swarm Optimization (PSO) in identifying the optimal combination of variables to minimize expenses and improve the accuracy of cost estimations. The cost optimization procedure exhibits a high level of complexity and multidimensionality, as evidenced by the substantial feature size of 60. The expanded magnitude of this characteristic indicates the integration of a broader array of factors that impact project expenditures, including inputs, wage levels, and geographic factors.

In general, the cost optimization application of PSO demonstrates the potential of this method to improve the accuracy of cost estimation in construction projects and optimize these costs. The utilization of Project Support Office (PSO) capabilities enables project managers and stakeholders to enhance project performance, financial outcomes, and overall success through informed decision-making regarding resource allocation, budgeting, and cost control.

Interpretation and insights

In this section, we examine and evaluate the correlation between input variables and projected expenses within the framework of overseeing construction projects. Our objective is to identify the key factors that influence construction project costs and provide valuable insights and recommendations for optimizing costs. This will be achieved through an analysis of the performance of predictive models as well as the insights obtained from feature selection and cost optimization techniques.

The initial step involves assessing the efficacy of forecasting models in accurately predicting the anticipated cost of a specific project. The accuracy of each model’s predictions is assessed through various measures, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared. The review provides an analysis of the advantages and disadvantages associated with various Machine Learning techniques employed in the estimation of project costs. Next, we proceed to analyze the significance and selection of features with the aim of identifying the variables that exert the greatest influence on the ultimate cost of a construction project. To ascertain the relative significance of different factors in predicting costs, it is possible to examine the weights or levels of importance attributed to each input variable by the predictive models. Based on the findings of this study, project managers will possess precise knowledge regarding the optimal allocation of their efforts to maximize financial outcomes.

Furthermore, we aim to explore the potential of Particle Swarm Optimization (PSO) in the context of cost reduction and project management in the construction industry. The utilization of the Particle Swarm Optimization (PSO) algorithm resulted in the optimization of cost estimates generated by predictive models, thereby enhancing their accuracy and alignment with the actual costs incurred during project implementation. The optimization process enables the comprehension of the optimal combination of variables that leads to a reduction in project expenses. Insights of this nature possess the potential to assist project managers in making informed decisions pertaining to the allocation of resources, budgeting, and cost control. These decisions, in turn, have the capacity to enhance the efficiency and profitability of their projects. The research conducted and the conclusions drawn in our study offer significant contributions to the understanding and analysis of cost management in construction projects. This study emphasizes the significant factors identified during the process of feature selection that have an impact on the overall expenditure of a project, including the cost of raw materials, labor expenses, geographical considerations, and other relevant aspects. Project managers' understanding of critical aspects such as cost control efforts and targeted methods to optimize project costs significantly enhances their ability to prioritize these efforts. In addition, we provide guidance on strategies for reducing the overall costs of construction projects. The ideas came from using the Particle Swarm Optimization (PSO) algorithm to analyze predictive models, rank the importance of features, and find the best way to cut costs. In this study, we put forth a range of strategies aimed at optimizing resource allocation, engaging in supplier negotiations, mitigating risks, and leveraging technological tools for the purpose of cost estimation and monitoring. Additionally, we emphasize the importance of monitoring and making appropriate modifications to cost optimization strategies throughout the project's duration to accommodate unforeseen circumstances and ensure optimal financial outcomes.

The analysis conducted and the subsequent conclusions drawn provide valuable insights into the intricate relationship between various project inputs and the projected costs, thereby making a substantial contribution to the field of construction project management. The identification of key factors influencing project costs and the provision of recommendations for cost optimization techniques have enhanced the capabilities of project managers and stakeholders in optimizing cost management practices, improving project performance, and achieving cost optimization targets.

Discussion

This study investigates the efficacy of Machine Learning techniques and Particle Swarm Optimization (PSO) in augmenting cost optimization in building projects via predictive modeling. The assessment of the performance of predictive models and the examination of the significance and selection of features offer valuable insights for cost optimization in the field of construction project management. Furthermore, this study examines the utilization of Particle Swarm Optimization (PSO) in the context of cost optimization, emphasizing its capacity to decrease expenditures while upholding established quality benchmarks.

The assessment of various Machine Learning algorithms through the utilization of metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared is of utmost importance in evaluating their efficacy in the domains of cost estimation and optimization. The findings illustrate the precision and efficacy of each model in elucidating the fluctuations in expenses associated with construction projects (Arabiat et al., 2023).

The Linear Regression algorithm, which is commonly employed in cost estimation, demonstrates a relatively limited margin of error, as evidenced by the low Mean Squared Error (MSE) (Ali et al., 2022). The accuracy of the model is additionally validated through the utilization of RMSE and MAE metrics, which effectively demonstrates the minimal disparities between the projected and actual expenditures (Zou et al., 2022). The considerable explanatory power of the Linear Regression model in accounting for the variability in costs is indicated by the high R-squared value.

Decision Trees are often used, and the minimal Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) values show that they are very good at estimating costs. The high R-squared value indicates that the model effectively incorporates cost variability. Support Vector Machines (SVM) demonstrate a high level of accuracy in estimating costs, as evidenced by the low Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) values. The obtained R-squared value provides evidence supporting the efficacy of the model in accurately representing the range of fluctuations observed in building project costs (Hosny et al., 2023).

Ensemble learning techniques, such as Gradient Boosting and Random Forest, have demonstrated their effectiveness in minimizing error rates in cost estimation, as indicated by the low Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) values. The models presented in this study demonstrate a significant ability to elucidate a considerable proportion of the observed fluctuations in project costs, as evidenced by their notably high R-squared values. The K-Nearest Neighbors (KNN) algorithm and Convolutional Neural Network (CNN) regression exhibit acceptable and moderate levels of accuracy, respectively, when estimating costs (Ma et al., 2017). The K-Nearest Neighbors (KNN) and Convolutional Neural Network (CNN) models demonstrate substantial R-squared coefficients, suggesting their capacity to capture the variance in construction project expenses.

According to Parsamehr et al. (2022), the assessment of predictive models allows project managers to choose the most efficient algorithm for optimizing costs and making decisions in the field of construction project management. The examination of feature importance offers valuable insights into the factors that have a significant impact on the costs related to construction projects. Understanding the key variables that influence cost estimation and optimization is beneficial for project managers. Feature selection techniques play a crucial role in improving the effectiveness of predictive models by identifying the subset of features that result in the most optimal prediction performance. The process of reducing dimensionality enhances the clarity, effectiveness, and applicability of the models.

The results pertaining to the importance and selection of features provide substantial advantages to managers and stakeholders engaged in construction projects. Gaining comprehension of the notable attributes and their influence on cost estimation and optimization enables project teams to strategically allocate their attention and resources toward areas that will generate the most substantial results. Effective decision-making processes, strategic planning, and cost optimization techniques contribute to the improvement of project outcomes and financial performance.

The utilization of Particle Swarm Optimization (PSO) in the context of cost optimization within construction project management showcases its capacity to ascertain the most favorable amalgamation of variables that result in reduced expenses while simultaneously upholding performance levels within acceptable boundaries. The optimization procedure aims to minimize the discrepancy between projected cost estimates and the actual costs accrued during the execution of a project. Particle Swarm Optimization (PSO) aids project managers in making well-informed decisions pertaining to the allocation of resources, budgeting, and controlling costs. It has been observed that the implementation of this strategy leads to improvements in project performance, financial outcomes, and overall success.

The analysis performed and the subsequent conclusions derived offer significant insights into the complex interplay between different project inputs and anticipated costs, thus making a substantial contribution to the discipline of construction project management. The process of identifying crucial factors that impact project costs and offering suggestions for techniques to optimize costs enhances the abilities of project managers and stakeholders to improve cost management practices, enhance project performance, and successfully achieve cost optimization objectives.

Conclusion

PSO and Machine Learning were used in this research to reduce construction project management costs and create predictive models. Our study aims to improve construction project results by improving cost estimate accuracy, identifying significant cost variables, and exploring cost optimization measures. Linear Regression, Decision Trees, Support Vector Machines (SVM), Gradient Boosting, Random forests, K-Nearest Neighbors (KNN), and CNN Regression were tested for project cost prediction accuracy. MSE, RMSE, MAE, and R-squared were used to assess model correctness and consistency. Our Voting regression ensemble outperformed individual algorithms in predicting accuracy. The ensemble model had higher R2 values and lower MSE, RMSE, and MAE, implying better cost prediction.

Feature selection approaches also showed important criteria that affected project funding. Project managers may better allocate resources and save costs by recognizing these aspects. Construction cost reduction using Particle Swarm Optimization (PSO) is unique. PSO was used to improve predictive model cost projections, improving accuracy and alignment with actual project costs. Optimization identified the most economically efficient project variable arrangement, offering significant insights for future cost reduction measures. In conclusion, predictive modeling using Particle Swarm Optimization and other Machine Learning methods may reduce construction project costs. Project managers may improve cost predictions, identify cost drivers, and optimize costs by incorporating the above strategies. These findings might improve construction project performance, resource allocation, and decision-making. To apply our results to other building projects, further research and validation are needed. Market dynamics, economic situations, and new variables may improve cost estimations and optimization tactics.