Introduction

Infrastructural development poses a great influence to the success and advancement of a given human society and the infrastructure assets constitute a considerable benchmark when level of development of a country is to be measured. This is because its quality cum efficiency makes a physical impact on a society’s level of development to different areas (Elmousalami, 2020). The uniqueness of every construction project means that construction work is commonly hindered by duration and cost instability. As this instability is unplanned, it constitutes a problem when attempting to execute and deliver projects within the estimated budget and at required time. Construction activities usually suffer from these variability as each construction project is unique, which induces different factors that makes the analysis very complex for construction managers. These factors include project location, clients, regulations, labor, equipment, technology, subcontractors, experience, stakeholders, even the project team, are prone to change, at least partially, among projects (Chudley & Greeno, 2016). Effective cost-estimation, therefore, is so vital; it can secure the financial fate of a project. However, effective models will be a great influence in the computation of cost and forecasting of construction durability in the industry. Completing a project as predicted and keeping to budget is not an easy task, in spite of advance in the field of project management today, most projects today face cost and over runs which increase with the increase in convolution of the project involved. Numerous factors contribute to these delays namely; contractors delay, client delay, consultant delay, labor related delay, and various other external delay. These delays in long run causes time overrun, cost overrun, dispute over run, arbitrations, total abandonment, and litigation (Assaf & Al-Hejji, 2006).

Cost–duration models appeared to be a functional tool, utilized as a financial portrayal in the semblance of a spreadsheet, mathematical impressions, chart, and/or diagram used to represent the total cost of constituents, or fractions within a total complex product, set-up, system, or facility (Samuel et al., 2015). Duration and cost prediction are most inestimable in construction projects during the initial stages of project inception. The cost–duration models can be utilized to decide the time, project length, and cost of proposed construction projects. These models can be applicable at either at the conceptual stage, feasibility stage, budget authorization stage, control stage, or bidding/tendering stage. Mahamid and Amund (2010) found that 100% of the construction projects under study suffered from cost diverge; 76.33% were underestimated and 23.67% were overestimated. Among the projects studied, it was noticed that 62% were overestimated and the rest were underestimated. Successful and productive cost estimation is therefore so vital; it can safe a financial fate of project. However, effective models will be a great influence in estimating cost and prediction of project duration. Ballesteros-Perez et al. (2018) showed how the most repeated scheduling techniques (Gantt chart, Critical-Path-Method and Project Evaluation and Review Technique, PERT) consistently underestimate the actual project duration and cost. Notable cause of underestimation originated precisely from abandoning activity-duration variability.

In addition to the classical scheduling model, more modern techniques for achieving improved project duration and/or cost estimates have previously been put-forward (e.g., fuzzy-logic, neural network analysis, Monte Carlo simulations, artificial-intelligence methods, many variants of PERT, and even more extensions of Earned Value Management) (Ballesteros-Perez, 2017). What all these methods have in common, classical and advanced alike, is that they all require some prior estimates of the potential activity durations and costs. For example, PERT-related techniques generally resort to three-point estimates (pessimistic, optimistic, and probable durations and costs). Similarly, realizable information/data on the interrelatedness between activity-duration and costs are also an infrequent item, which trigger these methods to either take-on sovereignty between costs and activities or have recourse to bias correlation articles (Banerjee, & Paul, 2008; Cho, 2009). Consequently, when enough quantity or qualitative information is not available, the forecasting exactness of the actual project duration and/or cost is unreliable expectedly. The desire for a precise estimation of construction lifespan from an initial stage is imperative (Jin et al., 2016; Kaka & Price, 1991; Love et al., 2005). Inaccurate estimation, either under or over, leads to poor discharge of project and failure to meet objectives of project (Khosrowshahi & Kaka, 1996; Lin et al., 2011). Therefore, prediction of the actual-time of construction duration by consolidating the features of a suitable project and facilities will likely have compelling power regarding project stakeholders including the end-users of the facility while providing the contractor with a real-time construction duration that is necessary for accomplishing the construction. This can help prevent many problems, including compromising the quality of construction due to fast-phased construction work, possibility of disputes due to the generation of additional costs, increase in budget, etc. (Owolabi et al., 2014; Thomas et al., 2001).

This research aims to develop smart intelligent model to validly forecast cost and estimate construction duration in various construction projects making use of Artificial Neural Network (ANN). Also, the objectives of this work are to investigate deviation between planned and actual cost, and planned and actual duration of construction projects using building information as the predictor variables (Alaneme & Mbadike, 2019; Alaneme et al., 2021a, 2021b). The essence is to achieve optimization of cost and time constraints for construction project works so as to enhance efficient planning before execution (Yilmaz & Yuksek, 2009). This analytical approach would bridge the gap between under-estimation and over-estimation in the construction industry, and analyzing good construction for proper contract completion. ANN refers to an instruction processing template that adopts the manner that brain in living organisms’ process information. It constitutes a huge number of emphatically inter-connected processors called (neurons) functioning in harmony to decipher a particular issue (Feng & Li, 2013). It entails a calculation method that builds several processing units based on inter-connected connections. The network contains an inconsistent mass of nodes or cells or neurons or units that links the input set to the output. It is an aspect of a computer-system that imitates the pattern which human-brain analyzes and processes data (Kaveh et al., 2008; Lu, 2002). The system comprises of a huge copies of highly inter-connected elements processors called neurons that collaborate to fathom a solution for a given task and forward information via synapses (electro-magnetic connections). The neurons are inter-connected firmly and well organized into layers. The input-layer receives the data, while the output-layer bring about the final result. Between the two, one or other secret layers are typically sandwiched. This arrangement makes predicting or knowing the exact surge of data difficult. Each connection has a connection weight, and every neuron has a threshold value and an activation function. It is calculated if each input has a positive or negative weight based on the sign of the input's weight (Onyelowe et al., 2021a, 2021b, 2021c; Pakbaz & Mehdizadeh, 2015). Work organization seeks to optimize the relative interaction which exists between equipment, resource, employees, and information which are the necessary tools of production and construction process to enhance the cost of achieving efficiency of work process while at the same time maintaining the needed balance in performance, skill set and motivation of employees. This influences the outcome of the production processes by regulating the incurred cost and duration required to complete the constituting tasks successfully. Work organization is the way tasks and responsibilities are distributed among individuals in an organization and also a measure in which these specific arrangement are coordinated to achieve the desired end products or service delivery. It presents the relationship between employee workforce engagement, leadership and management, organizational structure, and technology impact, and how its application influences the working environment (Flintsch & Chen, 2004).

Literature review

The need for an accurate prediction of construction cost and duration from an early stage is apparent due to the fact that inaccurate prediction of project cost and duration, whether under or over, results in cost overruns and poor project performance as regards to failure to achieve its objectives in terms of quality and timely delivery. Because of non-specific pattern in construction projects due to environmental and logistics factors, non-linear and discrete dependencies, soft computing technique are seen as appropriate for the modeling of time–cost constraints (Mirahadi & Zayed, 2016; Wang et al., 2017). Similarly, representational data for the investigative study between activity-duration and costs are also an uncommon commodity, which causes these analytical techniques to presuppose non-alignment between costs and activities. Consequently, when enough quantity or qualitative information is not available, the forecasting exactness of the actual project duration and/or cost is anticipated to be unreliable. However, development of smart intelligent model for the accurate prediction of these factors will enable project managers and clients to effectively plan for the execution of the projects using the obtained results as a proper guide for the control and supervision of the work to achieve desired quality at prescribed time (Remon et al., 2014). Effective cost-estimation therefore, is so vital; it can secure the financial fate of a project. However, effective models will be a great influence in the computation of cost and forecasting of construction durability in the industry. The problems of time and cost required for an entire project are as important as it goes a long way to decide how the success criteria can be achieved. Also, proper schedule planning would ensure the accomplishment of quality specifications in line with standards prescribed by the practice code. Estimation method has always been among the weakest links as regards construction planning process triggered by poor accuracy performance (Teicholz, 1994). Most contractors can recall one or two crafts people whose talents have ultimately failed miserably in business, many of these failures were the result of poor estimating practice. As common as a problem as estimating appears to be, a systematic perspective is clearly needed (Hwang et al., 2002).

Construction simulation entails utilization of computer-induced systems to design construction operations to discern their behavior-pattern and promulgate smart decision-making models more precise. When construction projects takes longer time and complicating, it becomes more demanding to superintend with the conventional methods; computer-simulation techniques can be helpful in tackling such problems. The diversified evolution of construction simulation tool has expanded, in company with the coverage of its application. Various scenarios can therefore be tested to overcome real-life construction project problems (Yu et al., 2010; Kaveh & Rahimi, 2004). Typically, the focus and main intent of simulations is to reduce costs and project time, and to scrutinize varying operational schemes in different project types. There are primarily two components involved in construction simulation models, activity and resource (AbouRizk, 2010). These refer to the activities involved in a project, and the supplies required to accomplish those activities. Construction activities have a degree of uncertainty, based on the stochastic nature of their processes, and several parameters that affect productivity and performance. Labor skills, conditions of weather, and equipment breakdown are some examples of the uncertainties involved during the span of construction activity (AbouRizk & Halpin, 1992).

Modeling all possible factors of influence in construction projects is a daunting task. Even if every facet of an operation is modeled, it would be near impossible to properly include all site conditions and factors that can occur while executing such a project. In developing a simulation-model for construction processes, the models may be based either on observations of historical data, or an experts’ judgment and insight of the processes (Kim et al., 2004; Lowe et al., 2006). The employ of historical or existing data is advisable when the user does not expect any significant alteration in the underlying presuppositions of the process. While the experts’ judgment is an appropriate choice for conceptualizing inputs that are expected to vary eventually due to unexpected changes in the underlying factors. Simulation implies an “imitation of a true-world proceeding of a system over time”. Computer simulation-tools are adopted to build models to make available image of different project activities, resources used in work execution, and the environment around the project site. Models can be employed in developing better plans for projects, optimizing the usage of resources, minimizing project costs and duration, and improving overall performance and productivity (Banks et al., 2000; Onyelowe et al., 2021a).

Artificial neural network

Unlike conventional rule-based artificial-intelligence techniques, neural network extracts expertise details from data automatically with no rules required. In other words, by adopting a hit-and-miss method, the system “learns” to become an “expert” in the field the user gives it to study. ANN refers to an information processing template that adopts the manner by which the brain in the nervous system of living organisms process information. It consist of huge number of extremely inter-connected processing-elements (neurons) functioning uniformly to find a solution to a specific problem (Wu et al., 2009; Park & Lee, 2011). Biological neurons (also known as nerve cells) or simply neurons are the underlying segments of the brain and nervous system, the cells responsible for obtaining sensory input from the outer world via dendrites, process it, and give the output through Axons. The neuron cell which comprises the nucleus and performs bio-chemical transformation essential to the life of neurons is named cell body (Soma). One and all neuron has fine, hair-like tube-shaped structures (extensions) around it called Dendrites. They spread out into a tree-like form around the cell body and accept arriving signals. Axon is a long, thin, tubular structure that functions similar to a transmission line (Alaneme et al., 2022a, 2022b; Sevda & Yusuf, 2020). Neurons are attached to one another in a complex-spatial arrangement. As axon gets out to its final destination, which is the nerve fiber that takes away impulses from the cell body. At the tip of the axon are extremely convoluted and all-embracing structures called as synapses. The attachment of one neurons to another fall out at these synapses. Dendrites receive input via the synapses of other neurons. The soma works on these in-coming signals progressively and recast that processed value into an output that is sent out to other neurons via the axon and the synapses, as represented in Fig. 1 (Nath et al., 2011; Uwanuakwa et al., 2022).

Fig. 1
figure 1

Biological neuron

The foremost perceptron currently utilized today was promulgated in 1943 by McCulloch and Pitts, by imitating the pattern a living neuron functions. A single-layer neural network with single output is called a Perceptron as represented in Fig. 2. It performs the functions of aggregation of the bias and inputs with their individual weight, and then performs decision based on the aggregated results. For a single observation, x0, x1, x2, x3x(n) represents various input parameters to the network. Each of these inputs is multiplied by a connection weighted function or synapse (Alaneme et al., 2021b; Flood & Kartam, 1994). The weights are represented as w0, w1, w2, w3….w(n). Weight shows the strength of a particular node. b is a bias value. A bias value allows you to shift the activation function up or down through a prescribed intercept and preventing the plots emanating from the origin. In the easiest case, the products are totaled, fed to a transfer function (activation function) to produce a result, and this result is forwarded as output. The summation/transfer function is presented mathematically in Eq. 1 (Kaveh & Servati, 2001; Rezaei et al., 2009)

$$ x_{0} \times w_{0} + x_{1} \times w_{1} + x_{2} \times w_{2} \ldots x_{n} \times w_{n} = \sum {x_{i} } \times w_{i} . $$
(1)
Fig. 2
figure 2

Perceptron

The activation-function is sacrosanct for an ANN to learn and make sense of something really complicated. Their main essence is to transform an input-signal of a node in an ANN to an output-signal. This output-signal is utilized as input to the next layer in the stack. Activation function determines if a neuron should be activated or not by summating the weighted-sum and further adding bias to it. The rationale is to launch non-linearity into the output of a neuron. If activation function is not applied, then the output-signal would be simply linear function (one degree polynomial). Hence, a linear function is easy to solve, but is restricted in complexity, possesses less power. Where activation function is lacking, our model cannot learn and model complicated data (Alaneme et al., 2020a, 2020b).

Materials and methods

Study area

The study area for this investigative research is Calabar Municipal which is a Local Government Area (LGA) of Cross River State, Nigeria with its headquarters in the city of Calabar, as shown in Fig. 3. Calabar Municipality lies between latitude 04° 15′ and 5° N and longitude 8° 25′ E in the North, and the Municipality is bounded by Odukpani Local Government Area in the North-East by the great Kwa River. Its Southern shores are bounded by the Calabar River and Calabar South Local Government Area. It has an area of 331.551 square kilometers. The population of Calabar municipal L.G.A. is estimated at 187,432 inhabitants with the majority of the area’s dwellers being members of the Efik and the Qua ethnic divisions. Christianity is the widely practiced religion in the LGA, while the Efik, Qua, and English languages are commonly spoken in the area. Popular festivals held in Calabar Municipal LGA include the widely acclaimed annual Calabar Carnival.

Fig. 3
figure 3

Study area map

Methodology

The logic sequence which presents a methodological framework within which the research steps and scientific methods used is presented in Fig. 4. The research survey assessment starts with a clear definition of the objectives of study which is followed by an in-depth review of relevant literature on duration and cost evaluation of civil construction works. Conceptualization of the research design approach to be adopted is the next step which is necessary for this research investigation, and helps to arrive at valid findings, comparisons, and conclusions. This critical procedure is followed by construction of research tool or instrument and sample selection which is the first practical step in carrying out the study to determine our aim and objectives of study; in this study, observation forms and questionnaire design are adopted (Fellows & Liu, 2008). The questionnaire is expertly designed to assess the problems of study and the results obtained in this survey exercise is taken as the system data sets which would be analyzed using ANN model. The generated smart intelligent model’s prediction performance is further evaluated using statistical methods and compared using multiple linear regression (MLR). The computed results are discussed and interpreted to draw the needed inferences for applicability purposes and to integrate the findings with the existing body of knowledge. Finally, conclusions and recommendations are drawn from the derived investigative findings (Wang et al., 2017).

Fig. 4
figure 4

Research methodology flowchart

Data collection

Purposive and random sampling techniques were adopted used for the study from the study area. Questionnaires which serve as the primary sources of data will be administered to the managerial, builders, clients, and site engineers of building construction firms registered with capital city development Authority within Cross River State. The questionnaire will be sent to some of the respondents through e-mail, some to be contacted personally, and some of them were contacted through LinkedIn. Information regarding the significant effect of building information, such as type of foundation, intended use of structure, number of floors, building area, class of clients, and contractors on building project construction cost and duration were assessed as the respondents were expected to carefully provide the details in the questionnaire [40]. This study will enhance productive decision-making during project planning especially after deriving the architectural and structural details of the proposed building by providing priceless evaluation of the expected cost and duration. In this research study, two varieties of project datasets were set out. The first one is analyzed at both activity and project levels, while the second one contains project-level information (planned and actual project durations and costs) and will be used for illustrative purposes in the discussions. To obtain representative values of the activity durations and costs, a significant amount of activities is necessary. The variety and number of project types, costs, durations, topologies, and number of activities is deemed as sufficiently representative for the analysis (Phillips & Stawarski, 2008). The activity duration and cost deviations are calculated for each activity i in the first dataset according to these two expressions, presented in Eqs. 23

$$ {\text{Activity duration deviation of activity }}i = {\text{Log}}_{10} \left( {\frac{{{\text{AAct}}_{i} }}{{{\text{PAct}}_{i} }}} \right), $$
(2)

where \({\text{AAct}}_{i}\) is the actual duration of activity i, and \({\text{PAct}}_{i}\) is the planned duration of activity i.

$$ {\text{Activity cost deviation of activity }}i = {\text{Log}}_{10} \left( {\frac{{{\text{ACost}}_{i} }}{{{\text{PCost}}_{i} }}} \right), $$
(3)

where \({\text{ACost}}_{i}\) is the actual cost of activity i, and \({\text{PCost}}_{i}\) is the planned cost of activity i.

The above expression is expressed in logarithmic scale as observed, this is important as duration and cost variable ratios which are always positive are not symmetrical in respect to the value 1. This will enable scale distortion (they range between 0 and 1 is essential in cases when the denominator is much bigger than the numerator, but between 1 and + infinity when the numerator is bigger than the denominator) which creates an artificial positive skewness in the data distribution that can only be removed by taking the log ratios beforehand. Additionally, in log scale, the variable variances are additive, rather than multiplicative. It is important to note that ratios in natural scale from 0 to 1 correspond to values from -infinity to 0 in any log scale. Whereas, ratios in natural scale from 1 to + infinity correspond to the (0, + ∞) range (Batselier & Vanhoucke, 2015).

Hypothesis

Hypothesis is an educated guess about a phenomenon which is testable either by observation or experiment. It is a statistical approach to examine the experimental or survey results to explore meaningful relationships among factor variables. Testing basically whether the obtained results are valid by investigating the odds that the results occurred by chance. If the results derived happened by chance, therefore, the experiment or observation possesses little or no statistical significance and would not be repeatable. The testing hypothesis constitutes a procedure which uses sample data to determine whether or not H0 can be accepted or rejected. If H0 is rejected, then the statistical conclusion is that alternative hypothesis (HA) is true. P value which is observed level of significance for the test provides achievement uses for drawing conclusions in hypothesis is testing applications. It is a message of how likely the results are, assuming H0 is true. If the P value is less than 0.05 (confidence interval of 95%), the null hypothesis is said to be rejected (Ikpa et al., 2021).

Null hypothesis (Ho)

Building information details are not significant to project duration and cost.

Alternate hypothesis (HA)

Building information details are significant to project duration and cost.

Model performance evaluation

The developed model performance was evaluated to affirm that it possesses a proven ability of estimating or predicting the response parameters with acceptable accuracy degree. Multiple linear regression (MLR) model was developed to validate the performance of ANN model, and also, based on related and relevant literatures, prediction performance criteria used are statistical are loss-function parameters mean absolute error (MAE) which entails a measure of errors between paired observation expressing the same phenomenon and root mean square error (RMSE) which is the standard deviation of prediction errors with the formula presented in Eqs. 45 (Alaneme et al., 2022b; Ritter & Muñoz-Carpena, 2013)

$$ {\text{RMSE}} = \sqrt {\frac{{\sum\limits_{i = 1}^{n} {\left( {E_{i} - M_{i} } \right)^{2} } }}{n}} $$
(4)
$$ {\text{MAE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {E_{i} - M_{i} } \right|} , $$
(5)

where n is the size of the data points under investigation and \(E_{i}\) is the experimental or actual values, while \(M_{i}\) is the model predicted results.

Results and discussion, analysis, and model development

Experimental database

The derived data from questionnaire and survey exercises were expertly sorted for proper evaluation of the critical factors affecting the cost and duration of several building construction projects which were divided into residential, commercial, church, agricultural, industrial, and school structural works, as shown in Table 1. The effects of the various factors, namely, activities involved, building area (m2), type of foundation, number of floors (storey), the clients class, and contractors involved, were assessed in respect with the cost and duration response. From the presented result, we observed higher cost and duration variations for the projects given to sole and mini-contractors which is clearly due to lack of modernization, technical advancements, and quality of resource personnel provided by the firm to manage and control the activities of the construction project. The medium and multi-companies of course possess sophisticated tools and equipment which it utilizes to derive optimal results in terms of desired quality in less time to enable efficient management of the project. These prevents cost overrun and enable timely completion achieving prescribed quality specifications. The bidding cost and negotiation fees were also observed to effect the choice of class of contractors recruited for the construction job as the clients with higher financial capacity such as government and cooperate organizations negotiated and hired the multi and medium companies (Abu Hammad et al., 2008; AlSehaimi & Koskela, 2008).

Table 1 Experimental survey result

Key

Contractors

Clients

Project type

Foundation type

Sole

1

Sole

1

Residential building

1

Strip/pad

1

Mini

2

Government

2

Commercial building

2

Raft

2

Medium

3

Missionary

3

Church building

3

Pile/deep

3

Multi

4

Bank

4

School building

4

  
  

School

5

Industrial building

5

  
    

Agricultural building

6

  

Respondents’ demographical characteristics

A total of 120 questionnaires were administered to the respondents who are major team players in infrastructural construction projects; however, seventy eight (78) responded which resulted to 65% of return and their responses were taken for the analysis in this study. The demographical details showing the percentage (%) and frequency distribution of respondents are shown in Table 2. From the tabulated result, 21.79% and 78.21% are female and male, respectively, and 38.46%, 24.36%, and 26.92% of the respondents are civil engineers, builders, and project managers, respectively. The years of experience of the respondents showed 38.46% for 21–30 years and 34.62% for 31–45 years.

Table 2 Respondents’ demographical characteristics

Statistical evaluation

The relationships and interdependencies between the building information details and construction duration and cost gotten from the survey were assessed through the use of 3D surface plot with wireframe, as shown in Fig. 5. The details derived from the plot showed a positive effect on cost by building area (BA), number of storey, and activity (Act.) factors. The distribution histograms were plotted for the input and output variables, as shown in Fig. 6 which presents how often each different value occurs in a dataset. A slight or no skewness was observed in both types of parameters used. The essential statistical functions are listed in Table 3, depicting the satisfying values of statistical mean, standard deviation, variance skewness, and kurtosis (Onyelowe et al., 2021b; Rofooei et al., 2011).

Fig. 5.
figure 5

3D surface plots showing factors relationships

Fig. 6
figure 6

Distribution histogram chart for input (in yellow) and output (in pink) variables

Table 3 Statistical functions for the output and input variables

Pearson correlation

According to previous studies, Pearson correlation coefficients as presented in Table 4 are deployed to evaluate the linear relationship between the output and input variables. The results present the behavior of the variables under study for proper assessment of the effects of building details on the duration and actual cost of the project. The results indicated stronger positive relationship between the input variables, namely, number of activities, building area, type of foundation, number of floors (storey), class of clients, and contractors with respect to the project duration than the cost variables. The activities involved in the project, storey number, and building area were observed to possess the highest positive correlation results compared with the response parameters of cost and duration (Alaneme & Mbadike, 2021; Ferentinou & Fakir, 2017).

Table 4 Pearson’s correlations

Activity duration and cost deviation

The deviation calculations for the duration and cost variables are presented in Fig. 7. The logarithmic function was utilized in the computation to rescale the data sets within the boundary limits of 0–1 to achieve additive variances (scale distortion) due to observed large differences between the numerator and denominator. The result obtained indicated higher deviation results for the cost variable compared to the duration parameter. These deviations obtained are caused by the critical factors expertly selected in this research study as the independent variables whose effects will be evaluated using smart intelligent modeling system (Cho, 2009; Mačková & Bašková, 2014).

Fig. 7
figure 7

Activity cost and duration deviation

Artificial neural network (ANN) model development

From the survey data results, the factors which affect the construction cost and duration were sorted as the independent variables which consists of number of activities (Act.), building area (BA), type of foundation (FT), number of floors (storey), class of clients, and contractors. The model framework is designed as six input variables and two-output target response (construction duration and cost). The selection and structure of these variables is based on the outcome of descriptive statistical evaluation and correlation analysis of survey data. The obtained results indicate a positive linear relationship which is statistically significant among the selected variables leading to the acceptance of the alternate hypothesis. The processing parameter settings for the neural network model are presented in Table 5 and Fig. 8 which shows a 6-22-2 two-layer feed-forward network with tansig hidden neurons and linear output neurons can fit multi-dimensional mapping problems arbitrarily well, given consistent data and enough neurons in its hidden layer (Kisi & Uncuoglu, 2005; Iranmanesh & Kaveh, 1999). To determine the best-performing n-neurons, mean squared error (MSE) and R-values evaluation criteria were used in this analytical study. Varying numbers of neurons from 1 to 25 were examined to determine the optimized network for the developed ANN model. From the performance test, 22 number of neurons produced the optimal generalization results in terms of the test criteria results with respect to training, validation, and testing of the network, as shown in Figs. 9, 10. The MATLAB program script showing the computed weights and biases for the 22 hidden neurons of the first and the weight and bias function of the second layer for the two output parameters is presented in the supplementary file (Çelik & Tan, 2005; Sobhani et al., 2010).

Table 5 ANN processing parameter settings
Fig. 8
figure 8

ANN architecture

Fig. 9
figure 9

R-values for varying number of hidden layer neuron

Fig. 10
figure 10

MSE for varying number of hidden layer neuron

Training state of the ANN

The ANN training state as presented in Fig. 11 shows a gradient of 4.6854 which was the best result obtained at 15 Epoch at which the validation checks fail at 6, because the errors are repeated six times before the process finally stopped. This represents the best possible performance of the network at that stage the network performance ceases to improve further. The error function is repeated at zero points from epoch 0–5, and then rose slightly to 1 and 2 for epoch 6 and 7, respectively. However, starting from epoch 10 demonstrated over-fitting of the data. Therefore, epoch 9 is selected as the base and its weights are chosen as the final weights (Erzin et al., 2010).

Fig. 11
figure 11

ANN training state

Validation performance of the ANN

MSE which was the loss-function parameter used to evaluate the model performance for validation of the ANN network developed as shown in Fig. 12 presenting the best validation performance of 7.5443 at Epoch 9 for the optimized network (6-22-2). The result indicated satisfactory model performance with the smart model capable of predicting the target output parameters accurately generalizing the sets of complex input variables with minimum error (Erzin et al., 2009; Alaneme & Mbadike, 2021a).

Fig. 12
figure 12

Validation performance of the ANN

Error histogram of the ANN

From the error histogram presented in Fig. 13 which reflects the strong correlation of the experimental and predicted results with 20 bins for training testing and validation of the network, the point of zero error is signifying the best performance which is indicated. Almost 95% of the data yields an error lesser than 1%. The zero error is indicated with a yellow line in the middle at 0.02904 for the error function with ninety (90) instances in the training set (Alaneme et al., 2022a, 2022b; Kaveh & Iranmanesh, 1998).

Fig. 13
figure 13

ANN error histogram

Regression plot of the ANN

The regression plot shows the relationship between the actual data and the ANN model estimated data using the coefficient of determination and mean squared error (MSE) for the training, validation, and testing sets, as shown in Fig. 14. The output values in the y-axis of the plot represents ANN model estimated values, while the target values in the x-axis represent the actual data and the derived statistical results shows satisfactory performance in terms of prediction accuracy of the ANN model with correlation coefficient (R) of 0.9301, 0.99554, and 0.93193 results obtained for validation, training, and testing, respectively (Ambrule & Bhirud, 2017).

Fig. 14
figure 14

Regression plot

Model validation

The generated datasets from the smart intelligent modeling process were compared statistically and are shown in histogram chart of Figs. 15, 16. The developed ANN model performance in terms of accuracy of prediction is evaluated using loss-function parameters MAE and RMSE; and also, multiple linear regression (MLR) computation was deployed to affirm checks for adequacy (Alaneme et al., 2020a). The result summary for MLR modeling is presented in Table 6 and Figs. 17, 18. The result shows the regression coefficients and model summary which indicates that the generated regression model possesses a non-robust characteristic and poor performance with average coefficient of determination (COD) of 69.86%. The loss-function computation (MAE and RMSE) was carried out using Minitab and Microsoft excel software which yields a sufficient criteria to assess the developed ANN model performance in terms of improved accuracy of prediction. The results loss-function validation computation summary is presented in Table 7 which implies that there is a good correlation between the actual data and the ANN model estimated results with an average coefficient of determination (R2) of 99.9995%, MAE of 0.2952, and RMSE of 0.5638. These calculated performance evaluation results are in agreement with the adaptive neuro-fuzzy inference system (ANFIS) and ANN performance assessment results of Chan et al. (2004) and Onyelowe et al. (2021a, 2021b, 2021c). The developed smart intelligent model was observed to exhibit robust performance for the prediction of construction building’s cost and duration considering the physical building details and contractor and client’s class. However, the limitations to the generated model include inability to incorporate the bidding and tender negotiation, supply chain, and safety constraints factors which are basically non-linear and very complex to quantify.

Fig. 15
figure 15

Model vs. actual results (cost (naira))

Fig. 16
figure 16

Model vs. actual results (duration (days))

Table 6 Multiple regression analysis statistical result
Fig. 17
figure 17

MLR residual plots for the cost variable

Fig. 18
figure 18

MLR residual plots for the duration variable

Table 7 ANN-model performance measure

Conclusion

The following conclusions can be drawn:

  1. 1.

    Investigation of the effects of building details on project construction duration and cost was carried out in this study through research survey and analysis. Questionnaire was designed and administered to major team players in construction industry namely; clients, builders, project managers, and civil engineers within the study area.

  2. 2.

    Building information details were observed to influence the project construction duration and cost from the respondents’ response. Critical factors was expertly sorted during data collation after field survey was taken as the predictor variables to evaluate their significant effects on the criterion parameters (cost and duration of the project).

  3. 3.

    From analysis result of the survey data, factors, such as the project activity (Act.), building area (BA), type of foundation (TF), number of floors (storey), class of clients, and contractors, were observed to possess higher positive correlation values with respect to the project duration than the cost parameter, while higher deviation results were derived for the cost variable compared to the duration parameter.

  4. 4.

    The development was carried out and simulated using feed-forward back propagation network, random data division, and Levenberg–Marquardt training algorithm. The optimized network architecture (6-22-2) was selected using MSE performance criteria and Tansig and Purelin activation functions to obtain the required number of hidden neurons. The ANN model performance were further evaluated first, using MLR statistics and lastly, using loss-function parameters (MAE and RMSE) to validate the adequacy of the model developed. The computation results obtained in this process indicate a robust model with MAE of 0.2952 and RMSE of 0.5638; coefficient of determination (COD) of 99.9995% which is better than 69.86% calculated for MLR model.

  5. 5.

    The assessment of cost and duration of construction projects considering the building information details would enhance provision of time saving and accurate data for scheduling and budgeting to avoid cost overruns and under/over-estimation. The findings achieved in this research study are very essential to providing timely and near perfect predictions which would be very helpful in planning stage of construction projects to help both client and the contractor the quantify the resources required and exact time required for the completion considering critical constraint factors.

Recommendation

  1. 1.

    The assessment of cost and duration of construction project’s relationship with building information becomes very important to guide the planning, budgeting, and scheduling processes. Results obtained from this research study will provide priceless support in decision-making process to achieve required quality for the project deliverables.

  2. 2.

    Building information which includes structural geometry design, client, and contractor’s class was observed to have significant impact on the project construction duration and cost. The benefits derived from this study will guide project managers, clients, and construction stakeholders to efficiently control and execute project within design budget and at target time (Schedule).

  3. 3.

    This research study is imperative due to unethical assumptions by project team players in the area of effective management of project’s constraints of cost, scope, time, and quality. Further studies is thus recommended especially in the area of soft computing technique application for intensive evaluation of the multicollinearity between the factor variables.