1 Introduction

Many pipes within water distribution networks in large cities around the world are in their final stages of design life. These aged infrastructures are prone to frequent major failures and/or leaks that may lead to water losses, interruption in delivery of essential service, and allow contaminated water ingress resulting in hazardous exposure to water consumers. The ongoing cost of repair of the aging water pipe network has reached billions of dollars per year in the North American cities alone [1, 2].

In response, many researchers have developed watermain failure models to predict potential failures and help municipalities forecast the cost of maintenance of water networks. Nishiyama and Filion [3] reviewed several existing watermain failure models and reported that all models have low coefficient of determination. Datasets of pipe failures used in these studies were not large and ranged from 50 to 2000 break instances.

Kleiner et al. [4] was one of the first studies to use machine-learning techniques for prediction of the next pipe break failure. They used the feed-forward backpropagation artificial neural networks that trained by backpropagation (FFBP-ANN) for deriving complex relations between variables. However, the main disadvantage of traditional ANN methods is that often the solution is caught in a local minimum not reaching the optimum solution. As an alternative, the extreme learning machine (ELM) calculates optimum weights in a single hidden layer feed-forward artificial neural network [5]. Hence, ELM-ANN differs from the traditional FFBP-ANN method, as the optimum weights in the network are calculated analytically, resulting in high performance capacity and fast training for large data sets [618]. However, although having many desirable features, the authors have not identified any application of ELM-ANN to water pipe networks.

It has been a challenge for many municipalities to gain knowledge about the frequency and expected timing of future pipe failures. While general guidelines on expected service life gathered from the literature play a major role in developing asset management plans, decision makers need more accurate tools to help provide specific information on the expected cost to maintain/rehabilitate water pipe networks. On this basis, the extreme learning machine (ELM) is described in a novel application to predict time to failure of distribution pipes, including important attributes such as the pipe protective coatings effect, material of pipes, length, and diameter. It is demonstrated how the new evolutionary model can serve as an alternative to ANN and other machine learning models for application on prioritizing rehabilitation of water pipe networks.

2 Materials and methods

2.1 Extreme learning machine

The ELM is a training method for a single hidden layer neural network. It has many advantages over a traditional backpropagation (BP) algorithm. In a BP algorithm, a gradient descent-based learning method, each network weight or bias, is determined by tuning. Due to the nature of tuning, the learning speed in BP is slow and has a tendency to converge to local minima. Determining the network parameters such as the hidden neuron, transfer function, training method, and performance criteria are other disadvantages of BP [12]. The ELM has three layers, one input layer, one output layer, and a hidden layer (Fig.1). These layers form a single hidden layer forward network where linear algebra is utilized to solve the equations for achieving optimal weights in the output layer. In the ELM, weights of the input layer are randomly assigned. The output weights are calculated analytically using a pre-defined training procedure. Based on the calculation scheme of weights and biases in ELM, its training stage is extremely fast and its generalization capacity is high [5, 19].

Fig. 1
figure 1

Basic structure of a single-layer ELM network

The output of a single hidden layer feed-forward neural network can be calculated by:

$$ y=\sum_{j=1}^m{\beta}_j g\left(\sum_{i=1}^n{w}_{i, j}{x}_i+{b}_j\right) $$
(1)

where y is output of the network, x shows inputs of network, n represents the features and equal to the number of input variables, m is the number of the hidden layer neurons and equal to the output variables of the problem considered, wi , j denotes input weights that connect the ith neuron of the input layer of the neural networks model to the jth neuron of the hidden layer, βj is a coefficient that connects the jth neuron of the network hidden layer to the related neuron in the output layer, bj symbolizes biases of the neurons in the hidden layer, and g( ) indicates the activation function. The output of a single hidden layer feed-forward neural network is calculated in two stages. Initially, a single hidden layer network is formed based on user-defined network parameters. These parameters are equal to the number of neurons in the hidden layer linked by the transfer function. The number of neurons that exist in the hidden layer (m) is chosen such that it is less than or equal to the number of data observations. Moreover, the activation function (g( )) can be any piece of infinitely differentiable function [5]. Following the formation of the neurons in the hidden layer, the weights in the output layer are calculated. This is achieved by arbitrary assignments of the weights wi , j in the input layer and the biases bj.

Thus, Eq. (1) can be written as follows:

$$ \mathbf{H}\boldsymbol{\beta } =\boldsymbol{y} $$
(2)

where H defines the ELM feature mapping matrix [5]:

$$ \mathbf{H}\left({w}_{i, j},{b}_j,{x}_i\right)=\left[\begin{array}{ccc}{H}_{1,1}& \cdots & {H}_{1, m}\\ {}\vdots & \ddots & \vdots \\ {}{H}_{n,1}& \cdots & {H}_{n, m}\end{array}\right]=\left[\begin{array}{ccc} g\left({w}_{1,1}{x}_1+{b}_1\right)& \cdots & g\left({w}_{1, m}{x}_m+{b}_m\right)\\ {}\vdots & \ddots & \vdots \\ {} g\left({w}_{n,1}{x}_n+{b}_1\right)& \cdots & g\left({w}_{n, m}{x}_m+{b}_m\right)\end{array}\right] $$
(3)

here, y and β can be defined as:

\( \boldsymbol{y}=\left[\begin{array}{c}{y}_1\\ {}\begin{array}{c}{y}_2\\ {}\begin{array}{c}\vdots \\ {}{y}_m\end{array}\end{array}\end{array}\right] \) and \( \boldsymbol{\beta} =\left[\begin{array}{c}{\beta}_1\\ {}\begin{array}{c}{\beta}_2\\ {}\begin{array}{c}\vdots \\ {}{\beta}_n\end{array}\end{array}\end{array}\right] \)(4)

The weights βj are found by minimizing error in the approximation by the Moore–Penrose generalized inverse method [20] such that:

$$ \widehat{\boldsymbol{\beta}}={\mathbf{H}}^{+}\boldsymbol{y} $$
(5)

where H+ symbolizes the generalized Moore–Penrose inverse matrix of H. Huang et al. [5] showed that relying only on determining optimal output weights is sufficient to achieve high accuracy and calculating the output weights instead of tuning, is the fundamental rationale behind the speed and the generalization capacity.

Three more versions of the ELM are employed in this study; backpropagation ELM (tELM), linear regression ELM (ELMr), and self-adaptive ELM (SaELM). In tELM, the output weights are calculated by tuning, while the weights in the input layer and related biases are assigned randomly [12]. The weights in the output layer are optimized by back- propagating the mean square error (\( \mathrm{MSE}=\frac{1}{2}\sum_{i=1}^N{\left({d}_i-{y}_i\right)}^2 \)) and changed by:

$$ {\varDelta \beta}_{j, k}=\eta {\sum}_{i=1}^N\left({d}_i-{y}_i\right){H}_j $$
(6)

where N indicates the dataset length, η shows the learning rate parameter, and di and yi are the desired and actual outputs, respectively. In ELMr, the weights in the output layer are calculated by linear regression and an error term is added such that Eq. (2) is written as follows:

$$ \boldsymbol{y}=\mathbf{H}\beta +\boldsymbol{\varepsilon} $$
(7)

where ε is an error matrix. On the other hand, the SaELM employs the differential evolution (DE) method for optimizing network parameters [19]. In the SaELM method, the self-adaptive DE is utilized to determine the input weights and hidden node biases with the ELM method being used to develop the output weights. Initially, the self-adaptive DE algorithm is used to generate random NP vectors θk , G as populations in the first generation. In the Gth generation, the ith parameter vector can be written as:

$$ {\theta}_{kG}=\left[{\theta}_{kG}^1,{\theta}_{ikG}^2,\dots, {\theta}_{kG}^D\right] $$
(8)

where i = 1, 2, …, NP, and vectors are generated randomly through the following:

$$ {\theta}_{k, G}={\theta}_{\min }+\operatorname{rand}\left(0,1\right).\left({\theta}_{\max }-{\theta}_{\min}\right) $$
(9)

where

$$ \left\{\begin{array}{c}\hfill \begin{array}{l}{\theta}_{\min }=\left[{\theta}_{\min}^1,{\theta}_{\min}^2,\dots, {\theta}_{\min}^D\right]\\ {}\end{array}\hfill \\ {}\hfill {\theta}_{\max }=\left[{\theta}_{\max}^1,{\theta}_{\max}^2,\dots, {\theta}_{\max}^D\right]\hfill \end{array}\right. $$
(10)

In this equation, θmin and θmax are the bounds of the considered parameters.

The weight matrix for the output is determined by the following equation:

$$ {\beta}_{k, G}={\mathrm{H}}_{k, G}^{\hbox{-} 1}\mathrm{T} $$
(11)

where\( {\mathrm{H}}_{k, G}^{\hbox{-} 1} \) = the generalized inverse ofHk , G and can be written as:

$$ {\mathrm{H}}_{k, G}=\left[\begin{array}{ccc}\hfill g\left({a}_{1,\left( k, G\right)},{b}_{1,\left( k, G\right)},{x}_1\right)\hfill & \hfill \cdots \hfill & \hfill g\left({a}_{L,\left( k, G\right)},{b}_{L,\left( k, G\right)},{x}_1\right)\hfill \\ {}\hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ {}\hfill g\left({a}_{1,\left( k, G\right)},{b}_{1,\left( k, G\right)},{x}_N\right)\hfill & \hfill \cdots \hfill & \hfill g\left({a}_{L,\left( k, G\right)},{b}_{L,\left( k, G\right)},{x}_N\right)\hfill \end{array}\right] $$
(12)

In addition, the root mean squared error (RMSE) of each individual is calculated as:

$$ {\mathrm{RMSE}}_{k, G}=\sqrt{\frac{\sum_{i=1}^N\left|\sum_{j=1}^L{\beta}_j g\left({a}_{j,\left( k, G\right)},{b}_{j,\left( k, G\right)},{x}_i\right)-{t}_i\right|}{m\times N}} $$
(13)

The population vector with the best RMSE is stored in the first generation. In subsequent generations, the parameter vectors are evaluated using the following equation

$$ {\theta}_{\mathrm{k},\mathrm{G}+1}=\left\{\begin{array}{c}\hfill {u}_{k, G+1}\kern1em \mathrm{if}\kern0.5em {\mathrm{RMSE}}_{\theta_{k, G}}-{\mathrm{RMSE}}_{\theta_{k, G+1}}>\varepsilon .{\mathrm{RMSE}}_{\theta_{k, G}}\kern10em \hfill \\ {}\hfill {u}_{k, G+1}\kern1em \mathrm{if}\kern0.5em \left|{\mathrm{RMSE}}_{\theta_{k, G}}-{\mathrm{RMSE}}_{\theta_{k, G+1}}\right|<\varepsilon .{\mathrm{RMSE}}_{\theta_{k, G}}\kern1em \mathrm{and}\kern1em \left|{\beta}_{u_{k, G+1}}\right|<\left|{\beta}_{\theta_{k,}}\right|\hfill \\ {}\hfill {\theta}_{k, G}\kern1em else\kern26em \hfill \end{array}\right. $$
(14)

In the self-adaptive DE algorithm utilized herein, the trial vectors are generated by using one of the following four mutation strategies [21]:

Strategy 1:

$$ {\nu}_{i, G}={\theta}_{r_1^i, G}+ F.\left({\theta}_{r_2^i, G}-{\theta}_{r_3^i, G}\right) $$
(15)

Strategy 2:

$$ {\nu}_{i, G}={\theta}_{r_1^i, G}+ F.\left({\theta}_{\mathrm{best}, G}-{\theta}_{r_1^i, G}\right)+ F.\left({\theta}_{r_2^i, G}-{\theta}_{r_3^i, G}\right)+ F.\left({\theta}_{r_4^i, G}-{\theta}_{r_5^i, G}\right) $$
(16)

Strategy 3:

$$ {\nu}_{i, G}={\theta}_{r_1^i, G}+ F.\left({\theta}_{r_2^i, G}-{\theta}_{r_3^i, G}\right)+ F.\left({\theta}_{r_4^i, G}-{\theta}_{r_5^i, G}\right) $$
(17)

Strategy 4:

$$ {\nu}_{i, G}={\theta}_{i, G}+ F.\left({\theta}_{r_1^i, G}-{\theta}_{i, G}\right)+ F.\left({\theta}_{r_2^i, G}-{\theta}_{r_3^i, G}\right) $$
(18)

where\( {r}_k^i \) are integers obtained randomly within the range [1, 2, …, NP] interval. The strategy choice at each generation is accomplished according to a probability procedure Pl,G. The Pl,G is the probability that the lth strategy is selected in the Gth generation. In the developed model, l can be 1, 2, 3, or 4. The Pl,G is updated such that if G is less than or equal P (number of generated vectors in each population), the four considered strategies have equal probabilities and Pl,G = 0.25. Else, if G is bigger than P, then Pl,G is obtained from the following equation:

$$ {P}_{l, G}=\frac{S_{l, G}}{\sum_{l=1}^4{S}_{l, G}} $$
(19)

Where

$$ {S}_{l, G}=\frac{\sum_{g= G-\mathrm{P}}^{G-1}{ns}_{l, g}}{\sum_{g= G-\mathrm{P}}^{G-1}{ns}_{l, g}+{\sum}_{g= G-\mathrm{P}}^{G-1}{nf}_{l, g}}+\varepsilon $$
(20)

where nfl , g is the trial vectors that are entered in the coming generations, nsl , g is the number of trial vectors that are discarded from the coming generations, and ε is a positive constant to prevent the zero improvement rate. The F and CR parameters are chosen for each target vector by selection from the normal distribution function. The generation of the trial vectors for the next generation is accomplished by using the θk , G + 1 equation that is presented before. In the SaELM, the evolution continues until the specified fitness is achieved.

The initialization step is similar in ELM, tELM, and ELMr, but in ELM, tELM, and ELMr, the output weights are calculated by the Moore–Penrose generalized inverse method, backpropagation, and linear regression, respectively (Fig. 2).

Fig. 2
figure 2

Flow chart of a ELM and b SAELM networks

2.2 Procedure for predictive model development

The following steps are followed for the predictive model development using ELM:

  1. 1.

    Non-dimensionalize the input and output variables

  2. 2.

    Specify the number of network features (input variables)

  3. 3.

    Specify the number of neurons in hidden layer (output variables)

  4. 4.

    Define the network parameters as follows: population size, weights, mutation rates, crossover constant, hidden layer neurons biases, and the termination criteria

  5. 5.

    Choose an activation function

  6. 6.

    Initialize the problem by randomly generating the parameters in the hidden node aj and bj for j = 1 ,  …  , J

  7. 7.

    Construct the H(x)

  8. 8.

    Train the network by calculating the output weights using either the Moore–Penrose generalized inverse method (ELM), or backpropagation (tELM) and linear regression (ELMr), or by DE (SaELM)

  9. 9.

    The trained network weights and biases are utilized to generate the ELM model

  10. 10.

    The developed ELM model is scored against selected error indicators. These indicators are the square of the Pearson product moment correlation coefficient (R2), the root mean square error (RMSE), coefficient of efficiency (Esn), and index of agreement (D). The indicators are calculated by the following equations [22]:

$$ {R_i}^2={\left(\frac{\frac{1}{n}\sum_{j=1}^n\left({T}_j-\overset{-}{T}\right)\left({P}_{(ij)}-\overset{-}{P}\right)}{\sqrt{\sum_{j=1}^n{\left({T}_j-\overset{-}{T}\right)}^2/ n}\sqrt{\sum_{j=1}^n{\left({P}_{(ij)}-\overset{-}{P}\right)}^2/ n}}\right)}^2 $$
(21)
$$ \mathrm{RMSE}=\sqrt{E\left[{\left( P- y\right)}^2\right]} $$
(22)

\( {E}_{\mathrm{sn}}=1-\frac{\sum_{i=1}^n\ {\left({\mathrm{T}}_i-{\mathrm{P}}_i\right)}^2}{\ \sum_{i=1}^n{\left({\mathrm{T}}_i-\overset{-}{\mathrm{T}}\right)}^2} \)(23)

$$ D=1-\frac{\sum_{i=1}^n\ {\left({\mathrm{T}}_i-{\mathrm{P}}_i\right)}^2}{\ \sum_{i=1}^n\ {\left(\left|{\mathrm{P}}_i-\overset{-}{\mathrm{T}}\right|+\left|{\mathrm{T}}_i-\overset{-}{\mathrm{T}}\right|\right)}^2} $$
(24)

where \( \overset{-}{P}=1/ n{\sum}_{j=1}^n{P}_j \), P is the predicted value, and y is the observed value.

  1. 11.

    The performance of the developed ELM is validated by the following measures as utilized by Sattar [23] and Sattar and Gharabaghi [24]:

$$ k={\sum}_{i=1}^n\left({T}_i\times {P}_i\right)/{P}_i^2\ or\kern0.5em {k}^{\prime }={\sum}_{i=1}^n\left({T}_i\times {P}_i\right)/{T}_i^2\approx 1 $$
(25)

\( m=\left({R}^2-{R}_{\mathrm{O}}^2\right)/{R}^2 \) and \( n=\left({R}^2-{R}_{\mathrm{O}}^{\prime 2}\right)/{R}^2<0.1 \)(26)

\( {R}_m={R}^2\times \left(1-\sqrt{\left|{R}^2-{R}_O^2\right|}\right) \)> 0.5(27)

where k and k’ are the regression line gradients for observed versus predicted values, m and n are the regression line coefficient of determination, \( {R}_{\mathrm{O}}^2 \) and \( {R}_{\mathrm{O}}^{\prime 2} \) are the predicted and observed values correlation coefficients.

2.3 2.3. Uncertainty analysis of predictions of ELM models

Watermain failures are not a uniform process with constant rate but they are based on various parameters that lead to substantial variations between water distribution networks [25]. Therefore, it is expected that there will be some uncertainties in the predictions of any developed model. The availability of a watermain failure prediction model in addition to the expected uncertainty range of predictions would be a valuable tool for decision makers. Many recent models are reported [2630] to have less uncertainty than other models. This can be accomplished by using the developed ELM models with the Monte Carlo simulation (MCS) method. The MCS is an easy to implement numerical method to determine the uncertainty of a model due to the combination of uncertainty of various inputs. The MCS is capable of handling various probability distribution types of uncertain inputs [23, 31]. For running a stochastic analysis using MCS, thousands of realizations are needed and in each realization, the ELM model is used to predict a single deterministic output. Therefore, there are thousands of outputs which can be used to construct an output distribution and calculate the uncertainty associated with a parameter’s median. The mean absolute deviation (MAD) is calculated as follows:

$$ \mathrm{MAD}=\frac{1}{250000}\sum_{i=1}^{250000}\left|{P}_i- Median(P)\right| $$
(28)

where the number of Monte Carlo realizations is taken 250,000 [1] Afterwards, the predictive model uncertainty can be calculated as [32]:

$$ \mathrm{Uncertainty}\%=\frac{100\times MAD}{Median(P)} $$
(29)

After calculating the prediction uncertainty, the least square linearization technique is used to determine the influence of various parameters on the output (details can be found in [22]). This is achieved by performing regression between the model output and each variable deviation from the mean.

$$ y={w}_1\varDelta {v}_1+{w}_2\varDelta {v}_2+\dots +{w}_i\varDelta {v}_i+ b $$
(30)

where y is the time to the next pipe failure; vi are the pipe attribute inputs; ∆vi = vi − mvi is the difference between vi, the random pipe attribute input i, and the mean value of all specific pipe attribute samples mVi. Initially, random samples of input variables are used as inputs to the model yielding a single output y. This output (time to watermain failure for a particular pipe) is calculated for m Monte Carlo realizations. Using linear regression analysis, the regression coefficients wi are calculated between the watermain time to failure and the input variables. Thus, the influence of each input variable i (\( {S}_{V_i} \)) can be expressed as:

$$ {S}_{V_i}=100\times {w}_i^2{\sigma}_{\varDelta_{V_i}}^2/\sum_{i=1}^n{w}_i^2{\sigma}_{\varDelta_{v_i}}^2 $$
(31)

where \( {\sigma}_{\Delta_{v_i}}^2 \)is the variance of ΔVi, and n is the number of random samples.

3 3. Results and discussions

3.1 Pipe failures in Greater Toronto Area

The Greater Toronto Area has more than 6000 km of drinking water network. The average age of the pipes is 50 years. Seventeen percent of the network is reaching 80 years in age and 6.5% reaching more than 100 years. The data on watermain failure has been continuously recorded by the district of Scarborough, in the eastern part of the Greater Toronto Area. The data covers pipe failures from 1962 to 2005 with multiple breaks of the same pipe documented up to the 10th break for some pipes. The database consists of important data on pipe failures including the location of the failure, pipe length and diameter, and year of construction. The database includes important information regarding pipe coating or cathodic protection and year and the date of successive pipe failures. The pipe network of the district of Scarborough in Greater Toronto has 6342 watermains and has a cumulative length of more than 1000 km and installation began in 1905. The recording of pipe failures started in the year 1962 and contains data on successive breaks in an individual pipe till 10th break. The pipe material is either ductile iron (DI), cast iron (CI), or asbestos cement (AC). Pipe length ranges from 0.50 to 1.6 km and diameter from 30 to 500 mm. There are 3497 pipes that did not fail before, while there are 2845 pipes that have failed at least one time.

The majority of the Scarborough network pipes are cast iron (CI), with almost 60% of the network, and 30% for ductile iron (DI) and 10% for asbestos cement (AC). Therefore, the analysis of failures for cast iron and ductile iron pipes would cover 90% of the network. The statistics of the pipe failures are presented in Table 1.

Table 1 General statistics of pipe failures database

Figure 3 shows the failure rates of watermains made of DI and CI normalized by specific pipes’ length. The normalized pipe failure rate for the CI pipes is relatively higher than that for the DI with an average value of 0.32 for CI compared to 0.14 for DI. A similar value of normalized failure rate of 0.10 for DI has been reported in Canada [33]. While both pipe types experienced an increase in normalized failure rate with age, the gradient was steeper for DI pipes within the first 10 years after installation and remained steady afterwards until 1990. Moreover, a similar trend of decrease in normalized failure rates is observed for both pipe types starting from 1990, reaching to normalized rates of 40 years ago in 1960.

Fig. 3
figure 3

The total number of DI and CI pipe failure rates per kilometer in Scarborough network

The city of Scarborough started implementing cathodic protection (CP) in 1986; this was accompanied by the application of cement mortar lining (CML) the following year. The CML process involves the cleaning of the rust from the inside of a pipe and applying a cement coating layer on the internal pipe surface. On the other hand, the CP attaches zinc anodes to the metallic surface of the pipes. The number of watermain failures started to decrease starting from 1990 after implementing these protection methods as shown in Fig. 4. The findings also show that these protection techniques are more effective in decreasing the DI pipe failure as compared to CI pipes with 80 and 60%, respectively.

Fig. 4
figure 4

Impact of CML and CP on the total number of watermain failures each year in Scarborough for a DI pipes and b CI pipes

Figure 5 shows the number of watermain failures per kilometer for DI and CI pipes versus the number of multiple pipe breaks for each pipe. The first pipe break is denoted by B1 and the second break by B2 and so forth. It is observed that the circumferential failure is the main failure type for CI pipes, while the hole failure is the main in DI pipes constituting more than 90% of the failure types in the network. The CI pipes tend to show higher failure rates in winter months, January and February [29, 34]. This is due to the external applied circumferential pressure exerted on the pipe circumference under the effect of frozen ground, where pipes tend to break more easily. Unlike the nonhomogeneous CI pipes, the DI pipes can resist externally applied pressure and thus experience fewer circumferential failures. DI pipes tend to break in localized areas caused by corrosion pitting and weakening the pipe material [34].

Fig. 5
figure 5

Number of pipe failures per kilometer for main failure types in Scarborough network

Considering the average age of various pipes when they first failed (Fig. 6), it can be seen that the DI has a lower age at first failure than that of the CI. The DI pipe average age of 16 years was recorded versus 22 years for CI pipe. Folkman [33] and Rajani et al. [34] reported similar findings in other networks in the Greater Toronto. This finding is due to the nature of the soil around the pipes that triggers the pipe corrosion that mainly affects DI pipes leading to pitting and hole failures. This average age at the first recorded failure implies that the soil has moderate corrosion [34]. Figure 6 also shows that the average time between subsequent failures for CI is 2.5 years, which is relatively larger than that of 1 year for DI. This indicates that, when a DI breaks for the first time, the frequency of subsequent breaks per year exceeds that of a CI in the same network.

Fig. 6
figure 6

Scarborough watermain average age at first and subsequent breakages

3.2 Development of new predictive equation

For pipe failure rate prediction, the objective is to construct an intelligent model employing the ELM algorithm that can perform better than available prediction models. The instances of watermain failures have been collected using pipes installed from 1946 up to 2005. Pipe failures are collected from the recorded dataset as per Harvey et al. [3537] and Sattar et al. [1]. A total of 9508 watermain failures have been collected including all pipes that have failed at least one time during the period where data were collected. The watermain pipes types with their attributes are presented in Table 2.

Table 2 Pipe-specific attributes used in ELM model development

According to Sattar et al. [1], the watermain failure is a function of the following variables:

$$ \mathrm{Watermain}\ \mathrm{time}\ \mathrm{to}\ \mathrm{failure}= f\left( L, D,{N}_B, CML, CP\right) $$
(32)

where L is the pipe length, D is the pipe diameter, NB is the number of previous pipe breaks, CML is the cement lining protection, and CP is the cathodic protection. These are considered the input variables to the ELM network as shown in Fig. 7. The pipe failure recorded dataset has been split into training and test sets. Of the 9508 pipe break instance, 7131 (75%) were used to train the ELM network and 2377 (25%) were used to test and validate the developed model. Fourfold cross validation was utilized to validate the developed ELM.

Fig. 7
figure 7

Formulation of ELM network for prediction of watermain failure timing

3.3 Finding optimal ELM parameters

The optimum ELM network parameters can be grouped in the hidden layer neurons and the transfer function. Choice of such parameters is based on the user experience and the statistical performance of the developed model. The increase in the number of neurons increases the complexity of the developed model and, many times, it is at the expense of accuracy. The ELM models for failure time of AC and DI pipes were found to give the best results with a number of neurons less than 20, while it required 50 neurons to produce the best results for DI. Regarding the transfer function, the hard limit function gave the least accurate ELM models, while the triangular basis function gave the best results. Other transfer functions such as the sin, the sigmoid, and the radial basis gave comparable or better results than the hard limit function. The chosen ELM model for predicting failure time of AC pipes had five neurons and based on the radial basis transfer function. However, the ELM models for predicting failure time for CI and DI pipes were based on the triangular basis transfer function with 50 and 20 neurons, respectively.

The developed ELM models performed well (Table 3), with R2 of 0.46, 0.43, and 0.64 for AC, CI, and DI, respectively in training, and 0.43, 0.41, and 0.63 in testing. Testing R2 and RMSE were based on fourfold cross validation where the ELM model is validated on the testing dataset (25% of total data) and then validated on the other three equal sets (each 25%). The RMSE associated with ELM models range from 0.09 to 0.17, which are low and similar for both training and testing datasets. The R2 and RMSE for training and testing cases are low and also similar for training and testing. These values indicate that the ELM model has an acceptable predictive performance. Other variants of the ELM have also been used. Accuracies obtained by SaELM, tELM, and ELMr are as shown in the same table. It is observed that the ELM scored the highest R2 values and lowest RMSEs. For AC pipes, the tELM had the closest score to the ELM model, while the SaELM had close values to the ELM model in the case of CI pipe. In the case of DI pipe, tELM had higher R2 than the chosen ELM model. On the other hand, the Esn and D showed very good values for the ELM models compared to ANN and other methods with values of 0.44 (Esn) and 0.82 (D) for AC, DI, and CI pipes.

Table 3 Statistics of developed ELM and ELM variants on training and testing datasets

Further testing and validation for the developed ELM models has been performed with results presented in Table 4. While the ELM model is considered a good one if it satisfied, one or more of the required validation conditions, it is observed that the developed ELM models for all pipe types satisfied all of the proposed tests confirming they have good prediction ability.

Table 4 External validation for developed ELM models

Consider now the performance of the developed ELM models in comparison with other machine learning methods, namely artificial neural networks (ANN), support vector machines (SVMs), and non-linear regression (NNR), as presented in Table 5. The developed ELM models show better performance than other machine learning models applied on the same dataset in terms of not only R2 and RMSE but also, process time. All tests were completed in MATLAB with an Intel Core i7-2600 CPU, 3.4 GHz, 4 GB RAM, PC. SVM analysis was performed using the toolbox PrTools (www.prtools.org).

Table 5 Statistics of developed ELM model versus some popular machine learning machines

3.4 Sensitivity analysis

Further analysis is performed to test the sensitivity of various input parameters to the pipe failure time prediction ELM model. These inputs are the L, D, CP, CML, P, and NB. These input parameters are fitted to probability distribution and unreal variables have been removed by truncating distributions. The truncated distribution limits have been constructed from the current dataset values. Various distributions for input variables are ranked based on the Anderson Darling and chi-squared tests [31]. This resulted in using the exponential distribution for modeling L and D, and the Poisson distribution for CP and CML. Following the calculation of various realizations for time to failure, the multiple regression analysis is used to construct the following equation:

$$ {T}_f={w}_1\varDelta L+{w}_2\varDelta D+{w}_3\varDelta CP+{w}_4\varDelta CML+{w}_5\varDelta {N}_B+ b $$
(33)

Results showed that using the developed ELM models, the predictions of time to next failure shows a MAD of 3, which is 36% of the median value. This is an acceptable uncertainty in model predictions according to Verbeeck et al. [38] and Sattar [23], with values up to 40% accepted.

Using the least square linearization to determine the parameter sensitivity showed the various input parameters importance (Table 6). The highest influential parameter on the ELM model prediction is shown to the number of previous pipe breaks. This agrees with what has been reported by Goulter and Kazemi [39], Asnaashari et al. [4042], and Sattar et al. [1], where availability of previous failures increased pipe failure rates over time. Pipe diameter came second to NB with more influence on time to pipe failure than pipe length. Protection methods had less effect on the output uncertainty. This is confirmed by results that also show that the CP protective effect is more pronounced than that for CML for different pipe types. These outputs are generally in agreement with Harvey et al. [36] and Sattar et al. [1].

Table 6 Importance of various pipe parameters as predicted by ELM model

3.5 Parametric analysis of developed ELM model

This section presents the parametric analysis for the developed ELM model. This parametric analysis helps to determine the behavior of the model and influence of various input parameters, mainly pipe diameter, length, and previous failures, on the predicted time to failure. Figure 8 shows the time to pipeline failure as predicted by the ELM model with the ratio of diameter and length for the three types of pipes, CI, DI, and AC. The predicted time to failure of a pipe is observed to decrease with longer pipes than for shorter ones for three types of pipes. This is due to the fact that longer pipes are subject to various possible external conditions that can affect their integrity such as traffic loads [43]. The same finding has been reported by Lei [44], Wang et al. [45], and [1, 4648]. Predicted time-to-failure is higher for cast iron pipes than ductile iron and asbestos cement pipes. This is attributed to the non-homogeneity of CI pipe material, the same fact that makes this type of pipes prone to failures, unlike DI and AC. Furthermore, ELM model predictions showed that the time to next failure is directly proportional to the pipe diameter. This has been confirmed by Rostum [43] who related the larger time to failure with higher pipe diameter. He attributed this to reduced pipe strength and less reliable joints of smaller diameter pipes.

Fig. 8
figure 8

Tf versus pipe length and pipe diameter for developed ELM models

Predictions of the ELM are consistent with the historical trends observed in the city dataset. A significant increase in the next time of failure of a pipeline is predicted with the application of one or two types of pipe protection for three types of pipes. The CP protection is shown to be more effective at increasing the time of pipes before the next failure than the CML protection. This is true for pipe types DI and CI where the CP protection increased the time to next failure by more than 15% in case of CML protection. However, this is not true for the AC pipes where the application of CP showed the same effect as CML. In all pipe types, the effects of CP and CML protection is additive leading to an increase in the time to next failure. This specific behavior is related to the studied network since various impacts of CP and CML protection have been reported for other networks under different conditions of soil corrosiveness, temperature, and installation methods that have impacts on the coatings [34].

4 Conclusions

In this study, the extreme learning machine method has been used on more than 9500 pipe failure instances in the city of Scarborough, Canada to develop a new model that can predict the time to next watermain failure. The developed ELM model indicated results with coefficient of determination ranging from 0.67 to 0.82. This was achieved with a maximum of 50 neurons in the network hidden layer and the triangular basis function. Other variants of the ELM modeling methods were attempted namely tELM, ELMr, and SaELM. Error results showed the superiority of the ELM models over its variants on the study case. The ELM model has the advantage of including the type of protection of pipes and incorporates its influence on the predicted results in addition to pipe diameter, length, and previous failures. The number of previous pipe breaks is shown to be the most influential input parameter to the ELM predictions followed by pipeline diameter. Moreover, the CP pipe protection was found to be more effective in protecting pipes and decreasing their failure rate. The ELM model can be used as a tool to help decisions on optimum pipe inspection and maintenance schedule to proactively control the rising maintenance cost of the aging infrastructure and also improve the reliability and safety of the essential service to the public.