Keywords

1 Introduction

A road pavement deteriorates under the combined action of traffic loading and environment, thus reducing the quality of the ride (Madanat et al. 1997). Models should be able to quantify the contribution of variables such as strength of pavement materials, traffic, and environmental conditions that are relevant to pavement deterioration (Ortiz-Garcia et al. 2006). PMS are commonly used to select maintenance strategies that result in lower project life cycle costs (Premkumar and Vavrik Haas 1994).

Modeling the performance of pavements is an important activity in pavement management, and many highway agencies have developed a variety of pavement performance models for use in their pavement management activities (Lethanh et al. 2014). This paper presents a methodology to develop new models for the various pavement families in the Libyan road network in order to predict the condition of a given area of pavement. The predicted future condition of the pavements is used in estimating its remaining service life to failure, which will consequently be used to help find the best ways to intervene in the maintenance and rehabilitation activities for a given area of the network (Li et al. 1997).

There are mainly two basic kinds of performance models: deterministic and probabilistic (Kobayashi et al. 2012). The deterministic models predict a single number for the life of a pavement or its level of distress or any other measure of its condition. In contrast, the probabilistic models predict a distribution of such events. There are many deterministic models some of them are Mechanistic, Empirical-Mechanistic, Polynomial Constrained Least Squares, and S-Shaped Curve models (Hong et al. 2013). In general, probabilistic models include Bayesian and Markov process models. Bayesian modeling assigns a prior probability distribution to pavement condition based on experience; it then mixes it with the experimentation and data collection to predict the future condition (Hong and Prozzi 2006). Furthermore, Markov models can be used when pavement data is a sequence of conditions in which the probability of each condition depends only on the state attained in the previous condition (Prozzi and Madanat 2003; Li 2005).

1.1 Bayesian Model

The principle of Bayesian statistics lies in combining prior probabilities and likelihood with experimental outcomes to determine a post-experimental or posterior probability as shown in Fig. 1 (Pandis 2015a). The posterior distribution expresses what is known about a set of parameters based on both the sample data and prior knowledge (Han et al. 2014). In frequents statistics, it is often necessary to rely on large sample approximations by assuming asymptomatic normality. In contrast, Bayesian inferences can be computed exactly, even in highly complex situations (Jongsawat and Premchaiswadi 2010). This paper gives an account of basic uses of Bayes theorem and of the role and construction of prior densities. This follows by inference, dealing with analogues of confidence intervals, tests, approaches to model criticism, and model uncertainty (Gongdon 2003). Using the probability density function, Bayes model can be expressed as follows:

Fig. 1.
figure 1

Bayes General Concept

$$ P\left( {\theta |X} \right) = \frac{{P\left( {X |\theta } \right)P\left( \theta \right)}}{P\left( X \right)} = \frac{{P\left( {X |\theta } \right)P\left( \theta \right)}}{{\int P\left( {X |\theta } \right)P\left( \theta \right)d\theta }} $$
(1)

A fundamental feature of the Bayesian approach to statistics is the use of prior information in addition to the sample data. A proper Bayesian analysis will always incorporate prior information, which will help to strengthen inferences about the true value of the parameter and ensure that any relevant information about it is not wasted (Nagaraja 2006).

1.1.1 Prior Knowledge \( P\left( \theta \right) \)

A fundamental feature of the Bayesian approach to statistics is the use of prior information in addition to the sample data. A proper Bayesian analysis will always incorporate prior information, which will help to strengthen inferences about the true value of the parameter and ensure that any relevant information about it is not wasted (Lunn et al. 2000).

1.1.2 Maximum Likelihood Estimation (MLE) \( P\left( {X |\theta } \right) \)

The maximum likelihood estimation (MLE) approach is one of the most important statistical methodologies for parameter estimation (Clark 2015). It is based on the fundamental assumption that the underlying probability distribution of the observations belongs to a family of distributions indexed by unknown parameters (Schwartz et al. 2013). The MLE estimator of the unknown parameters maximizes the likelihood function that corresponds to the probability distribution in the family that gives the observations the highest chance of occurrence. The MLE method starts from the joint probability distribution of the measured values \( x_{1} ,x_{2} , \ldots ,x_{n} \). For independent measurements, this is given by the product of the individual densities \( p(x|\theta ) \), as in Eq. 2.

$$ P\left( {X |\theta } \right) = p\left( {x_{1} |\theta } \right)p\left( {x_{2} |\theta } \right) \ldots .p\left( {x_{n} |\theta } \right) = \prod\nolimits_{n}^{i = 1} {} p(x_{i} |\theta ) $$
(2)

1.1.3 Posterior Distribution \( P(\theta |X) \)

Posterior expresses what is known about a set of parameters based on both the sample data and prior information. Bayes theorem works as a mechanism for generating a posterior of any parameter and thereby mixes the prior knowledge with the likelihood. The first iteration production of the prior knowledge and the MLE will then be divided by \( P\left( X \right) \), a normalizing factor, to normalize the distribution. When the posterior distribution \( P\left( {\theta |X} \right) \) is in the same family as the prior probability distribution \( P\left( \theta \right) \), the prior and posterior are then called conjugate distributions. Non-conjugate prior distributions can make interpretations of posterior inferences more difficult (Han et al. 2014).

1.2 The International Roughness Index (IRI)

The International Roughness Index (IRI) is an international standard for measuring road roughness longitudinally. The index measures pavement roughness in the wheel paths in terms of the number of rough meters per kilometer. The most common method uses a laser that is mounted on a specialized van. The laser is trained on the road surface, like a laser pointer. As the van drives along a road, the beam jumps unexpectedly at rough patches, just as a laser pointer; these jumps are measured and used for analysis (Mašović and Hajdin 2013). The lower the IRI number at a given speed, the smoother the ride felt by the road user. Moreover, this roughness statistic is suitable for any road surface type and covers all levels of roughness (Kobayashi et al. 2012). IRI can be treated as a random variable and therefore it can be described as a probability distribution (Shahin 2005). The main advantages of the IRI are that it is stable over time and transferable throughout the world. IRI can also be used as a measure of pavement conditions and the data can be easily shared between researchers. It can also be directly related to vehicle operating costs (Shahin 2005).

2 Methodology

The technique that will be used to estimate the road network deterioration is a variant of Bayesian Expert-Based probability matrices of deterioration. This technique can then be applied to road classes in Libya. This method depends on combining observed data with expert experience, using Bayesian linear regression techniques. The Bayesian prediction approach is the process of analyzing statistical models by using prior knowledge and observations as shown in Eq. 1 (Amador-Jimenez and Mrawira 2012). Bayesian linear regression adds more accuracy to the estimation of the parameters according to the International Roughness Index (IRI); this is because it covers the whole range of inferential solutions, rather than a point estimate and a confidence interval, as in classical regression (Davison 2008). The research methodology consists of three major steps. These are: interviewing experts in order to set up the prior distribution; inspecting road networks to estimate MLE; computing the posterior and predictive distributions for the IRI as can be seen in Fig. 2.

Fig. 2.
figure 2

Research methodology steps

Fig. 3.
figure 3

The differences between linear regression and Bayesian linear regression

2.1 Pavement Families Classification

Libya’s most prominent natural features are the Mediterranean coast, the Sahara Desert and several highlands. As a result, the soil conditions were divided into three categories (low, medium, high). Moreover, the road network is exposed to two climate zones which are the hot-summer Mediterranean climate and the hot desert climate consequently; the network was categorized into north and south zones. Therefore, in this research, 3 loading levels, 3 soil conditions and 2 climate zones interact with each other and result in 18 pavement families. These pavement families are then used to develop Bayesian linear regression prediction models for each family, as shown in Table 1.

Table 1. Road network will be divided into Zones (North and south) interacting with traffic loads and soil conditions during a sequence of years.

Since loading and soil conditions are the most important factors that damage most pavement sections, they are often used as independent variables in developing equations that predict conditions. In many cases, they are combined with age as an independent variable. In most circumstances, agencies want to know in how many years a pavement will need intervention (Lethanh et al. 2014). Therefore, in some models, loads and types of soil are used as factors that affect the rate of deterioration of a road surface; in these cases they are both considered as independent variables. Road sections are selected using a random stratified sampling technique to avoid any biased estimations.

Soil strength is measured by a penetration test in accordance with the California Bearing Ratio test (CBR) which evaluates the subgrade strength of roads. Traffic loads are categorized as follows:

  • Low: <50 vehicles/day

  • Medium: 50–500 vehicles/day

  • High: 500–2000 vehicles/day

In general, these models are network-level deterioration models and not project-level deterioration models because the characteristics and the properties of the materials are not presently available in Libya.

2.2 Interviewing Experts (Prior) and Pavement Condition Inspections

The Bayesian statistical approach combines prior knowledge (experience) with field data. In highway engineering, new models are continually needed to better predict pavement performance or to run various PMS; however, it takes much time and expense to gather data about pavement performance. In such situations, the Bayesian approach is useful in short circuiting the data collection cycle. After gathering some data, which may not be sufficient to support meaningful classical regression, one can collect some expert judgment and combine the two sources of information into a relatively robust regression model. The expert judgment serves to bridge the gaps in field data.

It is obvious that a lot of valuable information can be obtained from people who have observed pavement performance throughout their careers. This professional and field staffs know what variables are contributing to pavement performance. They understand the functional relations of the variables. Their impressions on these relationships can be encoded and when combined with field data, these impressions can have profound impacts on the resulting posterior models. That is why; initial data has been collected by interviewing Libyan experts who have worked for many years on the development of the Libyan road network. Six engineers were interviewed using a standardized, open-ended interview technique; it is very structured and include a set protocol of questions and probes (Pandis 2015b).

An analysis of variance was done before combining the experts’ knowledge; this ensures that all experts’ priors are consistent. From Fig. 4 and Table 3, there is no significant evidence to show that there is a difference in group means. As a result, the experts’ opinions were considerably compatible; this means that all of the experts’ knowledge about the roughness progression was close to each other. After that, an inspection of representative road sections from each the 18 families was conducted in Libya. The road deterioration was measured by the IRI on a subjective basis. Table 1 provides a summary of the characteristic of 18 families and codification of 18 associated databases. A sample of experts’ interviews represents pavement families 1 and 10 is shown in Table 2.

Fig. 4.
figure 4

ANOVA Experts’ opinions comparison

Table 2. Two pavement families from north and south zones

2.3 IRI Estimations

Roads deteriorate and their IRI drop gradually over time. This relationship can be represented using linear regression but, practically, road sections having the same zone, age, load, and soil strength conditions could still have a different rate of deterioration. Therefore, Bayesian linear regression is the appropriate technique wherein basic Bayesian philosophy is applied. This is because the Bayesian regression is a probabilistic approach that accounts for variability (refer to Fig. 3). As such, in Bayesian inference, MLE is considered to be point estimation. However, in Bayesian linear regression, productive probability around each inference of the IRI is probabilistically investigated (Amador-Jimenez and Mrawira 2012).

The research data required for IRI estimations has been divided into two main categories. The first category was extracted from the interviews with the experts. The second category is the MLE data; this data has been collected using the IRI as part of the road section inspections process, and has been done to measure the road deterioration. The MLE data is extracted and summarized as pairs of \( \left( {t_{i} ,IRI_{j} } \right) \) where IRI represents the road section condition and t indicates the time.

$$ Data = \left( {\left( {t_{1} ,IRI_{1} } \right), \ldots \ldots ,\left( {t_{n} ,IRI_{n} } \right)} \right) , 0 \le t_{i} \le 20 , 0 \le IRI_{j} \le 14 $$
(3)

Therefore, the \( IRI_{j } \) is a model to be conditionally independent given the w vector which will be the prior distribution.

$$ IRI_{j} \,{ \sim }\,N\left( {w^{T} t_{i} , a^{ - 1} } \right), a > 0 $$
(4)
$$ w\,{ \sim }\,N\left( {0,b^{ - 1} I} \right), b > 0 , w = \left( {w_{1} , \ldots \ldots ,w_{d} } \right) $$
(5)

Where \( a = \frac{1}{{\sigma^{2} }} \) is the precision factor, b is the covariance matrix; a and b are known, and w is a parameter vector with a Gaussian multivariate density.

2.3.1 The Posterior Distribution

The next step is to compute the posterior distribution on w given data. The \( t_{i } \) will be written as \( \varphi \left( {t_{i} } \right) = \left( {\varphi_{1} \left( {t_{1} } \right), \ldots .,\varphi_{n} \left( {t_{1} } \right)} \right) \) in order to be able to model the nonlinearities of \( t_{i } \). To compute the posterior, we need to calculate the MLE and then the predictive distribution.

2.3.2 Maximum Likelihood Estimation (MLE)

Given data:

$$ D = \left( {IRI_{1} , \ldots .,IRI_{n} } \right), IRI_{i} \in \left( {0,14} \right) $$
(6)

D represents a sample from the IRI statistical population that has been collected from road section inspections. Then, the MLE is computed using the following formula:

$$ P(D|w) \propto { \exp }\left( { - \frac{a}{2}\left( {IRI - Aw} \right)^{T} \left( {IRI - Aw} \right)} \right) $$
(7)

Where A is the design matrix and IRI is a value that we are going to predict, in a column vector form.

$$ A = \left( {\begin{array}{*{20}c} - & {t_{1}^{T} } & - \\ \vdots & \vdots & \vdots \\ - & {t_{n}^{T} } & - \\ \end{array} } \right) , IRI = \left( {IRI_{1} , \ldots ..,IRI_{n} } \right)^{T} $$
(8)

2.3.3 Posterior

From the classical Bayesian definition, the posterior is proportional with the prior

$$ P\left( {w |D} \right) \propto P\left( {D |w} \right)P\left( w \right) $$
(9)

After that, we replace the MLE expression in the posterior; this is shown as:

$$ P(w|D) \propto { \exp }\left( { - \frac{a}{2}\left( {IRI - Aw} \right)^{T} \left( {IRI - Aw} \right) - \frac{b}{2}w^{T} w} \right) $$
(10)

With a little calculus we can express w in the form of a Gaussian distribution and call it a precision matrix:

$$ P\left( {w |D} \right) = N(w|\mu ,A^{ - 1} ) \,Where\,\mu = a\Lambda ^{ - 1} A^{T} iri\,;\,\Lambda = aA^{T} A + bI $$
(11)

That shows us the Maximum Posterior (MAP) and MLE estimations of w, which are:

$$ w_{MAP} = \left( {A^{T} A + \frac{b}{a}I} \right)^{ - 1} A^{T} iri $$
(12)
$$ w_{MLE} = \left( {A^{T} A} \right)^{ - 1} A^{T} iri $$
(13)

2.3.4 Predictive Distribution

The predictive distribution is the conditional distribution of unobserved observations (prediction) given the collected data (Hong and Prozzi 2006). Our unobserved observation is the expert interview data; and the collected data is the data collected from road condition inspections, which can be expressed, in the following format:

$$ P\left( {iri |t,D} \right) = \int P\left( {iri |t,w} \right)\left( {w |t,D} \right)dw $$
(14)
$$ = \int N\left( {iri |w^{T} t,a^{ - 1} } \right)N\left( {w |\mu ,A^{ - 1} } \right)dw $$
(15)
$$ \propto \int exp( - \frac{a}{2}\left( {iri - w^{T} t} \right)^{2} exp( - \frac{1}{2}\left( {w - \mu } \right)^{T}\Lambda \left( {w - \mu } \right)dw $$
(16)

This formula is then factored and put in a quadratic form as a function of w in a formula similar to the following general expression: \( \int N\left( {w |\ldots \ldots } \right)g\left( {iri} \right)dw \) and then, since \( g\left( {iri} \right) \) is not dependent on w, it comes out of the integral and \( \int N\left( {w |\ldots \ldots } \right)dw \) integrates to 1.

After several algebraic steps, finalization of the predictive distribution is:

$$ P\left( {iri |t,D} \right) = N\left( {iri |u,\frac{1}{\lambda }} \right)whereu = \mu^{T} t\quad and\quad \frac{1}{\lambda } = \frac{1}{a} + t^{T}\Lambda ^{ - 1} t $$
(17)

Finally, using mathematical expectation and Eq. (16) in all road section families, IRI will be estimated depending on:

  • iri is the expert interview data,

  • t is the time corresponding with road conditions,

  • D is the data collected from road inspections.

3 Case Study (First Pavement Family)

To illustrate the effectiveness of this model, two data sources will be organized in a database file in order to be easily imported when the model is run. The first data source is the data extracted from interviewing Libyan experts. The interview data was used to formulate the prior probability distribution and some results of this interview are shown in Table 2. The second data source is from the selected road section inspection data (IRI). Table 1 shows two pavement families which have been chosen from each geographic zone (north & south). In this study, the model is applied only to the first pavement family, shown in Table 1.

3.1 Bayesian Regression Analysis

This section consists of all required steps to apply the Bayes regression on the collected data. The model has 1000 iterations using a combined prior to present all expert knowledge encoded in one model for each pavement family. A combined prior was selected to develop a single model for each pavement family. If each expert prior were analyzed separately, six separate posterior models would have to have been developed for each pavement family.

An analysis of variance was done before combining the experts’ knowledge; this ensures that all experts’ priors are consistent. From Fig. 4 and Table 3, there is no significant evidence to show that there is a difference in group means. As a result, the experts’ opinions were considerably compatible; this means that all of the experts’ knowledge about the roughness progression was close to each other.

Table 3. ANOVA results to investigate the differences between the means of experts’ opinions

WinBUGS was chosen as a programming platform; this is a free software available from the Biostatistics Unit of the Medical Research Council in the UK (Medical Research Council 2016). The WinBUGS program consists of three parts, all of which can be placed into a single file or as three separate files. The first part is the main program that is a string of computer code that lets WinBUGS know what the prior and likelihood of the model is. The second part is the data set that can be entered using matrixes in the same program or can be called from a file. The last part is the initial values that are used to start the algorithm. To estimate the parameters in Bayesian analysis, the prior distribution is multiplied by the likelihood; samples are then taken from the posterior distributions via an iterative Markov Chain Monte Carlo (MCMC) algorithm (Davison 2008).

3.2 Model Results (First Pavement Family)

The model combines data taken from the road condition inspections in accordance with a pre-established protocol and prior knowledge of the six experts who participated in the interviews. Once the model, the data, and the initial values have been specified, the program will be ready to be compiled and to run the MCMC algorithm. WinBUGS offers a Sample Monitor Tool panel which consists of a number of task icons as shown in Fig. 5. One of these tools is the stats tool; this gives a zoomed-out view of the entire posterior summary for the Bayes linear regression parameters, as illustrates in Table 4.

Fig. 5.
figure 5

Sample Monitor Tool

Table 4. Posterior summary for the Bayes linear regression parameters

As a result of the Bayesian analysis, IRI predictive posterior models for the first pavement family were developed. The models have one independent variable and the predictive posterior equation is, as shown in Eq. (18), where α, β are the estimated parameters and t represents time in years.

$$ IRI = \alpha + \beta t_{i} $$
(18)

Figure 6 shows the MCMC behaviour, where the chain appears to be moving around readily. This behaviour is called the dynamic trace because it will be continuously refreshed in real time if the model is updated. The probability distribution densities of the parameters are displayed and summarized, as shown in Fig. 7. The basic analysis of the MCMC output is obtained by checking the convergence of the chain (history) and the autocorrelation (auto cor), as show in Figs. 10 and 8 respectively. Figure 9 illustrates the moving averages of the mean and the 95% credibility interval; all parameters appear stable over the course of the run. The parameters posterior estimation and the 95% parameters credibility intervals are summarized in Table 5. Table 6 shows a comparison between the actual IRI values and the IRI estimations which have been predicted by the model.

Fig. 6.
figure 6

Dynamic trace for the parameter outputs

Fig. 7.
figure 7

Posterior densities for the model parameters

Fig. 8.
figure 8

Parameters autocorrelation functions

Fig. 9.
figure 9

Parameters running quantiles

Fig. 10.
figure 10

Parameters trace history output

Table 5. Model result summary
Table 6. A comparison between actual IRI values and the model outputs

4 Conclusions

This paper has demonstrated how Bayesian linear regression modeling provides a more reliable framework for anticipation when historical data is not available. The linear regression model is undertaken within the boundaries of the Bayes inference approach, in order to investigate the parameters estimation errors probabilistically.

The paper consists of three major steps: interviews with experts to establish the prior distribution of the model; measuring the current road roughness using the IRI on a selected road sample; producing the posterior distribution followed by the predictive distribution. The result is a Bayesian linear regression with two parameters \( \left( {\alpha ,\beta } \right) \) where the model is expressed as \( IRI = \alpha + \beta t_{i} \). Moreover, credibility intervals were accompanied with the parameter estimations; this increased the reliability of the domain estimation for the posterior probability distribution, as shown in Table 5.

This technique is highly recommended when developing a model to estimate pavement performance in the absence of historical data. Moreover, this method is not exclusive to the Libyan road network, but is applicable in any road network when the circumstances are similar especially in developing countries. The main disadvantage of this method is that, because it was developed for cases where there was no historical data, it does not have a mechanism to incorporate it. This required the researchers to adopt the approach that used interviews with experts instead; this was difficult and required a lot of consideration as to the type of questions necessary to extract the needed data for this research. Additionally, because of the lack of archived data, the authors have divided the road network into 18 families based on geographical location, traffic load and soil strength; this means that 18 models were developed, which was sometime unwieldy and oftentimes a lot of extra work.