Introduction

Land transport is the most important way of transportation because of coverage area. At the same time, managing the road network is turning out to be progressively challenging as demands increase and resources are limited. Emerging countries like India have highly heterogeneous traffic comprising of different vehicles of diverse operational features, which frequently leads to chaotic traffic growth and overcrowdings of traffic flow. To reduce these kind of problems, and to support appropriate traffic management Highway Capacity Manual (HCM), and several other handbooks were suggested level-of-service (LOS) analysis procedures to evaluate road and traffic condition, to recognize the necessities and allocate funds for future implementation. These guidelines help in quantitative estimation of service quality using measure of effectiveness like speed, delay, number of stops per mile and presence of left turn lane to assess the performance of transportation infrastructure and to make investment decisions [5, 11]. However, the LOS criteria in the current version of those guidelines are grounded on the basis of perception survey in which overall satisfaction (OS) of road user’s for provided road facilities are taken in to consideration. But the road users’ opinion about constituting transportation facilities (such as pavement condition, geometrical features, signs and marking, cleanliness and aesthetics etc.) was neglected. A proposed new LOS criteria based on user perceptions regarding individual contributing aspects of transportation facility would be more credible than those based on OS for any mode of transport. In contrast, majority of the general population utilizes both private and public modes of transport as per their necessity. Design and construction of infrastructure for the sustenance of one mode may adversely affect the operational enactment of alternative modes. Researchers from developed countries like USA have contributed significantly for the development of methodology to assess LOS using perception data. But in India, researchers have contributed very little for such research. This research has taken into considerations of all the qualitative measures of road transport, inconsistency as well as complications of human perception from a multimodal perspective. Hence the objective of this study is to derive a suitable method which shows the combined effect of several attributes of a transportation facility (urban street) to estimate road user’s satisfaction level from a multimodal perspective in developing countries.

To evaluate the road users’ satisfaction level, this paper is structured into several parts. The first part consists of investigating prominent factors of the transportation system that affects the satisfaction level of the road users on urban street segment. Then an innovative questionnaire having two main sections was developed. In the first section demographic information of the participants were included, so that the diversity in respondent’s opinion according to their age group, gender, educational level, income level etc. could be confirmed. The next part consists of preparing a questionnaire including 33 questions related to the investigated factors affecting road user’s comfort level. The participants were requested to indicate the degree of satisfaction for different attributes and the OS for the respective segment on a scale starting from 1 to 7. These surveys either seize traveller’s mid-trip by orally interviewing them or by giving them a questionnaire to rate the attributes at a convenient time after finishing their trip.

Review of Literature

The literature review focuses on issues related to assessment of factors affecting user’s opinions about service quality provided by urban road infrastructures, questionnaire design, and model development addressed in some of the previous studies. Ibrahim examined car owners’ as well as non-car owners’ remarks towards various modes of transport for shopping purposes [6]. Both subjective and quantitative parameters were considered in this estimation. The consequences from the subjective investigation found that shoppers’ judgments on various modes of transport for the purpose of shopping are influenced by travelling aspects and financial condition of the person. In that survey the shoppers were requested to rate various modes of travel for shopping taking into account a few variables. The authors concluded that every mode has its particular attributes. Lee et al. displayed another LOS principle for signalized crosswalks in business/market regions at bi-directional flows of pedestrians [8]. An arrangement of five photos were displayed to each participant for a particular flow ratio. All the information gathered from that survey have been utilized to decide the different congestion bounds for various bi-directional flow. This study proved that comfort level of pedestrians is adversely affected by the bi-directional stream flows. Araujo and Braga assessed the crossings pattern of pedestrians at different road junctions [1]. Some specialized authorities have been taken part to select the performance measures i.e. comfort, safety and particular attributes of system continuity. Pedestrians were requested to rate the comfort level as per their perception. Paired Comparison and Constant Sum methods were used to evaluate the perceptions of the participants. Rahaman et al. considered the pedestrian and shopkeepers’ perceptions in judging the walking environment in a medium-sized city center in Portugal [14]. The analysis was carried out by applying the Analytical Hierarchy Process. To address the needs from a pedestrian and shopkeeper perspective the survey comprised of investigating five criteria i.e. Identity, Connectivity, Hindrances, Illegal inhabitance and Safety. This study revealed that both shopkeepers and pedestrians were using the sidewalk according to their necessities.

Petritsch et al. presented a pedestrian LOS model for urban arterial with sidewalks using the stepwise regression technique [12]. Density of conflict points along the facility and traffic flow on the adjacent roadways were considered as primary factors in this model. Around 500 members were requested to rate the facility related to the requirements from a pedestrian point of view. Papadimitriou et al. analysed the highway LOS with respect to drivers’ individual characteristics and various conditions of traffic [10]. Drivers’ characteristics include age, gender, driving knowledge, road familiarity whereas traffic conditions include capacity of vehicles and v/c ratio. Perception survey was carried out taking 264 participants to rate traffic conditions in a 10-point scale. A piecewise linear regression technique was used to develop a relationship among perceived LOS and traffic condition. Joewono and Kubota presented a survey to enhance driving quality about the prevailing paratransit system [7]. The authors have gathered around 980 users’ perceived ratings with respect to level of satisfaction, service quality and loyalty while consuming the paratransit network. Eight factors have been extracted using factor analysis with 35 attributes. Musicant concentrated on measuring the abnormal behaviour, safety attitudes and safety climate perceptions of company car drivers [9]. The authors gathered car drivers’ perception by arranging a 34-item perception survey. Six factors have been extracted using factor analysis on the gathered information. K-means clustering method was applied to subgroups the output in to three classes. The outcomes demonstrate that the qualities of the distinctive subclasses of car drivers will be helpful to understand the measures that will counter safety. Freeman et al. assessed the 4792 expert drivers’ reaction and behaviour in an Australian fleet using Manchester driver’s behaviour questionnaire [4]. The conclusion drawn from this study is that the number of kilometres travelled by the members provides a sign of forecasting the probabilities of crash. Popuri et al. focused to select the public transportation to their work place using attitudinal Survey of 23 statements measuring their daily travel demand [13]. This study has executed six factors by factor analysis of 23 statements. Binary logistic regression method was applied to decide the selection among private or public mode of transport for work trip. The above qualitative models are developed preferably for homogeneous traffic flow conditions in developed countries.

Bhuyan and Rao have applied Hierarchical Agglomerative clustering method on average travel speeds to define threshold values for six LOS categories (A–F) for mixed traffic flow conditions [2]. But, there was no representation of the actual need of drivers while defining LOS under mixed traffic flow conditions. To satisfy the above complications, a suitable LOS model is proposed in this study using step-wise multi variable regression technique to evaluate the service quality provided by the transportation infrastructure from road user’s perspective.

Study Location and Data Collection

To develop an appropriate standard which fits for heterogeneous traffic circumstances, users’ responses from three cities of India were gathered. Information gathered incorporates distinctive sorts of road conditions and drivers of light or heavy vehicles (Fig. 1). Responses from road users regardless of their age and gender were gathered from Rourkela, Visakhapatnam, and Thiruvananthapuram of Odisha, Andhra Pradesh and Kerala state respectively.

Fig. 1
figure 1

a Map showing the three data collection cities in India. b, c, d Study sites of different locations in Rourkela, Visakhapatnam and Thiruvananthapuram respectively

Demographic Analysis

Various quality of service (QOS) factors were affecting the road user’s satisfaction levels on urban street segments as observed from a pilot survey conducted in this study. Based on the experience gained from the pilot survey, an innovative questionnaire was prepared containing 33 questions on QOS factors as shown in Table 1. Road users’ perception data was collected by travellers’ intercept surveys. The strength of this survey are better picture of extensive driving population, gathering of huge sample size and cost effectiveness with respect to the sample size. Study locations were chosen at residential as well as commercial zones in the urban communities. The survey has included personal information of the participant, such as: sex, age and driving experience. In the study around 450 participants were interviewed and requested to rate different QOS attributes on a rating scale ranging from 1 = Strongly agree to 7 = Strongly disagree. Finally, the OS of each road user for the particular street portion was additionally noted down on the same rating scale.

Table 1 Synopsis of quality of service attributes

Demographic Analysis

In this study, responses were gathered from the drivers with a good cross section of sex, age and driving experience. Table 2 shows the demographic analysis of the road users took part in this survey. Around 450 responses have been gathered from the above three cities and each city have minimum 30% of the total data. Participants interviewed in the perception survey were selected randomly by assuring their familiarity with selected road conditions, as they have travelled on the street segments previously. In this survey, almost 42, 40 and 18% of the participants, taken part in the perception survey were motor bike users, car users and commercial vehicle drivers respectively. Similarly, age and gender distribution as well as distribution according to the driving experience of the participants are also shown in Table 2. Drivers of age < 18 years were excluded from this survey due to lack of enough experience to give proper judgement.

Table 2 Demographic information of participants

Study Methodology

There are 33 statements used in the survey questionnaire to capture information regarding different features of transportation infrastructure. Yet, two causes are there behind not taking all the responses as input variables for the decision model. Firstly, there may be a high correlation among the individual statement. Secondly, utilizing all these variables is not suitable from model parsimony viewpoint. The information collected from the 33 statements were compressed into uncorrelated set of variables applying factor analysis.

Factor Analysis

Factor analysis is applied to compress a large data set to smaller subsets of elements. This analysis is used for (1) Understanding the arrangements of variables; (2) Constructing a questionnaire which measures the underlying variable; (3) Reducing the data set to a more adaptable size to retain more novel information as possible.

The factor analysis undertakes that the rankings of the variables are created by some unnoticed and underlying approaches. The basic formula of the factor analysis is explained by Eq. (1) as follows:

$${X_{ji}}=\sum\limits_{{k=1}}^{m} {({\lambda _{jk}}{F_{ki}})} +{\varepsilon _{ji}},{}_{{\forall j}}=1,2,.....,J , {}_{{\forall \iota }}=1,2,.....,{\rm N}$$
(1)

where X ji symbolizes the score of statement j for participant i; F ki implies the kth factor of participant i; λ jk (also known as loading) indicates the relation of jth variable with kth common factor; and ε ji signifies the associated error. The Eq. (1) undertakes J statements, N observations and m factors considered in the model. It is required to be summon up that factor scores (F ki ) were not observed. This exploration calculates both factor scores and respective loadings to make best use of the information maintained from original statements.

Kaiser–Meyer–Olkin (KMO) and Bartlett’s Test of Sphericity is the main aspect in Factor Analysis. The KMO statistic is used to quantity sampling adequacy for each variable. KMO values > 0.8 is measured as good, i.e. the factor analysis is suitable for the variables. The Bartlett’s Test of Sphericity is related to the implication of the study to show the validity and correctness of the collected responses to address the problem. The value of Bartlett’s Test of Sphericity < 0.05 is recommended as a suitable value in factor analysis [3].

Another important aspect mentioned in this study is Rotated Component Matrix to decide the total number of factors that should be analysed, if a variable is linked to more than one factor. Rotation maximizes high item loadings and minimizes low item loadings to produce a simplified solution. In this study orthogonal varimax rotation technique is used, that produces uncorrelated factor structure. To measure the consistency of a questionnaire Reliability analysis (denoted by Cronbach’s alpha) is used.

Multiple Linear Regression Technique

Multiple linear regression technique is commonly used to explain the relationship between one continuous dependent variable and two or more independent variables. The principal factors extracted from factor analysis have been taken as independent variables in the model development process. The overall scores of the QOS attributes under every factor are added together and a mean average value of QOS factors were taken for each individual. OS scores of each participant are considered as output variable. The model was established applying multiple regression technique which tries to fetch the association among two or more independent variables and a dependent variable fitting a linear equation. The independent variables have a specific coefficient (b n ). The output is projected by combining each variable multiplied by their individual coefficients as well as the residual term.

Mathematically,

$${Y_i} = {b_0} + {b_1}{X_1} + {b_2}{X_2} + .... + {b_n}{X_n} + {e_i}{\text{ for i }}={\text{ }}1,2,{\text{ }}...{\text{ n}}$$
(2)

where Y i is the resulting variable i.e. OS, b 1 is the coefficient of the first predictor (X 1 ), b n is the coefficient of the nth predictor (X n ) and e i is the standard error between the predicted and the observed value.

K-Means Clustering

The output of the proposed model i.e. OS scores are categorized in to six LOS groups (A–F) using k-means clustering technique. This is a simple algorithm which resolves the classification problem. A k-means clustering technique groups the information grounded on K points signifying group clusters. This k-means algorithm assigns each data point from a set of N points, to one of the clusters c to decrease the within-cluster sum of squares, provided that the number of clusters is 1 < c < N.

$$D_{{ik}}^{2}={\left( {{x_k} - {v_i}} \right)^T}\left( {{x_k} - {v_i}} \right),\;1 \leqslant i \leqslant c,\;1 \leqslant k \leqslant N.$$
(3)

where \(D_{{ik}}^{2}\) is the distance matrix from data points to cluster centres, x k is the kth data point in cluster i, and v i is the cluster centres (mean of the data points on cluster i).

$$v_{i}^{{(l)}}=\frac{{\sum\nolimits_{{j=1}}^{{{N_i}}} {{x_i}} }}{{{N_i}}}$$
(4)
$$\hbox{max} \left| {{v^{(l)}} - {v^{(l - 1)}}} \right| \ne 0$$
(5)

where N i is the number of objects in the cluster i, j is the jth cluster; \(1 \leqslant i \ne j \leqslant c\). l is the number of iterations.

Result and Analysis

The proposed framework includes statistical model that can identify significant factors affecting the satisfaction. The collected data sets with respect to 33 questions pertaining to various QOS factors of transportation system were analysed. The apprehended data of 33 questions is summarised into convenient and uncorrelated set of variables using factor analysis.

Factor Analysis

Factor analysis was carried out on the 33 statements with varimax rotation (orthogonal). To determine the suitability of the correlational matrix for factor analysis, the computation involves the KMO measure of suitability of sample. Table 3 represents the results of KMO and Bartlett’s test. KMO statistic is found out to be 0.836 (i.e. > 0.8). This value is adequate for factor analysis and indicates that the sample size is good enough to represent the model’s appropriateness. The values of KMO > 0.5 represents a suitable limit. Bartlett’s test is extremely significant with a significant test value of < 0.05 means that R-matrix is not an identity matrix. This represents that there exist some kind of relationships among the variables involved in the exploration.

Table 3 Results of KMO and Bartlett’s test

After getting eigenvalues of each attributes in the collected data it is found out that 8 components have eigenvalues over the Kaiser’s criteria of 1 and it clarified 67.34% of the variation in group. Reliability analysis (Cronbach’s alpha) is applied to quantify the consistency of a questionnaire or a distinct variable. The five variables i.e. roadway design (RD), intersection operations (IO), arterial operations (AO), maintenance (M) and signs and markings (SM) have the value of Cronbach’s alpha > 0.8, hence shows high reliabilities. However, remaining three variables aesthetics (A), road user behaviour (RB) and other facilities (OF) the value of Cronbach’s alpha is under 0.8, hence shows low reliability. The factor loadings after varimax rotation is tabulated in Table 4. From both factor analysis and professional judgement, there were eight factors taken based upon the combination of percentage of total variance in original variables. Table 4 shows the Cronbach’s alpha values, eigenvalues and percentage of variance for each component.

Table 4 Summary of exploratory factor analysis results

The Scree Plot, which is shown in Fig. 2 displays the percentage of total variance described by individual factor. As observed from this figure that beyond eight factors the rate of decrease in % variance with increase in factor numbers is not significant. Therefore, the factors have been “rotated” using the varimax technique, so that individual variable can be loaded heavily beside a single factor for easy interpretation. This procedure supports the perfect documentation of variables those are found out under individual factor and also reduces the overlap among factors. The attribute statements with highest loadings were given the impression in bold for each factor.

Fig. 2
figure 2

Scree plot after principal component analysis

The detail description of eight independent variables and the QOS attributes listed under each independent variable are presented in Table 5. These independent variables are discrete in nature. They vary linearly with the OS of road users. Hence, multiple linear regression technique is used in the present study for model development purpose.

Table 5 Extracted factors and their QOS attributes

Multiple Linear Regression technique

The eight factors extracted from the factor analysis were taken as independent variables and the OS is taken as a dependent variable. R value of 0.842 signifies the multiple correlation coefficient between the explanatory variables and the resulting variable. R2 value in this model is found out to be 0.709. This indicates that the 8 independent variables accounts for 70.9% of the variability of the total variability in overall satisfaction. Table 6 shows the summary of the parameters of multiple regression model. Durbin–Watson value is found out to be 2.163 (nearly equal to 2), which shows that the residual expressions are not correlated.

Table 6 Model parameters of the multiple regression analysis

The ANOVA results shown in Table 7 examines whether the model is considerably superior to predict the resulting variable or not. The value of F ratio = 106.798 represents that this regression model is much better than the inaccuracy within the model. The significance value is 0 indicates that the model has significantly developed the capability to calculate the resulting variable.

Table 7 Test results of ANOVA table

Table 8 shows the model estimates containing values of b-coefficient of predictors, the significance of each coefficient, and t-statistic. These values of b-coefficients represent contribution of each explanatory variable for the model output. After replacing the values of b-coefficients in Eq. (6) the model was re-written as:

Table 8 Model parameters
$$\begin{aligned} {\text{OS}} & = - 0.93+0.29{\text{RD}}+0.19{\text{AO}}+0.26{\text{IO}} \\ & \quad +0.1{\text{SM}}+0.11{\text{M}}+0.09{\text{A}}+0.09{\text{RB}}+0.1{\text{OF}} \end{aligned}$$
(6)

where OS is the overall satisfaction of road users, RD is the cross-section of roadway design, AO is the arterial operations, IO is the intersection operations, SM is the signs and markings, M is the maintenance, A is the aesthetics, RB is the road user behaviour and OF is the other facilities.

In this model the values of predictors are found out to be positive specifying that there is a positive relationship among OS and the predictors. The y-intercept (also called as constant) in this regression analysis is the value at which the regression line crosses the y-axis. Interpreting the meaning of y-intercept in regression analysis, it is the mean value of dependent variable (Y), when all independent variables are set to zero i.e. (x i ) = 0. Mathematically, that’s correct. However, a zero setting for all predictors in a model is often an impossible combination. It becomes even more unlikely that all the predictors can realistically be set to zero in multiple regression analysis with many predictors. In this study the dependent variable (OS) is always positive in the range of 1–7, which implies a positive mean value of Y. The estimated scores for each independent variable are also positive, which are also in the range of 1–7. Hence, the values of both dependent and independent variables will never become zero. In this study, the observed ranges of x i are not closer to zero. Therefore, the fitted regression line crossed the y-axis somewhere from the first quadrant to the third quadrant depending on the minimum value of variable score. This results in a negative intercept value for the constant term.

A negative value of constant is not generally a cause for concern depending on the outcome variable. It simply means that the expected value of dependent variable will be < 0 when all predictor variables are set to zero. Paradoxically, while the value is generally meaningless, it is crucial to include the constant term in the regression models. Even if the predictor variables will be set to be zero, the data points might be outside the range of 1–7 for observed data sets in the present context. Conversely, a regression model cannot be used to make a prediction for an output variable, that is outside the range of data, as the relationship between the variables will change accordingly. Typically, the overall relationships between the variables is mostly importance in a linear regression model rather than the value of constant term. The standard error associated with the beta values indicating the extent to which these values may vary across various samples. The t-statistics related to respective b-values is significant (sig. < 0.05) indicating the predictors are contributing significantly to the model. The greater the value of t-statistics, the larger the influence of that predictor. The tolerance value should be > 0.2 and variance inflation factor (VIF) should be < 10 to overcome the collinearity in the independent variables. In this study, both tolerance value of > 0.2 and the maximum VIF value of 2.296 are satisfied by the eight independent variables. Hence multi-collinearity is not a problem in the considered data set.

Classification of OS Scores

The LOS (OS scores) estimated from this model are grouped into six classes with the help of k-means clustering. The ranges of scores for six categories of LOS are shown in Fig. 3.

Fig. 3
figure 3

Classifying OS score for LOS categories (A–F) applying K-means clustering

Validating the Proposed Model

From the total data 80% was used for model development and remaining 20% was used for validation purpose. While validating the proposed model, the average value of OS scores for each street segments were calculated. The points in Fig. 4 are plotted between observed OS scores and predicted OS scores. The slope of trend line was found out to be 45° by plotting a graph between predicted and observed OS scores. The reliability index (R2 value) of 0.9 represents that the model is well validated for mixed traffic flow condition.

Fig. 4
figure 4

Scatter plot of observed versus predicted OS scores

Conclusion

The HCM has defined “level of service” as the service measures that both reflect the traveller’s perspective and are useful to operating agencies. But LOS criteria in the current version of HCM are not grounded on the basis of travellers’ perception survey about individual transportation facilities. In emerging countries like India, the mixed traffic flow condition comprises of diverse road and traffic operational features. Every user has different perspective and experience several difficulties while traveling along a particular roadway. There is no representation of variability and complexity of human perceptions in HCM for different modes of transport under mixed traffic flow conditions. Hence, HCM guidelines can’t be applied directly to the highly heterogeneous traffic flow conditions. Therefore, the proposed LOS criteria based on user perceptions regarding individual contributing aspects of transportation facility would be more credible than the HCM guidelines, which is based on quantitative performance measures or capacity based outcomes.

This research includes a statistical model that can identify significant psychological factors affecting the satisfaction. The apprehended data of 33 questions is summarised into convenient and uncorrelated set of variables using factor analysis. The KMO statistic value of 0.836 indicates that the sample size is suitable for factor analysis. Five factors i.e. cross-section of RD, IO, AO, M, SM have high reliability (Cronbach’s alpha > 0.8) and remaining three factors i.e. A, RB and OF have comparatively low reliability (Cronbach’s alpha is < 0.8). The proposed model using multiple regression analysis shows R2 value is 0.709 shows that this model explains 70.9% of the variation in overall satisfaction. Durbin–Watson test result was found out to be 2.163 which is close to 2, shows that the residual terms are not correlated. The LOS scores are grouped into six clusters with the help of k-means clustering method. The findings from this study suggests that, the important attributes which mostly affect the comfort level of road users i.e. RD, IO and AO for the poor street segments (designated as LOS category D, E and F) requires improvement. The proposed model was well validated with a reliability index of 0.9 and slope of the trend line 45°, while plotting a graph between observed OS scores and predicted OS scores. These kind of study is new to Indian traffic condition. Hence, this model is expected to serve as a guideline to improve the serviceability along the urban street infrastructure which will be easier for the Highway authorities to follow.