1 Introduction

Walking is the most effective and efficient mode of transportation for short trips. Approximately 1–2 km length of urban trips are performed on foot daily (IRC: 103 2012). The physical structure of the street networks affects the quality of walking journeys (Kang et al. 2018). The absence of proper pedestrian facilities forces pedestrians to use the carriageway and thus come in direct contact with the motorized traffic. One of the most commonly adopted solutions is separating pedestrians from the traffic stream through grade-separated facilities (such as overpasses or underpasses) or providing at-grade signal-controlled pedestrian crosswalks. However, the provision of signalized crossing does not guarantee that the vehicular traffic would stop for the pedestrians, and provide a safe and comfortable crossing experience. Many drivers have a propensity to break traffic rules and even tend to drive their vehicles during red stop signals, which might lead to a pedestrian-vehicle collision (Herms 1972; Koepsell et al. 2002). Also, the pedestrians have a general tendency to avoid properly designated crosswalks and cross the roads through illegal median openings to save time (Golakiya et al. 2019).

According to the World Health Organization (Global Safety Report 2018), among all types of road users, 43% consists of vulnerable road users (with pedestrians accounting for 23%). Also, as per the World Health Organization (WHO) 2018 report, for children and adults (aged between 5 and 29 years) in low- and middle-income countries, road traffic injuries are the primary cause of deaths. The majority of these accidents occurred at the uncontrolled mid-block crossings, where no proper pedestrian crossing facilities were provided. As per the “Road Safety in India Status Report 2016” (Mohan et al. 2017), the reported pedestrian fatality rate was nearly 35–40% of the total fatalities. Metropolitan cities like New Delhi, Bangalore, and Kolkata, had pedestrian fatality share more than 40% (Mohan 2009). Thus, in order to save more lives, it is essential to separate the pedestrian movement from the traffic stream by providing overpasses (foot over bridges) or underpasses (subways). Besides, providing proper street connectivity leads to a better sustainable environment for safe and efficient travel (Zlatkovic et al. 2019).

Past studies showed that even when such grade-separated facilities were provided, pedestrians were reluctant to use them and crossed illegally using available at-grade median openings (Malik et al. 2017; Saha et al. 2011; Das and Barua 2015; Pasha et al. 2015; Sinclair and Zuidgeest 2016). Researchers across the globe had studied different crossing facilities to understand factors that led to the use or non-use of such grade-separated facilities. Studies related to FOBs revealed that pedestrians preferred the shortest path while choosing the route (Methorst 2004), with time, distance, and extra effort versus safety benefit playing a significant part in the pedestrians’ decision-making (Li 2013; Rankavat and Tiwari 2016). In a study at Ankara (Turkey), Räsänen et al. (2007) reported that depending on the time saving, safety, and familiarity with an area, the use of FOBs varied between 6 and 63%. Although the existence of escalators increased the usage rate, however, the existence of a traffic signal in the nearby vicinity of the FOB decreased the usability. Fear of height (Opdyke et al. 1995; Juan and Pérez 2009) and the absence of adequately designed stairways (Mutto et al. 2002; Rizati et al. 2013) significantly affected the motive of the pedestrians toward using elevated facilities. Pedestrians across different Asian cities usually preferred using at-grade facilities rather than using underpasses or overpasses due to security concerns, poor accessibility, and encroachment by hawkers (Saha et al. 2011; Pasha et al. 2015; Malik et al. 2017; Anciaes and Jones 2018).

Researchers had used different modeling approaches to obtain the critical parameters affecting the usability of the pedestrians (refer to Table 1). The regression model used by Abojaradeh (2013) predicted that in Jordan, the use of pedestrian bridges had a positive impact on pedestrians to reduce fatalities. Factors such as posted speed limit, traffic volume, the width of the crosswalk, condition of the bridge, and the existence of median barriers significantly affected the use of the bridges. Also, the likelihood of using FOBs increased when pedestrians had experience of injuries (Oviedo-Trespalacios and Scott-Parker 2017). In China, Wu et al. (2014) used a binary logit model and identified that gender, age, career, level of education, license, detour wishes, detour distance, and crossing time played significant impact on the use of overpasses.

Table 1 Various empirical study details focused on the usability of different crossing facilities

Moreover, a study based in Thailand (using logistic regression method) revealed that the proximity of bus stop to the FOB and self-experience of road accidents further influenced the choice of the pedestrians (Sangphong and Siridhara 2014). A Relative Importance Index measure highlighted that even structural design, angle of stairs, width, surface, and the existence of fence at-grade influenced the possibility of using the facility (Hasan and Napiah 2014). In addition to the above-stated factors, the role of parents in educating children about possible traffic risk of using the at-grade facilities influenced the utilization of the footbridges (Hasan and Napiah 2018). Moreover, the use or non-use of pedestrian facilities is a habit and not a coincidental behavior, based on safety and convenience perception (Rankavat and Tiwari 2016; Räsänen et al. 2007).

Table 1 shows that majority of the studies which were conducted in Asian countries, used logistic regression (binary, multiple and mixed) in order to predict the pedestrian choice between the different crossing facilities. Also, questionnaire was the most preferred survey technique used by different researchers for modelling purposes. Apart from demographic characteristics, variables such as frequency of use, safety, security, hawker’s presence, cleanliness, number of steps and type of facility available were the most used for questionnaire studies.

Apart from logistic regression, soft computing techniques are becoming quite famous among researchers globally. Table 2 shows the diverse empirical studies carried out across various domains using different soft computing approaches.

Table 2 Application of soft computing approaches in various empirical studies across different domain

From Table 2, it is observed that different soft computing tools (generalized linear model, random forest, decision trees, neural networks, etc.) were used on survey data in the field of health care, environment, risk assessment, remote sensing and web app-security. Moreover, MAE, MAPE, RMSE, and R2square were the most common evaluation metrics used for model evaluation.

1.1 The study motivation and objective

From the previously mentioned literature (Table 1), it is evident that the perception of pedestrians in using the overpasses or FOBs has not been conducted extensively in a developing country, like India. Moreover, previous studies mainly used modeling approaches such as linear, binary logistic and ordered logistic regression models, to gain information on parameters influencing the use/non-use of FOBs. Alongside commonly used modeling approaches, in the past few decades many other high-performing machine learning algorithms, such as tree-based ensemble learners (i.e., random forest, bagging and boosting) and neural networks (deep learning) were introduced in various fields of research (Table 2), as they outperformed other traditional modeling approaches in terms of prediction accuracy (Couronné et al. 2018).

Hence, taking the model performance in consideration and to get more accurate insight on the most prominent factors affecting the usability of pedestrians (users), this study attempts to compare between three different modeling approaches (GLM: generalized linear model, RF: random forest and GBM: gradient boosting machines) in order to predict future usability of overpasses by pedestrians.

The main objective of the current study was to identify the potential parameters that drive the use of foot over bridges across various Indian cities. Both tangible (i.e., field measurements) and intangible (i.e., survey-based ratings of existing FOB’s condition) parameters were used in the modeling process.

Unlike the past studies which estimated the usability of FOBs from a single context point of view (i.e. aggregate usability measurement in terms of safety and security), this study estimates the relative importance of different parameters from four different users’ perspectives (such as mobility, safety and security, vertical end connectivity and horizontal end connectivity) together using advanced machine learning tools (GLM, RF and GBM). A robust solution (using the advanced machine learning tools) was presented which might not be the quickest one, but shall be the best for policy making by highlighting the impact of different parameters altogether. The results of the study would provide useful information to planners and developers to either upgrade existing elevated pedestrian facilities or construct better facilities in the future.

2 Method

2.1 Survey location selection

Six major cities covering the different regions of India were visited, and all possible survey locations were observed prior to final data collection. Only those locations were selected where the FOBs had an adequate flow of pedestrians throughout the day and were connecting from one side of the road to the other side through a single entry and exit. To cover the variability among the FOBs, different land-use types ranging from commercial and public transport terminal to educational and residential areas were covered for this study. Figure 1 shows the six different Indian cities which were chosen for questionnaire survey along with the total sample response.

Fig. 1
figure 1

Indian cities selected for the final survey

In total, 28 locations were visited over six different cities, out of which fourteen locations were finalized, where the final questionnaire surveys were conducted. The survey tried to cover various regions of the Indian subcontinent, covering metropolitan cities like Delhi, Mumbai, Kolkata and Bengaluru. In total 14 locations were covered and 552 completed questionnaire samples were collected. A detailed summary of the different foot over bridges selected for this study is illustrated in Table 3. All the foot over bridge locations considered for the study had different flow levels during peak and off-peak hours. The foot over bridges were provided with either stairway alone, or along with escalators, lifts, and ramps.

Table 3 Site and sample characteristics

The measured length and walkable width of the FOBs across selected locations, varied between 21–88 m and 2.05–5.70 m, respectively, as illustrated in Table 3. It is consistent with the guidelines prescribed by IRC: 103 (2012), suggesting the minimum required walkable width of 1.8 m. Depending on the riser dimension, i.e., the vertical distance between two successive steps (Irvine et al. 1990), the number of steps keeps on varying across the different locations. The dimension of the tread, i.e., the horizontal top portion of a step where foot rests (Irvine et al. 1990), ranged between 26 and 32 cm.

Ideally, the suggested riser and tread dimensions as per IRC: 103 (2012) should be 15 cm and 30 cm, respectively. The locations in Delhi (i.e., ITO and Maharani Bagh) had stairways, which were similar to ramps with tread and riser dimensions of 30 cm and 2 cm respectively. Table 3 also indicates that most of the FOBs other than those in Bengaluru and Delhi lacked either ramps/lifts/escalators. Due to a lesser number of FOBs in Guwahati, only one FOB was found suitable for this study.

Similarly, in the case of Mumbai, the FOB situated outside the Indian Institute of Technology (IIT) Bombay campus was only considered. Due to time constraint and the commonality between the FOBs in Ghaziabad, only one of the overpasses was selected to represent the other FOBs.

Moreover, from Table 3, it was also observed that majority of the respondents across different locations were male pedestrians (67–76%). The pedestrian in the age category of 23–59 years were regular users of the overpasses. Also, pedestrians using luggage were found to be higher (5–17%) than the ones without luggage for all the different cities considered.

2.2 Questionnaire design

A questionnaire set was prepared, including three broad sections (A to C) representing demography, the current condition of FOBs, and future usability dependents (refer Table 4).

Table 4 Questionnaire survey format and field observations

As per Table 4, Section A covered demographic characteristics including gender, age, profession, and frequency of daily use. Likewise, section B focused on capturing existing connectivity (i.e., the connection from one end of the stairway to the other), security (i.e., in the form of security personnel and CCTV cameras), comfort (i.e., shade, proper guardrails and cleanliness), walk environment (i.e., governed by the facility surroundings and whether they were pleasant or not) and obstruction (i.e., presence of hawkers/vendors, beggars, and standing pedestrians). The pedestrians were further asked whether the condition regarding obstruction, safety and security, vertical end connectivity (lift/escalator/ramp), and horizontal end connectivity improvement (refer Table 4, Section C) would govern their future usability. Further, a set of field measurements were noted down by the observers (see Table 4, section D).

2.3 Questionnaire survey

After finalizing the survey locations, the field observers (two in number for each location) gathered necessary details of the FOBs including GPS coordinates and dimensions of the walkways and stairways. An interviewer-administered questionnaire survey (by two interviewers) of the pedestrians in the neighborhood of the facility was conducted at each survey location for weekdays between morning (8.30–11 am) and evening (5–7.30 pm) peak hours, respectively, to obtain a representative sample. During the survey, both set of participants (including users and non-users of FOBs) were randomly selected (using random sampling technique) and requested for the survey participation; and those willing to undergo the interview process were finally interviewed. Due to the massive rush in morning and evening peak hours, the participation rate was low (i.e., out of approximately twenty random pedestrians, only one participated when requested). Among all participants, only 552 respondents answered all the questions, thoroughly. Later, in the laboratory, these 552 questionnaire samples were manually entered into an excel sheet according to the final analysis requirement and used for the final data analysis and modeling.

3 Data analysis

The demographic characteristics were obtained by performing exploratory data analysis on final prepared survey data containing 552 samples, using “tidyverse” package (Wickham 2017) in the R statistical programming environment. It was expected to get different behavior at various land-use types. Thus, instead of a city-wise analysis, the final analysis was carried out comparing the usability between different land-use types (commercial, educational, public transport terminal, and residential).

3.1 Demographic characteristics

The demographic parameters including gender, age, profession, and frequency of daily use are essential in understanding the existing usage pattern of pedestrians. Table 5 shows the demographic characteristics for different land-use types.

Table 5 Demographic characteristics of respondents’ under different land-use types

Table 5 shows that majority of the participants were male pedestrians (~ 65–75%). The variation of gender proportion within each gender group was found to be minimal. The highest proportion of male respondents were observed at residential locations (74.4%), while within the female gender group, the highest proportion of female pedestrians (34.8%) were observed at educational locations.

Majority (~ 55%) of the pedestrians using the overpass facility were at educational locations, between the age group of 13–22 years (refer Table 5). The finding is consistent with the study findings of Desriani and Komordjaja (2008) and Guo et al. (2014), which revealed that the utilization rate of FOBs was highest among young pedestrians when the facilities were situated in educational locations. Similarly, for the other three land-use types, majority of the users (~ 40–42%) were in the age group of 23–45 years, which is also consistent with the report of Ministry of Statistics and Programme Implementation (2018) which states that India holds the highest proportion of the young population (i.e., around 242 million). It is further noticeable that the usage rate decreased with the increase in age, which is in accordance with the findings reported by Räsänen et al. (2007). The population of pedestrians below 12 years of age at educational locations was low, as parents tend to drop them using private vehicles or they travel by buses to school, instead of walking to their destinations using FOBs.

The statistics shown in Table 5 revealed that the majority of the pedestrians (~ 36–46%) were regular users, as they were using the FOBs daily twice or more than twice. The frequency of users using the facility daily once was low (~ 12–17%), as people using the FOBs generally used the facility for both ways of their trips (for example, from residence to workplace and back). There were very few first-timers (~ 2–6%) who used the facility for commuting.

Based on Table 5, it was also observed that majority of the regular users in educational and residential areas consisted of predominantly students (54.6% in educational location), while in commercial and PTTs, the proportion of servicemen (38–41%) and self-employed personnel (14–17%) were high. The results were similar to the previous findings reported by Saha et al. (2011) and Wu et al. (2014) that the tendency of using FOBs increased with higher education and better employment. The percentage of retired persons (age > 60 years) was significantly low throughout all the locations, but a small group of homemakers was noticeable in commercial (6.4%) and residential (8.1%) areas. The next subsection gives a vivid discussion about the respondents’ response ratings regarding perceived satisfaction/dissatisfaction, on the existing condition of FOBs.

3.2 Satisfaction of existing FOBs

For understanding how much pedestrians were satisfied/dissatisfied with the available features in existing FOBs, seven quality assessment parameters were asked to the participants across different land-use types. These parameters were the comfort, connectivity, safety and security, surface condition, walk environment, width, and obstruction (as described in Sect. 2.2). The factors were selected from a list of physical and user characteristics provided by IRC: 103 (2012) for pedestrian facilities. The rating of these parameters were obtained from poor (0) to excellent (4) categories. The participants’ responses from all land-use types were combined and analyzed together as shown in Fig. 2.

Fig. 2
figure 2

Perceived satisfaction/dissatisfaction on existing features of FOBs

The response statistics from Fig. 2 revealed that safety and security was the most critical parameters, which the pedestrians perceived to be ‘poor’ in commercial (29.6%), residential (20.9%) and PTT (20.6%) locations. The finding was also consistent with the feedback (i.e., regarding existing issues) provided by the respondents (presented in Table 6).

Table 6 Respondents’ common feedback concerning different issues in existing FOBs

The users in commercial locations followed by the users in residential locations expressed the highest dissatisfaction related to comfort (19.1%), surface (21.4%) and walk environment (20.9%). Respondents’ satisfaction regarding connectivity between FOBs with their desired destination was perceived to be extremely poor in the commercial locations (9.1%), while in educational, PTT and residential land-use types, the users expressed satisfaction with the existing connectivity.

Further respondents’ felt that the walkable width and perceived obstructions (refer Figs. 2, 3) were not satisfactory for comfortable movement at commercial (17.7% and 23.6% respectively), PTT (9.4% and 4.4% respectively) and residential (5.8% and 29.1% respectively) locations. The dissatisfaction was mainly because during peak hours; these locations were crowded by standing pedestrians/vendors/beggars (see Table 6), reducing the effective walkway width and resulting in increased mobility friction.

Fig. 3
figure 3

Perceived obstruction across different land-use types

The field measurement of FOBs indicated that the mean walkway width was minimum at commercial (\( \bar{x} \) = 2.69 m and σ = 0.49) and residential (\( \bar{x} \) = 2.75 m and σ = 0.21) locations, which could cause higher mobility friction. Further, at PTT locations the mean width was found to be highest (\( \bar{x} \) = 3.63 and σ = 1.49), but due to the high peak hour flow and presence of mobility frictions (i.e., presence of standing pedestrians/beggars/vendors), the effective walkway width was fully utilized, and the space available to the pedestrians reduced substantially. These facts seemed to decrease the satisfaction regarding width and gave a sense of obstruction or acted as friction to a small group of pedestrians (4.4%).

4 Modeling approaches

An effort was made to understand the best-suited model (in terms of accuracy), which could predict the future usability of the foot over bridges under four different contexts (such as mobility, safety and security, vertical end connectivity and horizontal end connectivity). The different modeling approaches used in the current study were generalized linear modeling (GLM), random forest (RF), and gradient boosting machine (GBM). A short description of the models is provided in the following subsections.

4.1 Generalized linear modeling (GLM) framework

The GLM estimates regression models for outcomes following an exponential distribution. The structural form of the model describes the patterns of interactions and associations. The two categories of models produced by GLM are classification and regression. Binary logistic regression is the form of GLM which performs binary classification and estimates whether a probability characteristic is present (i.e., estimates the binary class probability). The form in which binary logistic regression is used is shown in Eq. 1.

$$ P_{i} (Y_{i} = 1|X_{i} = x_{i} ) = \frac{{{\text{EXP}} [\beta_{0} + \beta_{1} X_{1} + \beta_{2} X_{2} + \cdots + \beta_{i} X_{i} ]}}{{1 + {\text{EXP}} [\beta_{0} + \beta_{1} X_{1} + \beta_{2} X_{2} + \cdots + \beta_{i} X_{i} ]}} $$
(1)

where Pi is the probability of whether an occurrence happens or not; X is the observed value of explanatory variables, which can be discrete, continuous, or a combination of both; and β is the regression coefficient. The GLM-based algorithms are easy to train, though suffers from overfitting related issues. Thus, Lasso- or Ridge-based penalties are used to reduce the overfitting issues.

4.2 Random forest (RF) modeling framework

A decision tree is the simplest, perhaps most easily understandable algorithm used in Machine Learning (ML). A decision tree is similar to a flowchart, and Breiman (2017) first implemented this decision tree algorithm for classification and regression. Decision trees are mostly used for decision-making and predictive modeling.

A simple example can be where a person is choosing between two available alternative infrastructures, i.e., FOB and another available route, for crossing a busy road, based on different available features, shown in Fig. 4. In a decision tree, the top node is called “root node” and the node at the bottom is called “terminal node”. The other nodes except the root node and terminal node are called “internal nodes”. Each internal node includes a binary test condition, while each leaf node contains associated class labels (for binary choice example, yes/no).

Fig. 4
figure 4

A decision tree for choosing between two alternative routes

A classification tree uses a split condition to predict a class label based on the supplied one or more input variables. The splitting process starts from the root node, and at each node, it checks whether supplied input values recursively continue to the right or left sub-branch as per a splitting condition. This process stops when a leaf or terminal node is reached. Most mathematical algorithms use an impurity measure as a splitting criterion, and one of the common impurity measures is the Gini Index (Breiman 2001). Lower the Gini value; higher will be the purity of split (refer to glossary section). The other splitting criteria are entropy (information gain) and misclassification rate.

Though a single decision tree-based model is easy to build and interpret, yet it suffers from various drawbacks, such as high variance (as often a small change in the training data results in very different splits) and overfitting due to deep grown fragile tree structure. To overcome these drawbacks, ensemble learners were introduced (Breiman 2001). Ensemble learning is a method where more than one model is built to gain higher prediction accuracy. A voting mechanism is used to aggregate the results from all models. For classification, the ‘majority voting technique’ is used, where each ensemble is asked to predict the class label. Once all the classifiers are quarried, the class that received the highest number of votes returns as the final decision of the ensemble learner.

There are three popular ensemble learning algorithms such as Bagging (Boot Strap Aggregation), Random Forest and Boosting. Random forest is one of the widely used ensemble learners used in Ecology, Medicine, Astronomy, Autopsy, Agriculture, Bioinformatics, and Traffic and Transportation planning (Fawagreh et al. 2014). Though RF is widely popular, still limited researchers have used this approach for solving classification and regression problems in traffic and transport planning domains (see Table 7).

Table 7 Studies in traffic and transportation planning related to random forest modeling approach

RF was mainly used by different researchers in crash prediction analysis, apart from travel demand forecasting and mode choice modelling (refer Table 7).

The random forest modeling approach uses a bootstrapping (means sampling, ‘m’ number of rows/observations at random from the ‘n’ size training dataset with replacement) analogy similar to bagging, but adds additional randomness to models at each split by randomly selecting input variables/features (often referred as “mtries”, where mtries < available number of variables) using a feature bagging method (Cook 2016; Yu-Wei 2015), refer Fig. 5. As there are fewer number of variables available to choose from, hence less information is available to the model during the training process. Thus, this additional randomness makes trained trees more different from each other, in other words, this makes less correlated trees which improves the prediction performance (Breiman 2001). Though RF provides accurate solution, still it is computationally very expensive and difficult to interpret.

Fig. 5
figure 5

The random forest modelling approach

4.3 Gradient boosting machine (GBM) framework

GBM is another decision tree-based ensemble method (like RF) for regression and classification; and which primarily focusses on difficult rows of training (i.e., the ones that are hard to learn). GBM and RF differ in the way the trees are built and the way the results are combined. In RF, each tree is trained independently using random sample data; whereas in GBM one tree is built at a time, where each new tree helps to correct errors made by the previously trained tree. While in RF, mtries (i.e., number of variables to randomly choose as candidate at each split) and ntrees (number of trees to make) are the most important parameters which need tuning; in GBM, parameters such as ntrees, max_depth (how deep each tree will be allowed to grow) and learning rate (weighting factor applied for new trees when added to the model to slow down the learning) are the most prominent parameters which need modification. Studies have shown that GBM performs better than RF, as GBM tries to add new trees that complement the already built ones. Similar to GLM and RF-based algorithms, the boosting algorithms has limitations too. GBM is more focused toward bias correction than variance, and is computationally more expensive.

Similar to RF, GBM has been less explored in the field of transportation and planning. Table 8 shows the studies which have been conducted on GBM in the related field.

Table 8 Studies in traffic and transportation planning related to gradient boosting machine approach

Table 8 shows that majority of the studies used GBM to predict travel time and traffic patterns.

4.4 Study methodology

The step-by-step methodology adopted for modeling usability is illustrated in Algorithm 1. The study methodology involved literature survey, preliminary site inspection and questionnaire design, data collection and extraction, followed by modeling of usability from four different contexts, and finally extracting the important features for policy decision.

As explained in Sect. 2, the data was collected from fourteen FOB locations across India. This data must be handled carefully before using them for prediction model. Incomplete or partial data sets were ignored from the analysis. Afterward, normalization was applied to each column of the data set using min–max scalar. Normalized dataset was randomly divided into 80:20 ratio for training and testing of the developed model. A hyper-parameter grid was selected based on the past test experiments. A tenfold cross-validation approach was applied on the 80% dataset for model and hyper-parameter tuning. To obtain faster solution, a random grid search was adapted. The Area Under Curve (AUC) metric was selected for model performance evaluation due to class imbalance in the outcome variable. For faster training an early AUC-based stopping criterion was adopted. If AUC did not improve by 0.1% for the ten successive models the model training and hyper-parameter tuning stopped and next successive search started. The final best performing model then extracted and tested on the remaining 20% unseen/test dataset.

figure a

4.5 Future usability model development

To obtain the essential parameters that determine the use of FOBs, in the current study three different modeling approaches (GLM, RF, and GBM) were explored to predict the future usability determinants of FOBs (binary outcomes, Yes/No) using open-source statistical programming language R (version 3.4.3), under four different contexts. The four contexts include obstruction removal and relocation (Model 1), CCTV installation and security personnel deployment (Model 2), vertical end connectivity improvement in terms of lift/escalator/ramp installation and maintenance (Model 3) and horizontal end connectivity improvement (Model 4) corresponding to their selected predictors, presented in Table 9.

Table 9 Model parameters for modeling the usability of FOBs

The samples split used was 80:20, and thus 443 samples were used for training the models and 109 for testing them (Table 9).

4.6 Study hypothesis and limitations

The present study predicts the pedestrian usability of FOBs under four different contexts. Majority of the studies, which were conducted to identify the pedestrian behavioral attributes (to improve the usability of elevated facilities), were mainly focused on understanding the relevant factors attributed to the use of elevated facilities under a single context only, and those limited studies emphasized on accurate solutions for reliable estimates for policy decision. Thus, the current study tried to fill this gap. It is hypothesized here that similar to other study domain, in pedestrian research; high-end machine learning (ML) algorithms could be used to obtain accurate solution that helps in policy decision-making. Therefore, the objective of the current study was to come up with reasonably accurate solution for FOB utilization modelling for policy makers, rather than real-time choice decision-making system for FOB use using four different contexts.

Due to the advancement in algorithms and hardware, researchers were able to develop and test various ML-based algorithms to solve similar problems, but in research trying and comparing all type of techniques is neither possible nor practical. Thus, in the current study, common algorithm (GLM) as well as advanced algorithms (ensembles: RF and GBM) are used to get an accurate solution and thus identify the factors that influence the usability of FOBs.

4.7 Model configuration and hyper-parameters

In order to train the models and to use early stopping criteria, an open-source R package named “H2O” was used (H2O 2017). For training and performance testing of the models, a total of 552 samples were randomly split into 80% (n =443) train and 20% (n =109) test dataset. To identify the best set of model parameters and to avoid unnecessary search, a randomized grid search approach (randomDiscrete) was used in the case of RF and GBM methods, as this helps in minimizing the computational time (Bergstra and Bengia 2012). The grid search parameters are illustrated in Table 10.

Table 10 Grid search parameters and early stopping criteria

A random grid search was performed to get the best set of parameter combinations that would provide better prediction accuracy. In the random grid search, a number of trees were tried ranging from 100 to 500 for both RF and GBM. Instead of default mtries (H2O default, square root of the number of variables for classification), a range of mtries for each of the four models were used in RF, reported in Table 10. Further, the column sample rate at tree level and the sampling rate was varied from 0.5 to 1.0. The maximum tree depth was fixed at 40 for RF; while it was varied between 2 and 10 at an interval of 2 for GBM. In the case of GBM, a learning rate of 0.01 was used.

For minimizing the training time and to avoid model overfitting, among all available early stopping criteria offered by H2O package (such as misclassification, logloss, MSE, and AUC), the AUC-based early stopping criterion was selected. The AUC-based early stopping criteria was applied with the condition that if in the successive ten models, the AUROC (Area Under Receiver Operating Curve) does not improve by 0.1%, then H2O stops further grid search. Simultaneously, per-model level early stopping criteria was applied with the condition that if training goes with ten scoring rounds without any improvement at all (stopping tolerance = 0) in the AUC, then it stops. Due to smaller sample size (n = 552), instead of creating a separate validation data set, tenfold cross-validation criteria with a random fold assignment were used to get a better and reliable estimate of the trained models.

4.8 Model training and performance testing

The models were trained in R environment with the training samples (n = 443) using H2O package. In total 180 models were generated for Model 1 (predicting FOBs use in the context of future improvement in mobility friction) using RF and 225 models using GBM. Consecutively, for Model 2 (predicting FOBs use in the context of enhancement in safety and security), Model 3 (predicting usability in the context of future improvement in lift/escalator/ramp) and Model 4 (predicting usability in the context of future horizontal end connectivity improvement), RF generated 90, 47 and 135 models; while GBM generated 225, 225 and 450 models respectively. Next, all the generated models were sorted in decreasing order according to the AUC value, and the models with the highest AUC value were selected as the final optimized model. The final model summary is illustrated in Table 11.

Table 11 Summary of the best model hyper-parameters

The optimized model results (from Table 11) revealed that both RF and GBM identified the accurate solution using less than 220 trees. RF modes used max depth ranging from 10 to 16, while GBM achieved the best optimized model within maximum depth of 4 to 10. Additionally, the model trained on different context utilized the whole range of column sample rate per tree and sample rate, thus no distinct pattern was observed.

The various statistical measures used to measure the statistical significance of the models are Accuracy, LogLoss, AUC (Area Under Curve), MSE (Mean Squared Error) and Fbeta measure (good performance measure used for unbalanced classes; in this study only estimated during model performance on test data set), which are described in detail in the glossary section. Further, model prediction accuracy was separately estimated and reported. The model performance on the training data is shown in Table 12.

Table 12 Summary of model performance on training data

The results from Table 12 shows that in case of all the four models, GBM reasonably performed best in comparison to GLM and RF. Also, the cross-validation estimates for each final model (illustrated in Table 13) for RF and GBM showed that AUC ranged between 0.72 and 0.98 (in case of RF) and between 0.72 and 0.97 (in case of GBM), which indicated an overall good model prediction in case of both the approaches.

Table 13 Summary of tenfold cross-validation mean estimates of final models

Further, for obtaining model performance on unseen test dataset, the final models were tested on the remaining 20% (n =109) test dataset. Table 14 shows the model performance summary on the test data set.

Table 14 Summary of model performance estimated on the test dataset

The performance summary revealed that the overall optimized models using GBM (based on Fbeta measure: 0.61–0.97) performed best in comparison to GLM (Fbeta measure: 0.51–0.93) and RF (Fbeta measure: 0.54–0.97) methods on the same test data as illustrated in Table 14.

4.9 Applications of advanced soft computing techniques in transportation and its comparison with the current study

Application of different advanced soft computing techniques in the transportation-engineering domain is presented in summary format in Table 15. Results of the present study highlighted that GBM could be one of the best choices for modeling the usability of pedestrian FOBs. The current GBM model showed a classification power ranging from 77.42 to 97.80%. As per past studies (refer Table 15), better prediction accuracy was obtained using boosting-based algorithms on different study domain. For example, study by Ha et al. (2019) revealed, GBM could predict travel mode choice behavior with 95.1% accuracy. Similarly, Mousa et al. (2018), modeled the lane changing behavior with an advanced version of boosting-based algorithm called XGBoost and achieved an astonishing accurate model (99.7% accuracy). The other studies proved the effectiveness of boosting-based algorithms in different study domain, as mentioned in Table 15.

Table 15 Different studies in transportation sector using advanced soft computing techniques

Similar to other domain, in pedestrian-based researches, the use of such advanced algorithms is limited. Hence, the current study results tried to fill this gap, and showed the effectiveness of such algorithms in pedestrian research, which could act as a better alternative when model quality (accurate model) is the main goal.

5 Variable importance analysis

In the modeling process, variable importance was further estimated for each optimized model using the GBM method. The importance was estimated by calculating the relative influence of each variable: whether the variable was selected during the splitting in the tree building process and how much the squared error decreased. The variable importance obtained from each final selected model is illustrated in Fig. 6. All scaled importance ranging low (0) to high (1) for each selected factors were arranged in descending order to get the most important factors influencing the FOBs use, as illustrated in Fig. 6.

Fig. 6
figure 6

Variable importance plot (scaled importance)

5.1 Future use concerning mobility friction improvement

In the future usability model corresponding to mobility friction (model 1: obstruction), the importance plot revealed that stair width and daily frequency of use were the most crucial factors which influenced the use of FOBs (illustrated in Fig. 6, Plot A). Usually, narrow stair width and high pedestrian flow increased the mobility friction. Even the presence of vendors/beggars/standing pedestrians reduced the effective walkway width, which was further responsible for the reduction in the use of the FOBs, and this was in accordance with the findings reported by Pasha et al. (2015). During the survey, a set of feedbacks were obtained from respondents (refer Table 6) which also confirmed that the mobility friction was land-use specific and occurred majorly at PTT, commercial and residential locations. A study by Saha et al. (2011), confirmed that in the Central Business District (CBD) and PTT locations, unwanted people’s presence discouraged the use of overpass. It was further noticed that the type of location, age, existing walk environment, and security further also played a vital role in usability choice. Though the presence of people (vendors, standing pedestrians, and beggars) usually gave a sense of safety and security, yet this might not be true under all scenarios. During night time, the presence of vendors and beggars might not encourage pedestrians to use FOBs due to the prevalence of illegal activities and fear of victimization (see feedback Table 6). The study of Malik et al. (2017) also highlighted a similar concern, which made people feel insecure in the presence of many beggars and shops.

5.2 Usability concerning safety and security

Perceived safety and security is one of the most prominent factors related to overpass use (Räsänen et al. 2007). The future usability prediction model (Model 2) concerning perceived safety and security (i.e. CCTV installation and security personnel deployment) revealed walk environment (i.e., the facility surroundings and whether they are pleasant or not) and gender as the most crucial predictors that influenced the use of FOBs across Indian cities (see Fig. 6, plot B). The significant determinants of safety and security differed based on the time when pedestrians used the facility. At daytime, safety and security-related concerns arose among people in highly crowded areas (such as PTT), where the pickpocket and theft-related issues were frequently noticeable (see feedback Table 6). At night time, the security-related issues (such as walk environment being uncomfortable due to illegal activities) were primarily gender-specific (related to female pedestrians), which debarred them from using the FOBs (Malik et al. 2017; Pasha et al. 2015). Moreover, the absence of CCTV and security personnel at PTT and commercial locations (see feedback Table 6) demotivated users and increased the perceived fear of victimization. Insufficient security regarding the absence of lighting (Malik et al. 2017; Pasha et al. 2015), perceived insecurity of getting robbed (Hasan and Napiah 2014; Malik et al. 2017; Villaveces et al. 2012) and criminal activities (Villaveces et al. 2012, Saha et al. 2011) were previously found to deter the use of FOBs among pedestrians. Past studies, as well as current respondents’ expressed concerns also supports the fact that to enhance safety and security, most common measures are the provision of proper lighting (Hasan and Napiah 2014; Malik et al. 2017; Villaveces et al. 2012; Pasha et al. 2015) and removal of advertisement banners (Koepsell et al. 2002; Malik et al. 2017; Oviedo-Trespalacios and Scott-Parker 2017). Moreover, the provision of surveillance systems such as CCTV cameras along with proper placement of security personnel (Gallegos 2012) and stricter laws (Hidalgo-Solórzano et al. 2010; Sabet 2013) also strengthen the safety perception among pedestrians and motivate them to use the facility frequently.

5.3 Usability concerning vertical end connectivity (lift/escalator/ramp)

The variable importance obtained from Model 3 (predicting usability concerning vertical end connectivity) illustrated future usability was highly related to the design-related parameters such as steepness of stairs (Mutto et al. 2002; Sabet 2013; Saha et al. 2011), narrow width of stairs (Hasan and Napiah 2014) and absence of escalators/ramps (Desriani and Komardjaja 2008; Hasan and Napiah 2014; Räsänen et al. 2007; Rizati et al. 2013) which caused discomfort to the users (Sharples and Fletcher 2001; Hasan and Napiah 2014, 2018; Saha et al. 2011). These reported facts also support the findings obtained from Model 3 (refer Fig. 6, plot C), where the dimension of the stairway (i.e., number of steps, tread dimension, width, and riser dimension) and comfort played a significant role in deciding the future use of the FOBs. Further, location type, i.e., where the facility is situated (see Fig. 6, plot C) also played a pivotal role in the choice of use. This finding is consistent with the previous studies, where researchers reported that the usability of FOBs in commercial and educational areas was significantly higher than residential and shopping areas (Desriani and Komardjaja 2008; Hasan and Napiah 2017; Räsänen et al. 2007; Rizati et al. 2013). Further, the age of the respondent was found out to be a prevalent determinant, as with the increase in age pedestrians felt uncomfortable in putting extra effort to climb stairs (Rankavat and Tiwari 2016). Provision of short stairs, ramps (Desriani and Komardjaja 2008) and lift/escalator (Demiroz et al. 2015; Hasan and Napiah 2018; Räsänen et al. 2007), while designing new elevated facilities and on existing FOBs would reduce extra effort and enhance comfort, leading to an increase in the use of FOBs.

5.4 Usability concerning horizontal end connectivity

The future usability concerning the horizontal end connectivity (Model 4) was majorly influenced by the length of travel, comfort, daily frequency of use and age of the pedestrian (see Fig. 6, plot D). The traveled length was found out to be the most influencing factor in deciding the use of the FOBs. Mutto et al. (2002) in their study highlighted the fact that extra traveled distance negatively influenced pedestrian behavior while choosing a crossing facility. Other studies also used similar predictors, but instead of traveled distance, traveled time was used as a function of covered distance, which was also directly related to the perceived comfort. Past studies identified that when the traveled length increased significantly as compared to at-grade facilities and the time needed to cover the distance was more than 50% (Anciaes and Jones 2018; Hasan and Napiah 2014; Malik et al. 2017; Rankavat and Tiwari 2016; Wu et al. 2014) people tried to avoid FOBs and used nearest illegal exits available by attempting to cross through median openings or jumping over fences (Demiroz et al. 2015; Desriani and Komardjaja 2008). This indicated that providing additional length and detour distances would discourage pedestrians from FOB use. The current model finding also revealed that pedestrians’ perceived comfort was another important factor. The comfort was derived based on whether proper horizontal end connectivity gave the pedestrians easy and direct access toward their destination or not. In this regard, proper signboards mentioning the connecting locations (i.e., entry and exit location name) would likely give initial information to pedestrians about their destination and provide comfort to existing as well as new users (Desriani and Komardjaja 2008).

6 Sensitivity analysis

In order to understand the robustness of the developed model, sensitivity analysis was carried out. In the sensitivity analysis the top parameter of context 1 (refer Fig. 7), i.e., stair width was varied between 0.66 and 3.74 m with a base value of 2.22 m (i.e., varied between − 70 and + 70%). The other parameters from daily frequency to gender (refer Fig. 6, plot A, model 1), were fixed as per the frequency of the questionnaire survey.

Fig. 7
figure 7

Sensitivity analysis for Model 1 (Context: Mobility Friction)

The result of the sensitivity of Model 1 (as per Fig. 7) showed that for both gender and age groups (23–59 years), the usability starts increasing when the stair width is above 1.5 m, which is mainly due to the fact that in wider FOBs (above 1.5 meters) pedestrians feel more comfortable to use the elevated facility. As most stairways are open for bi-directional movement, and under Indian scenario average shoulder depth is considered to be 60 cm (Singh et al. 2016a, b), hence when two pedestrians are moving in opposite direction, a stair width above 1.5 m is preferable. Further the usability of FOB increases most for young male.

Similarly, as per context 2 (i.e. safety and security) sensitivity analysis, two perception scenarios (i.e., good and bad) were tested to understand the sensitivity of gender category, which is reported as one of the important predictors. The first scenario considered was good, where pedestrians use the facility twice a day, believe the safety and security is satisfactory, and are subjected to few obstructions. Similarly, the alternative scenario is bad, where pedestrians use the facility occasionally, believe the safety and security is poor and there are presence of many obstructions. Both the scenarios represent FOBs situated in commercial zone.

The result of the sensitivity test for Model 2 (refer Fig. 8) showed that compared to good perception when pedestrians had bad perception/experience regarding the facility, the preference or usability dropped and the magnitude of reduction in usability is more for old age people.

Fig. 8
figure 8

Sensitivity analysis for Model 2 (Context: Safety and Security)

Sensitivity test for Model 3 (vertical connectivity) and Model 4 (horizontal connectivity) were also conducted. In the vertical connectivity, it was found that as number of steps increases pedestrians need to put more effort during climbing up as well as down which eventually demotivates pedestrians to use the facility, and the highest impact is observed on the choice of aged pedestrians. Similarly, sensitivity analysis of horizontal connectivity revealed a negative relationship between length and facility use, this is because when travel length becomes significantly longer compared to at-grade traverse distance, pedestrians might feel negligent towards using the grade-separated facilities.

7 Conclusion

In the current study, information was obtained through interviewer-administered questionnaire survey and field measurement sessions near fourteen overpasses or Foot Over Bridges (FOBs) locations under different land-use types (commercial, residential, educational, and public transport terminal) across six Indian cities. In total, 552 valid survey samples were collected from the respondents. Analysis results revealed that the majority of pedestrians who were using the FOBs were young (13–45 years), regular users (used twice or more than twice daily) and mostly comprised of male (~ 65–75%) pedestrians. The career of the users further influenced the preference of use. The students and working professionals were more likely to use the FOBs than other profession types.

In the present study, machine learning techniques such as generalized linear model (GLM), random forest (RF) and gradient boosting machine (GBM) learning algorithms were compared to find the optimal solution that accurately predicts primary factors affecting the use of FOBs. Among different machine learning techniques, GBM outperformed the other two in terms of prediction performance on unknown (test) data set for identifying the essential parameters affecting pedestrians’ choice of using the elevated facility under four different contexts (i.e., mobility friction, safety and security, vertical end connectivity and horizontal end connectivity). The major conclusions drawn from this study were described as follows:

  1. (a)

    One of the most crucial factors that decided the usability was safety and security, which was gender-specific and depended on the existing walk environment. The feedback provided by the respondents revealed that the unavailability of CCTV cameras and security personnel, along with the prevalence of antisocial activities were significant concerns that influenced the usability choice.

  2. (b)

    The age of the pedestrian played a significant role in the choice of FOBs when the decision was derived regarding ease in vertical movement (climbing stairs) and when longer travel distance (length) was a matter of concern.

  3. (c)

    Gender acted as a significant influencer when the usability choice was solely dependent on perceived safety & security.

  4. (d)

    Sensitivity analysis showed that stair width above 1.5 m increased the usability preference of pedestrians.

  5. (e)

    Overall good perception/experience regarding the facility, increased the usability preference.

  6. (f)

    As the number of steps to climb increased, the usability preference decreased among aged pedestrians.

  7. (g)

    The design parameters of FOB such as the number of stairs to climb, stair width, width of the walkway and length of the FOB, were associated with the perceived comfort and also determined whether a pedestrian would choose the facility. The provision of stairs with short risers and escalator/lift/ramp would enhance the comfort of pedestrians and motivate them to use FOBs more frequently.

  8. (h)

    The feedback provided by the users revealed that apart from improving security, removing obstructions and installing lifts/escalators also affect the decision of FOB use. The respondents further expressed their concern regarding an immediate need for proper lighting and shade, along with regular maintenance of the facilities to increase the future usability.

8 Advantages of boosting ML model in FOB research

The use of advanced soft computing models (boosting-based algorithms) reduces the prediction uncertainty in pedestrian research. The boosting-based algorithm works well compared to RF- or GLM-based algorithms, as the nodes in every tree take different subset of features for making best split which makes them uncorrelated. Additionally, each new tree considers and corrects the errors or mistakes made by the previous trees which helps in achieving better prediction accuracy.

The finding of these models would ultimately provide new insight into factors associated with the use of FOBs for researchers, planners, and policymakers. The outcomes of this study could be used to improve the existing users’ experience (by improving the existing facilities or constructing better elevated facilities in the future), which could encourage them to use the facility more often and attract new pedestrians to use FOBs. To achieve these goals, it is also essential to make the facilities more attractive and user friendly, which provides the pedestrians a safe and comfortable crossing experience. By attracting new pedestrians and enabling existing users to use the facility continuously, could indirectly reduce at-grade road crossing risk. Further, awareness campaigns and strict enforcements come as handy tools to eradicate accidents due to the illegal road crossing.

9 Study limitations and future scope

Some of the significant challenges in the current study were low response rate, duration of data collection (i.e., restricted to a single day), language diversity, and the number of locations covered.

In future, research can be further extended by considering the broader population and covering more types of locations (e.g., institutional and recreational areas) across other Indian cities under different climatic conditions. Additionally, studies can be carried out by comparing both at-grade and grade-separated pedestrian facilities (FOBs or subways), considering both questionnaire and videography surveying techniques.

9.1 Glossary

Area Under Curve (AUC) Performance measure used for binary classifiers. Typically range from 0.5 to 1.0, higher the value better the model performance. Obtained from the plot between true positive rate and true negative rate.

Cross-Validation (CV) In the cross-validation, the total training sample is divided into k blocks, and each of the k blocks used as a validation dataset and rest used for training. The process repeats k times with different parts of the training dataset being the validation set each time. The error of the final model is obtained regarding the average value of all k models.

Pruning Pruning is a technique to chop off sections that are not very powerful in classifying examples to avoid overfitting and helps in improving model prediction accuracy.

Gini Index In tree-based models, Gini Index used as one of the common impurity measures in features splitting (Fawagreh et al. 2014) shown in Eq. 2. Lower the Gini value; higher will be the purity of split.

$$ {\text{Gini}}\left( t \right) = 1 - \mathop \sum \limits_{i = 1}^{N} P\left( {C_{i} /t} \right)^{2} $$
(2)

where t is a condition, N the number of classes in the data set, and Ci is the ith class label in the data set.

F-measure The F-measure is defined as the weighted harmonic mean of its precision and recall (Buckland and Gey 1994; Powers 2007; Yu-Wei 2015; Golakiya et al. 2019), shown in Eq. 3. A value closer to one indicates a better performing classification model. The measure for precision and recall is illustrated as a binary contingency table (refer to Table 16).

Table 16 Binary contingency table

where Precision and Recall can be defined as:

Precession = \( \frac{A}{A + B} \) and, Recall = \( \frac{A}{A + C} \)

Similarly,

$$ F_{\text{measure}} = \, 2 \, *\frac{{{\text{Precision}}*{\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}} $$
(3)