1 Introduction

Transit trip planners are one of the most important web applications in advanced public transportation systems (APTS) for increasing public transit ridership and information broadcasting. Recently, most research on transit trip planners has focused primarily on three areas. The first area is design of trip planning algorithms (also known as transit path finding algorithms); the second is trip planner system architecture design. For example, Sun et al. used a service-oriented architecture and created a web-based transit trip-planning system (Sun et al. 2011). Cherry and Hickman (2006) detailed a transit itinerary planner system using Arc Internet Map Server (ArcIMS). The third area of research is transit data model development. For example, in order to efficiently search for optimal transit paths, Huang and Peng (2002a) developed an object-oriented geographic information system (GIS) data model to support their proposed trip planning algorithms. Although system architecture and data models are both important, trip planning algorithms are the fundamental component of transit information services, facilitating trip planning and producing optimal routes.

In practice, Google Maps provides a powerful trip planning tool that considers various travel modes. However, few technical or white papers have documented the details of the trip planning tool in Google Maps. OneBusAway (2015) is another well-known trip itinerary planner that was designed primarily for surface public transportation in five major US cities. Documents and papers discussing OneBusAway have focused on improving transit attractiveness by providing real-time transit information (Ferris et al. 2010; Brakewood et al. 2014, 2015). Moreover, a leading open source trip planner, OpenTripPlanner (which relies on open data standards, including the General Transit Feed Specification (GTFS) for transit schedule data), was designed for multimodal trip planning (OpenTripPlanner 2016). In academia, most previous research on trip planning has used mature graph theory and graph theory-based shortest path algorithms because transit networks can be simplified as static networks consisting of nodes, arcs, and weights/costs. These graph-based algorithms include the Floyd–Warshall algorithm, Dijkstra’s algorithm, and the A* search algorithm (Leiserson et al. 2009). Of these, Dijkstra’s algorithm is the most popular (Huang and Peng 2002b) because it is less computational intensive and is conducive to online and real-time system implementation. In addition to the conventional graph theory-based algorithms, recent studies (e.g. Fu and Rilett 1998; Chen and Ji 2005; Chen et al. 2013) have focused on developing reliability-based path finding algorithms. However, these newer algorithms were primarily used to find the shortest/most reliable paths in the context of freeway or arterial networks. Transit network operations are based on timetables, as opposed to the unconstrained flow of vehicles on freeway or arterial networks. Transit path finding algorithms may need to consider transit schedule information. Without considering transit information, these algorithms may not represent reality.

To incorporate transit information (especially schedules) in determination of optimal trip paths, headway and schedule-based trip planning algorithms have been proposed in previous studies (Fu et al. 2012; Nuzzolo and Crisalli 2004; Schmöcker 2006). Most headway-based algorithms were derived from graph theory-based shortest path algorithms and perform best under frequent-service conditions (Huang and Peng 2002b), while schedule-based algorithms are highly dependent on transit schedules. Both headway and schedule-based algorithms usually are based on transit assignment models. For example, Wong and Tong (1998) proposed a three-stage schedule-based network model to estimate time-dependent transit origin–destination matrices. They claimed that their model also generates route guidance information for passengers. Friedrich et al. (2001) used branch and bound techniques to assign transit flow on a schedule-based network. Their model could potentially be used for planning trips. Most of the transit assignment models were proposed to address transit assignment issues, with heuristic algorithms applied to obtain numeric solutions. Generally, a large number of iterations are required to reach optimal or near-optimal solutions, and consequently, the computation time increases significantly with an increase in the number of iterations and network size. Because of this, the original transit assignment models (which seek network level transit equilibrium) may be unsuitable for finding optimal paths in online or real-time applications.

These above mentioned transit trip planning algorithms simplify transit networks as graphs and do not consider transit performance measures (such as on-time performance). Several assumptions may be required to enable their use. For example, Huang and Peng (2002b) listed five assumptions for their models: “(1) there is no congestion in the transit system; (2) bus is the only transit mode; (3) buses run on time; (4) transfers only take place at nodes in the path-searching process; and (5) walking time for transfer at a node is constant.” Although assumptions like these are generally reasonable, ideally trip planning models should require as few simplifications as possible. In this case, assumptions (1) and (3) could be replaced by incorporating the probability of on-time arrival at stops. Similarly, assumption (5) could be addressed by assigning a reasonable probabilistic distribution for walking time. Assumption (4) implies that the only type of transfer is bus-to-bus at a single transit stop.

Therefore, the main objectives of our study are (1) propose a transit path finding algorithm for online and real-time applications; and (2) propose a transit path finding algorithm to avoid assumptions, such as those mentioned above, by incorporating not only transit trip travel time but also a probability-based link-level reliability measure. Walking is also considered in the trip planning process to account for passengers who walk between transit routes. Our study expands the traveler’s transfer options, including not only bus-to-bus transfers but also stop-to-stop transfers between two stops. Although both streetcars and buses were in operation in our study region, only one streetcar route existed. As a result, only buses were considered. The proposed model is capable of finding optimal paths by considering both travel time and travel time reliability. The new model would also help travelers determine a suitable buffer time before departure.

This paper begins by detailing a chance constrained model incorporating link-level transit reliability measure, followed by a numerical method of solving the model. Network construction and relevant transit measures are introduced, followed by the results of three groups of experiments. The paper ends with conclusions and recommendations regarding the proposed model and its implementation.

2 Modeling framework

2.1 Chance constrained model

Many factors impact bus operation, including passenger boarding, traffic conditions, traffic signal configurations, etc. Therefore, bus travel time is generally difficult to estimate. Additionally, passenger-experienced transit travel time usually includes bus travel time, walking time (either between transit stops or between the origin/destination and stops), and wait time at stops. The latter two types of travel time add complexity to passenger-experienced transit travel time estimation and result in uncertainty. In order to resolve the problems regarding uncertainty, chance constrained programming was used as a means of describing the constraints in mathematical programming models as attainment probability levels. Consideration of chance constraints allows decision makers to consider mathematical programming objectives in terms of the probability of their attainment (Olson and Wu 2010). In our research, the uncertainties in travel time are considered and expand the deterministic linear programming to a stochastic model with chance constraints to guarantee the probability of arriving at destinations on-time. For simplicity, the bus travel time between two adjacent transit stops will be referred to as link travel time while travel time from the origin to the destination will be referred to as path travel time. Path travel time includes link travel times.

For stochastic and time-dependent transit networks, path travel time is an objective function including link travel time, transfer time (waiting time at stops), and walking time. However, for link travel time, few universally valid models for bus movements in urban environments have been developed, since buses are significantly affected by traffic conditions, roadway conditions, traffic signal control, company policies, etc. (Acer et al. 2012). Therefore, using a nonparametric probability distribution estimation method could provide greater flexibility and increased fidelity with fewer assumptions. For details about estimated travel time distributions, we refer to our previous research (Yang and Wu 2016). Specific calculation of the link travel time is shown in the following paragraphs. All of the corresponding assumptions made for this study and the notations (Table 1) that were used for the model are shown below.

Table 1 Notation
  1. 1.

    The modes studied here included bus and walking. Walking was only considered between bus transfers.

  2. 2.

    Link travel time, transfer time, and walking time between any consecutive stops were treated as random variables. Their interdependency was not considered.

  3. 3.

    Bus transfer time was assumed to follow the uniform distributions; the lower and upper limit were determined by bus timetables.

  4. 4.

    Walking time was computed based on the distance between two nodes and walking speed, which was presumed to follow the Normal distribution \( N(\mu , \sigma^{2} ) \). Reasonable values of \( \mu \) and \( \sigma \) are 1.35 and 0.2 m/s (Chandra and Bharti 2013). Since the first and second moment of reciprocal normal distribution do not exist, the mean of walking time can be estimated by \( distance/\mu \), and the standard deviation of walking time can be estimated by the estimator \( distance*(\frac{1}{\mu - 0.67449*\sigma } - \frac{1}{\mu + 0.67449*\sigma })/\lambda. \) λ was empirically set to be 1.34898 in our study.

For this stochastic network, the objective of minimizing expected total travel time can be expressed as shown in Eq. (1).

$$ \hbox{min } ET = \mathop \sum \limits_{{\left( {i,j} \right) \in A, m \in M}} x_{ijm} E_{t,ijm} $$
(1)

For the constraints, the basic flow balance constraints should be included to generate the feasible path which is given in Eq. (2).

$$ \sum\limits_{\left( {i,j} \right) \in A,m \in M} x_{ijm} - \sum\limits_{\left( {j,i} \right) \in A,m \in M} x_{jim} = \left\{ \begin{array}{ll} {1,} & {if\, i = O} \\ {0,} & {if\, i \ne O\,or\, D} \\ { - 1,}& {if\, i = D} \\ \end{array} \right..$$
(2)

The chance constraint is introduced here to guarantee the on-time arrival probability which should be equal to or greater than a pre-defined confidence level \( c \in [0, 1] \). The selection of the confidence level reflects user attitudes towards arriving at the destination within their anticipated travel time. Higher confidence levels mean the anticipated arrival time is a stronger consideration on the user’s route choice, while smaller confidence levels mean a higher risk acceptance towards being late.

$$ P\left\{ {ET \le (AT - CT)} \right\} \ge c $$
(3)

Travel time uncertainties are typically represented by random distributions. If the travel time between any consecutive stops is independently distributed, the path travel time approximately follows a Normal distribution according to the Central Limit Theorem. Then, based on the Central Limit Theorem and the independence assumption, it is appropriate to add up the mean and variance values of all the arcs in the path as the mean and variance value of this path. Hence, the chance constraint can be formed in the following equivalent deterministic constraints according to (Li et al. 2013).

$$ ET \le \sum\limits_{{\left( {i,j} \right) \in A,m \in M}} {x_{ijm} E_{t,ijm} } - z_{c} \sqrt {\sum\nolimits_{{\left( {i,j} \right) \in A,m \in M}} {x_{ijm} } (SD_{t,ijm} )^{2} } $$
(4)
$$ x_{ijm} \in \left\{ {0,1} \right\} $$
(5)

Although only bus and walk modes were considered in our study, we believe that the chance constrained model used here is flexible and compatible enough to be used for various other travel modes and uncertainty types, due to the characteristics of the chance constrained model. For instance, additional transport modes (e.g. metro trains and automobiles) could be integrated into a transit network by merging individual mode networks and connecting these networks using shared or overlapped nodes. In addition to the individual network connections, different probability distributions could also be included in the chance constrained model. The model developed in this study is simpler and easier to be implemented based on existing efficient shortest path finding algorithms.

2.2 Solution method

Up to this point, the model with chance constraints has been transformed into a classic network model for the shortest path problem with an extra travel time upper limit constraint. Numerous algorithms have been developed for this problem category in the static or stochastic network. Classic shortest path algorithms such as Dijkstra, Bellman, and Dreyfus focus on networks with deterministic arc weights, and for time-dependent networks some of the other solution algorithms, such as the exact or heuristic algorithms, were also proposed recently (Ji et al. 2011; Liu et al. 2014). All of these algorithms seek to obtain an optimum or near optimum path which limits the alternative options.

For practical application, there will be some paths that have the same objective value and satisfy the confidence level requirement; therefore, it is best to rank several possible optimal options for passengers to choose from based on their own preferences. This ranking system in increasing order of length is usually referred to as a k-shortest path problem which is a natural and long-studied generalization of the shortest path problem (Hershberger et al. 2007). The k-shortest path problem was originally examined by Hoffman and Pavley (1959), but nearly all early attempts to solve it led to exponential time algorithms (Hershberger et al. 2007). The best known implementation for this algorithm was proposed by Yen (1971) using modern data structures, in which \( O(kn(m + nlog(n))) \) limits the worst case time. This algorithm essentially performs \( O(n) \) single-source shortest path computations for each output path. Based on all these considerations, the framework of a k-shortest path algorithm was used to conduct the experiments and in the repeated k times of the algorithm, feasibility of constraint (4) needed to be checked; the solution path will not be stored if the feasibility is not satisfied. The framework used in this research is shown in Fig. 1. It is noted that, in iteration k, spur path S is the shortest path from the spur node \( \nu \) to the destination. The spur node \( \nu \) is retrieved from the previous shortest path (k − 1), and then the corresponding root path R is the node sequence from the origin to the spur node \( \nu \)

Fig. 1
figure 1

K-shortest path solution framework

3 Study site and data preparation

Transit service data assists transit agencies in making decisions regarding operations and planning. Manually collecting transit service data has been a popular approach in the past several decades. Recently, emerging techniques have allowed decision makers and researchers to automatically collect transit service information. For instance, GPS can be used to locate transit fleets in real-time. The Automated Vehicle Location (AVL) system is built based on GPS techniques. The transit service data collected from the AVL system contains not only fleet location information but also transit-related information (e.g. trip, route, and bus stop arrival time). However, specific transit agencies may define AVL data formats to satisfy their own operational and planning requirements. Google encourages transit agencies to follow the data format defined in the General Transit Feed Specification (GTFS) to exchange and share transit service information. Thus, the GTFS data format is becoming more popular in the United States. Two types of data formats are defined in the specification, including GTFS-static and GTFS-realtime. The GTFS-static data contains transit facilities and schedule information. For example, Fig. 2 shows an overview of bus stops in Tucson, Arizona. The location information for these bus stops is extracted from the GTFS-static data. This data is also used to build a network for trip planning and path choice. Real-time transit fleet information is encrypted in the GTFS-realtime data. This data can be used to estimate transit service measures. Two transit service measures were selected, and the details of estimating the measure will be given in the following section. Sun Tran, which manages transit service (including over 30 routes and 2000 bus stops) in the Tucson area, has implemented the two GTFS data formats and made them accessible to the public. Both types of GTFS data were collected from August 2014 to June 2015 and used in our study.

Fig. 2
figure 2

Bus stops and transit network in Tucson (Background image is from the OpenStreetMap)

3.1 Transit service measures

Two commonly used transit service measures were estimated using the GTFS data, including the mean value of link travel time (also known as stop-to-stop travel time) and transit service reliability. Transit service reliability herein is defined as the variance of link travel time. Travel time reliability is typically measured by time of day (TOD) and day of week (DOW) (Yang et al. 2014; Yang and Wu 2016). The transit service in Tucson used two timetables for weekdays and weekends because of noticeable differences in transit demand. A dummy variable \( w \) was used to indicate either weekdays or weekends. Therefore, transit service reliability was measured by TOD and DOW.

$$ \overline{{TT_{n, l}^{t, w} }} = \frac{1}{K} \mathop \sum \limits_{k = 1}^{K} TT_{n, l, k}^{t, w} $$
(6)
$$ \overline{{TTR_{n, l}^{t, w} }} = \frac{1}{K} \mathop \sum \limits_{k = 1}^{K} \left( {TT_{n, l, k}^{t, w} - \overline{{TT_{n, l}^{t, w} }} } \right)^{2} $$
(7)

where \( TT \) and \( TTR \) are average and variance of link travel time given \( n \), \( l \), \( t \), and \( w \); \( n \) represents the route number; \( l \) represents the lth link on Route \( n \); \( w \) is 0 for weekends and 1 for weekdays; \( t \) represents time of day (TOD); \( k \) is the kth estimated link travel time given \( n \), \( l \), \( t \), and \( w. \)

3.2 Network construction

Figure 3 demonstrates a theoretical transit network consisting of three transit routes and 13 transit stops. Multiple routes may travel on the same link and pass through the same stops. The link travel time and link travel time reliability are estimated by specific routes and links. For example, Westbound Routes A and C are designed to travel on a link consisting of Stops 1 and 2. \( TT_{A, 1}^{t, w} \) and \( TTR_{A, 1}^{t, w} \) represent the travel time and travel time reliability, respectively, on the first link, Route A, given a certain TOD and \( w \). \( T_{C, 1}^{t, w} \) and \( TTR_{C, 1}^{t, w} \) represent the travel time and corresponding travel time reliability on Route C on the same link. Static transit stop information was extracted from the GTFS-static data and the links were constructed using two consecutive stops on a specific route. The link travel times and the link travel time reliability were estimated using the GTFS-realtime data. Also, for the walking network, the link travel time and reliability were estimated (see Sect. 3). Finally, the transit network and walking network were combined and connected using static stops, constructed links, and corresponding link travel time and travel time reliability.

Fig. 3
figure 3

Transit network construction

4 Experiments and validation

Based on the above mentioned data and approaches for network construction, the transit network in the Tucson area was modeled. The transit bus network consisted of 2332 bus stops and 3529 links. Two modes were primarily considered in the network, including walking and taking transit buses. Thus, optimal paths could include both modes. Three groups of experiments were created to demonstrate the effects of transit service uncertainty on path choices. Catching a flight is a time-constrained activity, so it was used as a measure of the impact of transit reliability. Accordingly, the Tucson International Airport was set as the destination of the experiments.

Note that only regular transit services (e.g. non-holiday transit service) were considered in our study because regular transit services play more important roles than irregular transit services in riders’ daily lives. In addition, irregular transit services (e.g. the service on holidays) may not be implementable in our study because: (1) limited data was available for holiday transit services, and (2) large holiday headways (usually 30 min in our case studies) made the transit services temporally unavailable at some points.

4.1 Experiment group 1: effects of chance constraints on path choice

This experiment group was designed to investigate the effects of chance constraints on path choice. The departure time was selected as 5 pm on a weekday, when traffic usually suffered from recurrent congestion. The confidence level of the chance constraint was set at 99.5 %. Four scenarios were created by using two origins [the University of Arizona (UA) Mall bus stop (Stop 100) and the Kain/Kimberly PI bus stop (Stop 13912)] and whether the chance constraint was considered. These two origins were chosen for illustration as multiple transfer options were available due to their long distance from the airport. Also, at the selected peak hour (5 pm on a weekday), the travel time from these two stops to the airport was generally higher than non-peak hours.

  • Scenario 1a: Stop 100; do not consider chance constraint;

  • Scenario 1b: Stop 100; consider chance constraint;

  • Scenario 2a: Stop 13912; do not consider chance constraint;

  • Scenario 2b: Stop 13912; consider chance constraint;

A summary of the results is listed in Table 2, the paths are visualized in Fig. 4, and several findings are noted below.

Table 2 Results comparisons between with and without considering chance constraint
Fig. 4
figure 4figure 4

Optimal paths without and with chance constraints consideration

  1. 1.

    Both scenarios showed that the optimal travel times were higher when the chance constraint was considered.

  2. 2.

    Walking was preferred in Scenario 1b, because walking was more reliable than taking and waiting for buses. Both Routes 9 and 25 were chosen in Scenario 1. The major difference between Scenarios 1a and 1b was the mode selection to pass through Stop 10862. Taking buses was chosen in Scenario 1a (the stop was included in Route 25) while walking to the stop was chosen in Scenario 1b. The travel time of buses was usually shorter than walking time. However, congested traffic conditions may lead to transit bus arrival being less predictable and reliable. The selection of walking may become an alternative to avoid traffic congestion and ensure on-time arrival. Thus, walking became the optimal choice when considering the chance constraints. The optimal path chosen for Stop 100 in Scenario 1a and Scenario 1b is shown in Fig. 4a.

  3. 3.

    More reliable paths were chosen in Scenario 2. The differences of walking time between Scenario 2a and Scenario 2b were minor (approximately 1.4 and 1.8 min, respectively). Route 6 was chosen in Scenario 2a and was planned on a busier roadway; whereas Route 19 in Scenario 2b was planned on a roadway with relatively light traffic. Although the optimal travel time of Scenario 2a was slightly smaller than that of Scenario 2b, Scenario 2b could be a better path choice when considering chance constraints with a higher on-time arrival confidence level. The optimal path chosen for Stop 13912 in Scenario 2a and Scenario 2b is shown in Fig. 4.

4.2 Experiment group 2: effects of confidence levels on path choice

The second group of experiments was designed to investigate the effects of different on-time arrival confidence levels on path choice. Once again, the destination was the Tucson International Airport. Three origins were selected, including the UA Mall bus stop (Stop 100), the Kain/Kimberly PI bus stop (Stop 13912), and the 1st Ave/Rillito Park (Stop 12900). Since traffic conditions greatly affect transit reliability, two different weekday times, 6 am and 5 pm, were used for the experiment scenarios. Transit service was considered reliable at 6 am and less reliable at 5 pm. Thus, six scenarios were created based on the three origins and these two TODs. Several levels of on-time arrival confidence levels were tested for each scenario.

  • Scenario 1a: Stop 100; departure time: 6 am

  • Scenario 1b: Stop 100; departure time: 5 pm

  • Scenario 2a: Stop 13912; departure time: 6 am

  • Scenario 2b: Stop 13912; departure time: 5 pm

  • Scenario 3a: Stop 12900; departure time: 6 am

  • Scenario 3b: Stop 12900; departure time: 5 pm

Figure 5 and Table 3 show the optimal anticipated travel times for each scenario, and several findings are summarized below.

Fig. 5
figure 5

Optimal anticipated travel time vs. predefined confidence level

Table 3 Basic Statistics
  1. 1.

    The optimal anticipated travel times increased with increasing on-time arrival confidence levels. For example, the optimal anticipated travel time was 82.5 min when the chance constraint was not considered. The optimal anticipated travel time increased to 109.3 min when the on-time arrival confidence level was set at 99.5 % in Scenario 1. The same increasing trend can be observed in all of the scenarios. The trend matches travelers’ intuition: for a fixed arrival time, the more planned time, the higher the on-time arrival confidence level.

  2. 2.

    The optimal anticipated travel time for the 5 pm departure given the same on-time arrival confidence level was greater than that at the 6 am departure. For example, without considering confidence level, the optimal anticipated travel times were 82.45 and 90.55 min, respectively. Generally, the transit service was more reliable in the early morning than during peak hours.

  3. 3.

    Approximately 30 % additional planned time could ensure on-time arrival at a relatively high confidence level. Table 3 lists the optimal anticipated travel times when the on-time arrival confidence level was not considered and at the 99.5 % level for the six scenarios. Although the time difference varied, the percentage differences suggested that trips could be on-time at an on-time arrival confidence level of 99.5 % if 30 % additional planned time was added as a buffer.

4.3 Experiment group 3: weekend vs. weekday

The third experiment group was designed to investigate the optimal anticipated travel time and path choice on weekends and weekdays. Due to light traffic on weekends, transit service was originally supposed to be reliable and similar to service in the early morning. However, Fig. 5 shows that great differences of the optimal anticipated travel time existed between 6 am on weekdays and 10 am on weekends. The major difference between the weekday timetable and weekend timetable was the bus time headway. The time headway was typically 10 or 15 min on weekdays; while it was 60 min on weekends. Larger headway resulted in longer waiting time at bus stops, thus the optimal anticipated travel time increased. Additionally, the optimal anticipated travel time increased much faster in the confidence range [0.95, 0.995] for all of the scenarios as the quantile difference in this range was much higher than in other selected ranges (\( z_{0.995} \) ≈ 2.81, \( z_{0.95} \) ≈ 1.96, \( z_{0.90} \) ≈ 1.64).

As the results in Sect. 4.2 show (considering the three origin points), additional planned time is suggested as a buffer during non-peak hours (e.g., 6 am on weekdays) or peak hours (5 pm on weekdays) for on-time arrival at the airport if the confidence level is 99.5 %. Figure 5 also shows the necessary buffer time for the three origin points at different confidence levels for non-peak and peak hours on weekdays. However, the buffer time required is higher on weekends due to much longer waiting times caused by different bus timetables. Note that: (1) two transportation modes, bus and walking, were considered here. This result could be a reliable reference for passengers who only have these two travel options. The results could vary depending on several factors (e.g. the traffic peak hour, planned bus routes and timetable); (2) as indicated by the chance constraint of the proposed model, the additional buffer time greatly depends on the travel links involved and the deviation of travel time; (3) an airport was used as the destination because of its significant on-time requirements and passengers’ desire for on-time arrival. The proposed chance constrained model also could be applied to other origin–destination pairs.

5 Conclusions

Transit systems are not thoroughly utilized in the United States. Previous studies have shown that low fares and highly accessible information can attract increasing numbers of travelers to use transit. With new technologies emerging, the ease of tracking and collecting transit fleet information in real-time helps improve both operations and release real-time information. To further encourage travelers to take transit, an efficient decision tool is required to help travelers plan trips. APTS, such as transit trip planners, are one method for travelers to access and utilize transit information. In our study, a data-driven decision framework for transit path optimization was proposed and implemented. The advantages of the proposed framework are highlighted below.

  • Not only transit travel time but also transit travel time reliability was considered when planning optimal trips. These two important measures can inform passengers of both anticipated travel times and the on-time arrival confidence level. Passengers can use this information to better plan their trips.

  • The two transit measures were related by using a chance constrained decision model to obtain travel paths under different uncertainties. Then the chance constraint was transformed into an equivalent deterministic constraint based on the approximate Normal Distribution property of the path.

  • Walking was considered when passengers needed to transfer bus routes. Incorporating the walking mode would give passengers more details regarding trip planning and could help them plan more reliable trips.

  • The proposed reliability-based model is equivalent to conventional models (e.g. the Dijkstra algorithm) if the confidence level is zero. The reliability-based model would become like a conventional trip planner if limited data was available, ensuring model robustness regardless of data availability.

GTFS-static and GTFS-realtime data was collected and adopted for path optimization in the Tucson, Arizona area. Both data types were utilized to estimate link travel time and corresponding link travel time reliability in the transit network. Three groups of experiments with several different scenarios were conducted. The results from the experiments suggested that:

  • Optimal anticipated travel time increased with increasing on-time arrival confidence level. Essentially, more reliable planned transit paths usually involve longer anticipated travel times. Approximately 30 % additional time can serve as a reference for allocating traveling buffer time to ensure a high on-time arrival confidence level to the Tucson International Airport.

  • Walking was preferred when transferring buses, instead of taking a transit detour. This is because walking has relatively high reliability. The chance constrained decision model gave more weights to more reliable modes.

  • Given the same on-time arrival confidence level, additional time was required when traveling during peak hours, compared with non-peak hours, indicating that congested traffic results in less reliable transit service.

Further research could easily be extended to include intercity trip planning using aviation networks. Transportation modes, such as bikes and trains, could also be considered, but new challenges could arise, including the additional data required to quantify real time travel uncertainty for these modes, and the computation cost might increase due to the transit network with additional modes. Other statistical models, such as the Gaussian mixture model, could be employed to quantify the uncertainties in travel data and further refine the correlations of travel times between two consecutive links.