1 Introduction

Passenger demand plays an essential role in tactical planning and operational control in transportation, especially in public transport, because transit vehicles have to stop for passengers boarding and alighting. Transit tactical planning and operational control, as defined in [9], concerns the decisions to design the exact transit services, e.g. frequency of services and timetables; and the decisions to control the operating service, especially in real time. The questions of modelling the expected number of passengers arrival at transit stops are essential for these studies. For instance, the total or mean waiting time is often used as the main objective function for public transport tactical planning and operation studies [3, 8,9,10], which in turn is estimated using a knowledge of passenger demand.

The expected number of passenger arrivals can be explicitly linked to the estimation of aggregated passenger counts within a time period. Literature currently offers two major lines of research for this problem, one for long-term and the other for short-term passenger demand estimation. Long-term demand estimation models aim to complement long-term transit planning practice, such as in four-step demand modelling [19], route planning and frequency setting [9]. These models are developed to anticipate the approximation of passenger demand in the long-term for transit strategic planning, rather than the tactical planning and operational control problem discussed in this chapter. The other line of research, a short-term demand estimation model, that favours the use of data-driven and black-box methods, mainly aims for predictions. Examples of them include Neural Network [4, 20], Support Vector Machine [23] and the time-series analysis models [18]. While these methods showed their accuracy and robustness, the majority of them aim to provide predictions rather than an analytical connection between passenger demand and explanatory variables. For transit tactical planning and operational studies, data-driven models for short-term prediction may not be as useful as analytical models, because analytical models can be a part of an holistic framework, where researchers can estimate the passenger demand given the changes in explanatory variables. Existing data-driven methods generally use aggregated counts at previous time steps to predict the count at the next time step by relying on the underlying dynamic relationship between adjacent time steps.

One question which is of interest is how passengers arrive at transit stops. Transport researchers are generally interested in modelling and simulating the exact passenger arrival times at transit stops. This information is helpful for various purposes, for instance, to estimate the total travel time for passengers from the moment of arrival at transit stops to the moment of alighting from a transit vehicle. Existing studies in transit planning and operational control usually assume a known passenger arrival rate, which is the number of passengers arriving at a transit stop per time unit. The arrival rate allows a convenient simulation of passenger arrivals under one of two approaches: (a) deterministic or (b) stochastic point process. The deterministic approach assumes that passengers arrive uniformly to transit stops, so that the number of boarding/arrived passengers is simply the product of the passenger arrival rate and the time headway between consecutive vehicles. The approach has been used in many earlier studies such as [10, 13]. References [6, 7] also use a variation of this approach, where a dimensionless parameter is used to represent the marginal increase in vehicle delay resulted from a unit increase in headway. The stochastic point process approach assumes that passengers arrive randomly at stops with a stable arrival rate. In the majority of existing studies, this point process is an Homogeneous Poisson Process (HPP), which aims to model the passenger arrival times using only the arrival rate and the time interval between consecutive arrivals, regardless of the interval starting time. HPP is widely used to model systems with stochastic events, such as modelling the presence of connected vehicle in traffic [25] or traffic incidents [1]. An emerging number of existing studies in public transport have also adopted this stochastic approach, such as [12, 17, 24]. There is considerable evidence that assumptions of stochastic HPP process for passenger arrivals is reasonable for high-frequency services, such as those with scheduled headway to 10–15 min [9]. At longer headways, there is another line of research concerning passengers who time their arrivals with the schedule and service reliability [2, 11]. In this study, we assume that passengers do not consult the schedule prior to arrival at transit stops, thus the use of a stochastic point process such as HPP remains valid.

In literature, existing stochastic processes of public transport assume a stable passenger arrival rate or an intensity that does not change over time. A common approach to include time into consideration is to define exogenous time intervals. In each interval, the passenger arrival rate is constant. This approach has limited accuracy, because the passenger arrival process is not fully continuous time-dependent, but rather multiple independent HPP superimposed [22]. The non-homogeneous Poisson Process (NHPP), which allows the arrival rate to be continuous time-dependent, is a substantial advance from the HPP in terms of versatility and accuracy to the model passenger arrival process. NHPP models are not popular in public transit studies, but have been used elsewhere, such as software reliability [14] and finance [5].

This chapter proposes two analytical methods to model expected arrival rate of passengers arriving at transit stops. After the literature review, the first part of the chapter concerns the modelling of exact passenger arrival times using a time-varying Point Process model. Another aspect of the chapter concerns that of the modelling of aggregated counts of passenger demand, using a time-varying Poisson Regression model. This model aims to count how many passengers will be at a stop in a specific time period under certain conditions. Only aggregated counts of passenger demand are required to train this model. Finally, we also show the model calibration process using synthetic simulated data.

2 Modelling Exact Arrival Times with Point Process

In this section, we briefly recap the fundamentals of point processes and the celebrated Poisson process, which would be used to ’count’ and further evaluate the passenger demands. The following section serves as the building block for realistic modelling of passenger demands in later sections, to include periodicities in demands.

2.1 A Representation of Point Processes

A point process is a mathematical construct to record times at which event happens, which we shall denote by \(T_1,T_2, \ldots \). For example \(T_1\) represents the time when passenger 1 arrives at a bus stop, \(T_2\), represents the following passenger arrival and so on. \(T_k\) can usually be interpreted as the time of occurrence of the kth event, in this case - the kth arrival. In this chapter, we refer to \(T_i\) as event times. Formally, we define a counting process \(N_t\) as a random function defined on time \(t\ge 0\), and taking integer values \(1,2,\ldots \). We define \(N_0=0.\) \(N_t\) is piecewise constant and has jump size of 1 at the event times \(T_i\). The Poisson process can be defined as follows:

Definition 19.1

(Poisson process) Let \((Q_k)_{k\ge 1}\) be a sequence of independent and identically distributed Exponential random variables with parameter \(\lambda \) and event times \(T_n=\sum _{k=1}^{n}\,Q_i\). The process \((N_t\,,\,t\ge 0)\) defined by \({N}_t:=\sum _{k\,\ge 1\,}\mathbbm {1}_{\{t\ge T_k\}}\) is called a Poisson process with intensity \(\lambda \).

Memoryless Property

Note that the sequence of \(Q_k\) are known as the inter-arrival times, and it can be interpreted as follows in terms our modelling context: the first passenger arrives at time \(Q_1\), the second arrives at \(Q_2\) after the first, so on and so forth. One can show that this construct means that each passenger arrives at an average rate of \(\lambda \) per unit time, since the expected time between event times is \(\frac{1}{\lambda }\). Suppose we were waiting for an arrival of an event, say another bus passenger arrival to a bus stop, the inter-arrival times of which follow an Exponential distribution with parameter \(\lambda \). Assume that r time units have elapsed and during this period no events have arrived, i.e. there are no events during the time interval [0, r]. The probability that we will have to wait a further t time units is given by

$$\begin{aligned} p (Q>t+r\,|\,Q>r)&=\frac{p(Q>t+r\,,\,Q>r)}{p(Q>r)} \nonumber \\&=\frac{p(Q>t+r)}{p(Q>r)}=\frac{\exp ({-\lambda (t+r)})}{\exp ({-\lambda r})}\nonumber \\&=\exp ({-\lambda t})=p(Q>t). \end{aligned}$$
(19.1)

Equation (19.1) is said to have no memory and it is one of the special properties of the Poisson process. Usually memorylessness is a property of certain distribution rather than a process. It usually refers to the waiting time distribution until a certain event; and does not depend on how much time has elapsed already.

Moment Generating Functions

We now look at a particular kind of transformed average. The moment generating function \(\varphi \) of a random variable X, is defined as \(\varphi _X(s):=E[e^{sX}]\). We now compute the moment generating function of a Poisson distribution \(X\sim Pois(\lambda )\):

$$\begin{aligned} \varphi _X(s)=E[e^{sX}]=\sum _{k=0}^{\infty }e^{sk}p(X=k)=\sum _{k=0}^{\infty }\frac{e^{sk}e^{-\lambda }\lambda ^k}{k!}=e^{-\lambda }\sum _{k=0}^{\infty }\frac{(\lambda e^s)^k}{k!}=e^{\lambda (e^s -1)}. \end{aligned}$$
(19.2)

The moment generating functions are important because each distribution possesses a unique moment generating function. This means that we can infer the distribution from the moment generating function. In addition, the moment generating function of a sum of independent random variables is the product of the moment generating function of the individual random variables.

2.2 Non-homogeneous Poisson Process

The Poisson process, as we defined it so far, is simply characterised by a constant arrival rate \(\lambda \). It is equivalent to an assumption, for example, that public transport passengers arrival rate to stops is the same regardless of the time being mid-night or peak periods. It is more useful to extend the Poisson process to a more general point process in which the arrival rate varies as a function of time. Note that the intensity usually depends on the arrival time, not just on the interarrival time. We can define this type of process as non-homogeneous Poisson process (NHPP).

Definition 19.2

The point process N is said to be an inhomogeneous Poisson process with intensity function \(\lambda (t)\ge 0\ \) with \(t\ge 0,\) if

$$\begin{aligned} p(N_{t+h}=n+m\,|\,N_t=n)&=\lambda (t) h+o(h)\qquad&\mathrm {if}\qquad m=1, \nonumber \\ p(N_{t+h}=n+m\,|\,N_t=n)&=o(h)&\qquad \mathrm {if}\qquad m>1,\nonumber \\ p(N_{t+h}=n+m\,|\,N_t=n)&=1-\lambda (t) h + o(h)&\qquad \mathrm {if}\qquad m=0. \end{aligned}$$
(19.3)

Note that if the point process N be a NHPP with intensity function \(\lambda (t)\), then N(t) follows a Poisson distribution with parameter \(\int _0^t\lambda _u\,du\), i.e. \( p(N_t=n)=\frac{1}{n!}\exp \left( -\int _0^t \lambda _u\,du\right) \left( \int _0^t \lambda _u du \right) ^n\). One can also show that the number of points in the interval [st] follows a Poisson distribution with parameter \(\int _s^t\lambda _u\,du\), i.e. \( p(N_t-N_s=n)=\frac{1}{n!}\cdot \exp \left( -\int _s^t \lambda _u\,du\right) \left( \int _s^t \lambda _u du \right) ^n\).

We can see that the exact event times are needed to calculate moments in the NHPP setting. This next section proposes a public transport demand model and aims to simulate the dynamic and stochastic arrival process of public transport passengers.

2.3 The Proposed Time-Varying Intensity Function for Dynamic and Stochastic Passenger Arrival Process

We propose a parametric form for the rate of demand of passengers:

$$\begin{aligned} \lambda _t = pc^p\,t^{p-1}+\varepsilon , \end{aligned}$$
(19.4)

where \(c>0\) and \(p\in \mathbb {R}\). The parameter \(\varepsilon \) is usually taken to be fixed and acts as a parameter such that the rate never goes negative (bounded away from zero), since a negative rate of demand is non-sensical. Note that this function is rich enough for several reasons. When the parameter is \(p=1\), it reduces to a constant and we know from above that this specifies the parameter for the Exponential random variables. If this is respected then the data follows a Poisson process. If on the other hand, under the case that \(p<1\), this gives a decreasing curve (see plot). We interpret this as a decreasing rate of demand. Finally, our choice of intensity function can also handle the case when \(p>1\) - this corresponds to the increasing rate of demand. We summarise the following description below:

  • it reduces to a constant when \(p=1\), and hence is able to recover Poisson process should the data respects this,

  • when \(p<1\), the rate of demand is decreasing,

  • when \(p>1\), the rate of demand is increasing.

Figure 19.1 shows a plot of this intensity. It can be easily noted that this is a generalisation of the HPP, where the rate can be constant (similar to HPP) or varies over time.

Fig. 19.1
figure 1

A proposed NHPP model with time-varying intensity function

2.4 Likelihood Function for Nonhomogeneous Poisson Process

One of the main problems in modelling a nonhomogeneous Poisson process is inferring its parameters given data so that we have a calibrated model for the demand of passenger arrivals. Let \(N_t\) be a counting process on [0, T] for \(T<\infty \) and let \(\{T_1,T_2,\ldots ,T_n\}\) denote a set of event times of \(N_t\) over the period [0, T]. Then the data likelihood L (see [21] for instance) is a function of parameter set \(\theta \):

$$\begin{aligned} L(\theta ) = \prod _{j=1}^{n}\lambda (T_j)e^{-\int _{0}^{T}\lambda _x \,dx}. \end{aligned}$$
(19.5)

Let \(\varTheta \) be the set of parameters of the modulating of the nonhomogeneous Poisson process. The maximum likelihood estimate can be found by maximising the likelihood function in Eq. 19.5 with respect to the space of \(\theta \in \varTheta \). Concretely, the maximum likelihood estimate \(\hat{\theta }\) is defined to be \(\hat{\theta }=\arg \max _{\theta \in \varTheta } l(\theta )\). It is customary to maximise the log of the likelihood function:

$$\begin{aligned} l(\theta )~= \log L(\theta ) ~=-\int _{0}^{T}\lambda _x \,dx + \sum _{j=1}^{N(T)}\log \lambda (T_i) \end{aligned}$$
(19.6)

This negative log-likelihood can then be minimised with standard optimisation packages.

3 Modelling Aggregated Passenger Demand with Time-Varying Poisson Regression

In this section, we argue that a collective point process framework can also be formulated as a time-varying Poisson Regression model to estimate the count of arriving passengers to public transport stops. Aggregated counts of passengers are assumed to follow a Poisson distribution, which is consistent with the collective assumption in a Poisson Process (Definition 19.2). We then further propose a time-varying formulation of Poisson Regression to model the aggregated passenger counts at different time of the day.

3.1 A Representation of a Generalised Linear Model: Poisson Regression

One of the most common type of regression, the ordinary least squares assumes that the dependent variable Y is normally distributed around the expected value, and can take any real value, even negative values. Another type of regression, the Logistic Regression assumes a binary 0-or-1 dependent variable. These models are often unsuitable for count data, such as aggregated passenger counts, where the data is intrinsically non-negative integer-valued.

Poisson Regression is widely considered as the benchmark model for count data. It assumes the dependent variable Y has a Poisson distribution, and assumes the logarithm of Y can be modelled by a linear combination of X. It is a type of Generalized Linear Model (GLM). Let k be the number of independent variables (regressors). X is a 1-dimension vector \(X = (X_1, X_2, X_k)\), which can be both continuous or categorical variables. Poisson Regression can be written as a GLM for counts:

$$\begin{aligned} \log (\mu )=\beta _0+\beta _1 x_1+\beta _2 x_2+\cdots +\beta _k x_k=x^{T}\beta \end{aligned}$$
(19.7)

The dependent variable Y has a Poisson distribution, that is \(y_i\sim Poisson(\mu _i)\) for \(i=1, \ldots , N\). The Poisson distribution has only one parameter \(\mu \) that decides both conditional mean and variance. The conditional mean \(\mathbb {E}(y|x)\) and conditional variance Var(y|x) are equal in the Poisson regression model. The following exponential mean function can be written:

$$\begin{aligned} \mathbb {E}(y|x)=\mu = \exp (x^{T}\beta ) \end{aligned}$$
(19.8)

Under the GLM framework and assuming an n independent sample of pairs of observations (\(y_i, x_i\)), the regression coefficient \(\beta _j\) can be estimated using Maximum Likelihood Estimation (MLE). It is worth reiterating that MLE aims to find parameters that maximise the probability that the specified model has generated the observed sample. Given the observed data, we can define the joint probability distribution of the sample as the product of individual conditional probability distributions.

$$\begin{aligned} f(y_{1},\displaystyle \ldots , y_{N}|x_{1}, \ldots , x_{N};\beta )=\prod _{i=1}^{N}f(y_{i}|x_{i};\beta ) \end{aligned}$$
(19.9)

As per the previous section, Eq. 19.9 is often called likelihood function, which is often written in a shorter form:

$$\begin{aligned} L=L(\beta ;y_{1}, \ldots , y_{N},\ x_{1}, \ldots , x_{N}) \end{aligned}$$
(19.10)

MLE aims to maximise this likelihood function with regard to parameters \(\hat{\beta }\):

$$\begin{aligned} \displaystyle \hat{\beta }=\arg _{\beta }\max L(\beta ;y_{1}, \ldots , y_{N},\ x_{1}, \ldots , x_{N}) \end{aligned}$$
(19.11)

It is often more convenient to maximise the logarithmic transformation of this likelihood function, as it replaces products by sums and allows the use of the central limit theorem. We define the \(\log \)-likelihood function of Poisson Regression as:

$$\begin{aligned} \begin{aligned} \ell (\beta ;Y,\ X)&= \log \prod _{i=1}^{N}f(y_{i}|x_{i};\beta ) \\&= \sum _{i=1}^{N}\log f(y_{i}|x_{i};\beta ) \\&= \sum _{i=1}^{N}-\exp (x_{i}'\beta )+y_{i}x_{i}'\beta -\log (y_{i}\ !) \end{aligned} \end{aligned}$$
(19.12)

The estimated regression coefficient \(\beta _j\) that maximizes the value of the log-likelihood function, is found by computing the k first derivatives of the \(\log \)-likelihood function with respect to \(\beta _{1}, \beta _{2}, \ldots , \beta _{k}\) and setting them equal to zero.

$$\begin{aligned} s_{N}(\displaystyle \beta ;y,\ x)=\frac{\partial \ell (\beta ;y,x)}{\partial \beta }=\sum _{i=1}^{N}[y_{i}-\exp (x_{i}'\beta )]x_{i} \end{aligned}$$
(19.13)

We define \(\hat{\beta }\) as the value of \(\beta \) that solves the first order conditions:

$$\begin{aligned} s_{N}(\hat{\beta };y,\ x)=0 \end{aligned}$$
(19.14)

The system of k equations in Eq. 19.13 has to be solved using a numerical iterative algorithm due to the non-linearity of \(\beta \). There are a number of existing algorithms in literature that have been well implemented in various statistical packages, such as Newton-Raphson, Broyden-Fletcher-Goldfarb-Shanno (BFGS), Nelder-Mead and Simulated Annealing method.

3.2 Time-Varying Poisson Regression Model

As we are concerned with the time dimension in the passenger arrival process, the arrival patterns can be considered as a time series \(Y_t\). Autoregressive-based approaches for time-series, such as [18], or Neural Network based [4] approaches show high accuracy and robustness, but focus on short-term demand prediction, rather than developing an analytical formulation which is more useful for statistical studies. This section focuses on proposing an analytical model for public transport planning and operational control. Thus we introduce here a time-varying formulation of Poisson Regression to capture the variations of passenger arrivals to transit stops. We call this model the Time-varying Poisson Regression (TPR) model.

We are interested in modelling the counts of passenger demand throughout the time of the day. One can observe from aggregated passenger demand data that this count variable has a periodic sinusoidal pattern with two demand peaks at AM and PM rush hours, while gradually reducing to a plateau during off-peak periods. This bimodality distribution of passenger demand is well observed and analysed in literature [15]. A natural modelling approach to capture this sinusoidal pattern is to use a Fourier series:

$$\begin{aligned} f(x)= \frac{1}{2}a_0 + \sum _{n=1}^{\infty } a_n \cos (nx) + \sum _{n=1}^{\infty } b_n \sin (nx), \end{aligned}$$
(19.15)

where

$$\begin{aligned} a_0= \frac{1}{\pi } \int _{-\pi }^{\pi } f(x) dx , \end{aligned}$$
(19.16)
$$\begin{aligned} a_n= \frac{1}{\pi } \int _{-\pi }^{\pi } f(x) \cos (nx) dx, \end{aligned}$$
(19.17)
$$\begin{aligned} b_n= \frac{1}{\pi } \int _{-\pi }^{\pi } f(x) \sin (nx) dx. \end{aligned}$$
(19.18)

Here we assume the dependent variable Y is both Poisson distributed and time dependent, that is \(y_t\sim Poisson(\mu _t)\) where \(t=1, \ldots , N\) are a time-of-day variable. The time-varying formulation of our Poisson Regression model can be written as:

$$\begin{aligned} \log (\lambda _t)=\alpha _0+ \sum _{k=1}^{K} \left[ \beta _h \ \cos \left( k \frac{2 \pi }{T} t\right) + \gamma _h \ \sin \left( k \frac{2 \pi }{T} t\right) \right] \end{aligned}$$
(19.19)

The harmonic terms \(\sin (k \frac{2 \pi }{T} t)\) and \(\cos (k \frac{2 \pi }{T} t)\) are added to capture the daily demand patterns. K is the number of harmonics, in which larger K would generally increase the accuracy, but also the complexity of the model. If t is in minutes, T equals 24*60 min.

We further increase the adaptability of the model to observed passenger demand data by adding time-invariant independent variables into the model in Eq. 19.19. These variables do not have a time-varying formulation. Many variables in practice can be classified into this group, such as weather, day-of-the-week, events or travel cost. For generality, The TPR model can be formulated as:

$$\begin{aligned} \log (\mu _t)=\alpha _0+ \sum _{h=1}^{H} \left[ \beta _h \ \cos \left( k \frac{2 \pi }{T} t\right) + \gamma _h \ \sin \left( k \frac{2 \pi }{T} t\right) \right] + \sum _{v=1}^{V} \xi _v \ x_v \end{aligned}$$
(19.20)

where V is the number of time-invariant independent variables. Larger V would generally increase the model complexity. The question whether a time-invariant variable \(x_i\) is used in the model is to be decided by considering its correlation to other variables, and its contribution to the prediction of the dependent variable \(\log (\mu _t)\).

The TPR model in Eq. 19.20 has both time-varying and time-invariant independent variables. The next section will discuss the parameter estimation procedure of this model using MLE.

4 Simulated Experiments

In this section, we describe the numerical experiments of NHPP and TPR models using synthetic simulated data. We first generate the synthetic data using predefined parameters, and then fit this simulated data to the proposed NHPP models. The models perform well if they can get back the predefined parameters.

4.1 Non-homogeneous Poisson Process (NHPP)

This subsection discusses the simulation of data from NHPP with predefined parameters as well as the parameter estimation process for NHPP.

Simulation of a Nonhomogeneous Poisson Process Using Predefined Parameters

Given predefined parameters, we briefly explain how we can apply the thinning method [21] to simulate a NHPP. Thinning is a method to imitate the trajectory of the counting process over time. Given a NHPP with time-dependent intensity function \(\lambda _t\), we choose a constant \(\lambda ^{*}\) such that

$$\begin{aligned} \lambda _t\le \lambda ^{*},\quad \text {for all }t,\quad 0\le T, \end{aligned}$$
(19.21)

for some maturity \(T<\infty \). We then simulate a homogeneous Point process with the designated rate \(\lambda ^{*}\) through a sequence of independent and identically distributed exponential distributed random variables, each having a theoretical mean of \((\lambda ^{*})^{-1}\). We then look at simulated event times of the homogeneous Poisson process and assign some of these to be the event times of the nonhomogeneous Poisson process with intensity function \(\lambda _t\). We let an event time at a particular time t in the homogeneous Poisson process be also an event time in the nonhomogeneous Poisson process with probability \(\frac{\lambda (t)}{\lambda ^{*}}\), independent of the history up to and including time t, and assign no event time otherwise. Hence, the set of event times of the nonhomogeneous Poisson process constructed is a subset of the event times from the homogeneous Poisson process. The resulting pseudo-algorithm reads as follows:

  1. 1.

    Set \(T_0\leftarrow 0\) and \(T^{*}\leftarrow 0\) where \(T^{*}\) denotes the event times of homogeneous Poisson process with intensity \(\lambda ^{*}\)

  2. 2.

    For \(j=1,2,\ldots , n:\) generate an exponential random variable \(\mathscr {E}\) with mean \((\lambda ^{*})^{-1}\) and set \(T^{*}=T^{*}+\mathscr {E}(\lambda ^{*})\). We then generate a unit uniform random variable and accept the event time \((T_i=T^{*})\) if \(U<\frac{\lambda (T^{*})}{\lambda ^{*}}\), and reject otherwise. The sequence \(T_i\) generated from this algorithm is the event times from a nonhomogeneous Poisson process with rate \(\lambda _t\).

Numerical Experiments

We set our parameters for the NHPP model in Eq. 19.4 as in Table 19.1 as follows:

Table 19.1 Parameters for NHPP

The aforementioned thinning simulation is therefore performed for the intensity function \(\lambda _t = 0.304 \cdot t^{-0.25}+\varepsilon \). The simulated arrival times are then used to estimate the parameters for the proposed NHPP model in Eq. 19.4. The calibrated parameters should be as close as possible to the predefined parameters in Table 19.1. Figure 19.2 shows the calibration results. The calibrated parameters are very similar to the predefined parameters.

Fig. 19.2
figure 2

Calibrated and true trajectory of the proposed NHPP intensity function

4.2 Time-Varying Poisson Regression (TPR)

This sub-section describes the generation of synthetic simulated data and the parameter estimation process for time-varying Poisson Regression model

Data Generation Process

The TPR model has \(1+2 \times K + V\) parameters, where K is the number of harmonics and V is the number of time-invariant independent variables. The complexity of the model depends on the values of K and V. In this section, we generate the synthetic data using 3 harmonics (\(K=3\)) and 3 time-invariant variables (\(V=3\)). The time-invariant variables \(x_i\) are normally distributed with zero mean, and standard deviation of 0.1, 0.2 and 0.3, respectively. Table 19.2 shows the chosen parameters for the synthetic simulation.

Table 19.2 Parameters for synthetic simulation data

We simulate 100 days of data, with the time varying from 4 AM to 10 PM everyday and each sample is an aggregated passenger count for a 15-min interval. Figure 19.3 shows the simulated passenger demand for the first 3 days. The x-axis is the passenger count and the y-axis is the every time window for the first 3 days of the dataset.

Fig. 19.3
figure 3

Synthetic simulated data of passenger demand

We use this synthetic simulated data to estimate the parameters for 4 TPR models, from simple to complex model. The details for each model are as follows:

\(\bullet \) H1V1

The first model is a simple model with 1 level of harmonic and 1 time-invariant variable.

$$\begin{aligned} \log (\lambda _t)=\alpha _0+ \beta _1 \ \cos \left( \frac{2 \pi }{T} t\right) + \gamma _1 \ \sin \left( \frac{2 \pi }{T} t\right) + \xi _1 \ x_1 \end{aligned}$$
(19.22)

Table 19.3 shows the parameter estimates for Model 1.

Table 19.3 Estimated parameters for Model 1

\(\bullet \) H0V3

The second model ignores the effect of the harmonics. This model only includes 3 time-invariant variables.

$$\begin{aligned} \log (\lambda _t)=\alpha _0+ \sum _{v=1}^{3} \xi _v \ x_v \end{aligned}$$
(19.23)

Table 19.4 shows the parameter estimates for H0V3.

Table 19.4 Estimated parameters for H0V3

\(\bullet \) H3V0

The third model ignores the effect of the time-invariant variables. This model only includes the 3 harmonic levels.

$$\begin{aligned} \log (\lambda _t)=\alpha _0+ \sum _{h=1}^{H} \left[ \beta _h \ \cos \left( k \frac{2 \pi }{T} t\right) + \gamma _h \ \sin \left( k \frac{2 \pi }{T} t\right) \right] \end{aligned}$$
(19.24)

Table 19.5 shows the parameter estimates for H3V0.

Table 19.5 Estimated parameters for H3V0

\(\bullet \) H3V3

The last model includes 3 harmonic levels and 3 time-invariant variables.

$$\begin{aligned} \log (\lambda _t)=\alpha _0+ \sum _{h=1}^{H} \left[ \beta _h \ \cos \left( k \frac{2 \pi }{T} t\right) + \gamma _h \ \sin \left( k \frac{2 \pi }{T} t\right) \right] + \sum _{v=1}^{V} \xi _v \ x_v \end{aligned}$$
(19.25)

Table 19.6 shows the parameter estimates for Model H3V3.

Table 19.6 Estimated parameters for H3V3

Model Comparison

The results from Table 19.3, 19.4, 19.5 and 19.6 show the model performance. It is clear that H3V3 has the closest parameters to the actual parameters for synthetic simulation. We further evaluate the goodness-of-fit of each model by comparing their Akaike Information Criterion (AIC) statistics in Table 19.7.

Table 19.7 Goodness-of-fit of the proposed models

As expected, H3V3 shows the best fit among the proposed models. This is because the model incorporates all the determinants in the data, including 3 harmonics and 3 time-invariant variables. H1V1 and H0V3 have significantly lower fits due to the lack of harmonic variables, in which H1V1 has a slightly better fit compared to H0V3 due to the inclusion of one harmonic. The time-invariant variables further increase the goodness-of-fit of modelling. One can see this fact by comparing the AIC statistic of H3V0 and H3V3 because the only difference between them is the time-invariant variables.

We also simulate one day’s worth of new aggregated data to evaluate the performance of each Poisson Regression model. The data is simulated using the same parameters in Table 19.2 for 73 time periods of 15 min each. The new simulated data is used in H1V1 to H3V3 to predict the value of Counts. Figure 19.4 shows the new data and the estimation results from H1V1 to H3V3. One can easily see that H0V3 does not capture the sinusoidal pattern of the data. Model 1 captures some pattern with limited accuracy, such as the fact that the demand in earlier time periods are larger than those in later time periods. H3V0 captures the sinusoidal pattern of the data, even the difference between two peaks periods around 8:00 and 16:00. Only H3V3 captures both the sinusoidal pattern and the deviation of the sinusoidal pattern introduced by time-invariant variables. In fact, H3V3 provides a very close estimation to the simulated data.

Fig. 19.4
figure 4

Comparison of different Poisson Regression model performance on simulation data

5 Case Study

This section describes a case study where the proposed models are implemented using an observed dataset. We use domain knowledge in Transportation to decide the explanatory variables and to process the data for the models.

5.1 Case Study Site and Dataset

This chapter uses an aggregated Smart Card data from New South Wales (NSW), Australia for the case study. Smart Card is a microchip card, typically the size of a credit card, which has been widely used for ticketing purposes around the world. Examples of Smart Card in public transport are the Oyster Card in London, Opal Card in Sydney, or Myki Card in Melbourne. This chapter uses a 14-day Smart Card data. The data consists of over 2.4 million Smart Card transactions over large metropolitan areas in NSW, including Sydney, Newcastle and Wollongong City from February to March 2017. The data consists of all bus transactions in the aforementioned metropolitan areas. Each data record contains the following fields:

  • \(Card_{ID}\): the unique Smart Card ID, which has been hashed into a unique number

  • \(T_{on}\): the time when the passenger with \(Card_{ID}\) boards a bus

  • \(T_{off}\): the time when the passenger with CardID alight from a bus

  • \(S_{on}\): the stop/station ID of \(T_{on}\)

  • \(S_{off}\): the stop/station ID of \(T_{off}\).

We only focus our case study on estimating aggregated passenger counts using the Time-varying Poisson Regression (TPR) model proposed in Sect. 19.3 because the timestamps in the Smart Card are the boarding and alighting times of a passenger to a bus, rather than the passenger arrival times that are required for the model in Sect. 19.2. The objective is to estimate an aggregated count of passengers per time period for each travel choice between a pair of origin and destination. Transit providers can use this proposed TPR model to estimate the change in passenger demand given the changes in explanatory variables such as travel time or transfer time.

The next few subsections describe the required steps to process the input data for the proposed TPR model.

5.2 Journey Reconstruction Algorithm

For each Smart Card record from each individual passenger, the first step is to reconstruct the full public transport journey with transfers from origin to destination from individual Opal card transactions. This step is essential because Smart Card data only includes the tap-on and tap-off, while we are interested in modelling a completed journey between an origin and a destination. A completed journey would naturally give us the following explanatory variables for the TPR model:

  • Travel time tt: the time gap between the first tap-on and the last tap-off of a journey

  • Transfer time tf: the time gap between a tap-off from a bus to a tap-on to another bus to continue the journey

  • Travel distance d: the Euclidean distance between the first tap-on and the last tap-off

  • Distance from the origin to CBD \(d_o\): the Euclidean distance from the origin to the Sydney CBD

  • Distance from the destination to CBD \(d_d\): the Euclidean distance from the destination to the Sydney CBD

The journey reconstruction algorithm is based on the time and distance gap between individual tap-on and tap-off. Figure 19.5 shows the proposed journey reconstructing algorithm that is based on [16]. We revise the algorithm proposed in [16] by adding the distance gap \(\varDelta d\), which is set to be 500 m. \(\varDelta d\) is added to ensure that the transfer time will only be spent on walking and waiting, rather than any other side activity using a private vehicle.

Fig. 19.5
figure 5

Journey reconstruction algorithm

The time gap \(\varDelta t\) is defined to be less than 60 min, because in Sydney passengers will receive a discount if they make a transfer within 60 min from the last tap-off, so the majority of passengers would continue their journeys within this time frame. The following steps describes the trip reconstruction process.

  • Step 1: Query all the Opal transactions of an individual passenger i. A binary indicator RID is assigned as zero.

  • Step 2: For each transaction in the above database, the corresponding transaction is discarded if it is a tap-on reversal, where tap-on and tap-off are at the same location

  • Step 3: If RID equals zero, a variable Origin Location is defined and set as equal to the current tap-on. We also assign a new unique Journey ID, change RID to one and move to the next transaction. Otherwise we move to Step 4.

  • Step 4: Now with RID equals one, the current transaction will be assigned the current Journey ID if it satisfies three conditions: (1) time gap between the current tap-on and the last tap-off \(\delta t\) is less than 60 minutes, (2) the distance gap \(\delta d\) is less than 500 m, and (3) the current tap-off is different to Origin Location. Otherwise, we assign a new Journey ID and set RID equals zero.

  • Step 5: The journey reconstruction process for the passenger i is finished after the last transaction of the day, otherwise we move to the next transaction.

5.3 Data Processing

After journey reconstruction, the remaining data processing in preparation for the inputs for TPR is self-explanatory. Variables \(tt, tf, d, d_o\) and \(d_d\) are directly calculated from each completed journey. We then aggregate the completed journeys according to their start time and their AlternativeID to produce passenger demand counts. The AlternativeID is an indicator of the route choice. It has been defined in a way such that passengers from the same area who make similar choices will have the same AlternativeID. Table 19.8 shows an example of the data used for the case study.

The AlternativeID, as shown in Table 19.8, has been coded in the format: [Origin Zone ID, Destination Zone ID, Mode, Route of the first tap-on, Zone of the first tap-on, Zone of the first tap-off, Route of the last tap-on, Zone of the last tap-on, Zone of the last tap-off]. The Count is total number of passengers who travelled within the same time period, and made the same travel decision as shown in AlternativeID.

Table 19.8 Examples of processed data for the case study

5.4 Case Study Modelling Results

We use the five explanatory variables, as described in Sect. 19.5.1, as the time invariant variables of the TPR model, as described in Sect. 19.3. The dataset is randomly divided into the training dataset, which includes 90% of data points, and the testing dataset, which includes the remaining 10%. We develop TPR models with 3, 4, 5 harmonics and 5 time-invariant variables. Thus the models are named H3V5, H4V5 and H5V5, similar to Sect. 19.4.2. We then compare them using Root Mean Square Error (RMSE) as the criteria, which can be calculated as follows:

$$\begin{aligned} RMSE = {\sqrt{\frac{1}{D}{\sum \limits _{i = 1}^D {(c_{i} - \bar{c}_{i} } })^{2} } } \end{aligned}$$
(19.26)

where \(c_i\) and \(\bar{c}_{i}\) are the actual and estimated count, respectively. D is the total number of data points in the testing dataset. Thus RMSE measures the mean error of our prediction compared to the observed value. The models are trained using the training dataset, and then tested using the testing dataset (Table 19.9).

Table 19.9 Estimation errors with different TPR models

H5V5 shows better performance than H3V5 and H4V5. Table 19.10 shows the estimated parameters of H5V5. Most of the parameters are significant.

Table 19.10 Estimated parameters for H5V5

The values and especially the signs of the explanatory variables \(d_o, d_d, d, tt\) and tf provide insights into the bus passenger demand in NSW, Australia. The positive sign of \(d_o\) and d show that the further passengers are from the Sydney CBD and the longer the travel distance, the more likely that a journey by bus will be made. Similarly, the negative sign of \(d_d\) shows that if the journey ends near the CBD, the less likely that a journey by bus will be made. This is because the Sydney CBD is well serviced by other public transport modes such as train, light rail and ferry, so bus travels are more for distant areas. The negative signs of travel time tt and transfer time tf show that passengers care about these factors. If transit providers can provide services with shorter travel time and transfer time, bus patronage will be increased. Passengers are concerned most about distance of travel and transfer time, which is shown by the fact that the estimated coefficients d and tf are significantly larger than others.

6 Conclusion

The inference of the expected number of passengers arrivals at transit stops are essentially important for transit tactical planning and operation control studies. We propose a non-homogeneous Poisson Process (NHPP) framework to model the exact records of passenger arrival times. Simulation and calibration for this model are discussed. To estimate the aggregated count of passengers arriving at transit stops, this chapter proposes a time-varying Poisson Regression (TPR) model, given the time and other explanatory variables. This model uses aggregated counts of passenger demand within a time period and several other variables to estimate the passenger counts. The numerical experiments using synthetic simulated data show the calibration process for parameters of both NHPP and TPR.

We also use domain knowledge to implement the TPR model on a case study using observed Smart Card data in New South Wales, Australia. The transportation domain knowledge is used to define the important explanatory variables for the TPR model, and to process the data. The variables of travel time, transfer time, and distance are the most important to explain bus passenger demand. Domain knowledge has also been used to obtain great insights into the factors that impact the patronage level of buses in NSW, Australia. By analysing the values and signs of variables \(d_o, d_d, d, tt\) and tf, we have found that passengers are more likely to use a bus when the journey is long, and starts further from the Sydney CBD. They are less likely to use a bus if the travel time or transfer time are large; and if the journey is also serviced by other modes of transport such as train, light rail or ferry.

The proposed analytical models are useful as a part of a transit tactical planning and operational control framework to estimate the passenger demand at transit stops. Future work includes the use of observed data, a more involved formulation for NHPP model and possibly an inclusion of the autoregressive term for the TPR model.