Keywords

1 Introduction

Emergency Medical Service (EMS) consists of pre-hospital medical care and transport to a medical facility. Almost all EMS requests arrive by phone, through calls to an emergency number. The urgency of each request is evaluated and the location is obtained. Then, an ambulance is dispatched to the call site and, if needed, the patient is transported to a medical facility. Demand for such services is constantly increasing throughout the world, according to population growth and aging, while we observe a continuous pressure of governments to reduce health care costs; thus, an efficient use of resources is fundamental to guarantee a good quality of the service while maintaining the economic sustainability.

Several optimization planning models have been developed in the literature for EMS systems (see Bélanger et al. [3] for an extensive review). Unfortunately, the EMS demand is highly variable, and the uncertainty should not be neglected while planning the activities. Hence, it is fundamental to fairly predict the future number of emergency calls and their interarrival times.

The goal of this paper is thus to propose and validate a Bayesian model to predict the number of emergency calls in future time slots. The number of calls is described by means of a generalized linear mixed model, and the inference is based on the posterior density of model parameters, which is obtained through a Markov Chain Monte Carlo simulation scheme. Then, predictions are given in terms of their posterior predictive probabilities.

We demonstrate the applicability of the approach using the information available from the city of Montréal, Québec, Canada. Results show the convergence of the approach, good fitting, and low prediction errors.

The paper is organized as follows. A review of previous works dealing with stochastic modeling of EMS calls is presented in Sect. 2; the general features of an EMS system and the typical structure of the demand dataset are described in Sect. 3. Then, the Bayesian model is proposed in Sect. 4, and its application to the Montréal case is presented in Sect. 5. Conclusions of the work are finally given in Sect. 6.

2 Literature Review

Several studies deal with EMS calls prediction under a frequentist approach. An interesting survey of the works dated before 1982 can be found in Kamenetsky et al. [10]. In addition, the authors also presented regression models to predict EMS demand as a function of population, employment, and other demographic variables. Socio-economic parameters such as median income and percentage of people living below poverty line have been considered by Cadigan and Bugarin [5]. More recently, McConnell and Wilson [12] focused on the increasing impact of age distribution on EMS demand, while Channouf et al. [6] developed ARIMA models.

To the best of our knowledge, Bayesian approaches have not been considered for the EMS demand yet, even though they have been successfully applied in the health care literature. In fact, Bayesian approaches allow combining the available data with prior information within a solid theoretical framework, and results can be used as prior information once new data are available, which are important features in health applications. A good example of application to another health care service (i.e., the home care service) can be found in [1, 2].

3 Problem Description

An EMS system consists of an operations center and a certain number of ambulances, including the related staff. Ambulances are located in predetermined sites, ready to serve EMS requests. Requests arrive at the operations center via telephone, where they are evaluated. If a call requires an intervention, it is assigned to one of the available vehicles. The aim of an EMS is to serve all calls as fast as possible, maximizing the number of calls served within a given threshold that depends on the type of area (urban or rural).

For this purpose, due to the high uncertainty related to EMS calls, the decision maker needs accurate estimates of the demand as input for any optimization model underlying ambulance dispatching.

The typical EMS dataset includes several information about the calls and the provided service. For the aim of developing a prediction model, we focus on the calls. Three types of information are available:

  • Type: required service and patient characteristics; this information is usually summarized into a priority associated to the call.

  • Arrival time: day and time of the call.

  • Coordinates: latitude and longitude of the call, or alternatively the address.

Usually, for managing purposes, the territory is divided into zones; thus, coordinates are translated into the zone z (\(z=1,\dots ,Z\)) of the call. Moreover, concerning the arrival times, in this work we group the time into slots. Thus, day i (\(i=1,\dots ,I\)) and slot t (\(t=1,\dots ,T\)) are associated to the call, and for each day i we register the number of calls \(N^{i}_{z,t}\) arisen in slot t and zone z. In particular, we consider slots of two hours, i.e., \(T=12\).

4 The Bayesian Model

We propose the following generalized linear mixed model for the number of calls \(N^{i}_{z,t}\):

$$\begin{aligned}&N^{i}_{z,t} | \lambda ^{i}_{z,t} \overset{ind}{\sim } Poisson \left( \lambda ^{i}_{z,t} \right) \end{aligned}$$
(1)
$$\begin{aligned}&log \left( \lambda ^{i}_{z,t} \right) = \beta _1 p_z + \beta _2 a_z + \sum _{k = 1}^{K} \beta _{3,k} \phi _{k,z} + \beta _4 h_i + \gamma _t \end{aligned}$$
(2)

where: \(p_z\) and \(a_z\) are the population and the area of zone z, respectively; \(h_i\) is a binary covariate equal to 1 if day i is holiday and 0 otherwise; \(\varPhi _z = \left[ \phi _{k,z} \right] \) is a dummy vector of dimension K describing the type of zone z.

Zones z are classified into \(K+1\) types (e.g., residential, commercial, industrial); \(\phi _{k,z}=1\) if zone z is of type k (with \(k=1,\dots ,K\)) and 0 otherwise, while \(\phi _{k,z}\) is always equal to 0 if zone z is of type \(K+1\), to avoid identifiability problems.

Model (1) and (2) is a generalized linear mixed model with four fixed-effects parameters \(\beta _1\), \(\beta _2\), \(\pmb {\beta _3}\) and \(\beta _4\) (where \( \pmb {\beta _3}\) is K-dimensional), and a random-effects parameter \(\gamma _t\). The latter takes into account the similarity of the number of calls in different zones during the same time slot t. In this formulation \(\lambda ^{i}_{z,t}\) is the parameter responsible for EMS calls: the higher the parameter \(\lambda ^{i}_{z,t}\) is, the higher the expected number of calls is.

Finally, independent non-informative priors, i.e., Gaussian distributions with 0 mean and large variance equal to 100, are chosen for \(\beta _1\), \(\beta _2\), \(\beta _4\), \(\gamma _t\), and for the components of vector \(\pmb {\beta _3}\):

5 Application to the Dataset

Data adopted in this work are those adopted in [4, 7, 11]. They refer to EMS calls arisen in the city of Montréal and the near suburb of Laval, Québec, Canada, i.e., a region with about 2.3 million of inhabitants and a territory of 744 km\(^2\). According to these data, the region is divided into \(Z=595\) demand zones. In addition to the EMS data, information from Municipality of Montréal have been used to define the vector \(\varPhi _z\) for each zone. Eleven different types of zone are present, as described in Table 1; moreover, to avoid collinearity due to the low number of zones belonging to some types, types are regrouped as follows:

  • Residential (\(k = 1\));

  • Workplace, regrouping commercial, office, industrial and institutional (\(k = 2\));

  • Street (\(k = 3\));

  • Other, regrouping park, agricultural, empty space, water, and golf field.

Finally, data about population has been divided by 1,000 to be of the same order of magnitude of the other covariates.

5.1 Descriptive Statistics

The dataset consists of 2,606,100 observations for \(N^{i}_{z,t}\) (\(I = 365\) days, \(Z = 595\) zones and \(T = 12\) slots) together with the related covariates.

Tables 1 and 2 report the main information about the data. Moreover, Fig. 1 shows a map of the territory together with the number of calls.

Table 1 Total number of calls and empirical mean divided by the type of zone
Table 2 Empirical mean and standard deviation of the number of calls divided by time slot
Fig. 1
figure 1

Map of the city of Montréal together with the total number of calls. The number of calls for each zone is represented by a point in the center of the zone. Green points correspond to lower numbers of EMS calls, while red points represent higher numbers of EMS calls, according to the division in quartiles reported in the legend

5.2 Posterior Analysis

5.2.1 Convergence Analysis

The model is implemented in STAN (http://mc-stan.org/), which uses the Hamiltonian Monte Carlo algorithm to reduce the correlation and obtain faster convergence of the chains. Hence, 5,000 MCMC iterations have been run, with a burn-in of 1,000 iterations and a final sample size of 4,000.

Traceplots, autocorrelations and the Gelman–Rubin convergence statistics (\(\hat{R}\)) have been considered to verify that convergence is achieved. Moreover, we have estimated the Monte Carlo Standard Error (MCSE) with the MC error, the Naive SE and the Batch SE. See [8, 9] for further information.

Results show that \(\hat{R}\) is equal to 1 and that the MCSE is always less than the 5% of the standard deviation for all parameters. Moreover, nice traceplots and autocorrelations are obtained, showing that the convergence of the chain is satisfactory.

5.2.2 Credible Intervals of Model Parameters

Inference for each model parameter is reported in terms of the posterior 95% credible interval (CI).

CIs of the fixed-effects parameters are reported in Table 3. The population parameter \(\beta _1\) yields a positive effect, thus increasing number of calls, while the area parameter \(\beta _2\) gives a negative effect. This is in agreement with the considered data, in which zones with large areas have small population densities; thus, the higher the population density of a zone is, the higher the number of calls is. Vector \(\pmb {\beta _3}\) gives the effect of the zone; results show that workplace zones and streets have more EMS calls, followed by Residential Zones. Finally, CI of parameter \(\beta _4\) suggests that a lower number of calls is to be expected during holidays.

Table 3 95% CIs for the fixed-effects parameters

Posterior CIs for the random-effects vector \(\gamma _t\) are reported in Fig. 2. They suggest a clear distinction of the time slots: a higher number of calls arrive during the day (slots \(t=5,\dots ,11\)), while a lower number of calls arrive during night hours.

Fig. 2
figure 2

95% CIs for the random-effects vector \(\gamma _t\)

5.2.3 Cross-Validation Prediction

A cross-validation approach is adopted to validate the model, by partitioning the complete dataset. The first 90% of the days (with \(i=1,\dots ,I - \tilde{I}\)) is used as training set to determine the posterior density, while the remaining 10% (with \(i=I - \tilde{I} + 1,\dots ,I\)) is used as testing set. The predictive distributions of each \(N^{i}_{z,t}\) (with \(i=I - \tilde{I} + 1,\dots ,I\)) are computed, and the predictions are checked with the corresponding observed data.

The accuracy of the predictions is evaluated in terms of the global Mean Absolute Error (MAE), defined as:

$$\begin{aligned} \textit{MAE} ~ = ~ \frac{1}{\tilde{I}ZT} ~ \sum _{i=I - \tilde{I} + 1}^{I} ~ \sum _{z=1}^{Z} ~ \sum _{t=1}^{T} ~ \left| N^{i~obs}_{z,t} - \hat{N}^{i}_{z,t} \right| \end{aligned}$$

where the product \(\tilde{I}ZT\) is the numerousness of the sample in the testing set, and \(N^{i~obs}_{z,t}\) and \(\hat{N}^{i}_{z,t}\) represent the observed number of calls and the number predicted by the model (median of the predictive distribution) at day i, zone z and slot t, respectively. The obtained value is 0.078, which is two orders of magnitude lower than the unit, showing a good fit of the model.

We have also detailed the MAE for each combination of type of zone k and time slot t. Results in Table 4 show quite similar values, whose maximum is 0.111, confirming a good fit of the model that does not significantly deteriorate for any pair k and t.

Table 4 MAE for each combination of type of zone k and time slot t

5.2.4 Comparison with the Mean Estimate

In this Section we compare the outcomes of the proposed model with those of a very simple frequentist approach, in which the predictions are simply given by the historical means. This approach gives as a predictor the mean number of calls for the specific combination of type of zone z, time slot t and holiday parameter h. MAE values are computed considering the same training and testing sets as in Sect. 5.2.3.

The global MAE of the frequentist approach is equal to 0.145, while the values grouped by z, t and h are reported in Table 5. Results show that the MAE under the frequentist approach is higher, being about the double than the MAE under the proposed Bayesian approach. This further confirms the good fit of the proposed model to the data.

Table 5 Comparison of the MAE between the proposed Bayesian model and the mean frequentist approach, grouped by type of zone z, time slot t, and holiday h

6 Conclusions

This paper presents a first attempt to deal with stochasticity in the EMS calls by using the Bayesian approach. A generalized linear mixed model has been proposed, with the aim of identifying relevant effects that influence the calls and giving predictions of future EMS calls.

Results from the Montréal case suggest that population, area and type of zone have a strong impact. Moreover, as expected, the time slot has a relevant effect, showing lower predicted values of number of calls during the night. Finally, the model shows good performance when used to make predictions, and documented by the low MAE values under cross-validation.

Moreover, the model is general, and can be easily applied to describe EMS demand in other cities. On the contrary, as for rural situations, we expect that some modifications are necessary to include the presence of rare events in an environment characterized by a usually low demand. Another extension will be to consider the area of each zone as an offset/exposure term of the Poisson regression.