1 Introduction

Crowding refers to the accumulation of clients in a service system, which translates into virtual or physical states of waiting for services to be completed for the clients and into the piling up of service work to be performed for the system and its resources. Although primarily used in the context of Emergency Departments (EDs) and their patients (Morley et al. 2018), it can also refer to travelers on a train (Preston et al. 2017), customers in a supermarket (Luo and Shi 2020), vehicles in a traffic jam (Mohandas et al. 2009), planes in air traffic (Kostiuk et al. 2016), as well as citizens in crowding events (Guo et al. 2022). In industrial systems such as a factory or a set of factories, this phenomenon can also be translated in the accumulation of stocks of products (Graves and Willems 2003). Despite their varying natures, all these crowding situations are determined by the amplitudes and patterns of an arrival demand and to the adequacy of a service or production response. For service systems, crowding is a phenomenon that needs to be regulated, as over-crowding leads to poor quality services with the unsatisfaction of clients, their potential endangerment in the healthcare context, and a stress on service resources. Under-crowding must also be looked at, as it leads to the underuse of service resources and thus to the waste of economic resources.

The quantification of crowding constitutes a first step toward its management. Traditional general crowding measures of a system involve a comparison of its arrival process, which indicates the total number of arrivals over time and corresponds to the demand for the system, and its departure process, which indicates the total number of departures over time and is the result of the service offer response to the arrival demand. Average patient waiting time and average occupation levels constitute the most common crowding measures for a service system, notably in queuing systems where they can be related through Little’s law and their behavior can be studied against arrival rate and service rates (Shortle et al. 2017; Little and Graves 2008). Average waiting time is a crowding measure relative to the global arrival rate that gives the point of view of clients in the system, as their satisfaction is often highly related to it. Average occupation level can be directly compared to the capacity of the service system and captures real-time perceived crowding in a manner other than waiting time. Model-specific measures such as probability of balking and reneging behaviors due to client impatience (Cochran and Broyles 2010), as well as the probability of refusal or blocking due to full capacity can sometimes be more pertinent as they convey the failure of the system to deliver services (Shortle et al. 2017; Asaduzzaman et al. 2010; Bruin et al. 2007). In Queuing Theory, congestion is another model-based measure that compares, through a ratio, the arrival rate of a system to its maximal service rate and indicates as such its long-term capacity to deal with a given mean arrival rate. In particular, it becomes an interesting indicator when looking at high-scale systems where occupation and waiting time no longer adequately capture system performances. Indeed, congestion can measure the performance of a serial stationary system by identifying the bottlenecks that limit its maximum service rate.

Specific ED crowding scores have also been designed to capture this complex concept more precisely (Ahalt et al. 2018). They can be defined in two main approaches. Measures defined using an intuitive combination of crowding-related variables of the ED constitute a first and direct approach. The Emergency Department Work Index (EDWIN) is one such measure that compares the mean ESI triage level of patients present in an ED to the number of beds available. Although these measures have a more intuitive meaning, measures defined using a statistical approach with regression-based formulas are more accurate at capturing crowding based on empirical measurements by experts on the field. The National Emergency Department OverCrowding Score (NEDOCS) is one of the most commonly used regression-based crowding scores introduced by Weiss et al. (2004). For this score, the regression analysis of crowding measurements by doctors and nurses across many EDs in the United States were coupled with the current states of the EDs and a linear combination of five occupation and waiting variables allowed this evaluation to be estimated. Although potentially useful in the context of ED crowding management, these scores are limited by their specificity and their data requirements, which are not always accessible for all EDs with, for example, the number of staff available, as this is not always tracked. Considering these limits, their pertinence compared to general occupation or waiting time measures is still to be proven as those are as effective, if not better, at predicting the consequences of crowding, such as the rate of patients who left without being seen (LWBS), the diversion of ambulances, or the rate of short-term readmissions (Wang et al. 2017).

Considering the shortcomings of existing crowding measures, the objective of this study is to construct a new measure respecting a set of properties. First, this measure must be general, meaning that its use can be extended to any other service system. Using arrival and departure data, it must be able to capture the performances of a system from its own global perspective and on a time-window basis. More precisely, using the system perspective, it must propose an alternative to average waiting time, which uses the patient perspective.

To meet these requirements, this study proposes a new congestion measure that compares the arrival and departure processes in the form of a ratio of arrival load and departure load. It proposes an adaptation of the traditional theoretical congestion measure used in queuing theory to a data-driven performance measures context. The research questions of this study therefore relate to the characterization of this new measure and its pertinence over other existing crowding measures. Specifically, this study investigates how this measure can better capture the crowding of a system by analysing the mathematical properties. In particular, it looks at how it could be better suited to monitoring crowding on a short time scale from a system perspective, notably by considering crowding in terms of the system service performances, compared to occupation and waiting time measures. Furthermore, this study examines its capacity to capture real-time crowding in an Emergency Department, including its ability to predict its consequences such as the rate of LWBS patients.

Consequently, this study brings a set of important scientific contributions by establishing a mathematical framework to represent and study crowding, by defining and characterizing a new congestion measure and its relevance for measuring system crowding and by introducing a queuing fluid limit model to capture core system crowding dynamics whose relevance is shown for the Emergency Department system in a data-driven context.

The rest of this article is organized as follows. Section 2 gives a more detailed overview of the literature. Section 3 presents the methods with the definition of the original congestion measurement and the protocol for studying its relevance in an ED. Section 4 presents the results of the protocol with the theoretical, model-based, and empirical properties of congestion including its comparison and relationships with other crowding measures. Section 5 presents a discussion of the interpretation of the results regarding the use and relevance of congestion to other parameters as well as its limitations. Section 6 presents the conclusions of the study, placing congestion measurement in a general context for use, with its extension to other systems. Additionally, the reader is strongly invited to look at the supplementary material where appendices give some supplementary results, and most importantly the mathematical proofs of all the theoretical results.

2 Literature overview

This section presents a more in-depth view of the literature on crowding by examining the study of ED crowding, which is the system tested in this study, and the use of crowding in queuing network models.

2.1 Emergency Department crowding and related measures

Being a core element for monitoring and optimization, Emergency Department crowding measures are regularly reviewed (Hwang et al. 2011; Stang et al. 2015; Morley et al. 2018; Rasouli et al. 2019; Jones et al. 2021; Badr et al. 2022; Pearce et al. 2023). With an analysis of 90 studies, the most recent review by Badr, Samer et al. showed that ED crowding, through the use of different measures, is related to multiple outcomes with mainly perception of care for all studies and quality of care for 75% of studies, and secondarily mortality for 45% of studies. Most studies rely on measures that are easy to compute and easy to communicate such as ED occupancy, ED LOS (Length Of Stay or total waiting time), although there are sometimes heterogeneous definitions across EDs such as boarding time and number of boarders. More complex crowding scores have also been introduced and used to better capture the notion of ED crowding in a single monitoring indicator. Indeed, crowding in the ED is often viewed as a more complex issue than simply the rise in the global number of patients or waiting times in the system. Using the field ED experts’ point of view, that is mainly that of ED physicians and nurses, Weiss and al. showed that ED crowding is correlated with multiple internal occupation levels and waiting times of a specific part of ED patients’ pathways in relation to ED and hospital bed capacities (Weiss et al. 2004). This led to the creation of the National Emergency Department Overcrowding Score (NEDOCS), which embeds five ED crowding measures that are measurable in most ED systems, using a linear regression method to combine them to best fit the empirical real-time evaluation of ED crowding across the United States (Weiss et al. 2004, 2005). Many other scores exist, such as EDWIN, READI, SONET, which rely on the triage priority level of patients and on the number of working physicians (Ahalt et al. 2018; Hoot et al. 2007; Chandoul 2015; Wang et al. 2015, 2017). Although they have not necessarily been statistically tested, they embed notions that are highly related to the ED crowding situation and intuitively match the ED’s field expert point of view.

The scores described above may give a more accurate characterization of the congestion situation than an overall occupancy level or average expectation. However, they have two main limitations. The first limitation is that they are specific to ED systems and therefore can only be used for the management of their crowding. In addition, not all EDs necessarily have the data required to calculate their values, and if they do, it can still be quite difficult to track them in real time (Cheng et al. 2021). The second limitation is the limited evidence of their advantage over the overall occupancy level, which is easier to track and relevant to all types of SU. Indeed, it has been shown that global occupancy can have equal or better predictability of the consequences of overcrowding in EDs, such as the deviation of ambulances or the proportion of patients who LWBS by a doctor (Hoot et al. 2007; Kulstad et al. 2010; Wang et al. 2017; Gorski et al. 2021). Specific ED crowding measures are not necessarily easier to interpret, they are often strongly correlated with overall occupancy, and can only add information on crowding state in rare cases. For example, with NEDOCS, a rare case would correspond to the unlikely situation where overall occupancy would be normal or low compared to the historical level, but in combination with an abnormally high number of patients waiting for a hospital bed in absolute terms.

In this context, this study primarily examines the relationship between the new congestion indicator and the general measure of average emergency department occupancy and average waiting times (LOS).

As mentioned in the previous paragraphs, ED crowding can also be assessed based on an indirect indicator of its consequences. The Proportion of Patients who Left Without Being Seen (PLWBS) is one of the main ED measures to judge the consequences of crowding and is measurable in most EDs (Clarey and Cooke 2012). Other parameters such as rate of ambulance diversion and rate of short readmissions to the emergency room (within one to seven days) have also been used, but present a problem of rarity of occurrences and, for readmissions to the ED, have multiple causes in which the effect of crowding is relatively minor (Hoot et al. 2007; Wang et al. 2015, 2017). On the contrary, LWBS patients occur quite frequently with rates measured between 0.36% and 15% in the national health system according to the study by Clarey and Cooke (2012). Although designating a specific dysfunction of the ED, this patient behavior corresponds to the impatience present in many service systems and is modeled in the queue model in several forms, with balking and reneging mechanisms being the most common (Shortle et al. 2017). Balking is the behavior of a client or patient who does not enter a system because he/she considers it too busy and believes that his/her waiting time would be too long or the service quality too degraded. It is modeled as the probability of not entering the system for a given occupancy level and results in a proportional reduction in service compared to initial arrivals. Reneging is a different behavior where patients leave the system after waiting too long, resulting in additional non-zero occupation on the part of that patient, unlike balking. Since balking cannot be captured upon entrance to the emergency room if the patient remains outside, LWBS can be seen primarily as a reneging behavior and secondarily as a balking behavior when patients see the occupancy status of the emergency room. The review by Clarey et al. and other studies on patients who left without being seen further confirm this (Bambi et al. 2011; Clarey and Cooke 2012; Liu et al. 2014; Li et al. 2019). Indeed, from the point of view of the hospital system, PLWBS was mainly related to wait times according to 44.8–86% of the patients surveyed, and secondarily, to indirect mechanisms influencing waiting time, overcrowding of emergency rooms, untrained doctors, and large hospitals. From the patient’s perspective, they have been linked to younger patients, poor patients and foreign patients whose proportions depend mainly on the demographics of the local territory of emergencies and potential seasonal effects. Although the health risks and severity of patients who LWBS tend to be low, this phenomenon has an economic cost because patients who leave without delay use ED resources and a doctor’s time without achieving the goal of treatment or prescription that would potentially solve their healthcare request. Because of all the points mentioned above, a bottleneck indicator for EDs should be evaluated against the prediction of PLWBS to measure its relevance.

2.2 Congestion

Congestion is historically used in stationary queue systems and is the ratio of the arrival rate to the maximum service rate \(\rho = \frac{\lambda }{\mu _{\max }}\) (Shortle et al. 2017). For example, in \(M|M|1\) queues, \(\mu _{\max } = \mu\) and the stationary occupancy level is \(L = \ \frac{\rho }{1 - \rho }\) and in M|M|c queues, \(\mu _{\max } = c\mu\) and \(L = \frac{\rho }{c} + \left( \frac{\left( \frac{\rho }{c} \right) ^{c}\rho }{c!(1 - \rho )^{2}} \right) *\left( \frac{\left( \frac{\rho }{c} \right) ^{c}}{c!(1 - \rho )} + \sum _{n = 0}^{c - 1}\frac{\left( \frac{\rho }{c} \right) ^{n}}{n!} \right) ^{- 1}\). An interesting property of congestion is the general interpretation of its value as performance capacity, where \(\rho > 1\) corresponds to the saturation of a given stationary system and \(\rho = 0.2\) for example most often indicates underutilization. In addition, in a stationary queuing network, more precisely in Jackson networks, one can find the congestion of any set of stations using the arrivals and service performance information of each individual station. In particular, the maximum service rate of the unsaturated network is\(\ \mu _{\text {network}} = min\{\lambda _{0}:\exists i \in I,\ \lambda _{i} = \mu _{i}\} = \min _{i \in I}\left( \frac{\mu _{i}}{\sum _{k \in K}^{}n_{k,i}q_{k}} \right)\) with \(\lambda _{i}\) and \(\mu_i\)the arrival and maximal service rates of the stations \(i \in I\), \(q_{k}\) the probability of station circuit \(k \in K\) in the network and \(n_{k,i}\) the number of times the circuit \(k \in K\) passes through the station \(i \in I\). In the context of blocking systems where the system is composed of stations with limited capacities, congestion is very useful for looking for the bottleneck that limits the overall production capacity, or productivity, of the system. This has given rise to buffer allocation methods that are well studied in the operational research literature (Gershwin and Schor 2000; Ouazene et al. 2013; Koizumi et al. 2005). In the health system, Cochran K. and Bharti A. introduced a concept close to congestion to minimize the gap in the use of hospital units compared to each other and find a balance in the allocation of beds in an entire hospital (Cochran and Bharti 2006).

This use of congestion as defined in the previous paragraph is applicable only in the case of stationary systems. To reduce this stationarity constraint and to plan staff resources and schedules in an ED, Green L. et al. used a SIPP lag approach with the use of a delayed congestion indicator in an \(M_{t}|M|c\) queue model where the arrival process is a non-homogeneous Poisson process with a variable arrival rate \(\lambda (t)\) (Green et al. 2006; Defraeye and Van Nieuwenhuyse 2016). The congestion value is then defined as \(\rho (t) = \frac{\lambda \left( t - S_{\text {lag}} \right) }{c(t)\mu }\) with \(\mu\) the service rate of a server, \(S_{\text {lag}}\) the service offset introduced, and \(c(t)\) the chosen number of servers at the instant \(t\). The average occupancy level is then estimated over time using the classic queuing formulas mentioned in the previous paragraph. More generally, the arrival and departure processes can be modeled as Coxian processes, that is their local behavior can be modeled through a birth-death process with an instantaneous arrival rate \(\lambda (t)\) (of "birth") and an instant departure rate \(\delta (t)\) (of "death") where congestion \(\rho (t) = \frac{\lambda (t)}{\delta (t)}\) corresponds to an instantaneous probability ratio, or relative risk, on the nature of a new system state change event at the instant \(t\). \(\rho (t)\) is then the ratio between the probability that this event is an arrival and the probability that this event is not an arrival, that is, a departure. Although more flexible, these definitions of congestion still depend on theoretical models of the system where its stochastic behavior is known. As such, they cannot measure congestion directly on a real system from observed events.

In this context, the objective of this study is to define a congestion measure that can characterize the congestion behavior of the system over a time window without using a prerequisite model of service behavior.

3 Methods

This section introduces an original congestion measure with its interpretation and justification. It also describes its evaluation protocol with its mathematical, experimental, and empirical properties, particularly using a data correlation analysis from an Emergency Department to support them.

3.1 Definition of the new congestion measure

This sub-section and the study rely on a mathematical framework to characterize system crowding based on the notions of arrival A(t) and departure D(t) processes commonly used in queuing theory and presented in Table 1. Traditional queuing measures are also defined in a data-driven context for an observed window \([t-h,t]\), with the average occupation \(L_h (t)\) (on \([t-h,t]\)), the average waiting time of patients departing \(W_{(d,h)} (t)\) in \([t-h,t]\), and the average waiting time of patients arriving \(W_{(a,h)} (t)\) in \([t-h,t]\). The arrival entities are called patients but the framework is also valid in other contexts, namely for clients in a service system and products in a production system.

Table 1 List of mathematical queuing notions used for this study

The crowding measures mentioned in the Table 1 quantify the service performances of a system with waiting time describing its capacity to serve a given patient in a certain time period and occupation levels describing its capacity to empty itself against the accumulation of the patients that arrived in the system. Although these measures are pertinent for quantifying crowding using minimum information, the occupation measure does not directly translate the performance of a system relative to the arrival rates. The average waiting time does but solely from the patient perspective and not from the time window-based system perspective, especially on small scales. Moreover, these two crowding measures tend to lose meaning with higher scale systems where the interpretation of an occupation or a waiting time is less obvious for the manager of a system.

As such, the main objective of this study is to define a measure that can be used to quantify crowding with a more general meaning and to test it against usual measures. To this end, as explained in the introduction, this study proposes a congestion measure that quantifies service response performances by looking at the ratio between arrival and departure processes.

As detailed in the literature overview in Sect. 2, the historical definition of congestion in queuing theory is a ratio between a constant arrival rate and a constant service rate that indicates the usage capacity of the system. As this ratio tends towards 1, the system becomes saturated and both asymptotic occupation and waiting time, related by Little’s Law \(L=\lambda W\) (Little and Graves 2008), tend to infinity with a nonlinear relation to congestion. However, for a monitoring purpose of real systems, a focus is set on transient local behaviors where arrival and service rates are dynamical stochastic notions that cannot be measured directly. As such, this study defines and characterizes a new congestion measure that could be used for this data-driven context. To be an operational measure, the congestion measure must compare the process of arrival and departure over a window \([t_1,t_2] \in {\mathbb {R}}^+\), that is to compare \(\Delta A([t_1,t_2])=A(t_2 )-A(t_1 )\) to \(\Delta D([t_1,t_2])=D(t_2 )-D(t_1 )\) using a ratio approach. However, the direct use of the ratio of these two values as a measure is problematic. Indeed, the congestion values obtained are relative to the local context, so that the consequences of congestion on two time windows of the same length on the same system can be very different depending on the initial context of crowding captured by the level of occupancy and the number of arrivals in each window. To keep this contextual relativity, which is essential for extending it to a general context, this study proposes a new measure of congestion that uses a ratio of arrival and departure loads with Definition 1.

Definition 1

Given the local arrival and departure processes defined in Table 1, congestion is defined as follows:

$$\begin{aligned} \begin{aligned}&\forall t,h \in {\mathbb {R}}^{+},\ t - h> 0,\int _{t - h}^{t}{\Delta A_{t - h}( u )}du > 0, \\&\rho _{h}(t) = \ \frac{\int _{t - h}^{t}{\Delta A_{t - h}( u )}du}{\int _{t - h}^{t}{\Delta D_{t - h}( u )du}} = \ \frac{\int _{t - h}^{t}{A(t) - D(t-h)du}}{\int _{t - h}^{t}{D(t) - D(t-h)du}} \end{aligned} \end{aligned}$$
(1)
Fig. 1
figure 1

Illustration of congestion components with the arrival and departure patterns of the Troyes ED on 1st February 2019 (here \(h=24hrs\), \(\int _{t - h}^{t}{\Delta A_{t - h}( u )}du = 117.61\) in 1a, \(\int _{t - h}^{t}{\Delta D_{t - h}( u )}du = 74.46\) in 1b and \(\rho _h(t)=1.58\))

According to the Definition 1, the congestion measure looks at the evolution of the system from an initial time point of the system \(t_{0} = t - h\) by considering the initial system state using calibrated arrival and departure processes. The congestion measures the delay or mismatch between the arrival and departure process for a specific time scale using a ratio approach rather than a difference approach which is used by average occupation \(L_h(t)\) with \(hL_h(t)=\int _{t - h}^{t}{\Delta A_{t - h}( u )}du-\int _{t - h}^{t}{\Delta D_{t - h}( u )}du\). In an equivalent way, it can be seen as the ratio between the average number of arrivals, considering initial occupation as initial arrivals, and the average number of departures. Alternatively, it can also be seen as the ratio of the highest occupation load, that is the worst crowding situation, since an initial point \(t_{0} = t - h\) and the avoided occupation load. Figure 1 illustrates the integrals that are involved in congestion measurements with the example of the one-day evolution of the system under study (Troyes ED). The value \(t_{0} = t - h\) is taken at the beginning of the day and congestion corresponds to the ratio between the red area and the green area.

The range of values that congestion can take defined by (1) is \([1, + \infty [\ \cup \left\{ + \infty \right\}\). Two problematic cases exist. When the departure load is zero, the congestion is infinite and for the special case where, in addition to a zero denominator of departures, the arrival load is zero, the congestion is then undefined. An arbitrary value could be defined in this case but measuring service performances does not have a concrete sense during a period where no services can be performed due to the absence of demand. This remark can be extended to waiting time measurements, which also require the departure and/or arrival of at least one patient in the time window selected. On empirical observations, these problems can be mostly avoided using sufficiently long intervals to allow the system to react to a positive arrival load. More precisely, the assumption to use the new congestion measure is the requirement to have a lag such that arrival and departure loads can be observed for the times t of interest. However, these exceptional cases complicate the stochastic analysis, by making the expectancy value of congestion either undefined or infinite in a discrete event theoretical model and by requiring special monitoring in data analysis. Nevertheless, asymptotic models (where the lag \(h \rightarrow +\infty\)) and fluid limit continuous queuing models can address the study of this measure using a deterministic modeling approach.

3.2 Evaluation protocol of congestion pertinence

As the knowledge of the congestion measure combined with that of the arrival processes is equivalent to the knowledge of occupation and approximate waiting time, the main goal of the evaluation procedure is to show that the congestion measure is a useful metric on its own, that is without having to consider any other variable. To be sufficiently extensive, the evaluation procedure is split into three parts. The first part presents the mathematical properties of congestion that can be observed on a particular sample or realization of a system process with its arrival and departure. More precisely, it shows the characteristics of this measure with in relation to other crowding measures on a large temporal scale with the lag \(h \rightarrow + \infty\). The second part looks at the low temporal scale behavior, with fixed lag \(h\), of \(\rho _{h}(t)\) through the fluid limit model of the theoretical queuing models of type \(M_{t}\left| M_{t} \right| + \infty\) (Shortle et al. 2017). All necessary proofs of these two parts, and of the supplementary results, are available in the Supplementary Material. The use of this limit model avoids the issue encountered by stochastic models where congestion can be infinite or undefined and allows concrete analytic formulation and pertinent properties. Finally, a case study is performed to illustrate and support the results of the first two parts using the observations made on the 279,594 patient stays between 2017 and 2021 in the Troyes ED. The following Sect. 3.3 gives additional details on data analysis and ED context.

This study analyzes the behavior of the new measure at different time scales compared to average occupation \(L\), average waiting time \(W_{d}\) and \(W_{a}\), proportion of patients who leave without being seen PLWBS. It also analyzes its queuing behavior in a series queuing network with its propagation and the identification of a bottleneck. This evaluation relies on a correlation analysis through the Pearson correlation \(r_{\text {xy}} = \frac{\sum _{i = 1}^{n}{(x_{i} - {\overline{x}})(y_{i} - {\overline{y}})}}{\sqrt{\sum _{i = 1}^{n}\left( x_{i} - {\overline{x}} \right) ^{2}}\sqrt{\sum _{i = 1}^{n}\left( y_{i} - {\overline{y}} \right) ^{2}}}\) in cases where a linear regression model is used. When directly comparing two variables through their rank values, the Spearman correlation \(r_{S} = 1 - \frac{6\sum _{i = 1}^{n}d_{i}^{2}}{n(n^{2} - 1)}\) is used with \(d_{i}\) the difference in the ranks given the mix of two variables with \(i\) referring to their original rank in the observations and \(n\) the total number of observations. The correlation \(r_{S} \in [- 1,1]\) is maximal when \(r_{S} = 1\) or \(r_{S} = - 1\) and minimal when \(r_{S} = 0\) and its interpretation is similar to the Pearson correlation, but highly strong monotonous non-linear relationships are better measured with the Spearman correlation. Given an estimation \({\widehat{Y}}\) and the observations \(Y = \left( y_{n} \right) _{n \in \{ 1...N\}}\), it also relies on a statistical indicator with the mean absolute error \(\text {MAE}(Y) = MAE\left( {\widehat{Y}},Y \right) = \frac{1}{N}\sum _{n = 1}^{N}{|{{\widehat{y}}}_{n} - y_{n}|}\) and the mean absolute percentage error \(\text {MAPE}(Y) = MAPE\left( {\widehat{Y}},Y \right) = \ \frac{\text {MAE}(Y)}{\frac{1}{N}\sum _{n^{'} = 1}^{N}y_{n^{'}}}\).

3.3 Emergency Department under study

To evaluate its pertinence on service systems, the practical use of the original congestion metric must be tested and analyzed on specific cases. This study begins this investigation by applying this crowding measure to an Emergency Department. More particularly, the data involved in this study come from the Troyes Hospital ED in Eastern France from 2017 to 2021. This is the largest hospital in the Aube Department of France, which has a population of 310,000 inhabitants and a medical density of 234.1 physicians per 100,000 inhabitants. With 279,594 ED visits during the study period (2017 to 2021) and 62,082 ED visits in 2018 and, coming from an average use rate of 250 to 330 visits per 1,000 inhabitants within the hospital’s service area, it belongs to the top quartile of EDs in France in terms of arrival volume. It thus constitutes a good initial point of examination as the variations in crowding patterns are associated with those of arrival rates, which are in turn expected to be associated with the global volumes of arrivals. Figure 2 illustrates the patient flow design of this ED with two circuits, the short circuit being a fast track for low acuity patients.

Fig. 2
figure 2

ED patient flow design (ED of Troyes)

Like most modern EDs in France and the United States, this ED can be broken down into four service stages:

  1. 1.

    A triage process where the patient goes through an administrative registration, indicating their arrival in the ED, followed by an initial examination of their health status and health problem by a triage nurse. In most EDs, this examination leads to a triage classification, generally with five classes. In France, the CIMU triage scale is used to facilitate the management of ED patient pathways (Chauvin et al. 2016). In the Troyes ED, there are five CIMU classes ranging from 5 to 1, and priorities are given with intervention time limits of 240 min, 120 min, 60 min, 20 min and 0 min respectively.

  2. 2.

    An initial examination phase where the patient waits in the waiting room before being moved to a consultation box to be seen by a doctor. Together with the triage phase, this step is crucial, as the risks of patients leaving without treatment or having health complications are much greater before they are first seen by a doctor. The “Door-to-Doc” time is therefore an important measure to monitor (D2D) (Cochran and Roche 2009).

  3. 3.

    A supplementary examination phase where the doctor who saw the patient gives him/her a prescription to fill in and/or out of the ED. This is mainly for biological exams (blood tests), radiological exams, surgical and other treatments such as drugs. During this phase, the doctor takes all the necessary steps to reach a diagnosis, often summarized by an ICD-10 code.

  4. 4.

    A final phase where the patient is held in an ED after the diagnosis has been made. This happens mainly for patients requiring hospitalization and waiting for a hospital bed, but it also happens for other reasons that prevent patients from going directly home, such as supplementary exams results, treatments or simply the need for some rest.

All patients whose stays ended “normally” were considered for this four-stage service breakdown, i.e., all patients who did not leave the ED without completing their stay and whose real waiting time in the system was known. This represents a total of 279,594 stays over the study period from 2017 to 2021. For patients who left without having to complete all the service steps, or who skipped a step, their arrival and departure times were considered identical to the last departure from the service step they left. For example, patients needing resuscitation pass through the triage system and are thus counted, but are then transferred to a resuscitation room where they are not accounted for, and they usually do not pass through the other service stages.

4 Results

4.1 General and long-term behavior of congestion: sample path characterization

Congestion can be characterized along with its lag parameter \(h \in {\mathbb {R}}^{+}\) as well over time \(t \in {\mathbb {R}}^{+}\). The complete characterization of \(\rho _{h}\left( t_{0} + h \right)\) as a function of \(h\) and of \(\rho _{h}(t)\) as a function of \(t\) are given in Supplementary Material section 1.2. To use congestion \(\rho _{h}(t)\) as a monitoring tool for crowding evaluation, one first has to select a pertinent lag \(h \in {\mathbb {R}}^{+}\). Theorem 1 shows the main sample path result of the characterization of \(\rho _{h}\left( t_{0} + h \right)\) as a function of \(h\).

Theorem 1

Asymptotic equivalence between \(\rho _{h}({t}_{0+h})\) and \({W}_{{d,h}}({t}_{0+h)})\) Given the existence of an arrival rate \(\lambda = \lim _{h \rightarrow + \infty }\frac{A(h)}{h} \in {\mathbb {R}}^{+}\) (that is \(A(h)\sim h\lambda \ as\ h \rightarrow + \infty\)) and of finite asymptotic average waiting time \(\lim _{h \rightarrow + \infty }W_{d,h}\left( t_{0} + h \right) = \ \omega \in {\mathbb {R}}^{+}\), congestion asymptotic behavior can be related to waiting time through \(f=1-x^{-1}\):

$$\begin{aligned} \frac{h}{2}f\left( \rho _{h}\left( t_{0} + h \right) \right) \sim \ W_{d,h}\left( t_{0} + h \right) \ as\ h \rightarrow + \infty \end{aligned}$$
(2)

Theorem 1 shows the central asymptotic relationship between congestion and average waiting time with (2). This result helps to understand the long-term behavior of congestion and can be extended further, with the Corollaries 1 and 1.2 (in Supplementary Material), as leads to many equivalences between congestion and average waiting time.

Corollary 1

Given the hypotheses and context of Theorem 1 and two systems 1 and 2, which may refer to a same unique one, the comparison between two congestion measures and two waiting times are related as such:

$$\begin{aligned} \begin{aligned} \frac{h}{2}( \rho _{1,h}\left( t_{1} + h \right)&- \ \rho _{2,h}\left( t_{2} + h \right) ) \\& \sim\left( W_{1,d,h}\left( t_{1} + h \right) - W_{2,d,h}\left( t_{2} + h \right) \right) as\ h \rightarrow + \infty \ \end{aligned} \end{aligned}$$
(3)

Taking a single system for ease of notation, one can relate the arithmetic mean of a set of congestion measures with the corresponding arithmetic mean of a set of average waiting time measures:

$$\begin{aligned}{} & {} \frac{h}{2}f\left( \frac{1}{|K|}\sum _{k \in K}^{}{\rho _{h}\left( t_{k} + h \right) } \right) \sim \frac{1}{|K|}\sum _{k \in K}^{}{W_{d,h}\left( t_{k} + h \right) }\ as\ h \rightarrow + \infty \ \end{aligned}$$
(4)
$$\begin{aligned}{} & \ \frac{h}{2}\Bigl ( \frac{1}{|K_{1}|}\sum _{k \in K_{1}}^{}{\rho _{h}\left( t_{k} + h \right) } - \ \frac{1}{|K_{2}|}\sum _{k^{'} \in K_{2}}^{}{\rho _{h}\left( t_{k^{'}} + h \right) }\Bigr ) \nonumber \\ &\quad \sim \Bigl ( \frac{1}{|K_{1}|}\sum _{k \in K_{1}}^{}{W_{d,h}\left( t_{k} + h \right) } - \frac{1}{|K_{2}|}\sum _{k^{'} \in K_{2}}^{}{W_{d,h}\left( t_{k^{'}} + h \right) } \Bigr )\ as\ h \rightarrow + \infty \ \end{aligned}$$
(5)

The main consequence of Corollary 1 is the ability to directly map the variations of mean congestion with the variations of mean average waiting time on large scale behavior. Theorem 1 also explains behaviors of congestion in queuing series networks, more precisely tandem queues, as shown in Corollary 1.2 of Supplementary Material Section 1.2.

4.2 Short-term behavior of congestion: asymptotic characterization by fluid modeling

This section deepens the analysis of the congestion measure \(\rho _{h}(t)\) by looking at its properties under specific models. Although some assumptions of these models might oversimplify the behavior of real service systems, they make it possible to observe and quantify some behaviors of this new measure that may not otherwise be seen. In particular, Theorem 2 considers an approximation of congestion \(\widetilde{\rho _{h,\epsilon }}(t)\) that is always defined \(\forall t\in {\mathbb {R}}^{+*}, h\in {\mathbb {R}}^{+*}\), even in the absence of arrivals or departures. Like the example given in Supplementary Material section 1.3, when \(\rho _{h}(t)\) is defined, one has \(\widetilde{\rho _{h,\epsilon }}(t)\overset{{\mathcal {P}}}{\rightarrow }\rho _{h}(t)\) when \(\epsilon \overset{{\mathcal {P}}}{\rightarrow }0\) also by definition.

Theorem 2

Convergence of \(\widetilde{{\rho }_{{h},\epsilon }}\left( {{t}} \right)\) to deterministic \({\rho }_{{{h}}}\left( {{t}} \right)\) Given the point stochastic processes \(\Delta A_{t - h}(t),\) \(\Delta D_{t - h}(t)\), their expectancies \(\Lambda _{h}(t)\mathbb {= E(}\Delta A_{t - h}(t))\) and \(M_{h}(t)\mathbb {= E( }\Delta D_{t - h}(t))\), and a random variable \(\epsilon\) such that \(M_{h}(t) = O(\Lambda _{h}(t))\), \(Var(\Delta A_{t - h}( t) ) = o(\Lambda _{h}(t)^{2})\), \(Var(\Delta D_{t - h}( t) ) = o(\Lambda _{h}(t)^{2})\) and \(\epsilon \overset{{\mathcal {P}}}{\rightarrow }0\ as\ \Lambda _{h}(t) \rightarrow + \infty\), we have the following probability convergence results:

$$\begin{aligned}&\frac{1}{\Lambda _{h}(t)}\int _{t - h}^{t}{\Delta A_{t - h}(u)}du\overset{{\mathcal {P}}}{\rightarrow }1\ as\ \Lambda _{h}(t) \rightarrow + \infty \end{aligned}$$
(6)
$$\begin{aligned}&\frac{1}{M_{h}(t)}\int _{t - h}^{t}{\Delta D_{t - h}(u)}du\overset{{\mathcal {P}}}{\rightarrow }1\ as\ \Lambda _{h}(t) \rightarrow + \infty \end{aligned}$$
(7)
$$\begin{aligned}&\widetilde{\rho _{h,\epsilon }}(t)\overset{{\mathcal {P}}}{\rightarrow }\rho _{h}(t) = \frac{\Lambda _{h}(t)}{M_{h}(t)}\ as\ \Lambda _{h}(t) \rightarrow + \infty \end{aligned}$$
(8)

Given some reasonable assumptions, this theorem implies that as the arrival load gets bigger, congestion, as given by Definition 1 and extended by \(\widetilde{\rho _{h,\epsilon }}(t)\), can be well approximated by deterministic congestion \(\rho _{h}(t)\), as given by Theorem 2, which is the ratio of the expectancies of the arrival and departure load. In particular, the traditional and well-studied \(M_{t}\left| M_{t} \right| + \infty\) queuing models as well as finite server models with service rates proportional to queue length, constitute a type of model that respects these assumptions. Consequently, it can be approximated with a deterministic fluid model characterized by the continuous differential equation \(\frac{{dl}(t)}{{dt}} = \ \frac{{dA}(t)}{{dt}} - \frac{{dD}(t)}{{dt}} = \lambda (t) - \mu (t) l(t)\) as shown in corollary 2.5 of Supplementary Material section 1.3. To facilitate the study of this limit case behavior, particularly on the local evolution scale where \(\lambda (t)\) and \(\mu (t)\) do not vary much, the stationary case of \(M|M| + \infty\) is presented below. Corollary 2 shows that from an initial empty state, the system converges to a constant state of occupation \(\frac{\lambda }{\mu }\) with windowed average waiting times \(\frac{1}{\mu }\), which is related to congestion \(1 + \frac{2}{h\mu }\) and is coherent with relation (2).

Corollary 2

Fluid system approximation of \({{M}}\left| {{M}} \right| {+ }\infty\) Given the arrival rate \(\lambda (t) = \lambda \in {\mathbb {R}}^{+ *},\ \forall t \in {\mathbb {R}}^{+}\) and service rate \(\mu (t) = \mu \in {\mathbb {R}}^{+ *},\ \forall t \in {\mathbb {R}}^{+}\), the model \({\bf{M}}\left| {\bf{M}} \right| \mathbf {{+} \infty }\) will converge to fluid model described by the differential equation:

$$\begin{aligned}{} & {} \frac{{dl}(t)}{{dt}} = \ \frac{{dA}(t)}{{dt}} - \frac{{dD}(t)}{{dt}} = \lambda - \mu l(t) \end{aligned}$$
(9)
$$\begin{aligned}{} & {} l(t) = \frac{\lambda }{\mu }\left( 1 - e^{- \mu t} \right) \underset{t \rightarrow + \infty }{\rightarrow }\frac{\lambda }{\mu } \end{aligned}$$
(10)

Which results in the congestion and average waiting time:

$$\begin{aligned}{} & {} \rho _{h}(t) = \frac{h\mu + 2\left( 1 - e^{- \mu (t - h)} \right) }{h\mu + \frac{2}{h\mu }\left( e^{- \mu (t - h)}\left( 1 - e^{- \mu h} - h\mu \right) \right) }\underset{t \rightarrow + \infty }{\rightarrow }\ 1 + \frac{2}{h\mu } \end{aligned}$$
(11)
$$\begin{aligned}{} & {} W_{d,h}(t) = \frac{\frac{1}{\mu }(h + \left( t + \frac{2}{\mu } \right) e^{- t\mu } - \left( t - h + \frac{2}{\mu } \right) e^{- (t - h)\mu })}{h - \frac{1}{\mu }(e^{- \mu (t - h)} - e^{- \mu t})}\underset{t \rightarrow + \infty }{\rightarrow }\frac{1}{\mu } \end{aligned}$$
(12)

From the more general \(M_{t}\left| M_{t} \right| + \infty\) fluid limit model, one can also show several key results. The first one, presented in Corollary 3, shows the asymptotic behavior of congestion when considered at small scales as a function of local arrival rate \(\lambda (t)\) and local service rate \(\mu (t)\).

Corollary 3

Local behavior of congestion If \(l\left( t_{0} \right) > 0\) and \(\mu (t)\) and \(\lambda (t)\) are continuous function, the congestion measure has the following equivalence:

$$\begin{aligned} \begin{aligned} \rho _{h}\left( t_{0} + h \right) \ {}&\sim \ \frac{\lambda (x)}{2\mu (x)l\left( t_{0} \right) } + \frac{1}{\mu (x)h}\ \ \sim \ \frac{1}{\mu (x)h}\ \\ \ {}&\forall x \in \left[t_{0},t_{0} + h \right]\text {\ as\ }h \rightarrow 0 \end{aligned} \end{aligned}$$
(13)

Corollary 3 links the local service performances to the local behavior of congestion when the system is not initially empty. On a small temporal scale where \(\frac{1}{\mu (x)h} \gg \frac{\lambda (x)}{2\mu (x)l\left( t_{0} \right) }\), congestion is directly related to local service performance (\(\mu (t)\)). However, on a medium temporal scale where \(\frac{1}{\mu (x)h}\) and \(\frac{\lambda (x)}{2\mu (x)l\left( t_{0} \right) }\)are at the same order of magnitude, congestion is also related to the state of local crowding evolution given by the ratio \(\frac{\lambda (x)}{2\mu (x)l\left( t_{0} \right) }\sim \frac{1}{2}\frac{dA(x)}{dD(x)}\text {\ as\ }h \rightarrow 0\), which corresponds to traditional congestion as defined in the literature overview Sect. 2. In particular, if the service rate is constant (\(\forall x \in {\mathbb {R}}^{+},\mu (x) = \mu \in {\mathbb {R}}^{+}\)), \(\rho _{h}\left( t_{0} + h \right)\) evolution directly reflects the state of local crowding evolution.

A second result states that, contrary to the average occupation \(L_{h}(t)\), the behavior of congestion \(\rho _{h}(t)\) is independent from the total arrival \(\Lambda _{h}(t) = \Delta A_{t - h}(t)\) if the relative distribution of arrivals \(g_{[t - h,t]}(u)\) is known. This is shown in the following Corollary 4, which links the behavior of the total arrival \(\Lambda _{h}(t) = \Delta A_{t - h}(t)\) to the behavior of congestion \(\rho _{h}(t)\).

Corollary 4

Independence of \({\rho }_{{{h}}}\left( {{t}} \right)\) and \({{{\Lambda }}}_{{{h}}}\left( {{t}} \right)\) given \({{g}}_{\left[{t - h,t} \right]}\left( {{u}} \right)\) Given a service rate \(\mu (t)\) and the relative distribution of arrivals \(g_{[t - h,t]}(u)\) in \([t - h,t]\), one can deduce the congestion with:

$$\begin{aligned} \rho _{h}(t) = f^{- 1}\left( \frac{\int _{x = t - h}^{t}{\int _{u = t - h}^{x}{e^{- \int _{u}^{t}{\mu (v)\text {dv}}}g_{[t - h,t]}(u)\text {du}}\text {dx}}}{\int _{x = t - h}^{t}{\int _{u = t - h}^{x}{g_{[t - h,t]}(u)\text {du}}\text {dx}}} \right) \end{aligned}$$
(14)

This corollary shows the relative nature of congestion that, under a certain distribution \(g\), will reflect the performances of a system service captured by the service rate \(\mu (t)\). More precisely, the term \(e^{- \int _{u}^{t}{\mu (v)\text {dv}}}\) shows that past service performances are exponentially more important than current ones as the crowding, or average occupation they avoid increases overtime. The average occupation \(L_{h}(t)\) also captures the service performances but, under the same distribution \(g\), is directly proportional to the arrival total \(\Lambda _{h}(t)\):

$$\begin{aligned} L_{h}(t) = \frac{\Lambda _{h}(t)}{h}\int _{x = t - h}^{t}{\int _{u = t - h}^{x}{e^{- \int _{u}^{t}{\mu (v)\text {dv}}}g_{[t - h,t]}(u)du}dx} \end{aligned}$$
(15)

As such, this corollary also shows that congestion can isolate better service performances relative to arrival load than average occupation while considering the distribution of service performances, where recent performances are yet to have an important impact on crowding. However, congestion is still dependent on the distribution of arrivals \(g_{[t - h,t]}\) in the window of interest \([t - h,t]\) and thus captures a combination of information about the impact of the distribution of the arrival rate, the distribution and the amplitude of the service rate, which furthermore often depends on occupation level and indirectly on past arrival rates. Under an unspecified distribution \(g_{[t - h,t]}\), the lower bound \(f^{- 1}\left( \frac{L_{(t-t_0)}(t)}{\Delta A_{t_{0}}(t)} \right)\) and upper bound \(f^{- 1}\left( \frac{L_{(t-t_0)}(t)}{l(t_{0})} \right)\) on \(\rho _{(t-t_0)}(t)\) defined in proposition 2 of Supplementary Material section 1.2 are still valid. Concretely, for a constant\(\ \Lambda _{h}(t)\), the lowest bound corresponds to the limit case where arrival load is generated by the initial (positive) occupation, here leaving the time to see the exponential system service response, and the upper bound is infinite corresponding to the limit case where all the arrival load is created at the end of the window \([t - h,t]\) with a null initial occupation. Some examples of distribution \(g_{[t - h,t]}\) with the associated behavior of \(\rho _{h}(t)\) are given with proposition 3 and 4 of Supplementary Material section 1.2. They illustrate effects such as, for a constant service rate and a given arrival load, congestion increasing with an arrival rate that is more distributed at the end of the window.

Corollary 4 shows that the new congestion measure also embeds information about the arrival rate relative distribution on the window of information. If one wants to specifically isolate the local service performances \(\mu (t)\), two other methods exist. The first one involves the use of the approximate average waiting time of departed patients, defined in method Sect. 3.1 with \(\frac{1}{{{\widetilde{W}}}_{d,h}(t)} = \frac{D(t) - D(t - h)}{hL_{h}(t)}\ \ \sim \frac{dD(t)}{dt}*\frac{1}{l(t)} = \mu (t)\ as\ h \rightarrow 0\) and \(\rho _{{{\widetilde{W}}}_{d,h_{1}}(t),h_{2}} = 1 + \frac{2{{\widetilde{W}}}_{d,h_{1}}(t)}{h_{2}}\) the corresponding \(M|M| + \infty\) asymptotic congestion. A second possibility is to use a dichotomous fitting strategy with an \(M_{t}|M| + \infty\)estimation \({{\widehat{\mu }}}_{h}(t)\) given below in the Definition 2 using the measures given by \(\rho _{h}(t)\) and \(g_{[t - h,t]}(u)\).

Definition 2

Given the functions \(\rho _{h}(t)\) and \(g_{[t - h,t]}(u)\), the local service performance rate \({{\widehat{\mu }}}_{h}(t)\), if it exists, is defined using a \(M_{t}|M| + \infty\)fluid limit model:

$$\begin{aligned} {{\widehat{\mu }}}_{h}(t) = \text {arginf}_{\mu > 0}\left\{ \left| f\left( \rho _{h}(t) \right) - \frac{\int _{x = t - h}^{t}{\int _{u = t - h}^{x}{e^{- (t - u)\mu }g_{[t - h,t]}(u)du}\text {dx}}}{\int _{x = t - h}^{t}{\int _{u = t - h}^{x}{g_{[t - h,t]}(u)du}\text {dx}}} \right| \right\} \end{aligned}$$
(16)

And the associated \(M|M| + \infty\) limit congestion, using two potentially different lags \(h_{1}\) and \(h_{2}\), is defined as:

$$\begin{aligned} \rho _{{{{\widehat{\mu }}}_{h_{1}}}(t),h_{2}} = 1 + \frac{2}{h_{2}{{\widehat{\mu }}}_{h_{1}}(t)} \end{aligned}$$
(17)

Theorem 2 and the other associated Corollaries 2, 34 (and 2.5 in Supplementary results) enable a comparison of the new congestion measure and the average waiting time at small temporal scale. Indeed, in this context, the traditional use of the average waiting time of patients that depart the system to measure service performances is problematic. It is true that, under a \(M|M| + \infty\) model assumption, this measure will converge to the inverse of the service rate as shown in result (12) and should converge in the more general case of the \(M_{t}|M| + \infty\) as \(h \rightarrow + \infty\) from an initial \(t_{0} > 0\). However, by using the patient rather than the system perspective, this measure is ineffective at capturing local service performances in a \(M_{t}\left| M_{t} \right| + \infty\) queuing system for a given time window \([t_{0},t_{0} + h]\) on the medium and small temporal scales. By taking a \(M_{t}\left| M_{t} \right| + \infty\) fluid limit model example, Corollary 5 shows why this is the case.

Corollary 5

Local behavior of average waiting time Given a \(M\left| M_{t} \right| + \infty\) fluid limit model with \(\lambda (t) = \lambda \in {\mathbb {R}}^{+ *},\ \forall t \in {\mathbb {R}}^{+}\) and \(\mu _{u}(t) = \mu _{1}\mathbbm {1}_{t < t_{0}} + \mu _{2}\mathbbm {1}_{t \ge t_{0}} \in {\mathbb {R}}^{+ *},\ \forall t,u \in {\mathbb {R}}^{+}\), the time point \(t_0\) marking a sudden change of service rate, the average waiting \(W_{d,h}\left( t_{0} + h \right)\) can be decomposed into:

$$\begin{aligned} W_{d,h}\left( t_{0} + h \right) = \frac{\omega _{1,h}\left( t_{0} + h \right) + \omega _{2,h}\left( t_{0} + h \right) + \omega _{3,h}\left( t_{0} + h \right) }{\gamma _{1,h}\left( t_{0} + h \right) + \gamma _{2,h}(t_{0} + h)} \end{aligned}$$
(18)

With \(\omega _{1,h}\left( t_{0} + h \right)\) is the measured load of waiting time of entities arrived before \(t_{0}\) and that happens after \(t_{0}\), \(\omega _{2,h}\left( t_{0} + h \right)\) the one of entities arrived after \(t_{0}\) and that happens after \(t_{0}\), \(\omega _{3,h}\left( t_{0} + h \right)\) the one of entities arrived before \(t_{0}\) and that happens before \(t_{0}\); \(\gamma _{1,h}\left( t_{0} + h \right)\) is the arrival load of entities arrived before \(t_{0}\) and \(\gamma _{2,h}\left( t_{0} + h \right)\) is the one of entities arrived between \(t_{0}\) and \(\left[t_{0},t_{0} + h \right]\). With this decomposition, one has the following results:

$$\begin{aligned}{} & {} \frac{\omega _{1,h}\left( t_{0} + h \right) + \omega _{2,h}\left( t_{0} + h \right) }{\omega _{1,h}\left( t_{0} + h \right) + \omega _{2,h}\left( t_{0} + h \right) + \omega _{3,h}\left( t_{0} + h \right) }\ \sim \frac{2h\mu _{1}^{2}}{\mu _{2}} \rightarrow 0\ as\ h \rightarrow 0\ and\ t_{0} \rightarrow + \infty \end{aligned}$$
(19)
$$\begin{aligned}{} & {} W_{d,h}\left( t_{0} + h \right) \sim \frac{\omega _{3,h}\left( t_{0} + h \right) }{\gamma _{1,h}\left( t_{0} + h \right) }\sim \frac{1}{\mu _{1}} + \frac{2h\mu _{1}}{\mu _{2}} \rightarrow \frac{1}{\mu _{1}}\ as\ h \rightarrow 0\ \text {and }t_{0} \rightarrow + \infty \end{aligned}$$
(20)

Corollary 4 shows that as the lag \(h\) decreases the proportion of waiting load belonging to \(\left[t_{0},t_{0} + h \right]\) decreases up to 0 by being proportional to \(\frac{h\mu _{1}}{2}\) near \(t_{0}\). In the proposed model with sudden service rate change at \(t_{0}\), this translates into \(W_{d,h}\left( t_{0} + h \right)\), primarily measuring past service performances \(\mu _{1}\) and not the most recent \(\mu _{2}\). In the case of the ED, if one considers a supplementary rate \(\mu _{\text {LWBS}}\) as a valid model to capture LWBS patients, the proportion and the probability of arrived patients to LWBS after \(t_{0}\) corresponds to \(PLWBS = \frac{\mu _{\text {LWBS}}}{\mu _{2} + \mu _{\text {LWBS}}}\) and thus will be directly related to \(\mu _{2}\), which can solely be captured with the help of congestion \(\rho _{h}\left( t_{0} + h \right) \sim \frac{1}{h*\mu _{2}}\) as \(h \rightarrow 0\ and\ t_{0} \rightarrow + \infty .\)

4.3 Case study of congestion: the Troyes Emergency Department

This last result subsection completes the theoretical properties shown previously by examining them in the context of the Troyes Emergency Department. The main results are presented below and some additional detailed statistics are given in Supplementary Material section 1.4.

4.3.1 General and long-term behavior

The general and long-term behavior of ED crowding can be measured through the analysis of monthly means of the crowding indicator as Fig. 3 illustrates over the five-year period. For ease of interpretation, congestion and average arrival load are studied with the monthly means of their daily values, that is with a lag of 24 h and \(t_{0} + h\) taken every day at 00:00. The years 2020 and 2021 were marked by the COVID-19 pandemic when waiting times increased by approximately two hours, while arrival load saw high fluctuations with low values during the lockdowns.

At this scale, the evolutions of mean daily congestion \({\overline{\rho }}_{\text {day}}\) and average waiting time \(W_{d,month}\) are almost identical with a Spearman correlation of 0.972, a Pearson correlation of 0.983 and the linear regression equations \(W_{d,month} = - 11,4 + 10.8{\overline{\rho }}_{\text {day}}\) and \({\overline{\rho }}_{\text {day}} = 1.06 + 0.09W_{d,month}\). This observation can be linked to Corollary 1 with result (5). The result (4) as well as Corollary 2 with results (11) and (12) lead to the interpretation that, on a large scale, a 0.083 increase in average daily congestion is comparable to one hour of average increase in waiting time as \(\frac{d(1 + \frac{2}{h\mu })}{d\left( \frac{1}{\mu } \right) } = \frac{2}{h} = \frac{1}{12}\text {hrs}^{- 1}\) \((h = 24hrs)\), which is coherent (\(\frac{1}{12} \approx 0.9\)) with the linear regression model observed.

Outside of the pandemic period (2017–2019), which greatly affected LWBS behaviors, the evolution of congestion and PLWBS are strongly related with a Spearman correlation of 0.638, a Pearson correlation of 0.666 and the linear regression equation \(PLWBS = -0.16763 + 0.13797{\bar{\rho }}_{day}\) leading to the interpretation that, on a large scale, during this period and for this system, an increase of 0.1 in daily congestion is comparable to a 1.38% increase in the proportion of patients with LWBS behavior, or 2.6 patients per day considering the average number of arrivals of 164 over the period 2017–2019.

Fig. 3
figure 3

Large scale temporal graphs of ED crowding from the of Troyes ED between 2017 and 2021 (congestion(h = 24 h) and PWLBS in 3a, congestion and waiting time (hours) in 3b, occupation and arrival load (h = 24 h) in 3c)

In a similar way, the evolutions of average occupancy \(L\) and mean daily arrival load \(\frac{1}{24\,h}\int _{}^{}{\Delta A}\) are very similar with a Spearman correlation of 0.751, a Pearson correlation of 0.775 and the linear regression equations \(\frac{1}{24h}\int _{}^{}{\Delta A} = 47.3 + 1.5L\) and \(L = - 2.71 + 0.36\frac{1}{24\,h}\int _{}^{}{\Delta A}\). This relationship can be explained with Little’s Law where the average occupation depends on both arrival rate and average waiting time and by the fact that variations in global arrival rate are much greater than that of average waiting time here. Finally, during the stable period from 2017 to 2019, the relationship between mean congestion and arrival load is weaker but presents with a Spearman correlation of 0.371, a Pearson correlation of 0.482 and a linear regression equation \({\overline{\rho }}_{\text {day}} = 1.15 + 0.0032\frac{1}{24\,h}\int _{}^{}{\Delta A}\).

4.3.2 Short term behavior

To derive the results of Sect. 4.2 with the short-term behavior of congestion, a fluid model was used. Figure 4 analyzes the quality of the approximation depending on the arrival load of the window studied. In particular, the stationary asymptotic congestion approximation of \(M|M| + \infty\) was studied for arrival load between 1 and 10,000 using 100 replications per point of arrival load studied. \(\widetilde{\rho _{h,\epsilon }}(t)\) was designed with an arbitrary 100 threshold value. Using a service rate per patient similar to the Troyes ED \(\mu = \mu (t) = \frac{\Lambda _{h}(t)}{h*30}\), a log-linear relation between the error of approximation and the arrival load \(MAPE \approx \frac{1}{\sqrt{\Lambda _{h}(t)}}\) is observed with a high Pearson correlation of 0.99. A sinus model with \(\lambda (t) = \lambda \left( 1 - \sin \left( \omega t \right) \right) ,\ \ \lambda = 150day^{- 1},\ \ \omega = 1,\ \mu = \frac{24}{5}day^{- 1}\) mimicking day-scale ED behavior is presented with \(MAPE = 0.08\) coherent with the previous relation presented. With a focus on the year 2019 for stability and time of computation (of \(\rho _{{{{\widehat{\mu }}}_{h_{1}}}(t),h_{2}}\)) purposes, the Troyes ED was modeled at different scales of six hours, one day, and one week. This was done by replacing the departure process observed with one generated using the general formula of corollary 2.5 (Supplementary Material section 1.3) where the arrival rate \(\lambda (t)\) is seen as a sum of Dirac distribution functions \(\delta _{\text {Dirac}}\) at the point of patient arrivals, and which approximates the local service rate per patient with \(\mu = \frac{1}{{{\widetilde{W}}}_{d,h}(t)}\). Using those results, a six-hour scale window was chosen for the local evolution analysis as it is the lowest lag that makes it possible to always have a positive departure load and to have an integer lag value that is a divider of 24 h. With the exception of the ED 6–12 h model with a \(\text {MAPE}\) of 0.20, all the other models have a low MAPE between 0.02 and 0.098, which are lower than in the \(M|M| + \infty\) case.

Fig. 4
figure 4

Quality of approximation of a fluid model depending on the arrival load

Fig. 5
figure 5

Mean evolution of 6 h congestion throughout the day (with ED 2019 data)

Fig. 6
figure 6

Ratio of internal waiting load given the lag of observation

Subsection 4.2 with Corollary 3 shows that local congestion depends on both the amplitude of the inverse service rate and the ratio of arrival to service rate. Figure 5 shows how congestion with a lag of six hours evolves throughout the day where one additional hour of average waiting time corresponds to 1/3 of additional congestion asymptotically. The \(M|M| + \infty\) and sinus model theoretical evolution with arrival amplitude \(\lambda = 6.25~\text {hrs}^{- 1}\) and stationary service rate value \(\mu = \frac{1}{5}~\text {hrs}^{- 1}\), which are close to the 2019 long-term observed values of the ED, are shown in Figs. 45 and 6 for comparison. The congestion \(\rho _{6~\text {hrs}}\) observed in the ED in blue has much more amplitude, which rises above 4 at 12:00 a.m., than the sinus model, which only rises to 3. This is related to the variation in service rate amplitude captured by \(\rho _{{{\widetilde{W}}}_{d,6\text {hrs}}(t)}\) and more accurately, but with the same behavior, by \(\rho _{{{\widehat{\mu }}}_{6\text {hrs}}(t)}\) which uses a 20-step dichotomous algorithm to obtain \({{\widehat{\mu }}}_{6\text {hrs}}\). As such, the rise in congestion is the consequence of a rapid rise in arrivals in the early morning and a decline in local service rate. This evolution can be associated with a slowdown effect of having to treat more patients in parallel and to the fact that the rate of service or departure of a patient is very low during the beginning of their stay, specifically during the first hour, as they must go through the various stages of the ED. The congestion mean evolution is also available in the Supplementary Material section 1.4. Furthermore, the peak of congestion \(\rho _{6\text {hrs}}\) is delayed by approximatively one hour compared to the other two indicators \(\rho _{{{\widetilde{W}}}_{d,6\text {hrs}}(t)}\) and \(\rho _{{{\widehat{\mu }}}_{6\text {hrs}}(t)}\), which shows that it takes approximately this time for the speedup of global service rate (or departure rate) to begin improving the crowding evolution situation.

Finally, Fig. 6 illustrates that the congestion measure is much more pertinent at the local six-hour scale compared to average waiting time \(W_{d,6~\text {hrs}}\) where around half of the waiting load at this scale is outside the window of interest \([t - 6~hrs,t]\).

5 Discussion

This study presents congestion as an original indicator of real-time crowding. Its mathematical expression and properties show that it is a measure of system performance, or more precisely of service rate per patient, relative to an arrival context over a given time window. It is also a function of the measure of system arrival load relative to system performance context. This study has shown that the new congestion measure behaves similarly to average waiting time on the long-term scale. It can be used as an alternative indicator when considering a system perspective, which can be more pertinent in large scale systems and in general for system management. However, in the short term, the fluid limit model analysis shows that congestion is much more pertinent for capturing service performances in a given window of time. Furthermore, as further detailed in Supplementary Material results section 1.4, it is a better local predictor of the proportion of LWBS patients, which makes it a good measure for ED crowding management.

5.1 Comparison of crowding measures: making the right choice

This study aims to characterize the different general crowding metrics that can be used on a service system as defined in Sect. 4. It constitutes a guide for choosing a crowding measure depending on the monitoring objective. In particular, three types of objectives can be distinguished with the tracking of service demand, of the service offer, i.e. service rate performances, and of the crowding resulting from the response of the service offer to the demand. These monitoring objectives can further be realized from two types of perspective, that is the global system perspective and the patient perspective, and at different temporal scales, ranging from the short-term local evolution, which corresponds to a few hours in the case of an ED, to the long-term evolution, from one week to multiple years in the case of an ED.

The following paragraphs discuss the general guidelines for choosing a crowding measure depending on the perspective and the temporal and system scales chosen, as they can all be useful depending on the monitoring objective.

On the large temporal scale, particularly in asymptotic theoretical models, method Sect. 3 and Theorem 1 show that the three types of monitoring can be extracted through several indicators. Indeed, the demand can be captured by four indicators \(\Delta A_{t_{0}}( t )\sim \Delta D_{t_{0}}( t )\sim \frac{2}{h}\int _{t_{0}}^{t}{\Delta A_{t_{0}}( u )}du\sim\) \(\frac{2}{h}\int _{t_{0}}^{t}{\Delta D_{t_{0}}( u )}du\ (as\ t \rightarrow \ + \infty )\) for the system perspective, and by the mean number of arrivals and departures observed by patients during their stay for the patient perspective. The service offer performances are captured by average waiting time \(W_{d}\left( \left[t_{1},\ t_{2} \right]\right) \ = W_{d,(t_2-t_1)}(t_2) \sim \ {{\widetilde{W}}}_{d}\left( \left[t_{1},\ t_{2} \right]\right) \sim\) \(\ {{\widetilde{W}}}_{a}\left( \left[t_{1},\ t_{2} \right]\right) \sim W_{a}\left( \left[t_{1},\ t_{2} \right]\right)\) as \(\text {\ \ as\ }\left( t_{2} - t_{1} \right) \rightarrow + \infty\), with \(W_{d}\left( \left[t_{1},\ t_{2} \right]\right) \text {\ and\ }W_{a}\left( \left[t_{1},\ t_{2} \right]\right)\) for the patient perspective, by the inverse of average waiting time corresponding to a service rate per patient, and by the new congestion measure \(\rho _{h}\left( t \right)\) and its other formulations \(f(\rho _{h}\left( t \right) )\), \(f^{- 1}(\rho _{h}\left( t \right) )\) adjusted on \(h\) with \(\frac{h}{2}f\left( \rho _{h}\left( t + h \right) \right)\) for example (\(f(x) = 1-x^{-1}\)). Finally, the interaction between demand and service offer can be captured by the average occupation \(L_{h}\left( t \right)\) for the system perspective, and, for the patient perspective, by the average occupation perceived by the patient during their stay in the system. Its value is greater than \(L_{h}\left( t \right)\) as patients cannot observe an empty system, and it becomes higher as the patients concentrate in the same periods, such as in the ED in the middle of the day. Additionally, the service performances also reflect indirectly the interaction between arrival demand and service offer as the latter is often a function of occupation level, like with the finite servers in traditional queuing models. With the equivalence results shown in Sect. 4.1, derived from Little’s law on the large temporal scale, the choice between average waiting time and congestion is primarily a matter of perspective and interpretation. In particular, congestion and its other formulations can be more useful with large-scale complex systems where average waiting time may not have a clear meaning.

On a medium and short temporal scale, particularly in a data-driven context and for statistical analyses, when the equivalences of measures are not yet pertinent, the choice of a crowding monitoring measure requires more attention due to the different information that each measure conveys. If the monitoring targets the demand or the interaction between demand and service offer, \(\Delta A_{t_{0}}( t )\), \((A(t) - A(t_0))\), \(\frac{2}{h}\int _{t_{0}}^{t}{\Delta A_{t_{0}}(u)}du,\ \frac{2}{h}\int _{t_{0}}^{t}{(A(t) - A(t_0))}du\) and \(L_{(t-t_0)}\left( t \right)\) and their patient perspective equivalents are still pertinent tools. However, if the monitoring targets the service performances, the patient perspective given by \(W_{d}\left( \left[t_{1},\ t_{2} \right]\right)\) and \(W_{a}\left( \left[t_{1},\ t_{2} \right]\right)\) are often no longer adequate, as a high percentage of measured waiting time is outside the targeted window. Supplementary Material section 1.4 illustrates this with average waiting time lag values of 6 h varying very little during a day. Their approximate versions \(\ {{\widetilde{W}}}_{a}\left( \left[t_{1},\ t_{2} \right]\right)\) and \({{\widetilde{W}}}_{d}\left( \left[t_{1},\ t_{2} \right]\right)\), which give a system perspective, are thus more pertinent. In particular, as shown in the results Sect. 4.3 with Fig. 5, \(\frac{1}{{{\widetilde{W}}}_{d}\left( \left[t_{0},\ t \right]\right) } = \frac{\Delta D_{t_{0}}( t )}{h}\) gives a good approximation of the service rate per patient that would be measured in a \(M_{t}|M| + \infty\) model during a given window. It shows that the service performances can be captured using the departure process \(\frac{\Delta D_{t_{0}}( t )}{(t-t_0)}\) compared to the crowding situation measured by \(L_{(t-t_0)}\left( t \right)\). \(\frac{\Delta D_{t_{0}}( t )}{(t-t_0)}\) and \(\frac{2}{(t-t_0)^{2}}\int _{t_{0}}^{t}{\Delta D_{t_{0}}( u )}du\) could also be used directly as \(L_{(t-t_0)}\left( t \right) \rightarrow \ + \infty\) when the system service rate speed is not limited by the arrival demand or the occupation level. Although this is not pertinent in the ED case study presented here where service rate per patient matters much more, it may be pertinent for manufacturing companies where the global service rate of their production system is often only limited by machines with the material supply that can be controlled so that the system always has raw materials to work with. However, using the discrete values of \(\frac{\Delta D_{t_{0}}( t )}{(t-t_0)}\) (resp. \(\frac{\Delta A_{t_{0}}( t )}{(t-t_0)}\)) does not consider their distribution in the time window \(\left[t_{0},t \right]\) and can also be problematic if the number of departures (resp. arrivals) is very low.

In this context, the new congestion measure \(\rho _h ( t ) = \ \frac{\int _{t - h}^{t}{\Delta A_{t - h}(u)}du}{\int _{t - h}^{t}{\Delta D_{t - h}(u)du}}\) and its other formulations \(f\left( \rho _h ( t ) \right)\) and \(f^{- 1}\left( \rho _h ( t ) \right)\) avoid this issue by only considering the integrals of the arrival and departure process, which are continuous values that consider their distribution in the time window \(\left[t-h,t \right]\). Corollaries 3 and 4 are central to the understanding of this new congestion. The comparison of two congestion observations under similar distributive functions of arrival \(g_{[t - h,t]}(u)\) will compare the service rate performances while considering its distributions through \(e^{- \int _{u}^{t}{\mu (v)dv}}\). The comparison of two congestion measures under similar service performances, as shown with the sinus model in Sect. 4.3.2, will compare the distributions \(g_{[t - h,t]}(u)\) where the worst congestion will be associated with the system with the least initial crowding occupation. Indeed, its crowding situation will worsen more than the other system due to the time it takes for the global departure rate, or service rate, to catch up. When both arrival density function and service rate function change, local congestion will indicate a combined effect, indicating a worsening of the crowding situation.

With these characteristics, the congestion \(\rho _h ( t )\) proposes a ratio alternative to the difference measure of average occupation \(L_h ( t )\). Congestion values are more generalizable by not depending directly on the arrival volumes \(\Lambda _{h}(t)\) in a \(M_{t}\left| M_{t} \right| + \infty\) fluid limit context. This fluid model also shows that this congestion also embeds the more traditional notion of congestion \(\frac{dA(t)}{dD(t)}\) as \(\Lambda _{h}(t) \rightarrow + \infty\) while avoiding the issues expressed earlier on the direct use of the discretized value \(\Delta A_{t_{0}}( t )\), \(\Delta D_{t_{0}}( t )\).

Consequently, the benefits of the new congestion measure can be summarized as being a general measure that can be used on any system where arrival and departure data are available, as focusing on a local time window service performance contrary to waiting time, which captures out-of-window information, and occupation, whose values are a direct consequence of past occupation accumulation, and thus as being more sensitive to local crowding evolution by capturing the capacity of a system to empty itself in a given time window. As such, this measure is a candidate for better real-time crowding monitoring of a system. However, these benefits come with some drawbacks that can be summarized as its more abstract meaning compared to occupation and waiting time, its more complex mathematical representation and properties, which notably require the use of fluid modeling to understand them, and with its focus on crowding local evolution and not on crowding local state, better captured by occupation for the system perspective and by mean waiting time for the patient perspective. Consequently, the new congestion measure should be used when the focus of crowding monitoring is to judge the adequate response of the system over a given period, whatever the situation might be as a result of past dynamics. Occupation and average waiting time are best used for long-term evaluation, when considering a patient perspective and when system capacity monitoring is required.

5.2 Isolating service performances: a first step toward a crowding diagnostic and optimization tool

Although the new congestion \(\rho _h (t)\) does not directly depend on the arrival volumes \(\Lambda _{h}(t)\) in a \(M_{t}\left| M_{t} \right| + \infty\) fluid limit context, the services performances captured by \(\int _{u}^{t}{\mu (v)dv}\) often do as result Sect. 4.3.2 illustrates with Fig. 5 using the isolation of service performances with the indicators \(\rho _{{{\widetilde{W}}}_{d,6\text {hrs}}(t)}\) and \(\rho _{{{\widehat{\mu }}}_{6\text {hrs}}(t)}\), or \({{\widetilde{W}}}_{d,6\text {hrs}}(t)\) and \({{\widehat{\mu }}}_{6\text {hrs}}(t)\). More generally, to evaluate a crowding situation beyond the natural consequences of arrival volume variations, a manager would want to assess the excess crowding caused by abnormal service performances. The rate captured by \(\frac{1}{{\widetilde{W}}_{d,h}(t)}\), \({\widehat{\mu }}_h(t)\) and \(\frac{1}{h\rho _h ( t )}\) with the fluid limit model exploited in this study constitute a first step toward this diagnostic. To extend this analysis, a \(M_{t}\left| M_{t} \right| + \infty\) model (fluid limit or stochastic) can be used where the service rate \(\mu _{m}(t)\) per patient \(m\) is modeled as a function of other local variables such as the occupation level \(l(t)\), past arrival rates, the current waiting time of the patient concerned \(w_{m}(t)\), the mean waiting time of patients present \({\overline{w}}(t)\) or the number of other resources present in the system such as the number of doctors \(l_{\text {doctor}}(t)\) or nurses \(l_{\text {nurses}}(t)\). Using this data-driven crowding model strategy, a manager could understand the behavior of service performances in any time window against other known behavior and identify crowding situations caused by abnormally low (or high) service rate per patient to focus on it and evaluate other potential factors of congestion. Using simulation-optimization metaheuristics, he/she could obtain an optimized staffing plan to minimize congestion while considering some constraints on staff costs and on the minimal and maximal length of medical shifts. As such, future studies focusing on data-driven modeling of congestion should focus on this type of approach to obtain a general data-driven crowding method.

Considering the properties established in this study, some general guidelines can already be proposed to optimize the new congestion measure without using a numerical approach, particularly to minimize its maximum value over a given period of time. These guidelines could also be used to minimize occupation and waiting time but are presented here in a data-driven context for a local time window with the properties demonstrated in this study. The first guidelines relate to the demand side of the system. Reducing global demand, by redirecting patient flow to other non-scheduled services for example, will minimize the congestion for systems whose service performance \(\mu (t)\) is a decreasing function of occupation and past arrival rates. Additionally, reducing the variability of the arrival demand, such that the global volume is distributed more evenly over the period, will also minimize the congestion, as abrupt changes will cause congestion to peak as suggested by Corollary 4. The second guidelines relate to the service performance side of the system. As seen with Corollary 3, the value of congestion depends asymptotically on the ratio between local arrival and local service rate and on the limit inversely proportional to the local service rate per patient. As such, increasing the service rate per patient by increasing available staff locally, particularly by considering its value against the local arrival rate, will make it possible to reduce the new congestion measure.

5.3 Congestion as an ED crowding score

Section 5.1 showed the relevance of congestion for measuring crowding in a general context. However, its relevance has also been specifically tested for EDs. Section 4.3, complemented by Supplementary Material section 1.4, illustrated multiple important findings directly related to the mathematical properties of Sects. 4.1 and 4.2. Specifically, Sect. 4.3.1 showed the almost identical nature of waiting time and the new congestion measure on a monthly scale using a lag of 24 h. Section 4.3.2 showed however that it behaves very differently on a short time scale. Indeed, with the finding that a fluid approximation of the ED is valid at six hours of lag, the results show that the new congestion measure is better at capturing local crowding evolution by only capturing information within the targeted six hour window, whereas around half of the waiting load is outside the window when using waiting time of departed patients. As further illustrated by the almost stagnant value of average waiting time during the day in Supplementary Material section 1.4, the new congestion measure captures local service performance variations much more accurately. Particularly with Fig. 5, the congestion value of around 2.0 at 5 a.m. doubles to above 4.0 at 12 a.m. for the 2019 ED data, showing that the arrival occupation load increases greatly during this period, by more than 100% relative to the departure load, and that the system is unable to balance it out with the departure load until late in the night.

One of the major arguments found in this study in favor of using the new congestion as a pertinent ED crowding score is its comparison, through a correlation analysis, with one of the major consequences of ED crowding, namely LWBS patients. The literature has already shown the link between the number of patients with LWBS behavior and the occupancy level as well as the EDWIN score on the day scale with a Spearman correlation of 0.771 and 0.67 respectively in the study by Kulstad et al. (2010). This link was also shown with the average NEDOCS score throughout the day (intra-day scale) giving the Spearman correlation of 0.66 in the study by Weiss et al. (2005). Given the relative nature of congestion, the present study examined the rate or proportion of LWBS (PLWBS) on the daily scale, that is the ratio of the number of LWBS patients to the total number of patients who arrived during the day. PLWBS is more relevant to measure because one would expect the number of LWBS patients to be primarily proportional to the total number of patients who arrived during the day and secondarily to congestion. This fact explains why, apart from the specificity of the context, the correlation values of the present study are lower.

Outside the context of pandemic lockdowns where patient behaviors are very different, this study shows in result Sect. 4.3 and Supplementary Material section 1.4 that congestion is more linearly related to PLWBS than day-level occupancy level, using an intra-day linear regression, with a Pearson correlation of 0.51 versus 0.48 respectively. As such, congestion is specifically pertinent to ED crowding assessment. In addition, given the internal specificities of the ED system, an even more specific, but more complex, congestion measure could be designed that would correlate with PLWBS. Indeed, one could specifically examine the proportions of patients with low triage levels in the parts of the ED just before seeing a physician with their waiting times or congestion. These patients are most likely to leave due to their low acuity and the fact that their care pathway has not yet begun (Wiler et al. 2013). While more effective in specifically targeting PLWBS as an objective measure to be inferred or predicted, this would make the measurement specific to EDs and it would cease to be a good general measure of crowding like congestion.

5.4 From traditional theoretical congestion to the new data-driven congestion

One of the main ambitions that led to the construction of this congestion measure was to transfer its initial use in theoretical queueing models to a statistical data-driven measure based solely on the observation of data of arrivals and departures occurring over a time window and not on assumptions of service organization. Indeed, these assumptions allow traditional congestion to be based on the notion of maximum asymptotic service level, which is very useful for sizing purposes in a system with stationary Markovian hypotheses and a predefined service organization, but loses its meaning and relevance to describe empirical measures of crowding locally (Cochran and Roche 2009; Cochran and Broyles 2010). To achieve this main ambition, this study showed that a relevant congestion measure \(\rho _h ( t ) = \ \frac{\int _{t - h}^{t}{\Delta A_{t - h}( u )}du}{\int _{t - h}^{t}{\Delta D_{t - h}( u )du}}\) can be formulated. While a more conventional ratio \(\frac{A(t) - A(t-h)}{D(t) - D(t-h)}\) has the notion of a saturated (\(>1\)) or unsaturated (\(<1\)) system such as traditional congestion indicates, its use to describe crowding by the direct application of arithmetic means over time becomes problematic. Indeed, the values produced by this ratio can only be compared and correctly combined by considering the initial crowding context, particularly with \(l(t - h)\), in which the measurement is made as the quantity of departures \(D(t) - D(t-h)\) is bound by \(l(t - h) + A(t) - A(t-h)\) and often is proportional to it. Moreover, over time, any stable service system alternates between phases of saturation and phases of desaturation so that an arithmetic mean loses meaning. The use of a non-arithmetic mean based on this context with \(\rho _h ( t )\) (see Supplementary Material section 1.1) proposes a solution to this problem at the cost of a change in the original meaning and its interpretation. Indeed, as \(L_h ( t )\), this new indicator measures the delay between the arrival load and the departure load from an initial state taken at \(t-h\) but using a ratio instead of a difference. This definition still encompasses the system’s ability to cope with the demand, but no longer indicates a notion of exceeding service capacity as in the case of traditional congestion. This abstract notion of service capacity is difficult to observe locally in a data-driven context with flexible service systems such as EDs where the service performances adapt to the crowding context in a much more complex manner than \(M|M|1\ or\ M|M|c\) models.

Despite the shift that has occurred in the meaning with this new congestion, it maintains an asymptotic relationship between global congestion and the bottleneck congestion of the sub part of a series system, or tandem queue, that can be observed with traditional congestion (Gershwin and Schor 2000; Ouazene et al. 2013). In the real system, these bottleneck mechanisms result either in blocking mechanisms with the bottleneck slowing down the departures of the previous stages (Koizumi et al. 2005), or in infinite waiting times at the bottleneck with the inability to catch up with the departures of the previous stages. In relation to what the previous paragraph described, the concept of bottleneck based on empirical observations of a real dynamic system such as an ED without service assumptions is more delicate. As such, the system global congestion has a more complex relationship with the congestion of the system bottleneck, as shown in corollary 1.2 (Supplementary Material section 1.2). Although the relationship is less direct than an equality between the global and the maximum congestion, it still relates to the idea that a service system can only go as fast as the slowest of its service parts. Furthermore, this study showed that the congestion of a service stage has more influence on overall congestion when its bottleneck order is higher, with the example of additional examinations being the most responsible for overall congestion in the ED under study. Asymptotically, as conveyed by corollary 1.2, this can be related to the fact that waiting times in the bottleneck of additional examinations are longer on average and vary more, which inevitably results in greater variations in the overall average wait times that have been associated with congestion.

As with traditional congestion, the new congestion measure can easily be extended to other service, transportation, production systems, given that the definition of arrivals in A(t), departures in D(t) and the right time window size \(h\in {\mathbb {R}}^{+}\) are chosen so that meaningful crowding is observed. For a factory with open workshops, for example in a wafer fabrication facility in semiconductor manufacturing, this measure could be used to capture crowding in a workshop with the arrivals corresponding to the start of the waiting process for a wafer batch and the departure either the start of the operation process to perform on the batch or its end. Considering a processing time of around one hour, a lag h of 6–12 h seems reasonable. A second very different example would be to measure the real-time performance of an IT server with the arrival of http requests (or more complex protocols) and the sending of their response considered as the departure process. Considering the service time of 100 ms (which is slow), a lag of around 1 s would be appropriate.

5.5 General limitations

This study takes an original look at congestion with the new congestion indicator \(\rho _h ( t )\) and its analysis both through theoretical properties and by testing its relevance on a real system compared to other measures. However, the design of this study as well as the indicator has certain limitations.

The first limitation lies in the validity of the fluid model approximation. The result Sect. 4.3.2 shows that this approximation is very pertinent to the ED studied, as the volumes of arrivals are sufficiently high to analyze the behavior of the congestion patterns at the six-hour scale. However, this might not be the case for other systems including other EDs where the arrival volumes might require a higher scale, such as a 12-h scale or the day-scale, for the approximation to be pertinent, thus losing some time locality on the crowding situation. However, as the arrival volumes become smaller, the aggregation view given by the congestion measure and the system perspective become less pertinent and it becomes easier to look at individual services to evaluate a crowding situation. Even if the fluid model is valid, computing \({{\widehat{\mu }}}_{6~\text {hrs}}(t)\) has memory and computational cost that must be managed as the non-optimized algorithm to find on 2019 Troyes ED data for every 30 min took 10 h to on modern cpu to complete (inter core i7vPro).

As mentioned in the previous paragraph, the second limitation of this study is that its case study is still monocentric and although the population and the ED in question are comparable to others in France (Naouri et al. 2018), with arrival and service behaviors common to most EDs worldwide (Whitt and Zhang 2017; Salmon et al. 2018), the results are still subject to selection bias. That said, the five-year Troyes ED study adequately judged crowding over a long time period. Furthermore, backed up by an in-depth theoretical study, the general nature of the indicator suggests the transferability of most of the observations discussed here to other ED systems and to other service systems. Therefore, future work should extend this study by testing this measure on other systems, in particular larger-scale systems where the concept of waiting time is not appropriate. In addition, confounding biases regarding the link between congestion and other parameters were largely avoided by considering the change in arrival contexts, particularly present in the COVID-19 pandemic years of 2020 and 2021. These confounding biases could still be considered by using local mobile means and standard deviations to obtain standardized values for each measure over the five-year period as is planned for a future study. Measurement biases are not very relevant here because patient arrivals and departures are systematically recorded with good reliability.

Congestion has a high potential for practical use in a large-scale system to better isolate service performances and relative crowding situations, but it does not replace other measures that may have more meaning in specific systems. For example, if we look at a hospital unit with a limited number of beds, occupancy gives a direct idea of the current saturation of the system. In addition, caution should be exercised when interpreting congestion in large-scale series systems, as some congestions may be more significant at certain stages than others, even if they are lower. Indeed, in the case of the ED, triage and initial examinations are less congested according to the measure introduced but more critical for patient safety and satisfaction.

6 Conclusion

The description and management of a large-scale complex service system requires the right approach and paradigm. The concepts of congestion and bottlenecks offer a concrete and practical means of theoretically capturing the performance of a system in the context of stationary queuing systems, or tandem queues, broken down into a series of steps. These concepts are crucial to providing mathematical tools that are more appropriate than level of occupancy and waiting time, which become limited in a large-scale system and on a low temporal scale. This study proposes an adaptation of this concept as a statistical indicator applicable to specific time windows. It demonstrates the relevance of the approach by examining its properties as a congestion measure relative to the context of arrival and by testing a real Emergency Department system. In particular, using a fluid mechanistic perspective, it introduces a general data-driven modeling paradigm. It allows real-time analysis of crowding where a \(M_{t}\left| M_{t} \right| + \infty\) queuing system, and its fluid-limit model, can be used and adapted with a statistical model to fit the system performances behavior. This paradigm contrasts with classic stochastic queuing approaches using models such as \(M|M|1\) or \(M|M|c\) that require more assumptions, are less generalizable and the asymptotic results they provide are often not pertinent for analyzing transient behaviors of local crowding patterns.

Future work will focus on the modeling of congestion in the Troyes Emergency Department with the new data-driven paradigm, which will enable many other types of studies on the simulation of system congestion behavior at multiple time scales, its forecasting, its optimization, and its diagnostics. Other systems, notably healthcare systems, will also be tested with this paradigm to extend the work beyond Emergency Department systems.