Quantile regression forests for individualized surgery scheduling

Dean, Arlen; Meisami, Amirhossein; Lam, Henry; Van Oyen, Mark P.; Stromblad, Christopher; Kastango, Nick

doi:10.1007/s10729-022-09609-0

Quantile regression forests for individualized surgery scheduling

Published: 18 August 2022

Volume 25, pages 682–709, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Health Care Management Science Aims and scope Submit manuscript

Quantile regression forests for individualized surgery scheduling

Download PDF

710 Accesses
3 Citations
Explore all metrics

Abstract

Determining the optimal surgical case start times is a challenging stochastic optimization problem that shares a key feature with many other healthcare operations problems. Namely, successful problem solutions require using a vast array of available historical data to create distributions that accurately capture a case duration’s uncertainty for integration into an optimization model. Distribution fitting is the conventional approach to generate these distributions, but it can only employ a limited, aggregate portion of the detailed patient features available in Electronic Medical Records systems today. If all the available information can be taken advantage of, then distributions individualized to every case can be constructed whose precision would support higher quality solutions in the presence of uncertainty. Our individualized stochastic optimization framework shows how the quantile regression forest (QRF) method predicts individualized distributions that are integrable into sample-average approximation, robust optimization, and distributionally robust optimization models for problems like surgery scheduling. In this paper, we present some related theoretical performance guarantees for each formulation. Numerically, we also study our approach’s benefits relative to three other traditional models using data from Memorial Sloan Kettering Cancer Center in New York, NY, USA.

Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data

Article Open access 25 April 2020

Predicting Procedure Duration to Improve Scheduling of Elective Surgery

Variational Bayes machine learning for risk adjustment of general outcome indicators with examples in urology

Article Open access 14 September 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The growth of personalized data from Electronic Health Records (EHR) and biomarkers has motivated us to investigate how to harness this individualization for problems in healthcare where the modeling of uncertainty is also critically required. We present a framework using quantile regression forests (QRF) to generate individualized distributions integrable into three optimizations paradigms. We demonstrate the effectiveness of our individualized optimization approach in terms of basic theory and practice. Specifically, we focus on operating room scheduling because it is exactly the type of problem that we expect can benefit greatly from our framework.

The operating room (OR) is well known to be both the most significant source of revenue and cost for hospitals [46]. As a significant driver of hospital operations and their efficiency, ORs are often affected by significant schedule volatility as they vary from low utilization to high levels of overtime [15, 37, 40]. It remains a challenge to schedule surgeries because surgical durations suffer from a high degree of variability due to the patient, procedure, and surgeon-related factors [27]. Studies on surgery scheduling, accordingly, often entail stochastic durations characterized by specified distributions as probabilistic estimators. These studies commonly defer to distributions on a population level where each patient is treated equally as coming from a single “generating” pool or a limited number of categories/types [14, 15, 35, 41]. The reason is it can be difficult to model the distribution of case times for every procedure type because there are a large number of distinct procedures, and many are known to have few observed instances [17, 36, 47]. Other studies have proposed Bayesian methods to obtain estimators even when data is limited [16, 18, 34]. These approaches still do not include, among others, patient-specific factors (e.g., comorbidities, body mass index). As a result, to our best knowledge, highly personalized case time distribution estimates have not been used for optimization in this domain.

Recently, decision-making problems like portfolio optimization [1, 21], assortment planning [26], and medical dosing [4] have introduced the use of individualized information by leveraging machine learning (ML) methods. Following this idea, we introduce individualized estimates via ML for use in optimization under uncertainty. Unlike the works above, we require the ML model to give distributional predictions to quantify the uncertainty individual to each case. Our proposed use of QRF accomplishes this as this method naturally outputs the prediction of an entire distribution. The benefit of ML methods, like QRF, is that they allow us to include patient, procedure, surgeon, and hospital factors for prediction even for unobserved feature values, and in a nonlinear manner beyond classical statistical tools [3]. Along this vein, we mention that, in the robust optimization paradigm, the closest work to ours is [50] who suggested several ML approaches to create uncertainty sets. However, in contrast to [50], our chosen ML approach of QRF accommodates not only robust optimization but also other optimization modeling methodologies.

We show that the common paradigms of optimization under uncertainty, specifically sample average approximation (SAA), robust optimization (RO), and distributionally robust optimization (DRO), are all amenable to incorporating QRF. These optimization approaches depend on a decision-maker’s risk tolerance, and QRF can be integrated into the respective formulations to enhance efficacy. Among these paradigms, SAA represents a risk-neutral decision process as a stochastic optimization method that approximates an expectation objective by sampling [28, 45]. RO, which requires only a support set for the random variables, is risk-averse as it focuses on the worst-case outcomes [5, 7]. DRO lies somewhere between SAA and RO in that it considers the worst-case performances among partially informed probability distributions [6, 13, 53].

As a demonstration of our approach, we consider the problem of setting surgery start times. We formulate an SAA, RO, and DRO scheduling model using a QRF to navigate the difficult trade-off that arises from the stochasticity of patient case durations. Namely, surgeries occur in sequence, and so if a job’s start time is set too late, then the job before it may finish with too much time left over. Such an event creates idling and can lead the last surgical case in the block to operate in overtime. On the other hand, a job scheduled too early likely cannot start on time because it needs to wait for the previous job to finish first. This delay can cause patient dissatisfaction and adverse health outcomes.

Theoretically, we illustrate how the statistical consistencies of our formulations (with respect to their target guarantees) follow the known properties of QRF. In the RO and DRO settings, we also study uncertainty sets naturally deduced from QRF and their tractabilities for the surgery problem. To complement these results, we conduct empirical experiments in collaboration with Memorial Sloan Kettering Cancer Center (MSKCC) in New York, NY, USA. We test our formulated models using real-patient data and numerically highlight the strengths of our QRF-optimization integrated framework.

The remainder of this paper is organized as follows: Section 2 introduces our setting and notation for surgery scheduling. Section 3 reviews QRF and Section 4 demonstrates how we integrate QRF into the three optimization approaches as well as their corresponding consistency results. Section 5 provides a numerical example to test our proposed methodology. We conclude with Section 6.

2 The surgery scheduling model

We consider a single operating room scheduling problem where the number of planned operations and their sequence for a given day is fixed. A common practice in surgery departments like those in cancer centers is that appointments for elective surgeries are made a few days before the surgery. Accordingly, we assume knowledge of the biometric characteristics of the incoming patients (namely the “features” in the predictive model). Our optimization model uses predicted distributions of the surgery durations, built from available patient features, to determine the best starting time for each surgery to account for the uncertainty of surgical durations. The model’s objective captures the previously mentioned trade-off between the minimization of patient wait times and the occurrence of overtime. We chose to employ a more basic model because its analysis will show that QRF is suitable for use with the three optimization methodologies considered in this paper. This is aligned with other works in the scheduling literature (e.g., [20, 30, 35] ) for clearer presentation of a proposed methodology. Moreover, we share a similar justification with the previous literature. Our objective can include idle times without significant deviation of notation if their cost rates are identical to those of the waiting times. Overall, the model setting is general enough to apply to problems in other areas where the adjustment of time allowances for a sequence of jobs is used for improved operational behavior or cost containment [42, 52].

We follow similar notations as in [14]. The model’s decision is to select the start time of each surgery for a set of n elective surgeries in a given sequence on a given day for a given operating room. Let T be the scheduled closing time of the day and $z_{i}$ define the random duration (respectively, the outcome) of the ith surgery. The decision maker has to set the surgery time allowance of the ith case, $x_{i}$, such that the first starts at time zero, the second at $x_{1}$, the third at $x_{1} + x_{2}$, and so on. Assuming that the start time for the first surgery is zero, the start time of any subsequent surgery is scheduled at the sum of the time allowances of all previous surgeries. We also denote by $Z=(z_{i})_{i=1,\ldots ,n}$ and $X=(x_{i})_{i=1,\ldots ,n}$ the vectors of surgery durations and surgery time allowances. Based on these assumptions, the waiting time, $w_{i}$, which is defined as the difference between the scheduled surgery starting time and the actual starting time whenever the previous procedure ends, and the overtime l, can be represented as:

$$\begin{aligned} w_{i}= & {} \max \{0,w_{i-1} + z_{i-1} - x_{i-1}\}, i = 2, ..., n\end{aligned}$$

(1)

$$\begin{aligned} l= & {} \max \{0,w_{n} + z_{n} - x_{n}\} \end{aligned}$$

(2)

We assume that the first surgery always starts on-time and, hence, $w_{1} =0.$ Through rescaling, without loss of generality, we assume each unit of waiting time costs one unit for any of the subsequent surgeries. Therefore, how long each patient personally has to wait from their appointed time until the start of their surgery is precisely the corresponding waiting time cost. Lastly, an overtime cost of $\phi$ per unit time is incurred if we run later than time T. Given the cost function $f(X,Z) = \sum ^{n}_{i=2} w_{i} + \phi l$ and the definitions above, a surgery scheduling model, assuming hypothetically that Z are known, is constructed as:

$$\begin{aligned} \min _{x \in \mathcal {X}} \quad&f(X,Z) = \sum ^{n}_{i=2} w_{i} + \phi l \nonumber \\ s.t. \quad&w_{i+1} =\max \{ 0,w_{i} +z_{i} - x_{i}\} \qquad i= 1, ..., n-1\nonumber \\&l =\max \{0, w_{n} +z_{n} - x_{n}\} \end{aligned}$$

(3)

where $\mathcal {X}=\{x_{i}\ge 0 \ \forall i, \sum ^{n}_{i=1} x_{i} \le T\}$ and $(x_{1},...,x_{n})$ are the surgery time allowance decision variables. By a monotonicity argument on the objective function in terms of $w_{i}$’s and l, it is standard to see that Eq. 3 can also be reformulated as

$$\begin{aligned}&\underset{\underset{w,l}{x \in \mathcal {X}}}{\min } \quad f(X,Z) \nonumber \\ s.t. \quad&w_{2} \ge z_{1} - x_{1} \nonumber \\ \quad&w_{i+1} \ge w_{i} +z_{i} - x_{i} &&\qquad i= 2, ..., n-1\nonumber \\&l \ge w_{n} +z_{n} - x_{n}\nonumber \\&w_{i}, l \ge 0 &&\qquad i= 2, ..., n \end{aligned}$$

(4)

where $(w_{2},...,w_{n},l)$ are introduced as auxiliary decision variables. Note that the closing time T is used in the feasible region $\mathcal X$. For completeness, we show in Appendix 1 that idle times can be considered in this model by adding $\sum _{i=1}^{n-1} (x_{i} - z_{i}) + w_{n}$ to the objective function when their cost rates are identical to those of the waiting times. When the Z are stochastic, we can replace the objective of Eq. 3 as either E[f(X, Z)] or $\min \{q:P(f(X,Z)\le q)\ge 1-\delta \}$, where $E[\cdot ]$ and $P(\cdot )$ are the expectation and probability taken with respect to Z. The former is an expected value formulation, and the latter is a percentile formulation. Common approaches like SAA and RO provide approximate solutions to these formulations, as we describe in Section 4.

3 Conditional distributions and quantile regression forests

Under stochasticity, we need the distribution of each $z_{i}$ to be tailored to the characteristics of each patient to ensure model Eq. 4 is solving a problem that considers the subtle differences across patient cases. Letting $\Xi$ denote the patient’s feature (e.g., gender, age etc.) vector in the feature space $\mathcal B$, we approximate $F(z|\xi )=F(z|\Xi =\xi )$, the distribution function of the surgery duration given $\Xi =\xi$.

Studies on machine learning (ML) methods for patient-specific surgical duration estimates have mainly focused on point estimators (i.e., the mean response) (e.g., [3, 44, 48]) and are typically not directly capable of predicting the probability distributions needed for optimization models involving uncertainty. Although bootstrap-type sampling can generate such predictions in basic linear models (see Section 3.5 of [23]), we consider elaborate quantile ML methods to be a more direct way of accomplishing this in complicated modeling environments. The simplest quantile method to estimate $F(z|\xi )$ is linear quantile regression (LQR) (see [29] for details), but it relies on linear assumptions. Thus, we are motivated to study QRF based on its ability to handle more complex high-dimensional data settings [38].

QRF extends from the widely known ensemble method of random forests (RF) (see [9] for details) and shares a similar procedural design. RF models are built to output mean predictions by bootstrap aggregating (or bagging) decision trees, which is known to achieve stable prediction [19] and possess variance reduction properties [10]. Leveraging this, rather than taking the mean of responses within the same leaf as the output of a tree, QRF takes the empirical distribution of the responses instead [33]. We refer to [38] for a more complete overview on QRF.

We close this section by stating that QRF recovers the true conditional distribution as the number of observations in the training data increases under appropriate regularity conditions. That is, the conditional distribution of z constructed from QRF is $\hat{F}(z|\xi )$, and it is known to converge in probability to $F(z|\xi )$ (see Theorem 1 of [38] for specific details). The next section will show how this consistency property can be leveraged to provide asymptotic guarantees for various optimization under uncertainty formulations that take into account individualized information.

4 Individualized optimization under uncertainty

This section presents our integration of QRF into three optimization formulations that capture the stochasticity of the surgery duration in an individualized fashion. We first consider an expected value formulation and SAA in Section 4.1. Then we move to a percentile formulation and RO in Section 4.2, and finally revisit the expected value formulation and DRO in Section 4.3.

We aim to provide theoretical justification for the validity of our approach by showing asymptotic guarantees for each method, as is the standard for algorithms of this nature. Through our analysis, we also hope to convey our reasoning on how ML methods can be incorporated into existing optimization modeling paradigms to enhance the quality of decisions, in particular, with respect to individualization. Moreover, our theoretical results suggest that the proposed approach can be widely applied to problems beyond just the one we consider in this paper. On the other hand, an admitted limitation of our analysis is that it characterizes only asymptotic behaviors. Therefore, our numerical results in the following section aim to offer empirical support for the practical use of our approach in the OR scheduling setting, where limited sample sizes commonly exist.

4.1 Expected value optimization and sample average approximation

Our first considered formulation is Eq. 3 but with an expected value objective function

$$\begin{aligned} E[f(X,Z)|\xi _{1},\ldots ,\xi _{n}]=\int f(X,Z)\prod _{i=1}^{n}F(dz_{i}|\xi _{i}) , \end{aligned}$$

(5)

which minimizes the average overall scheduling cost conditional on the feature $\xi _{i}$ of each patient, who uses a surgery duration $z_{i}$. In this objective, we assume the distributions of all patients are conditionally independent, and we approximate the distribution $F(z|\xi _{i})$ by QRF, namely $\hat{F}(z|\xi _{i})$.

For convenience, we denote $\hat{E}[\cdot |\xi _{1},\ldots ,\xi _{n}]$ as the conditional expectation under independent $\hat{F}(z|\xi _{i})$’s. Exact computation of the $\hat{E}[f(X,Z)|\xi _{1},\ldots ,\xi _{n}]$ in this setting can be demanding. Thus, we use sample average approximation (SAA) to approximate the problem. In particular, we generate a set of scenarios under $\prod _{i=1}^{n}\hat{F}(z_{i}|\xi _{i})$, denoted $\mathcal {S} = \{s_{1}, ..., s_{k}\}$. Every scenario $s \in \mathcal {S}$ is associated with a realization of $Z(s)=(z_{i}(s))$ and with $w_{1}(s) = 0$,

$$\begin{aligned} w_{i}(s)= & {} \max \{0,w_{i-1}(s) + z_{i-1}(s) - x_{i-1}\}, i = 2, ..., n \\ l(s)= & {} \max \{0,w_{n}(s) + z_{n}(s) - x_{n}\}. \end{aligned}$$

From Eq. 4 we can approximate the expected value problem via SAA as

$$\begin{aligned} \underset{\underset{w,l}{x \in \mathcal {X}}}{\min } \quad \sum ^{K}_{k=1}\frac{1}{K}\big (\sum ^{n}_{i=2} w_{i}(s_{k}) + \phi l(s_{k})\big ) \end{aligned}$$

(6)

$$\begin{aligned} s.t. \quad w_{i+1}(s_{k}) \ge w_{i}(s_{k}) +z_{i}(s_{k}) - x_{i}&\qquad i= 1, ..., n-1, \nonumber \\&\qquad k= 1, ..., K \end{aligned}$$

(7)

$$\begin{aligned} l(s_{k}) \ge w_{n}(s_{k}) +z_{n}(s_{k}) - x_{n} \qquad k= 1, ..., K \end{aligned}$$

(8)

$$\begin{aligned} w_{i}(s_{k}), l(s_{k}) \ge 0 &\qquad&i= 2, ..., n, \nonumber \\&&\qquad k= 1, ..., K \end{aligned}$$

(9)

where K is the number of scenarios independently generated in the SAA. Formulation Eq. 6 naturally combines the distributional prediction of QRF into the expected value minimization. As noted previously, although modeling case durations by distribution fitting is standard, utilizing the variety of data to capture the particularities across different case durations is more challenging. In this regard, unlike other approaches that may also consider nonidentical distributions in surgery scheduling, the QRF generates a distribution for every case unique to each patient as it is conditional on all their available health information. Our numerical study in Section 5.2 shows this is highly beneficial to solution quality. Under standard conditions, the SAA problem Eq. 6 has a solution and optimal value that converge to those of Eq. 5, as we formally state next.

Theorem 1

Let H(X) be the objective function Eq. 5 (suppressing $\xi _{1},\ldots ,\xi _{n}$ for convenience), and $H^{*}$ be the optimal value when solving Eq. 3 with Eq. 5 as the objective function. Let $\tilde{X}^{*}$ be an optimal solution to Eq. 6 where the scenarios are drawn from the distribution $\prod _{i=1}^{n}\hat{F}(z_{i}|\xi _{i})$. Assume that z and $\xi$ satisfy the conditions in Appendix 1, and moreover that z is bounded a.s. within $\mathcal A\subset \mathbb R_{+}$. We have $H(\tilde{X}^{*})\overset{p}{\rightarrow }H^{*}$ as $K,N\rightarrow \infty$, where N is the observation size in building the QRF and K is the scenario size in SAA.

For proof of Theorem 1 see Appendix 1.

4.2 Robust optimization

Robust optimization (RO) provides an alternative paradigm to account for the uncertainty in the surgery duration. In the RO paradigm, we replace the stochasticity with a so-called uncertainty set or ambiguity set, which is a deterministic set that, intuitively, captures the likely realization of the surgery duration (other interpretations are possible, e.g., [5], page 33 discussion point B).

Formulation Eq. 4 has resemblance with single-server queues, for which [2] has introduced an RO approach to estimate relevant quantities. Following their framework, we introduce $\underline{\Gamma _{k}}$ and $\overline{\Gamma _{k}}$ and assert that the surgery times belong to the uncertainty set

$$\begin{aligned} \tilde{\mathcal {U}} = \left\{ (z_{1}, z_{2}, ..., z_{n}) \biggm | \underline{\Gamma _{k,i}} \le \sum _{j=k}^{i-1} z_{j} \le \overline{\Gamma _{k,i}} \text { ,} \quad \forall k < i \le n \right\} \end{aligned}$$

(10)

The reason why we consider a set on $\sum _{j=k}^{i-1} z_{j}$, instead of other possible candidates (e.g., merely $z_{k}$ themselves), is that the objective function, upon rewriting, depends explicitly on these quantities which gives rise to easy bounds. However, this convenient choice can be plausibly replaced by others, which result in convex optimization problems that can still be handled by standard solvers.

RO considers

$$\begin{aligned} \min _{x \in \mathcal {X}} \max _{z \in \tilde{\mathcal {U}}} \left( \sum _{i=2}^{n} w_{i} + \phi l\right) \end{aligned}$$

(11)

where $w_{i}$ and l satisfy Eqs. 1 and 2. Note that the waiting time of the ith patient can be expressed as

$$\begin{aligned} w_{i} = \max \{w_{i-1} + z_{i-1} - x_{i-1}, 0\} = \max _{1 \le k < i} \left( \sum _{j=k}^{i-1}(z_{j} - x_{j}), 0\right) \end{aligned}$$

(12)

and it is also trivial to see that $l=w_{n+1}$. Putting these into Eq. 11 gives

$$\begin{aligned}&\min _{x \in \mathcal {X}} \max _{z \in \mathcal {U}} \left( \sum _{i=2}^{n} \max _{1 \le k< i} \left( \sum _{j=k}^{i-1}(z_{j} - x_{j}), 0\right) \right. \nonumber \\&\left. + \phi \max _{1 \le k < n+1} \left( \sum _{j=k}^{n}(z_{j} - x_{j}), 0\right) \right) . \end{aligned}$$

(13)

Switching the order of maximizations, Eq. 13 is upper bounded by

$$\begin{aligned}&\min _{x \in \mathcal {X}} \left( \sum _{i=2}^{n} \max _{1 \le k< i} \max _{z \in \mathcal {U}} \left( \sum _{j=k}^{i-1}(z_{j} - x_{j}), 0\right) \right. \nonumber \\&\left. + \phi \max _{1 \le k < n+1} \max _{z \in \mathcal {U}} \left( \sum _{j=k}^{n}(z_{j} - x_{j}), 0\right) \right) . \end{aligned}$$

(14)

The innermost maximization can be easily seen to be attained at the upper bounds imposed in $\mathcal {U}$, so that Eq. 14 is equivalent to

$$\begin{aligned}&\min _{x \in \mathcal {X}} \left( \sum _{i=2}^{n} \max _{1 \le k< i} \left( \overline{\Gamma }_{k,i-1} - \sum _{j=k}^{i-1} x_{j}, 0\right) \right. \nonumber \\&\left. + \phi \max _{1 \le k < n+1} \left( \overline{\Gamma }_{k,n} - \sum _{j=k}^{n} x_{j}, 0\right) \right) . \end{aligned}$$

(15)

Now, denoting $Q_{i} = \max _{1 \le k < i} \left( \overline{\Gamma }_{k,i-1} - \sum _{j=k}^{i-1} x_{j}, 0\right)$, then Eq. 15 can be reformulated into the following linear program

$$\begin{aligned} &\min _{x \in \mathcal {X}} \quad \sum _{i=2}^{n} Q_{i} + \phi Q_{n+1}\nonumber \\& s.t. \quad Q_{i} \ge \overline{\Gamma }_{k,i-1} - \sum _{j=k}^{i-1} x_{j} & \begin{aligned} \\& i = 2, ... ,n+1 \\ & 1 \le k < i \end{aligned} \nonumber \\& \qquad Q_{i} \ge 0 & i= 2, ... , n+1. \end{aligned}$$

(16)

The question remains how to calibrate $\overline{\Gamma }_{k,i}$’s. Using the idea of data-driven RO (e.g., [8]), one can set $\overline{\Gamma }_{k,i}$’s so that

$$\begin{aligned} \hat{P}\left( \underline{\Gamma }_{k,i} \le \sum _{j=k}^{i-1} z_{j} \le \overline{\Gamma }_{k,i} \text { ,} \forall k < i \le n \Bigg |\xi _{1},\ldots ,\xi _{n}\right) \ge 1-\delta , \end{aligned}$$

(17)

where $\hat{P}(\cdot |\xi _{1},\ldots ,\xi _{n})$ refers to the probability under $\prod _{i=1}^{n}\hat{F}(z_{i}|\xi _{i})$. Notice that in fact only $\overline{\Gamma }_{k,i}$’s are used; the $\underline{\Gamma }_{k,i}$’s can be dropped by adopting our subsequent analysis to a one-sided bound instead of two-sided in a straightforward manner. The guarantee of Eq. 17 can be translated to the optimal value of Eq. 11 and hence Eq. 16 provides an upper bound to

$$\begin{aligned} \min _{x \in \mathcal {X}}\min \{q: \hat{P}\left( f(X,Z) \le q|\xi _{1},\ldots ,\xi _{n}\right) \ge 1-\delta \} \end{aligned}$$

Namely, the optimal $1-\delta$ quantile of f(X, Z) under $\prod _{i=1}^{n}\hat{F}(z_{i}|\xi _{i})$. Theorem 2 below details a more elaborate version of this claim, taking into account the Monte Carlo noises that we discuss next.

To find $\underline{\Gamma }_{k,i}$ and $\overline{\Gamma }_{k,i}$, we find $\hat{\Gamma }$ such that

$$\begin{aligned}&\hat{P}\left( \left| \frac{\sum _{j=k}^{i-1} z_{j}-\sum _{j=k}^{i-1} \hat{\mu }_{j}}{\sqrt{\sum _{j=k}^{i-1}{\hat{\sigma }_{j}^{2}}}}\right| \le \hat{\Gamma }\text {,} \forall k < i \le n \Bigg |\xi _{1},..,\xi _{n}\right) \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \ge 1-\delta \end{aligned}$$

(18)

where $\hat{\mu }_{j}$ and ${\hat{\sigma }_{j}^{2}}$ are the mean and variance from $\hat{F}(\cdot |\xi _{j})$. The left hand side of the expression inside the probability in Eq. 18 is a centered and normalized version of $\sum _{j=1}^{i-1}z_{j}$ that appears often in the central limit theorem. The choice of $\hat{\Gamma }$ in Eq. 18 then implies that one can choose

$$\begin{aligned} \underline{\Gamma }_{k,i}=\sum _{j=k}^{i-1} \hat{\mu }_{j}-\hat{\Gamma }\sqrt{\sum _{j=k}^{i-1}{\hat{\sigma }_{j}^{2}}} \end{aligned}$$

and

$$\begin{aligned} \overline{\Gamma }_{k,i}=\sum _{j=k}^{i-1} \hat{\mu }_{j}+\hat{\Gamma }\sqrt{\sum _{j=k}^{i-1}{\hat{\sigma }_{j}^{2}}} \end{aligned}$$

(19)

Now, to find the $\hat{\Gamma }$ that satisfies Eq. 18, we can use quantile estimation (e.g., [22]). Simulate, say K, i.i.d. copies of Z, and for each Z one can calculate

$$\begin{aligned} \max _{k<i\le n}\left| \frac{\sum _{j=k}^{i-1} z_{j}-\sum _{j=k}^{i-1} \hat{\mu }_{j}}{\sqrt{\sum _{j=k}^{i-1}{\hat{\sigma }_{j}^{2}}}}\right| \end{aligned}$$

(20)

Find the $\lfloor K(1-\delta )\rfloor$-th order statistic of the K copies of Eq. 20, call it $\tilde{\Gamma }$. When K is large, $\tilde{\Gamma }$ will roughly satisfy Eq. 18. Note that, desirably, this calibration approach based on quantile estimation does not require a large n in Eq. 18, which could actually be relatively small in applications.

Using the above formulations and procedure gives the following:

Theorem 2

Assume that z and $\xi$ satisfy the conditions in Appendix 1. Let $\tilde{H}^{*}$ be the optimal value of formulation Eq. 16, with $\overline{\Gamma }_{k,i}$ calibrated from Eq. 19 and $\hat{\Gamma }$ approximated by $\tilde{\Gamma }$, the $\lfloor K(1-\delta )\rfloor$-th order statistic of Eq. 20 among the K samples of Z generated under $\prod _{i=1}^{n}\hat{F}(z_{i}|\xi _{i})$. Assume that the true conditional distribution $F(z|\xi )$ is continuous and strictly increasing in z. Let $H^{*}$ be the optimal value for $\min _{x \in \mathcal {X}}\min \{q: P\left( f(X,Z)\le q|\xi _{1},\ldots ,\xi _{n}\right) \ge 1-\delta \}$. Then

$$\begin{aligned} \liminf _{N\rightarrow \infty ,K\rightarrow \infty }\tilde{H}^{*}\ge H^{*} \end{aligned}$$

The proof of Theorem 2 is in Appendix 1.

Note $\min _{x \in \mathcal {X}}\min \{q: P\left( f(X,Z)\le q|\xi _{1},\ldots ,\xi _{n}\right) \ge 1-\delta \}$ means minimizing the $(1-\delta )$-quantile of f(X, Z). Theorem 2 stipulates that the RO formulation can be viewed as a conservative approximation of this optimization. Also, note that when the continuity assumption of $F(z|\xi )$ is removed, we can modify the above procedure slightly by inflating our obtained $\tilde{\Gamma }$ by an arbitrarily small constant, i.e., we use $\tilde{\Gamma }+\epsilon$ for some small $\epsilon$, and our argument (detailed in the proof) will carry through to get the same guarantee in Theorem 2.

4.3 Distributionally robust optimization

We next consider distributionally robust optimization (DRO). This approach targets expected value objective function, like in the case of SAA, but under only partial information of the distributions. More specifically, it optimizes the worst-case expected value among all distributions that are in an uncertainty set or an ambiguity set which represents the partial information. In our scheme, we assume the quantiles of each patient’s surgery duration distribution at a list of given probability levels are known. These information pieces can be drawn from the QRF, achieving individualization.

The motivation for using DRO is that its solution can be more robust to some hidden uncertainty. In our circumstance, for instance, imposing enough quantile information means we know the distribution of each patient’s surgery duration distribution, but we do not assume any dependency structure among the patients. The DRO solution is thus robust against this hidden stochasticity that is not revealed by the individualization, which focuses only on the prediction for each patient.

More concretely, for each patient i, we choose a sequence $q_{i1}<q_{i2}<\cdots <q_{im}$, and set $r_{ij}=\hat{F}(q_{ij}|\xi _{i})$ which can be inspected from the QRF. Roughly speaking, $q_{ij}$ is the $r_{ij}$-th quantile of the duration distribution given $\xi _{i}$. For convenience, we assume that $r_{im}=1$, so that the $q_{im}$ is the upper limit of the support of the data. We consider the uncertainty set

$$\begin{aligned} \hat{\mathcal U}=\{P\in \mathcal P:P(z_{i}\le q_{ij})=r_{ij},\ i=1,..,n, j=1,..,m\} \end{aligned}$$

(21)

where $\mathcal P$ is the set of all probability distributions supported on $\mathbb R_{+}^{n}$. Here, the constraints indicate that the marginal $r_{ij}$-th quantiles for patient i are known to be $q_{ij}$ under the QRF. They also include the information that the largest possible value of $\hat{F}(z|\xi _{i})$ is estimated to be $q_{im}$.

We seek to solve

$$\begin{aligned}&\min _{x \in \mathcal {X}} \max _{P \in \hat{\mathcal {U}}} E_{P} \big [f(X,Z)\big ]. \end{aligned}$$

(22)

We approach Eq. 22 using the technique in [35], which first replaces f(X, Z) as the optimal value of a linear program (LP) given by

$$\begin{aligned} \begin{aligned} \max _{y} \quad&\sum ^{n}_{i=1} (z_{i} - x_{i})y_{i}\\ s.t. \quad&y_{i} - y_{i-1} \ge -1&&\qquad 2\le i \le n \\&y_{n} \le \phi \\&y_{i} \ge 0&&\qquad \forall i=1,...,n. \end{aligned} \end{aligned}$$

(23)

This can be derived from the definition of $w_{i}$ and l in Eqs. 1 and 2, and considering the dual of the resulting LP, where $y_{1},\ldots ,y_{n}$ are the dual variables. A slight change of notation to (23) also leads to the dual formulation (see Appendix 1 ) when considering idle times. Then problem Eq. 22 can be rewritten as

$$\begin{aligned} \min _{x \in \mathcal {X}} \max _{P \in \hat{\mathcal {U}}} E_{P} \big [\max _{y \in \Omega } \sum ^{n}_{i=1} (z_{i} - x_{i})y_{i} \big ], \end{aligned}$$

(24)

where $\Omega$ refers to the constraint set in Eq. 23, and $y=(y_{1},\ldots ,y_{n})$. We now focus on the inner maximization problem, $\max _{P \in \hat{\mathcal {U}}} E_{P} \big [\max _{y \in \Omega } \sum ^{n}_{i=1} (z_{i} - x_{i})y_{i} \big ]$. The following lemma transforms it into a more manageable form:

Lemma 1

The dual representation of the optimization $\max _{P \in \hat{\mathcal {U}}} E_{P} \big [\max _{y \in \Omega } \sum ^{n}_{i=1} (z_{i} - x_{i})y_{i} \big ]$ is

$$\begin{aligned}&\min _{\rho } \max _{y \in \Omega } \sum ^{n}_{i=1} \max _{j=1,\ldots ,m}\bigg \{(q_{ij} - x_{i})y_{i} - \sum ^{m}_{k=j} \rho _{ik} \bigg \}\nonumber \\&+ \sum ^{n}_{i=1} \sum ^{m-1}_{j=1} r_{ij} \rho _{ij} + \sum ^{n}_{i=1} \rho _{im}, \end{aligned}$$

(25)

where $\rho _{11},\ldots ,\rho _{nm}$ are the dual variables corresponding to the quantile constraints.

For proof of Lemma 1 see Appendix 1.

By Lemma 1, problem Eq. 22 can be represented by

$$\begin{aligned}&\min _{x \in \mathcal {X}, \rho } \max _{y \in \Omega } \sum ^{n}_{i=1} \max _{j=1,\ldots ,m}\bigg \{(q_{ij} - x_{i})y_{i} - \sum ^{m}_{k=j} \rho _{ik} \bigg \} \nonumber \\&+ \sum ^{n}_{i=1} \sum ^{m-1}_{j=1} r_{ij} \rho _{ij} + \sum ^{n}_{i=1} \rho _{im}. \end{aligned}$$

(26)

We want to transform Eq. 26 to a linear program. Note that $\max _{y \in \Omega } \sum ^{n}_{i=1} \max _{j=1,\ldots ,m}\bigg \{(q_{ij} - x_{i})y_{i} - \sum ^{m}_{k= j} \rho _{ik} \bigg \}$ is a convex maximization problem over the polyhedron $\Omega$, and it suffices to consider its extreme points for an optimal solution. This allows us to follow the method discussed in Proposition 2 of [35] to reduce it to solving an LP relaxation of an integer program using the following proposition.

Proposition 1

Problem Eq. 22 can be reformulated as the following linear program

$$\begin{aligned} &\min _{x,\rho ,\lambda } \quad \sum ^{n}_{i=1} \lambda _{i} + \sum ^{n}_{i=1} \sum ^{m-1}_{j=1} r_{ij} \rho _{ij} + \sum ^{n}_{i=1} \rho _{im}\nonumber \\ & s.t. \quad \mu _{io} \ge \bigg \{q_{ij}\pi _{io} - \sum ^{m}_{k= j} \rho _{ik}\bigg \} & \begin{aligned} \\ \\ & 1\le i \le n \\& i \le o \le n+1 \\& 1 \le j \le m \end{aligned} \nonumber \\& \quad \sum ^{min\{o,n\}}_{i=g} \mu _{io} \le \sum ^{min\{o,n\}}_{i=g} x_{i}\pi _{io} + \lambda _{i} & \begin{aligned} \\ &1\le g \le n \\& g\le o \le n+1\end{aligned} \\& \qquad \sum ^{n}_{i=1} x_{i} \le T\nonumber \\ & \qquad x_{i} \ge 0 & 1\le i \le n, \end{aligned}$$

(27)

where $(\lambda _{1},...,\lambda _{n})$ are the dual variables, $\pi _{io}=y_{i}$ for all $i \in [g,o]$ and any $g\le o$. Also, $\mu _{io}= \max _{j=1,\ldots ,m} \{q_{ij}\pi _{io} - \sum ^{m}_{k = j} \rho _{ik}\}$ for $o=1,...,n+1$.

For proof of Proposition 1 see Appendix 1.

The following shows that the DRO under the quantile information drawn from the QRF gives an asymptotic bound for the true expected value:

Theorem 3

Assume that z and $\xi$ satisfy the conditions in Appendix 1. Suppose that the true conditional distribution function $F(z|\xi )$ is continuous and strictly increasing in z. Let $H^{*}$ be the optimal value of Eq. 3 with objective being Eq. 5, and $\hat{H}^{*}$ be the optimal value of Eq. 27. Then for any $\epsilon >0$, we have

$$\begin{aligned} P\left( H^{*}\le \hat{H}^{*}+\epsilon \right) \rightarrow 1 \end{aligned}$$

(28)

as $N\rightarrow \infty$, where N is the data size in constructing the QRF.

For proof of Theorem 3 see Appendix 1.

Note that Theorem 3 does not require strong duality of the dual formulation in Eq. 25, as the upper bound in Theorem 3 can be obtained as long as Eq. 25 upper bounds its primal moment problem. However, we do need strong duality of an associated moment problem when the quantiles are set to be the true quantities, and this property is guaranteed by the continuity and strict monotonicity of $F(z|\xi )$.

Finally, also note that, unlike SAA and RO discussed before, our DRO approach does not require running simulation from the QRF.

5 Numerical study

Table 1 Features (covariates) in the MSKCC dataset grouped into four main categories that were used for training the predictive models

Quantile regression forests for individualized surgery scheduling

Abstract

Similar content being viewed by others

Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data

Predicting Procedure Duration to Improve Scheduling of Elective Surgery

Variational Bayes machine learning for risk adjustment of general outcome indicators with examples in urology

1 Introduction

2 The surgery scheduling model

3 Conditional distributions and quantile regression forests

4 Individualized optimization under uncertainty

4.1 Expected value optimization and sample average approximation

Theorem 1

4.2 Robust optimization

Theorem 2

4.3 Distributionally robust optimization

Lemma 1

Proposition 1

Theorem 3

5 Numerical study

5.1 Test setup

5.2 Numerical results

5.3 Sensitivity analysis of T

5.4 Insights

6 Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Appendices

A: Proofs

1.1 A. 1: Proof of Theorem 1

1.2 A.2: Proof of Theorem 2

1.3 A.3: Proof of Lemma 1

1.4 A.4: Proof of Proposition 1

1.5 A.5: Proof of Theorem 3

1.6 A.6: QRF consistency conditions

Condition 1

Condition 2

Condition 3

Condition 4

Condition 5

Theorem 4

1.7 A.7: Idle time

1.8 A.8: Dual formulation for idle time

B: Supplementary results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation