Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

The Receiver Operating Characteristic (ROC) analysis has been widely used as tools for assessing the discriminant performance for biomarkers. Based on a univariate or combined-to-univariate marker, the ROC curve is known as a plot of the true positive rate versus the false positive rate for each possible cut point, for summarizing sensitivity and specificity of a binary classifier system when marker measurements are continuous. In nonparametric, semiparametric or parametric models, the ROC curve and its associated measures such as area under curve (AUC) or partial area under curve (pAUC) have been used as useful indices for evaluating the predictive accuracy of markers or diagnostic tests [17]. In statistical literature, different measures have been developed to summarize and compare the predictive accuracy of biomarkers ([2, 8] among others).

This paper considers situations when multiple markers \((M_{1},M_{2},\ldots,M_{k})\) are available for classification of disease state. The research interest is to establish criterion and tools for assessing predictive accuracy based on multivariate markers or multivariate test measurements, \((M_{1},M_{2},\ldots,M_{k})\), from observed data or a training data set. The proposed work includes at least two types of applications: (i) to quantify the result of dual or multiple readings from a single diagnostic test, or readings from multiple tests; (ii) to evaluate the predictability of combined multiple markers for a disease, where each marker characterizes a specific biological function for the disease. For the first type of applications, (i), multiple reading is employed for either reducing uncertainty of test classification or comparison of multiple diagnostic modalities [9, 15]. Applications of the second type, (ii), are important when multivariate markers are used as prognostic measurements for predicting or understanding the disease.

To analyze multiple marker data, several approaches have been developed to handle the correlation structure of marker measurements for different research goals. The most common approach is perhaps to combine multiple markers into a single composite score using logistic regression model, and evaluate the predictability of markers by the one-dimensional composite score [14]. For high-dimensional markers, or when markers come from different biological sources, it may not be analytically appropriate to combine the markers into a composite score and, in such situations, the tree-based regression model could serve as a good alternative for identifying a classification rule. The tree-based classification method is sometimes referred to as recursive partitioning, which is frequently used in data mining, machine learning and clinical practice as a predictive model [3, 22]. For example, Baker [1] and Etzioni et al. [6] considered discretized markers by keeping the marker values in multi-dimensional settings and proposed new definitions for ROC curves.

When markers are continuous, Jin and Lu [13] considered bivariate markers and proposed to use the area under the upper boundary of ROC region to evaluate diagnostic utilities. Jin and Lu’s work can be viewed as an extension of Baker’s approach [1] from discrete markers to continuous markers. Wang and Li [21] defined an ROC function for bivariate continuous markers via generalized inverse set of the quantile function FP, where the ROC function possesses a conditional expectation expression. In this paper, we generalize Wang and Li’s results from bivariate marker to multivariate marker setting, and develop methods and inference for ROC analysis.

Assume a k-dimensional marker vector \((M_{1},M_{2},\ldots,M_{k})\) is available and the disease state is determined by a sequence of arbitrarily combined and-or classifier with positivity specified in either direction of marker values; for example, \(I((M_{1} \geq m_{1}\mbox{ or }M_{2} < m_{2})\mbox{ and }(M_{3} < m_{3}\mbox{ or }M_{4} \geq m_{4}))\). This extension links to potential applications related to classification tree with binary decision diagrams. The research interest is to establish criterion and tools for assessing predictive accuracy based on multivariate markers, \((M_{1},M_{2},\ldots,M_{k})\). Specifically, the ROC function is extended from univariate case to multivariate case, and a weighted ROC (WROC) function is introduced for examining the performance of predictive accuracy with arbitrarily combined and-or classifiers.

Let \((M_{l1},M_{l2},\ldots,M_{lk})\), l = 0, 1, be the marker vector for a non-diseased or diseased subject. Let the arbitrarily combined and-or classifier be expressed as \(I\{(M_{l1},M_{l2},\ldots,M_{lk})\) \(\in D(m_{1},m_{2},\ldots,m_{k})\}\) with \(D(m_{1},m_{2},\ldots,m_{k}) \subseteq {R}^{k}\) defined as the region for marker-based positivity. To simplify notation and formulation, hereafter we shall use bold face \(\mathbf{m}\) to represent the vector \((m_{1},m_{2},\ldots,m_{k})\), and let \({\mathbf{m}_{l}}\) and \({\mathbf{M}_{l}}\), l = 0, 1, represent the vectors \((m_{l1},m_{l2},\ldots,m_{lk})\) and \((M_{l1},M_{l2},\ldots,M_{lk})\). Define the false and true positive rates respectively as

$$\displaystyle\begin{array}{rcl} FP(\mathbf{m}) = P\{{\mathbf{M}_{0}} \in D(\mathbf{m})\}\ , \ \ \ \ TP(\mathbf{m}) = P\{{\mathbf{M}_{1}} \in D(\mathbf{m})\}& & {}\\ \end{array}$$

The research interest is to extend rules and tools from univariate marker to multivariate marker setting for assessment of predictive accuracy of markers.

Using the US Alzheimer’s Disease Neuroimaging Initiative (ADNI) data set as an example, the biomarkers of interest include measurements from different biological systems related to neuroimaging, genetics, CSF (Cerebrospinal fluid) and cognition. As the k markers are identified from different biological sources, it may not be appropriate to combine them using, say, a linear combination of the measurements. The and-or classifier also signifies the importance of interaction between markers. For example, using an Alzheimer’s Disease study that the authors are currently involved (the BIOCARD study at Johns Hopkins School of Medicine), decreases in CSF Amyloid beta-42 and/or increases in total tau or phosphorylated-tau (p-tau) are hypothesized as strong predictors for AD or AD-related symptoms. It would be interesting to keep the k markers in multivariate setting and explore their respective roles and interaction nonparametrically.

The paper is organized as follows. Section “Univariate Marker Case” briefly reviews some of the fundamental definitions and properties for univariate ROC analysis, where emphasis is placed on those which will be extended to multivariate setting. In sections “Multivariate Markers: ROC, WROC and AUC” and “Other Types of ROC and WROC Functions”, a set of ROC and ROC-related functions are introduced with discussion focused on contrasting features between univariate and multivariate cases. Section “Nonparametric Estimation” considers nonparametric estimators for ROC-related functions, AUC and concordance probabilities. Simulation and a real data analysis are presented in section “Simulation and Data Example” to illustrate the applicability of the proposed procedures. Section “Discussion” concludes the paper with a brief discussion.

Univariate Marker Case

In the section we consider the univariate marker case, k = 1. Suppose the disease outcome D takes binary values 0 or 1, and M is a continuous marker variable. Let M 0 and M 1 respectively represent the marker variable from non-diseased (D = 0) and diseased (D = 1) group. Define \(TP(m) = P(M_{1} > m) = P(M > m\vert D = 1)\) as the true positive rate (sensitivity), and \(FP(m)=P(M_{0}>m) = P(M>m\vert D = 0)\) the false positive rate (1− specificity). Assume M 0 and M 1 are independent. Define \(F_{0}(m) = 1 - FP(m)\) and \(F_{1}(m) = 1 - TP(m)\) respectively as the cumulative distribution function of M 0 and M 1.

There are multiple ways to define the ROC function for a univariate marker. A mathematically simple definition \(ROC(q) = TP[{FP}^{-1}(q)]\), q ∈ [0, 1], evaluates the magnitude of true positive rate at controlled false positive rate through inverse functional mapping between FP and TP. The comparison of two ROC functions from different markers should thus be interpreted as the comparison of TP values with the same FP rate. The partial area under ROC curve for false positive rate less than p, 0 ≤ p ≤ 1, is defined as AUC(p) = ∫ I(0 ≤ q ≤ p)ROC(qdq. The area under ROC curve is defined as the total area with the FP rate ranging from 0 to 1, that is, AUC(1). Define the partial concordance probability as \(CON(p) = P(M_{1} > M_{0},FP(M_{0}) \leq p).\) For univariate marker model, the quantile variable \(Q_{0} = FP(M_{0})\) is Uniform[0, 1] distributed and thus CON(p) can be calculated using probability measure on \((M_{1},Q_{0})\) and is simplified to

$$\displaystyle\begin{array}{rcl} CON(p)& =& P(M_{1} >{ FP}^{-1}(Q_{ 0}),Q_{0} \leq p) =\int _{ 0}^{p}\int I(m_{ 1} >{ FP}^{-1}(q))\ dF_{ 1}(m_{1})\ dq {}\\ & =& \int _{0}^{p}ROC(q)dq = AUC(p) {}\\ \end{array}$$

Thus, an alternative way to define ROC(p) is to obtain it as the derivative of the partial concordance probability with respect to p, namely ROC(p) = CON  (p). By definition, CON(p) can also be expressed as

$$\displaystyle\begin{array}{rcl} CON(p)& =& \int \int I(m_{1} > m_{0})I(FP(m_{0}) \leq p)\ dF_{1}(m_{1})dF_{0}(m_{0}){}\end{array}$$
(1)

The equivalence between CON(p) and AUC(p) has led to development of nonparametric approaches for estimating AUC(p) using the formula in (1). Dodd and Pepe [4] showed that the partial area under curve possesses a concordance probability expression: Let \(p_{0}^{{\ast}} = FP({TP}^{-1}(p_{0}))\) and assume \(p_{0}^{{\ast}} < p_{1}\), then

$$\displaystyle{ \int I(p_{0}^{{\ast}}\leq q < p_{ 1})ROC(q)dq = P(M_{1} > M_{0}\,{FP}^{-1}(p_{ 1}) < M_{0} \leq T{P}^{-1}(p_{ 0})) }$$
(2)

Thus, the partial concordance probability coincides with the partial AUC restricted to the interval that false positive rate less than p 1 and true positive rate greater than p 0. As proposed by Dodd and Pepe [4], by plugging the empirical distributions of M 0 and M 1 into (1) and (2), the partial area-under-curve can be estimated by nonparametric U-statistics. The above properties will be extended to multivariate marker case for further analytical developments.

An alternative approach can be adopted by reversing the roles of true and false positive rates to define a function similar to the ROC function:

$$\displaystyle{{ ROC}^{\;{\ast}}(q) = FP[{TP}^{-1}(q)],\ \ q \in (0,1) }$$
(3)

By property of composite function, it is seen that

$$\displaystyle{{ ROC}^{\;{\ast}}(q) ={ ROC}^{\;-1}(q) }$$
(4)

Clearly, since the mapping ROC(q) is one-to-one, the function ROC  ∗(q) consists the same amount of information as that of ROC(q). Graphically, ROC(q) and ROC  ∗(q) are symmetric with respect to the diagonal line which connects points (0, 0) and (1, 1). Thus, \(ROC(q) +{ ROC}^{\;{\ast}}(1 - q) = 1\) and the sum of area under ROC curve and area under ROC  ∗ curve equals 1. In section “Other Types of ROC and WROC Functions”, for multivariate marker model, a function parallel to ROC  ∗(q) will be introduced and some interesting relationships similar to or different from those of univariate maker case will be explored.

Multivariate Markers: ROC, WROC and AUC

Now consider continuous markers and classification rule in multivariate setting. Suppose \({\mathbf{M}_{0}}\) and \({\mathbf{M}_{1}}\) are independent k-dimensional marker vectors from non-diseased group (D = 0) and diseased group (D = 1) respectively. Define

$$\displaystyle{FP(\mathbf{m}) = P\{{\mathbf{M}_{0}} \in D(\mathbf{m})\}\,}$$
$$\displaystyle{TP(\mathbf{m}) = P\{{\mathbf{M}_{1}} \in D(\mathbf{m})\}\ .}$$

Let \(F_{0}(\mathbf{m}) = P(M_{01} \leq m_{1},M_{02} \leq m_{2},\ldots,M_{0k} \leq m_{k})\) be the cumulative distribution function for non-diseased population, and \(F_{1}(\mathbf{m}) = P(M_{11} \leq m_{1},M_{12} \leq m_{2},\ldots,M_{1k} \leq m_{k})\) the cumulative distribution function for diseased population. Define the quantile variable \(Q_{0} = FP({\mathbf{M}_{0}})\) and denote by H 0 the distribution function of Q 0. As an important feature of multivariate markers, in general Q 0 is not uniformly distributed. The distribution of Q 0 depends on the classifier as well as the probability structure of \({\mathbf{M}_{0}}\), and therefore varies from marker vector to marker vector.

Definition of ROC Function

When marker measurements are multivariate, the function \(FP({\mathbf{M}_{0}})\) is not a one-to-one transformation, which implies that the ROC function for univariate marker, TP(FP −1(q)), can not be used for multivariate marker case. Wang and Li [21] considered bivariate marker models and defined an ROC function via generalized inverse set of the quantile function FP, where the ROC function possesses a conditional expectation expression. For multivariate markers, instead of using the generalized inverse set to conceptualize the ROC function, the ROC function is defined as the average of the true positive rate conditioning on the set of marker values with false positive rate q, where the conditional average is calculated subject to the non-diseased population:

$$\displaystyle{ ROC(q) =\mathrm{ E}[TP({\mathbf{M}_{0}})\ \vert \ FP({\mathbf{M}_{0}}) = q\ ] }$$
(5)

There are a few characteristics of ROC(q) in (5), which may or may not be similar to characteristics of the ROC function for univariate marker:

  • The value of the ROC function in (5) is bounded between 0 and 1.

  • The function ROC(q) may not be an increasing function in q, 0 ≤ q ≤ 1.

  • If the distributions of \({\mathbf{M}_{0}}\) and \({\mathbf{M}_{1}}\) are the same (i.e., the marker vector is non-predictive for disease), then for each Borel set \(D(m_{1},m_{2},\ldots,m_{k})\), one has \(TP(m_{1},m_{2},\ldots,m_{k}) = FP(m_{1},m_{2},\ldots,m_{k})\). This implies \(TP({\mathbf{M}_{0}}) = FP({\mathbf{M}_{0}})\) with probability one and

    $$\displaystyle{\mathrm{E}[TP({\mathbf{M}_{0}})\ \vert \ FP({\mathbf{M}_{0}}) = q\ ] = q.}$$

    Thus, if the markers are non-predictive for disease, the ROC function coincides with the diagonal line which connects points (0, 0) and (1, 1), which is similar to the ROC function for univariate marker.

  • When the markers are predictive subject to the classifier \(D(m_{1},m_{2},\ldots,m_{k})\), it means that \(TP(m_{1},m_{2},\ldots,m_{k}) \geq FP(m_{1},m_{2},\ldots,m_{k})\) for each \((m_{1},m_{2},\ldots,m_{k}) \in {R}^{k}\), and this implies \(TP({\mathbf{M}_{0}}) \geq FP({\mathbf{M}_{0}})\) with probability one and

    $$\displaystyle\begin{array}{rcl} ROC(q) =\mathrm{ E}[TP({\mathbf{M}_{0}})\ \vert \ FP({\mathbf{M}_{0}}) = q\ ] \geq \mathrm{ E}[FP({\mathbf{M}_{0}})\ \vert \ FP({\mathbf{M}_{0}}) = q\ ] = q,& & {}\\ \end{array}$$

    for 0 ≤ q ≤ 1, Thus, the ROC function is above the diagonal line if the markers are predictive for disease.

WROC and AUC

In use of the ROC function, a question of interest is whether the function in (5) can be used for comparisons of markers’ predictive accuracy at population level. To address the question, we recall that for univariate marker the area under ROC curve is calculated with uniform distribution on q-axis (i.e., FP-axis). For multivariate markers, the ROC function defined in (5) can be used to compare the performance of true positive rate locally by conditioning on \(FP({\mathbf{M}_{0}}) = q\). To evaluate multivariate markers’ predictability unconditionally, the evaluation should take into account the distribution of Q 0 besides the use of the conditionally defined ROC function.

Using the probability distribution of Q 0, the AUC can be naturally defined as the area under ROC curve subject to Lebesgue integration with measure H 0 on q-axis, namely AUC = ∫ ROC(q)dH 0(q), or equivalently,

$$\displaystyle{ AUC =\int _{ 0}^{1}ROC(q) \cdot h_{ 0}(q)\ dq }$$
(6)

where h 0(q) is the derivative of H 0(q), which is assumed to exist. Define

$$\displaystyle{WROC(q) = ROC(q) \cdot h_{0}(q)}$$

as the weighted ROC (WROC) function. Note that WROC(q) is the unconditional average of the true positive rate with fixed false positive rate q:

$$\displaystyle{ WROC(q) =\mathrm{ E}[TP({\mathbf{M}_{0}})I(FP({\mathbf{M}_{0}}) = q)]\ . }$$
(7)

It is seen that AUC is interpreted as area under WROC curve with uniform measure over the unit interval [0, 1]. Subsequently, the partial area under WROC curve can be defined as

$$\displaystyle{ AUC(p) =\int _{ 0}^{p}WROC(q)dq, }$$
(8)

which can be used for comparison of markers in terms of their population-average predictability.

The concordance probability is naturally defined as \(CON = P({\mathbf{M}_{1}} \in D({\mathbf{M}_{0}}))\). Next we prove the equivalence between the concordance probability and the area under WROC curve, which is an extension of a property for univariate marker [4]:

$$\displaystyle\begin{array}{rcl} CON& =& P({\mathbf{M}_{1}} \in D({\mathbf{M}_{0}})) =\int \int I({\mathbf{m}_{1}} \in D({\mathbf{m}_{0}}))\ dF_{1}({\mathbf{m}_{1}})dF_{0}({\mathbf{m}_{0}}) \\ & =& \int TP({\mathbf{m}_{0}})\ dF_{0}({\mathbf{m}_{0}}) =\int _{ 0}^{1}\mathrm{E}[TP({\mathbf{M}_{ 0}})\ \vert \ Q_{0} = q] \cdot h_{0}(q)dq \\ & =& \int _{0}^{1}WROC(q)dq = AUC {}\end{array}$$
(9)

With an additional constraint on the false positive rate p, 0 ≤ p ≤ 1, the partial concordance probability can be expressed as

$$\displaystyle\begin{array}{rcl} CON(p) = P({\mathbf{M}_{1}} \in D({\mathbf{M}_{0}}),FP({\mathbf{M}_{0}}) \leq p)\,& & {}\\ \end{array}$$

where the full concordance probability corresponds to the special case p = 1. The partial concordance probability is

$$\displaystyle\begin{array}{rcl} CON(p)& =& P({\mathbf{M}_{1}} \in D({\mathbf{M}_{0}}),FP({\mathbf{M}_{0}}) \leq p) \\ & =& \int \int I({\mathbf{m}_{1}} \in D({\mathbf{m}_{0}}))I(FP({\mathbf{m}_{0}}) \leq p)\ dF_{1}({\mathbf{m}_{1}})dF_{0}({\mathbf{m}_{0}}) \\ & =& \!\int TP({\mathbf{m}_{0}})I(FP({\mathbf{m}_{0}}) \leq p)\ dF_{0}({\mathbf{m}_{0}})\! =\!\int _{ 0}^{p}\mathrm{E}[TP({\mathbf{M}_{ 0}})\ \vert \ Q_{0} = q] \cdot h_{0}(q)dq \\ & =& \int _{0}^{p}WROC(q)dq = AUC(p) {}\end{array}$$
(10)

The equivalence between CON(p) and AUC(p) is again an extension of the result from univariate marker model to multivariate marker model. Further, with the restrictions that the false positive rate is less than or equal to p and that the true positive rate is greater than q, the formula in (10) can be extended to

$$\displaystyle\begin{array}{rcl} CON(p,q)& =& P({\mathbf{M}_{1}} \in D({\mathbf{M}_{0}}),FP({\mathbf{M}_{0}}) \leq p,TP({\mathbf{M}_{1}}) > q) {}\\ & =& \int \int I({\mathbf{m}_{1}} \in D({\mathbf{m}_{0}}))I(FP({\mathbf{m}_{0}}) \leq p,TP({\mathbf{m}_{1}}) > q)\ dF_{1}({\mathbf{m}_{1}})dF_{0}({\mathbf{m}_{0}})\,{}\\ \end{array}$$

which is a useful formula for constructing a U-statistic in estimation of the concordance probability with two-sided constraints. It is also clear that CON(p, 0) = AUC(p).

Nonparametric Estimation

Suppose the observations include independent samples of iid copies of \({\mathbf{M}_{0}}\) and iid copies of \({\mathbf{M}_{1}}\), where marker vectors are represented by \(\{{\mathbf{M}_{i,0}}: i = 1,\ldots,n_{0}\}\) and \(\{{\mathbf{M}_{j,1}}: j = 1,\ldots,n_{1}\}\), and realization values by \(\{{\mathbf{m}_{i,0}}: i = 1,\ldots,n_{0}\}\) and \(\{{\mathbf{m}_{j,1}}: j = 1,\ldots,n_{1}\}\), respectively from non-diseased and diseased populations. In this section we consider nonparametric approaches for estimation of ROC, WROC, AUC and CON. Denote by \(\widehat{TP}\), \(\widehat{FP}\), \(\hat{F}_{1}\) and \(\hat{F}_{0}\) respectively the empirical distribution of the corresponding function. For those p with \(FP({\mathbf{m}_{i,0}}) = p\), initially one can use a crude empirical estimate \(TP({\mathbf{m}_{i,0}})\) to estimate ROC(p). Or, alternatively, we can consider the ROC function in its form as a conditional expectation in (5), \(ROC(q) =\mathrm{ E}[TP({\mathbf{M}_{0}})\vert FP({\mathbf{M}_{0}}) = q]\), and construct a kernel average estimate, which can be thought of as a smoothed version of the crude empirical estimate, to estimate ROC(q):

$$\displaystyle\begin{array}{rcl} \widehat{ROC}(p) = \frac{\int \widehat{TP}({\mathbf{m}_{0}}) \cdot k(\frac{p-\widehat{FP}({\mathbf{m}_{0}})} {b} )\ d\hat{F}_{0}({\mathbf{m}_{0}})} {\int k(\frac{p-\widehat{FP}({\mathbf{m}_{0}})} {b} )\ d\hat{F}_{0}({\mathbf{m}_{0}})} = \frac{\sum _{i=1}^{n_{0}}\widehat{TP}({\mathbf{m}_{i,0}}) \cdot k(\frac{p-\widehat{FP}({\mathbf{m}_{i,0}})} {b} )} {\sum _{i=1}^{n_{0}}k(\frac{p-\widehat{FP}({\mathbf{m}_{i,0}})} {b} )} \,& & {}\\ \end{array}$$

where the kernel k(⋅ ) is a mean zero density function and b is a bandwidth [7].

Note that the ROC function in (5) is defined as the average of true positive rate given a fixed value of the false positive rate, where the calculation of the conditional expectation is through the two one-dimensional variables \(TP({\mathbf{M}_{0}})\) and \(FP({\mathbf{M}_{0}})\). Thus, the ‘curse of dimensionality’ does not occur when the ROC function is estimated nonparametrically. A nonparametric estimator of WROC(p) can be constructed by estimating the derivative of CON(p) in (10) using kernel estimation technique:

$$\displaystyle\begin{array}{rcl} \widehat{WROC}(p) = \frac{1} {b}\int \widehat{TP}({\mathbf{m}_{0}}) \cdot k(\frac{p -\widehat{ FP}({\mathbf{m}_{0}})} {b} )\,d\hat{F}_{0}({\mathbf{m}_{0}}) = \frac{1} {n_{0}b}\sum _{i=1}^{n_{0} }\widehat{TP}({\mathbf{m}_{i,0}}) \cdot k(\frac{p -\widehat{ FP}({\mathbf{m}_{i,0}})} {b} )& & {}\\ \end{array}$$

which is seen to be the same as the product of \(\widehat{ROC}(p)\) and the kernel estimate of h(p),

$$\displaystyle{\frac{1} {b}\int k(\frac{p -\widehat{ FP}({\mathbf{m}_{0}})} {b} )\,d\hat{F}_{0}({\mathbf{m}_{0}})\ .}$$

Based on the equivalence between AUC(p) and CON(p), a nonparametric estimator of AUC(p) can be obtained:

$$\displaystyle{ \widehat{AUC}(p) =\int \int I({\mathbf{m}_{1}} \in D({\mathbf{m}_{0}}))I(\widehat{FP}({\mathbf{m}_{0}}) \leq p)\ d\hat{F}_{1}({\mathbf{m}_{1}})d\hat{F}_{0}({\mathbf{m}_{0}})\ }$$
(11)

With the restriction that the false positive rate is less than or equal to p and the true positive rate greater than q, the formula in (11) can be extended to

$$\displaystyle\begin{array}{rcl} \widehat{CON}(p,q)& =& \int I({\mathbf{m}_{1}} \in D({\mathbf{m}_{0}})) \cdot I(\widehat{FP}({\mathbf{m}_{0}}) \leq p,\widehat{TP}({\mathbf{m}_{1}}) > q)\ d\hat{F}_{1}({\mathbf{m}_{1}})d\hat{F}_{0}({\mathbf{m}_{0}}) {}\\ & =& \frac{1} {n_{0}n_{1}}\sum\limits_{i=1}^{n_{0} }\sum\limits_{{ j}=1}^{n_{1} }I({\mathbf{m}_{{\rm j},1}} \in D({\mathbf{m}_{\rm i,0}})) \cdot I(\widehat{FP}({\mathbf{m}_{\rm i,0}}) \leq p,\widehat{TP}({\mathbf{m}_{\rm j,1}}) > q),\end{array}$$

where the estimator has the form of a U-statistic [12].

Theorem 1.

Let \(N = n_{0} + n_{1}\) . Assume \(0 <\lim _{N\rightarrow \infty }n_{0}/N =\lambda < 1\) . Then, for p,q ∈ [0,1], (i) \(\widehat{\mathit{CON}}(p,q)\) converges to CON(p,q) in probability as N →∞, and (ii) \(\sqrt{N}\{\widehat{\mathit{CON}}(p,q) -\mathit{CON}(p,q)\}\stackrel{d}{\rightarrow }\mathrm{Normal}(0{,\sigma }^{2})\) , where σ 2 is specified in the Appendix.

The asymptotic results require that N be large and \(0 < n_{0}/N =\lambda < 1\). This condition is generally satisfied with random sampling while disease status D could be either random or fixed, which is respectively relevant in prospective and retrospective (case-control) study. In the case D is random, N corresponds to the total sample size and n 0N converges to \(P(D = 0) =\lambda\), 0 < λ < 1, with probability 1 and the asymptotic normality holds with the usual interpretation.

Other Types of ROC and WROC Functions

Similar to considerations of using (3) in univariate marker case, for multivariate markers we may want to consider a function with the roles of true and false positive rates reversed. Define \(Q_{1} = TP({\mathbf{M}_{1}})\), and let H 1 and h 1 respectively be the distribution function and density function of Q 1. Then, similar to the structure of ROC(q), where \(ROC(q) =\mathrm{ E}[TP({\mathbf{M}_{0}})\ \vert \ FP({\mathbf{M}_{0}}) = q\ ]\), for multivariate markers we may define

$$\displaystyle{{ROC}^{\;{\ast}}(q) =\mathrm{ E}[FP({\mathbf{M}_{ 1}})\ \vert \ TP({\mathbf{M}_{1}}) = q\ ]\ .}$$

In general, as a part of the main features which distinguish the univariate and multivariate ROC inferences, the functional transformation ROC  ∗(q) is not one-to-one and therefore does not have the inverse functional relationship with ROC(q). Further define

$$\displaystyle{\overline{ROC}(q) =\mathrm{ E}[FN({\mathbf{M}_{0}})\ \vert \ TN({\mathbf{M}_{0}}) = q\ ]\ \ \mbox{ and }\ \ {\overline{ROC}}^{\;{\ast}}(q) =\mathrm{ E}[TN({\mathbf{M}_{ 1}})\ \vert \ FN({\mathbf{M}_{1}}) = q\ ]}$$

where \(FN(\mathbf{m}) = P({\mathbf{M}_{1}}\notin D(\mathbf{m}))\) is the false negative rate and \(TN(\mathbf{m}) = P({\mathbf{M}_{0}}\notin D(\mathbf{m}))\) is the true negative rate. The weighted functions corresponding to ROC  ∗, \(\overline{ROC}(q)\) and \({\overline{ROC}}^{\;{\ast}}(q)\) can be defined in such ways similar to the WROC function: for 0 < q < 1,

$$\displaystyle\begin{array}{rcl} & & \qquad WROC(q) = ROC(q) \cdot h_{0}(q);\ \ {WROC}^{\;{\ast}}(q) ={ ROC}^{\;{\ast}}(q) \cdot h_{ 1}(q) {}\\ & & \overline{WROC}(q) = \overline{ROC}(q) \cdot h_{0}(1 - q);\ \ {\overline{WROC}}^{\;{\ast}}(q) ={ \overline{ROC}}^{\;{\ast}}(q) \cdot h_{ 1}(1 - q) {}\\ \end{array}$$

These weighted ROC functions serve to study the performance of predictive accuracy for multivariate markers from different perspectives. For example, WROC  ∗(p) serves to study the performance of false positive rate with true positive rate controlled at value p. It is shown in the appendix that

$$\displaystyle\begin{array}{rcl} & & \qquad \quad ROC(q) + \overline{ROC}(1 - q) = 1;\ \ {ROC}^{\;{\ast}}(q) +{ \overline{ROC}}^{\;{\ast}}(1 - q) = 1 {}\\ & & WROC(q) + \overline{WROC}(1 - q) = h_{0}(q);\ \ {WROC}^{\;{\ast}}(q) +{ \overline{WROC}}^{\;{\ast}}(1 - q) = h_{ 1}(q){}\\ \end{array}$$

Thus, the function ROC provides the same amount of information as \(\overline{ROC}\), and similarly ROC  ∗ is as informative as \({\overline{ROC}}^{\;{\ast}}\). Also, with knowledge of h(q), WROC(q) provides the same amount of information as \(\overline{WROC}\) for predictive accuracy, and similar argument applies to the relationship between WROC  ∗ and \({\overline{WROC}}^{\;{\ast}}\). Essentially, the pair-wise relationship can be thought of as the conjugate partnership.

For evaluation based on partial area under curve, subject to either smaller FP (FP ≤ p) or larger TP (TP > q), choices of these weighted ROC functions should be WROC and \({\overline{WROC}}^{{\ast}}\) so that maximization of area under curve would make sense. These two weighted ROC functions together with their corresponding ROC functions are used in our simulation to study the performance of the proposed criterions and methods for multivariate markers. Note that the partial concordance probability for true negativity is \({\overline{CON}}^{{\ast}}(p) = P({\mathbf{M}_{0}}\notin D({\mathbf{M}_{1}}),\ FN({\mathbf{M}_{1}}) \leq p)\). By similar technique employed in section “WROC and AUC”, it can be proved that this concordance probability coinsides with the area under \({\overline{WROC}}^{{\ast}}(p)\) function, \({\overline{CON}}^{{\ast}}(p) ={ \overline{AUC}}^{{\ast}}(p)\), and therefore a U-statistic \(\widehat{{\overline{CON}}^{{\ast}}}(p)\) can be constructed to estimate \({\overline{CON}}^{{\ast}}(p)\).

In case of requiring both FP ≤ p and TP > q, these ROC or WROC functions cannot be used for evaluation, but CON(p, q) can be used and estimated by the technique described in section “Multivariate Markers: ROC, WROC and AUC”. For estimation of \({\overline{ROC}}^{\;{\ast}}\), \({\overline{WROC}}^{\;{\ast}}\) and \({\overline{CON}}^{{\ast}}(p,q)\), nonparametric estimates can be constructed using methods similar to those for the functions ROC, WROC and CON(p, q). Also, a property similar to Theorem 1 can be established for \(\widehat{{\overline{CON}}^{{\ast}}}(p)\) by the same technique.

Remark.

By setting \(M_{l1} = M_{l2} =\ldots = M_{lk}\), l = 0, 1, univariate marker model can be viewed as a degenerated case of multivariate markers. For this degenerated case, the quantile variable \(Q_{0} = FP({\mathbf{M}_{0}})\) and \(Q_{1} = TP({\mathbf{M}_{1}})\) both follow Uniform[0, 1] distribution, and \(\overline{ROC}(q) = FN({TN}^{-1}(q))\) and \({\overline{ROC}}^{\;{\ast}}(q) = TN({FN}^{-1}(q))\). In this case, each of the WROC functions coincides with their counterpart of ROC functions. Further, besides the relationship \(ROC(q) + \overline{ROC}(1 - q) = 1\) and \({ROC}^{\;{\ast}}(q) +{ \overline{ROC}}^{\;{\ast}}(1 - q) = 1\), it is seen that \({ROC}^{\;{\ast}}(q) ={ ROC}^{\ -1}(q)\), which implies that each of the four ROC functions provides the same amount of information as the other three functions for predictive accuracy of the marker.

Simulation and Data Example

Simulation

To show the performance of predictive accuracy for multivariate markers, we conduct simulation studies under different scenarios. We compare ROC and WROC curves for multivariate markers under each scenario, along with the weight function h 0(q). We also compare univariate and multivariate marker cases to evaluate the gain and loss by using multiple markers.

Since this paper is a generalization of the bivariate ROC analysis of Wang and Li [21], we take k ≥ 3 markers for evaluation. For simplicity, we take k = 3. Consider the simulation model where \((M_{01},M_{02},M_{03})\) and \((M_{11},M_{12},M_{13})\) follow a multivariate normal distribution. By convention we assume higher marker value indicates presence of disease. Let N 1 = 200 be the number of diseased individuals and N 2 = 200 be the number of non-diseased individuals. We generate data so that \((M_{01},M_{02},M_{03})\) have mean (0, 0, 0) and unit deviations. We generate data so that \((M_{11},M_{12},M_{13})\) have mean (1, 1, 1) and unit deviations. Let \(\boldsymbol{\rho _{l}} = (\rho _{l12},\rho _{l23},\rho _{l13})\), l = 0, 1, where ρ lij denote the correlation between M li and M lj . We consider different scenarios according to different correlations \(\boldsymbol{\rho _{l}}\). The ROC analysis for univariate marker is based on data generated from the distributions of M l1, bivariate ROC analysis is based on data generated from the distribution of \((M_{l1},M_{l2})\), and multivariate ROC analysis is based on data generated from the distribution of \((M_{l1},M_{l2},M_{l3})\).

Figures 13 exhibit simulation results when \(\boldsymbol{\rho _{0}} =\boldsymbol{\rho _{1}} = {\boldsymbol 0}\), \({\boldsymbol 0.5}\) and \({\boldsymbol 1}\) respectively. As discussed in section “Other Types of ROC and WROC Functions”, WROC is the conjugate partner of \(\overline{WROC}\) and WROC is the conjugate partner of \({\overline{WROC}}^{{\ast}}\), and with the knowledge of h 0(q) and h 1(q), each of paired-partners provides the same amount of information for prediction as its partner. Choices of these weighted ROC functions should include only WROC and \({\overline{WROC}}^{{\ast}}\) so that maximization of area under curve makes sense.

Fig. 1
figure 1

Simulation for classifier I(\(M_{1} > m_{1}\), \(M_{2} > m_{2}\), \(M_{3} > m_{3}\)) with \(\boldsymbol{\rho _{0}} =\boldsymbol{\rho _{1}} = {\boldsymbol 0}\)

Fig. 2
figure 2

Simulation for classifier I(\(M_{1} > m_{1}\), \(M_{2} > m_{2}\), \(M_{3} > m_{3}\)) with \(\boldsymbol{\rho _{0}} =\boldsymbol{\rho _{1}}\) = \({\boldsymbol 0.5}\)

Fig. 3
figure 3

Simulation for classifier I(\(M_{1} > m_{1}\), \(M_{2} > m_{2}\), \(M_{3} > m_{3}\)) with \(\boldsymbol{\rho _{0}} =\boldsymbol{\rho _{1}}\) = \({\boldsymbol 1}\)

When \(\boldsymbol{\rho _{0}} =\boldsymbol{\rho _{1}} = {\boldsymbol 0}\), the three markers are mutually independent, so the use of all three markers is expected to be more informative than one marker or two markers alone. Figure 1 shows a clear pattern of gain and loss as the number of markers increases. The gain in WROC(q) for small values of q, when compared to univariate ROC curve, is substantial for multivariate ROC curve but only moderate for bivariate ROC curve. Similarly, the loss in WROC(q) for large values of q is substantial for bivariate ROC curve but only moderate for bivariate ROC curve. This phenomenon can partly be explained by the right skewness of the weight function h 0(q): the distribution of FP is uniform in univariate case, but it distributes more probability toward smaller values for bivariate marker case, and the inclusion of the third marker makes the weight function more skewed. By the equivalence between partial concordance probability and partial area under WROC curve, we find that multivariate markers outperform univariate marker and bivariate marker for the region with small FP. The function \({\overline{WROC}}^{{\ast}}\) for multivariate markers shows the opposite direction of gain and loss, compared to univariate or bivariate marker case. There is loss in \({\overline{WROC}}^{{\ast}}(q)\) for small values of q (FP) and gain for large values of q, which is due to the left skewness of the weight function h 1(1 − q).

When \(\boldsymbol{\rho _{0}} =\boldsymbol{\rho _{1}}\) = \({\boldsymbol 0.5}\), the three markers are moderately correlated, similar to the case \(\boldsymbol{\rho _{0}} =\boldsymbol{\rho _{1}}\) = \({\boldsymbol 0}\), the distribution of Q 0 and Q 1 still distribute more probability to small values, so we can observe the same pattern of tradeoff between gain at small FP and loss at large FP.

When \(\boldsymbol{\rho _{0}} =\boldsymbol{\rho _{1}}\) = \({\boldsymbol 1}\), the three markers are identical and they provide the same information as one marker case (or two marker case). The ROC (WROC) functions for multivariate case coincides with the ROC function for univariate case (Fig. 3). The univariate case can thus be viewed as a degenerated case of multivariate markers.

A Data Example

We apply the proposed methods to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data for multivariate ROC analysis. The ADNI study is a research project with research focus on

changes of cognition, function, brain structure and function, and biomarkers in elderly controls, subjects with mild cognitive impairment, and subjects with Alzheimer’s disease

(quoted from http://adni.loni.ucla.edu/). The study is supported by the NIH, private pharmaceutical companies, and nonprofit organizations. Enrollment target was 800 participants – 200 normal controls, 400 patients with amnestic MCI, and 200 patients with mild AD – at 58 sites in the United States and Canada. Participants were enrolled on a rolling basis, and evaluated every six months. One of the major goals of the ADNI study is to identify biomarkers that are associated with progression from MCI to AD, and determine which biomarker measures (alone or in combination) are the best predictors of disease progression. Sensitivity and specificity for both cross-sectional and longitudinal diagnostic classification were considered important statistical techniques for assessing biomarkers in disease progression [18].

Investigations of the risk of progressing from MCI to AD dementia have largely focused on measures from the following categories: demographics, cognition, apolipoprotein E (APOE), magnetic resonance imaging (MRI), and cerebrospinal fluid (CSF) data. Demographic variables include age, education and gender. Cognitive measures represent five domains respectively: memory, language, executive function, spatial ability, and attention. Neuroimaging measures include brain volume, ventricular volume, and bilateral hippocampal volumes. The CSF variables include T-tau, Aβ42, p-tau181, the ratio of the first two variables, and the ratio of the last two variables.

For this section, we selected three markers, hippocampus volume, memory score and executive function for illustration. To account for censoring, we used a reduced sample data set to create time-independent binary disease outcomes (D = 0, 1). We chose the 24th month as the cut-off time to define disease state. Of the 274 subjects who had complete data for the three markers, 49 subjects were loss to follow up before 24 months, so we focused on the 225 subjects who have had follow-up time longer than 24 months: there were 89 failures (D = 1) and 136 survivors (D = 0) at the 24th month. Let M 1 be hippocampus volume, M 2 be executive function score, and M 3 be memory score. Figure 4 compares the diagnostic performance of three markers \((M_{1},M_{2},M_{3})\), bivariate markers \((M_{1},M_{2})\), and univariate marker M 1. If the classifier is I(\(M_{1} > m_{1}\), \(M_{2} > m_{2}\), \(M_{3} > m_{3}\)), there is gain for small values of FP and loss for large values of FP. The partial AUC plot indicates that multivariate markers produce higher partial concordance summary than univariate marker when q < 0. 6, and multivariate markers produce higher partial concordance summary than bivariate marker when q < 0. 3. In diagnostic testing, it is crucial to maintain the false positive rate to be low to avoid unnecessary monetary costs. Thus, if the prognostic capacity is evaluated in terms of partial AUC, the multivariate marker hippocampus volume, executive function and memory score together would be considered performing much better than hippocampus volume alone.

Fig. 4
figure 4

(\(M_{1},M_{2},M_{3}\))  = (hippocampus, executive function, memory), with classifier I(\(M_{1} > m_{1}\), \(M_{2} > m_{2}\), \(M_{3} > m_{3}\))

Without restriction on the false positive rate, the AUC under the multivariate WROC curve is 0.358 (SE: 0.022) and the AUC under the multivariate \({\overline{WROC}}^{{\ast}}\) is 0.964 (SE: 0.024); the AUC under the bivariate WROC curve is 0.437 (SE: 0.030) and the AUC under the bivariate \({\overline{WROC}}^{{\ast}}\) is 0.906 (SE: 0.030); the AUC under the univariate ROC curve is 0.658 (SE: 0.040). The bootstrap method was adopted to calculate the standard errors for estimation of AUC.

Discussion

Existing ROC methods to incorporate multiple markers typically consider a composite score based on combined markers by modeling the relationship between the marker vector \(\mathbf{M}\) and the binary outcome D [14], where \(P(Y = 1\vert \mathbf{M}) = p(\mathbf{M})\) is used as the optimal score to identify the combination of multiple markers for classifying the disease outcome. In general, by the Neyman-Pearson lemma, the optimality of \(p(\mathbf{M})\) is a very general property which holds without dimensionality constraint on \(\mathbf{M}\). In the case that the linear logistic regression model assumption holds, the optimal classification rule, \(p(\mathbf{M})\), becomes equivalent to the regression function \({\boldsymbol \beta M}\) under the logit link. Thus, the optimality property of a one-dimensional classification score heavily relies on the assumption of logistic regression model. In this paper, we extend tools from univariate marker to multivariate markers for evaluating predictive accuracy of markers under a nonparametric setting based on tree-based classification rules.

The proposed ROC and WROC functions together with the AUC are intended to measure the average performance of and-or classifier among all possible combinations of true positive rate for a given false positive rate for evaluating predictability of markers and comparing curves, and they may not reflect the optimized use of markers for clinical decisions. Although the proposed approach is not designed to achieve optimality as a decision rule such as the one proposed by Jin and Lu [13], our methods and inferential results are much more structural, accessible and workable. The proposed ROC and WROC functions enjoy the advantage of preserving the distributional structures of markers, and the associated summary measures such as AUC or partial AUC serve as very appropriate summary measures to evaluate the performance of and-or classifier among all possible combinations of marker values – this is a feature similar to the univariate marker case. These summary measures are useful in applications, since many biomarker studies (such as the ADNI study and two other Alzheimer’s Disease studies that the authors are currently involved) have research emphasis largely focused on the understanding of predictability of biomarkers in target population, and less emphasis toward optimization of clinical decision rules.

The evaluation takes into account the distributions of quantile variables Q 0 and Q 1 in the diseased and non-diseased populations, which leads to the result of equivalence between AUC and CON, a property similar to the case of univariate marker. We also provide estimation procedures using nonparametric smoothing estimators for the ROC and WROC function, and U-statistic for the AUC. For applications of the proposed analysis, as the ‘curse of dimensionality’ is not a concern for nonparametric estimation of ROC, WROC and other related properties, the usual random split into training sample (for model fitting) and test sample (for creating ROC curve and calculating AUC) would be as proper as it is for univariate marker case, and therefore is advisable.

For future and further research, similar to the considerations for univariate ROC analysis [16, 20], it would be interesting to consider methodology to adjust for covariates such as age, sex or other demographical factors for bivariate or multivariate markers.

Also, given that the disease outcomes typically change with time, it would be interesting to extend the ROC analysis for high-dimensional markers to accommodate time-to-disease information using the ‘survival-tree methodology’ [22], along the lines of extending ROC techniques from binary disease outcome model to right-censored survival data model in univariate marker settings [5, 10, 11, 19].