Keywords

1 Background

Up to date, the statistical models commonly used to examine medical, health, psychological, and socio-behavioral outcomes depends on the linear regression and continuous change approach (Chen & Chen, 2015, 2019; Chen, Stanton, Chen, & Li, 2013). However, in the real world, these outcomes are rarely linear and continuous because of the nature of the medical, health, and behavioral outcomes and the multiple, complex influences of environmental, behavioral, psychological, and biological factors (Chen, Lin, Chen, Tang, & Kitzman, 2014; Chen et al., 2010; Witkiewitz, van der Maas, Hufford, & Marlatt, 2007; Xu & Chen, 2016). What might appear to be small and inconsequential changes in one of these factors can lead to abrupt and sudden changes in an outcome (Thom, 1975). Under these conditions, a linear and continuous approach seriously limits the predictability of the influence of hypothesized factors on a particular outcome variable (Chen & Chen, 2015, 2019; Chen, Wang, & Chen, 2019) and therefore a new paradigm to incorporate nonlinear and discrete behaviors is needed to fill this knowledge gap.

1.1 Cusp Catastrophe for Nonlinear Discrete Systems

To account for nonlinearity and discrete characteristics in low-dimensional scenarios, researchers often turn to natural extensions of a linear regression model, including the kernel regression or regression/smoothing splines (Berk, 2008; Faraway, 2009; Guastello & Gregson, 2011). In addition to these nonparametric methods, other techniques for use with high-dimensional data include additive models, multivariate adaptive regression splines, random forests, neural networks, and support vector machine. These techniques have been discussed extensively elsewhere (Chen & Chen, 2017; Faraway, 2009). Despite much strength, these nonparametric methods do not have a mechanism to identify and incorporate a medical, health and behavior outcomes with sudden and discrete changes and multi-modes. Cusp catastrophe model is one that is capable to quantify such a mechanism.

As a complement to many traditional analytical approaches, the cusp catastrophe model offers distinct advantages given its capacity to not only simultaneously handle complex linear and nonlinear relationships in a high-order probability density function but also to incorporate sudden jumps in outcome measures, as outlined in Zeeman (Zeeman, 1977) and Gilmore (Gilmore, 1981). Catastrophe theory was proposed in the 1970s (Thom, 1975) to understand a complicated set of behaviors that included gradual, continuous changes as well as sudden and discrete or catastrophic changes in general. The cusp catastrophe model has been used extensively in a wide range of research fields, including the modeling of tobacco use (Xu & Chen, 2016), adolescent alcohol use (Clair, 1998), changes in adolescent substance use (Mazanov & Byrne, 2006), binge drinking among high school and college students (Chen et al., 2019; Guastello, Aruka, Doyle, & Smerz, 2008) adult population (White, Tapert, & Shukla, 2017) and problem drinking among persons living with HIV (Witkiewitz et al., 2007), sexual initiation and condom use among young adolescents (Chen et al., 2010, 2013), nursing turnover (Wagner, 2010), HIV prevention (Xu, Chen, Yu, Joseph, & Stanton, 2017), therapy and program evaluation (Guastello, 1982), health outcomes (Chen et al., 2014), and accident process (Guastello, 1989; Guastello & Lynn, 2014).

1.2 Established Methods for Cusp Catastrophe Modeling

Historically, three main implementation approaches have been established for data analysis to conduct cusp catastrophe modeling.

The first method test the outcome variable if it follows cusp catastrophe by inserting regression coefficients into the deterministic cusp model and the method was operationalized by Guastello using a polynomial regression approach (Guastello, 1982; Guastello et al., 2008). This method is straight forward to understand and the analysis can be completed using any software packages with regression analysis functionality (Guastello & Gregson, 2011).

The second method uses a stochastic differential equation from Cobb and his colleagues (Cobb, 1981; Cobb & Zacks, 1985; Grasman, van der Maas, & Wagenmakers, 2009) with likelihood estimation implemented in an R package “cusp”. Since the method was established by Cobb and implemented through Grasman’s work, this approach has been named as Cobb-Grasman cusp modeling (Chen et al., 2019).

The third method takes a different approach to solve the deterministic cusp catastrophe model with a statistical approach. Different from the Cobb-Grasman’s approach described above, in this method, the deterministic cusp catastrophe is directly casted into the classical multiple regression with the outcome variable being measured with a latent variable and the two control variables each being measured as linear combination. In this modeling approach, method for estimation of the cusp region is also provided (Chen & Chen, 2017). This Chen-Chen method has been used in modeling harm perception and social influence on binge drinking among high school students in the United States (Chen et al., 2019). In Chap. 15 of this book, this method was used to model prostate-specific antigen (PSA), a biomarker of prostate cancer in men.

1.3 Need for Methods to Model Binary Data

All the methods described above for cusp catastrophe modeling are for continuous outcome variables, and no one method is available for other types of outcomes, to the best of our knowledge. To fill this methodology gap, in this chapter we attempted a method to analyze binary outcome with cusp catastrophe model. In our previous research, we developed a regression-based approach to solve for cusp catastrophe model for continuous outcomes (Chen & Chen, 2017) and used it in analyzing binge drinking among youth (Chen et al., 2019). We used the same regression-based approach in this new method with the continuous outcome being replaced by binary outcome for cusp catastrophe modeling of binary data in the framework of statistical logistic regression.

2 An Overview of the Cusp Catastrophe Model

Catastrophe theory was proposed in the 1970s by Thom (1975) and popularized over the next two decades by several leading researchers (Cobb, 1981; Cobb & Ragade, 1978; Cobb & Watson, 1980; Cobb & Zacks, 1985; Gilmore, 1981; Thom & Fowler, 1975; Zeeman, 1977). Thom (1975) originally proposed the catastrophe theory to understand complicated phenomena that included both gradual, continuous change and sudden, discontinuous or catastrophic change.

2.1 Deterministic Cusp Model

To apply this model in research, the deterministic cusp catastrophe model can be specified with three components: two control factors (i.e., α and β) and one outcome variable (i.e., y). This model is defined by a dynamic system:

$$ \frac{dy}{dt}=-\frac{dV\left(y;\alpha, \beta \right)}{dy} $$
(16.1)

where V, commonly called the potential function, is defined as

$$ V\left(y;\alpha, \beta \right)=-\alpha y-\frac{1}{2}\beta {y}^2+\frac{1}{4}{y}^4 $$
(16.2)

In this potential function V, α is the asymmetry or normal control factor, and β is the bifurcation or splitting control factor. Both α and β are linked to determine the outcome variable y in a three-dimensional response surface. When the right side of Eq. (16.1) moves toward zero, change in the outcome y also tends toward zero with change in time; this status is called equilibrium. In general, the behavior of the outcome y (i.e., how y changes with time t) is complicated, but all subjects tend to move toward equilibrium the surface.

2.2 Characteristics of the Cusp Catastrophe Model

Figure 16.1 graphically depicts the equilibrium surface that reflects the response plan of the outcome measure (y) at various combinations of the asymmetry control factor (the measure of α in Fig. 16.1) and the bifurcation control factor (the measure of β in Fig. 16.1).

Fig. 16.1
figure 1

Cusp catastrophe model for outcome (y) in the equilibrium plane with an asymmetry control variable (the measure of α) and a bifurcation control variable (the measure of β)

As shown in Fig. 16.1, dynamic changes in y have two stable regions (attractors), which are the lower area in the front left (lower stable region) and the upper area in the front right (upper stable region). Beyond these stable regions, y becomes sensitive to changes in α and β. The unstable region can be projected to the control plane (α, β) as the cusp region. The cusp region is characterized by line OQ (the ascending threshold) and line OR (the descending threshold) of the equilibrium surface. In this region, y becomes highly unstable with regard to changes in α and β, jumping between the two stable regions when (α, β) approaches the two threshold lines OQ and OR. In Fig. 16.1, paths A, B, and C depict three typical but distinct pathways of change in the health outcome measure (y). Path A shows that in situations where y < O, a smooth relation exists between y and α. Path B shows that in situations when y > O, if α increases to reach and pass the ascending threshold link OQ, y will suddenly jump from the low stable region to the upper stable region of the equilibrium plane. Path C shows that a sudden drop occurs in y as α declines to reach and pass the descending threshold line OR.

The cusp catastrophe model can be used as both a qualitative and a quantitative analytical method in research to investigate the relationship between predictors and outcome variables (e.g., behaviors or health outcomes). The qualitative approach focuses on identifying the five catastrophe elements (i.e., catastrophe flags) outlined by Gilmore (1981), whereas the quantitative approach uses numerical data to statistically fit the model.

3 Implementation of a Cusp Catastrophe Model

As described in the Introduction, since the introduction of the cusp catastrophe model, three quantitative approaches have been developed and used to implement the model for data analysis: Guastello’s polynomial regression, Cobb-Grasman stochastic differential equation implemented in an R package “cusp”, and Chen-Chen approach to cast the cusp catastrophe model into the nonlinear regression.

3.1 Guastello’s Polynomial Approach

Specifically, as the first implementation, Guastello’s approach is derived by reformulating the cusp dynamic system in Eq. (16.1) in the differential equation form into a difference equation system as outlined in Guastello (1982), Guastello et al., 2008). Since its first publication, this approach has been widely used in analyzing research data because this approach can be implemented in common statistical software packages, including SAS, SPSS, STATA, and R. This approach makes it possible the first time for many researchers to modeling social and behavioral issues with cusp catastrophe modeling. Guastello’s approach is suitable for longitudinal data with outcome variables measured at two time points that are not vary far from each other.

3.2 Cobb-Grasman’s Approach

As the second approach in implementing the cusp catastrophe model is Cobb-Grasman’s stochastic differential equation method (named thereafter as “SDECusp”). In this SDECusp approach, the deterministic cusp model in Eq. (16.1) is first extended with a probabilistic/stochastic Wiener process. With this extension, the modeling process incorporates measurement errors in the outcome variable. Using this approach, the response surface of cusp catastrophe is modeled as a probability density function where the bimodal nature of the outcome corresponds to the two states of outcome variable. Mathematically, Cobb and his colleagues (Cobb & Ragade, 1978; Cobb & Watson, 1980; Cobb & Zacks, 1985; Hartelman, van der Maas, & Molenaar, 1998; Honerkamp, 1994) cast the deterministic cusp model in Eq. (16.1) into a stochastic differential equation (SDE) as follows:

$$ dz=\frac{\partial V\left(z,\alpha, \beta \right)}{\partial z} dt+ dW(t) $$
(16.3)

where dW(t) is a white noise Wiener process with variance σ 2.

This extension is in fact a special case of general stochastic dynamical systems modeling with a constant diffusion function defined by dW(t). Since the model Eq. (16.2) cannot be solved analytically, computational implementation of this stochastic model is limited. However, at the equilibrium state when time (t) approaches the infinity, it is easier to estimate the probability density function of the corresponding limiting stationary stochastic processes. In other words, the probability density function of the outcome measure (y) can be expressed as follows:

$$ f(y)=\frac{\Psi}{\sigma^2}\mathit{\exp}\left[\frac{\alpha \left(y-\lambda \right)+\frac{1}{2}\beta {\left(y-\lambda \right)}^2-\frac{1}{4}{\left(y-\lambda \right)}^4\ }{\sigma^2}\right] $$
(16.4)

where the parameter ψ is a normalizing constant and λ is used to determine the origin of y.

With this probability density function, the regression predictors α and β can be incorporated as linear combinations to replace the canonical asymmetry factor (i.e., α) and bifurcation factor (i.e., β). Note that as a distribution for a limiting stationary stochastic process, this probability density function in Eq. (16.3) is independent from time t, thus it can be used to model cross-sectional relationship with the advantage to detect and quantify its potential cusp nature comprising both sudden and continuous states. Moreover, the probability density function allows the well-known statistical theory of maximum likelihood to be used for model parameter estimation and statistical inference. R Package “cusp” has been developed to implement this SDECusp (Grasman et al., 2009). This SDECusp model with R package “cusp” is extremely well-suited for use with cross-sectional data. We have used this SDECusp model extensively for research and publications (Chen, Lin, et al., 2014; Chen et al., 2013; Diks & Wang, 2016; Katerndahl, Burge, Ferrer, Wood, & Becho, 2015; Xu & Chen, 2016; Xu et al., 2017; Yu et al., 2018).

3.3 Chen-Chen’s Cusp Regression Approach

As the third approach, Chen and Chen (2017) developed a cusp catastrophe nonlinear regression model (“RegCusp”) for continuous data as a conceptual model that is guided by the statistical theory of nonlinear regression models (Seber & Lee, 2003). Following Eq. (16.1), the RegCusp model can be formulated as following:

$$ {y}_i={Y}_i+{\varepsilon}_i, $$
(16.5)

where y i (i = 1,…,n) are the observed outcome values and ε i are the residuals from n observations, and are assumed to be normally distributed as ϵ i~N(0, σ 2).

Mathematically, it can be seen that the latent variable Y i in Eq. (16.5) is one of the real roots of the deterministic cusp catastrophe equation:

$$ {\alpha}_i+{\beta}_i{Y}_i-{Y}_i^3=0, $$
(16.6)

where α i and β i are two control variables which is discussed later in the section of cusp catastrophe conventions. For any observed data with p independent variables (x 1, …, x p) and the outcome variable y i, the variables α i and β i are the control variables for ith subject.

In modeling analysis, these two control variables α i and β i are modeled in a way similarly to the Cobb-Grasman (Cobb & Zacks, 1985; Grasman et al., 2009):

$$ {\alpha}_i={a}_0+{a}_1{x}_{1i}+\dots +{a}_p{x}_{pi}=\sum \limits_{j=0}^p{a}_j{x}_{ji}\vspace*{-13pt} $$
(16.7)
$$ {\beta}_i={b}_0+{b}_1{x}_{1i}+\dots +{b}_p{x}_{pi}=\sum \limits_{j=0}^p{b}_j{x}_{ji} $$
(16.8)

With the formulations of Eqs. (16.5)–(16.8), a nonlinear regression method can be used to estimate the model parameters of a = (a 0, a 1, …, a p), b = (b 0, b 1, …, b p) from Eqs. (16.7) and (16.8). The model parameters can be estimated using maximum likelihood estimation with the likelihood function formulated as follows:

$$ L\left(\boldsymbol{a},\boldsymbol{b},{\sigma}^2| data\right)={\left(\frac{1}{\sqrt{2\pi \sigma}}\right)}^n\mathit{\exp}\left(-\frac{\sum_{i=1}^n{\left({z}_i-{Z}_i\right)}^2}{2{\sigma}^2}\right) $$
(16.9)

With the likelihood function defined in Eq. (16.10), the theory of likelihood estimation can be readily applied to estimate RegCusp parameters as well as the associated statistical inferences on parameter significance and model selection.

4 Cusp Catastrophe Modeling of Binary Data

To establish the logistic cusp catastrophe regression model, we start with the binary data structure, then introduce logistic cusp catastrophe regression, conventions and algorithm for parameter estimation, and the method for cusp region estimation.

4.1 The Binary Data Structure

Suppose data from n participants are available as data = (y i, x 1i, …, x pi)   (i = 1, …, n) where y i is observed binary outcome with 0/1 from the ith participants, x 1i, …, x pi are the corresponding p-independent variables. Then y i will be binary distributed as:

$$ {y}_i\sim Binary\left({p}_i\right) $$
(16.10)

where p i = Pr(y i = 1) is the probability to observe category 1.

4.2 The Binary Cusp Catastrophe Model

We make use of the logistic type of regression to model the logit of p i to the latent variable Y i, such that

$$ {p}_i=\frac{\mathit{\exp}\left({Y}_i\right)}{1+\mathit{\exp}\left({Y}_i\right)} $$
(16.11)

is one of the real roots of the deterministic cusp catastrophe equation:

$$ {\alpha}_i+{\beta}_i{Y}_i-{Y}_i^3=0, $$
(16.12)

where αi and βi are two control variables which is discussed later in the section of cusp catastrophe conventions.

The two control variables of α i and β i are modeled in a way similarly to SDECusp with the linear combination of multiple independent variables in Eqs. (16.7) and (16.8)

4.3 Maximun Likelihood Estimation

With the formulations of Eqs. (16.10) to (16.12), a maximum likelihood procedure can be developed to estimate the model parameters of a = (a 0, a 1, …, a m), b = (b 0, b 1, …, b m) from Eq. (16.7) and (16.8) as well as the associated statistical inferences on parameter significance and model selection. Based on the theory of maximum likelihood estimation, the parameters of a = (a 0, a 1, …, a m), b = (b 0, b 1, …, b m) are estimated by solving the system of gradient equations and their associated variances can be obtained by the Fisher information matrix or Hessian matrix.

Specifically, we construct the likelihood function from Eq. (16.10) as follows:

$$ L\left(\boldsymbol{a},\boldsymbol{b}| data\right)=\prod \limits_{i=1}^n{p}_i^{y_i}{\left(1-{p}_i\right)}^{1-{y}_i} \vspace*{-3pt}$$
(16.13)

To maximize the likelihood function defined in Eq. (16.13) is equivalent to maximize the log-likelihood function as follows:

$$ \begin{aligned}logL\left(\boldsymbol{a},\boldsymbol{b}| data\right)&=\sum \limits_{i=1}^n\left[{y}_i\mathit{\log}\left({p}_i\right)+\left(1-{y}_i\right)\mathit{\log}\left(1-{p}_i\right)\right]\nonumber\\ &=\sum \limits_{i=1}^n\left[{y}_i{Y}_i+\mathit{\log}\left(1-{p}_i\right)\right]\end{aligned}\vspace*{-3pt} $$
(16.14)

4.4 Cusp Catastrophe Conventions

The cusp catastrophe model is not the traditional statistical model in which each combination of independent variables is associated with one and only one outcome value. In fact, the RegCusp model formulated from Eq. (16.6) and the LogisticCusp model formulated from Eq. (16.12) could have one, two, or three roots for each α i and β i combinations depending on the locations on the control plan, defined by Eqs. (16.7) and (16.8). There three roots can be solved analytically as follows:

$$\begin{aligned} {Y}_1&=\frac{1}{6}\frac{\nabla^{2/3}+12\beta }{\nabla^{1/3}}{Y}_2=\frac{1}{12}\frac{\sqrt{3}I{\nabla}^{2/3}-12\sqrt{3} I\beta -{\nabla}^{\frac{2}{3}}-12\beta }{\nabla^{1/3}},\kern1em \mathrm{and}\nonumber\\ {Y}_3&=-\frac{1}{12}\frac{\sqrt{3}I{\nabla}^{\frac{2}{3}}-12\sqrt{3} I\beta +{\nabla}^{\frac{2}{3}}+12\beta }{\nabla^{1/3}}\end{aligned} $$
(16.15)

where \( I=\sqrt{-1} \) as the imaginary unit, \( \nabla =108\alpha +12\sqrt{3\Delta} \) and Δ = 27α 2 − 4β 3 is the well-known Cardan discriminant. Which one to choose to fit the likelihood function in Eq. (16.13) for the latent variable Y in Eq. (16.12) would have to be determined using the Cardan discriminant.

From Eq. (16.15), it can be derived that when Δ > 0, Eq. (16.12) has one real root; but when Δ ≤ 0, Eq. (16.12) has three real roots. Among these three roots, there are three cases: (a) if α = β = Δ = 0, the three roots are the same, which is referred as the cusp point (labeled O in Fig. 16.1); (b) if Δ = 0, but α ≠ 0 or β≠ 0, two roots are the same, which are the two lines OQ and OR forming the boundary for the cusp region (Fig. 16.1); and (c) if Δ < 0, and α ≠ 0 or β≠ 0, the three roots are distinct, which characterizes the cusp region between OQ and OR also indicated in Fig. 16.1. Therefore, this LogisticCusp model is no longer within the traditional domain of mathematical and statistical modeling. Further investigation is needed to identify the statistical properties of this LogisticCusp model.

To select the correct root for the cusp catastrophe model described by Eq. (16.12), we used two modeling conventions: delay convention and Maxwell convention. The delay convention is used to select the root from the cusp surface of \( \frac{dV\left(y;\alpha, \beta \right)}{dy}=0 \) in Eq. (16.1) that are close to the observed y. The Maxwell convention is used to select the roots on the cusp surface of \( \frac{dV\left(y;\alpha, \beta \right)}{dy}=0 \) in Eq. (16.1) corresponding to the minimum of the associated potential function \( V\left(y;\alpha, \beta \right)=\alpha y+\frac{1}{2}\beta {y}^2-\frac{1}{4}{y}^4 \).

4.5 Cusp Region Estimation

Based on the discussion above, the boundary of the cusp region depicted in Fig. 16.1 can be constructed from Δ = 0. Since Δ = 27α 2 − 4β 3, this can be solved at \( \beta =\sqrt[3]{27{\alpha}^2/4} \). Therefore for the asymmetric parameter α from a range of lower limit (say, α Lower Limit) to upper limit (say, α Upper Limit), β can be calculated by at \( \beta =\sqrt[3]{27{\alpha}^2/4} \) which would correspond to the two lines OQ and OR forming the boundary for the cusp region (Fig. 16.1).

When α = β = 0, then Δ = 0 which would be the cusp point as commonly referred as the cusp point (labeled O in Fig. 16.1). When Δ < 0, the values of (α, β) are within the cusp region and when Δ > 0, the values of (α, β) are outside the cusp region.

This cusp region under (α, β) coordinate system can be easily transformed into the original data coordinate system of the interest based on the estimated Eqs. (16.7) and (16.8). For example, if the interest is for (x 1, x 2), we can plug the estimated Eqs. (16.7) and (16.8) with x 1 and x 2 varying and the other xs fixed into \( \beta =\sqrt[3]{27{\alpha}^2/4} \) and solve for x 2 as a function of x 1. This is illustrated in the real data analysis in Sect. 4.

4.6 Numeric Search Algorithms for Parameter Estimates

There are several methods to be used to maximize the log-likelihood function in Eq. (16.13). We make use of R function “optim”. The default method is an implementation of that of Nelder and Mead (1965) which uses only function values and is robust but relatively slow. It will work reasonably well for non-differentiable functions. Another commonly-used method is a quasi-Newton method (also known as a variable metric algorithm), specifically that published simultaneously in 1970 by Broyden, Fletcher, Goldfarb and Shanno which is named as BFGS. The BFGS uses function values and gradients for the optimization. Specifically, with the log-likelihood function in Eq. (16.14), the parameters of a = (a 0, a 1, …, a p), b = (b 0, b 1, …, b p) are estimated by solving the system of 2p + 2 gradients as:

$$ {\fontsize{9}{11}\selectfont{\begin{aligned}& {\left(\begin{array}{c}\frac{\partial logL}{\partial a}\\ {}\frac{\partial logL}{\partial b}\end{array}\right)}_{\left(2p+2\right)\times 1}\nonumber\\ &={\left(\frac{\partial logL}{\partial {a}_0},\frac{\partial logL}{\partial {a}_1},\dots, \frac{\partial logL}{\partial {a}_j},\dots, \frac{\partial logL}{\partial {a}_p},\frac{\partial logL}{\partial {b}_0},\frac{\partial logL}{\partial {b}_1},\dots, \frac{\partial logL}{\partial {b}_j},\dots, \frac{\partial logL}{\partial {b}_p}\right)}^{\hbox{'}}\nonumber\\ &=0\end{aligned}}}\normalsize $$
(16.16)

where (.)' in Eq. (16.16) denotes the vector transpose and the partial derivatives in Eq. (16.16) can be derived as \( \left\{\begin{array}{c}\frac{\partial logL}{\partial {a}_j}=\sum \limits_{i=1}^n\left[{y}_i\frac{\partial {Y}_i}{\partial {a}_j}-\frac{1}{1-{p}_i}\frac{\partial {p}_i}{\partial {a}_j}\right]\\ {}\frac{\partial logL}{\partial {b}_j}=\sum \limits_{i=1}^n\left[{y}_i\frac{\partial {Y}_i}{\partial {b}_j}-\frac{1}{1-{p}_i}\frac{\partial {p}_i}{\partial {b}_j}\right]\end{array}\right. \) for all j = 0, 1, …, p. In addition, the partial derivatives of \( \frac{\partial {Y}_i}{\partial {a}_j} \) and \( \frac{\partial {Y}_i}{\partial {b}_j} \) in the gradients can be derived from Eq. (16.12) as \( \frac{\partial {Y}_i}{\partial {a}_j}=-\frac{x_{ji}}{\beta_i-3{Y}_i^2} \) and \( \frac{\partial {Y}_i}{\partial {b}_j}=-\frac{x_{ji}{Y}_i}{\beta_i-3{Y}_i^2} \). Also the partial derivatives of \( \frac{\partial {p}_i}{\partial {a}_j} \) and \( \frac{\partial {p}_i}{\partial {b}_j} \) in the gradients can be derived from Eq. (16.11) as \( \frac{\partial {p}_i}{\partial {a}_j}={p}_i\left(1-{p}_i\right) \) \( \frac{\partial {Y}_i}{\partial {a}_j} \) and \( \frac{\partial {p}_i}{\partial {b}_j}={p}_i\left(1-{p}_i\right) \) \( \frac{\partial {Y}_i}{\partial {b}_j} \).

Equation (16.16) is highly complicated and it’s obvious that there are no analytical solutions to solve the 2p + 2 gradients from Eq. (16.16) to estimate the 2p + 2 parameters of a = (a 0, a 1, …, a p) and b = (b 0, b 1, …, b p). Therefore, a numerical iterative search algorithm has to be used to obtain the parameter estimators from Eq. (16.16). We make use of Newton’s method (Nocedal & Wright, 1999) to solve Eq. (16.8) iteratively using following iterative scheme with a large number of iterations of s = 1,…,S (i.e. S > 1000):

$$ \left(\begin{array}{c}{a}^{\left(s+1\right)}\\ {}{b}^{\left(s+1\right)}\end{array}\right)=\left(\begin{array}{c}{a}^{(s)}\\ {}{b}^{(s)}\end{array}\right)-{\left(\begin{array}{c}\frac{\partial^2 logL}{\partial {a}^2},\frac{\partial^2 logL}{\partial a\partial b}\\ {}\frac{\partial^2 logL}{\partial a\partial b},\frac{\partial^2 logL}{\partial {b}^2}\end{array}\right)}_{\left(\begin{array}{c}{a}^{(s)}\\ {}{b}^{(s)}\end{array}\right)}^{-1}{\left(\begin{array}{c}\frac{\partial logL}{\partial a}\\ {}\frac{\partial logL}{\partial b}\end{array}\right)}_{\left(\begin{array}{c}{a}^{(s)}\\ {}{b}^{(s)}\end{array}\right)} $$
(16.17)

Note that in the right side of Eq. (16.17), \( \left(\begin{array}{c}\frac{\partial logL}{\partial a}\\ {}\frac{\partial logL}{\partial b}\end{array}\right) \) and \( \left(\begin{array}{c}\frac{\partial^2 logL}{\partial {a}^2},\frac{\partial^2 logL}{\partial a\partial b}\\ {}\frac{\partial^2 logL}{\partial a\partial b},\frac{\partial^2 logL}{\partial {b}^2}\end{array}\right) \) are the gradient vector in Eq. (16.16) and the Hessian matrix evaluated at the sth iteration of the parameters \( \left(\begin{array}{c}{a}^{(s)}\\ {}{b}^{(s)}\end{array}\right) \). The Hessian matrix is a (2p + 2) × (2p + 2) matrix with its elements of the associated second derivatives. Specifically, in the Hessian matrix,

  • The upper-left matrix \( \frac{\partial^2 logL}{\partial {a}^2} \) is a (p + 1) × (p + 1) matrix with diagonal elements as \( \frac{\partial^2 logL}{\partial {a}_j^2}=\sum \limits_{i=1}^n\left[{y}_i\frac{\partial^2{Y}_i}{\partial {a}_j^2}-\frac{1}{1-{p}_i}\frac{\partial^2{p}_i}{\partial {a}_j^2}+\frac{1}{{\left(1-{p}_i\right)}^2}{\left(\frac{\partial {p}_i}{\partial {a}_j}\right)}^2\right] \) for all j = 1, …, p, and the off-diagonal elements as \( \frac{\partial^2 logL}{\partial {a}_j\partial {a}_k}=\sum \limits_{i=1}^n\left[{y}_i\frac{\partial^2{Y}_i}{\partial {a}_j\partial {a}_k}-\frac{1}{1-{p}_i}\frac{\partial^2{p}_i}{\partial {a}_j\partial {a}_k}\right.\break \left.+\frac{1}{{\left(1-{p}_i\right)}^2}\frac{\partial {p}_i}{\partial {a}_j}\frac{\partial {p}_i}{\partial {a}_k}\right] \) for all j, k = 1, …, p and j ≠ k.

  • The upper-right matrix \( \frac{\partial^2 logL}{\partial a\partial b} \) is the same as the lower-left matrix which is a (p + 1) × (p + 1) matrix with elements as \( \frac{\partial^2 logL}{\partial {a}_j\partial {b}_k}=\sum \limits_{i=1}^n\left[{y}_i\frac{\partial^2{Y}_i}{\partial {a}_j\partial {b}_k}-\frac{1}{1-{p}_i}\frac{\partial^2{p}_i}{\partial {a}_j\partial {b}_k}\right.\break \left.+ \frac{1}{{\left(1-{p}_i\right)}^2}\frac{\partial {p}_i}{\partial {a}_j}\frac{\partial {p}_i}{\partial {b}_k}\right]\) for all j, k = 1, …, p.

  • The lower-right matrix \( \frac{\partial^2 logL}{\partial {b}^2} \) is a (p + 1) × (p + 1) matrix with diagonal elements as \( \frac{\partial^2 logL}{\partial {b}_j^2}=\sum \limits_{i=1}^n\left[{y}_i\frac{\partial^2{Y}_i}{\partial {b}_j^2}-\frac{1}{1-{p}_i}\frac{\partial^2{p}_i}{\partial {b}_j^2}+\frac{1}{{\left(1-{p}_i\right)}^2}{\left(\frac{\partial {p}_i}{\partial {b}_j}\right)}^2\right] \) for all j = 1, …, p, and the off-diagonal elements as \( \frac{\partial^2 logL}{\partial {b}_j\partial {b}_k}=\sum \limits_{i=1}^n\left[{y}_i\frac{\partial^2{Y}_i}{\partial {b}_j\partial {b}_k}-\frac{1}{1-{p}_i}\frac{\partial^2{p}_i}{\partial {b}_j\partial {b}_k}\right.\break \left.+\frac{1}{{\left(1-{p}_i\right)}^2}\frac{\partial {p}_i}{\partial {b}_j}\frac{\partial {p}_i}{\partial {b}_k}\right] \) for all j, k = 1, …, p and j ≠ k.

  • In addition, all the second-order derivatives \( \frac{\partial^2{Y}_i}{\partial {a}_j^2} \), \( \frac{\partial^2{Y}_i}{\partial {a}_j\partial {a}_k\ } \), \( \frac{\partial^2{Y}_i}{\partial {b}_j^2} \), \( \frac{\partial^2{Y}_i}{\partial {b}_j\partial {b}_k} \), \( \frac{\partial^2{p}_i}{\partial {a}_j^2} \), \( \frac{\partial^2{p}_i}{\partial {a}_j\partial {a}_k} \), \( \frac{\partial^2{p}_i}{\partial {b}_j^2} \) and \( \frac{\partial^2{p}_i}{\partial {b}_j\partial {b}_k} \) in the above calculations of Hessian matrix can be similarly obtained using the first-order derivatives from the calculations in Eq. (16.16).

We name the above estimation process as “LogisticCusp” with respect to the “RegCusp” in Chen and Chen (2017).

5 Test the Logistic Cusp Catastrophe Model Through Monte-Carlo Simualtion

As the first step to examine the logistic cusp regression method described above, we conducted Monte Carlo simulation studies with known parameters.

5.1 Model Settings for Simulation

To conduct Monte Carlo simulation, surrogate data are generated using Eqs. (16.10) to (16.12) with the number of observations n = 300. Two (i.e., p = 3) independent variables x 1 and x 2 are simulated independently from the standard normal distribution.

To test whether the novel model can correctly distinguish and determine the model variables, we make use of the true parameters of a = (2, 2, 0), b = (2, 0, 2) from Eqs. (16.3) and (16.4) where a 2 = 0 in Eq. (16.3) to represent the correct model selection of x 1 from Eq. (16.3) and b 1 = 0 to represent the correct model selection of x 2 from Eq. (16.4).

5.2 Steps of Simulation Study

The simulation is an iterative process, and it was completed in the following seven consecutive steps:

  • Step 1: With n = 300, simulate x 1 and x2 from the standard normal distribution;

  • Step 2: With the true parameters a = (2, 2, 0) and b = (2, 0, 2) and the x 1 and x2 from Step 1, calculate α i and β i from Eqs. (16.7) and (16.8);

  • Step 3: With the α i and β i from Step 2, solve Eq. (16.12) to obtain Y i and select the one root corresponding to the Maxwell convention, or the minimum of the associated potential function V(Y i, α i, β i);

  • Step 4: With the selected Yi from Step 3, generate the outcome variable yi using Eq. (16.10);

  • Step 5: Using the data generated from Steps 1 through 4, the objective function can be formed to estimate the parameters a and b based on Eq. (16.13) using maximum likelihood estimation.

  • Step 6: Repeated Steps 1 to 5 for a large number of simulations (we used 5000 times) and record the estimated parameters

Following the steps described above, we first investigated the default Nelder and Mead optimization and we found that the estimation from the maximum likelihood is unbiased, but lack of efficiency of the Fisher information matrix for variance estimation. We further investigated the gradients and Hessian matrix from Eqs. (16.16) and (16.17) with the quasi-Newton (BFGS) and we found that BFGS produced very satisfactory variance estimation. As a routine, we run the simulation for 100,000 times to obtain the modeling results.

5.3 Results and Interpretation

Table 16.1 summarizes the main results from the simulation analysis. It can be seen from Table 16.1 that the parameters are estimated unbiased (i.e., the “Mean” and “Median” of the 100,000 estimated parameters are close to the “True” values) and the empirical coverage probabilities (ECP) are very reasonable with more than 70%. We also investigated this BFGS estimation with 200,000 simulations, similar conclusions are found.

Table 16.1 Summary of the result for BFGS from 100,000 simulations

Results from the simulation studies indicate that the LogisticCusp performed quit well to estimate the known parameters of as and bs for the asymmetry and the bifurcation control variables, including the intercept and the slope with small differences between the known values and estimates. For example, the true value for b 1 is 2.0000, and the mean estimate is 2.0370.

6 Modeling Analysis with Real Data: Binge Drinking

We have known from the above simulation studies that the logistic cusp catastrophe regression works well. To further demonstrate the utility of the newly established method, we analyze real data using the logistic cusp regression method that validated from the Monte-Carlo simulations.

6.1 Data Sources and Variables

Data used for empirical testing were 1122 youth lifetime drinkers derived from the 2015 Monitoring the Future Study: A Continuing Study of American Youth (12th-Grade Survey) (ICPSR 36408, URL: https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/36408). Of the total sample, 48.6% were male, and 50.1% were White and 24.4% were Black, 39.8% less than 18 years of age and 60.2% were older than 18. The response variable in this study is the number of drinks (denoted by “y”) in binge drinking. Based on self-reported data, 848 (75.6%) did not engage in binge drinking in the past month, 130 (11.6%) engaged once, 72 (6.4%) engaged twice and 72 (6.4%) engaged in three or more times. A binary variable if binge drinking (y/n) (denoted by “y 2”) was created for modeling with participants who engaged in binge drinking at least once in the past month as yes; otherwise no.

Perception of alcohol harm was modeled as the asymmetry variable (denoted by “x 1”). The variable was measured using responses to the question: “How much do you think people risk harming themselves (physically or in other ways), if they: (1) Take one or two drinks nearly every day? (2) Take four or five drinks nearly every day? (3) Have five or more drinks once or twice each weekend? Answer options to these questions were: 0 (no risk), 1 (slight risk), 2 (moderate risk), 3 (great risk). Items were reverse coded and mean scores (range: 0–3) were computed for analysis such that 0 (most risk or highest level of harm) and 3 (least risk or lowest level of harm). This measure was used in MTF’s research (Johnston, O’Malley, Miech, Bachman, & Schulenberg, 2017) and reported studies indicate perceived harm is a significant predictor of alcohol use in adolescents (Pedersen, Fjaer, & Gray, 2016).

Frequency of drinking in social settings was modeled as the bifurcation variable (denoted by “x 2”) based on the responses to the question: “When you used alcohol during the last year, how often did you use it in each of the following situations?” (1) With 1 or 2 other people; and (2) at a party. Answer options to the questions were 0 (not at all), 1 (few times), 2 (sometimes), 3 (most times), and 4 (every time). The highest frequency (range: 0–4) at either of the two settings was used for modeling analysis. Social setting has been reported as an influential factor for alcohol use in high school and college students (Weitzman, Nelson, & Wechsler, 2003).

6.2 Modeling Analysis

Modeling analysis was conducted using the R program we developed and used in the simulation studies presented in Sect. 4. For comparison purposes, we analyzed the same data with Cobb-Grasman’s SDECusp and Chen-Chen’s RegCusp. In the modeling analysis the asymmetry variable is the perceive alcohol as less risk, and the bifurcation variable is the social setting for drinking. We consider two types of outcome variable of binge drinking, y, as continuous variable and y 2, as the binary variable. Using continuous outcome, y, we can fit the typical multiple linear regression (“Linear Regression”), the stochastic cusp catastrophe model (“SDECusp”) and the regression cusp (“RegCusp”) catastrophe model. With the binary outcome, y 2, we can fit the LogisticCusp model in this chapter.

6.3 Parameter Estimates and Comparison

Results in Table 16.2 summarizes the parameter estimates and their associated standard errors with standardized data on y, x 1 and x 2. Parameter estimates from the linear regarrison and the three cusp catastrophe modeling methods are all statistically highly statistically significant at p-value <0.001, except the a 0 from the RegCusp that is not (p > 0.05).

Table 16.2 Results from linear and 3 cusp regression modeling methods

6.4 Comparison of the Estimated Cusp Regions

With SDECusp, RegCusp and LogisticCusp models, we can estimate the cusp point in the cusp region as denoted by point O in Fig. 16.1. This can be done by setting the estimated α and β in Eqs. (16.7) and (16.8) to be zero and solving for the corresponding values of x 1 and x 2 which would be the estimated cusp point as described in Section “Cusp Region Estimation”. As seen in Table 16.2, the cusp point is estimated at (−1.545, 15.950) for SDECusp catastrophe model which is out of data region. The estimated cusp point using RegCusp and BinaryCusp are (1.082, 2.483) and (0.996, 1.982), respectively. The cusp point estimated using these two method are reasonable compared to the cusp point estimated with the SDECusp method. The estimated cusp point from the SDECusp was far off the data range of the two predictor variables with x 1 ranging from 0 to 3 and x 2 from 0 to 4. According to the cusp point estimated with the RegCusp, sudden changes in binge drink behavior would occur only when x 1, the perceived alcohol harm was slightly greater than 1 (somewhat harmful); and x 2, the frequency of drinking in social settings was about in the middle between 2 (sometimes) and 3 (most times). If results from LogisticCusp is used, the values in the two control variables reduced a bit. Sudden changes in binge drinking would occur when x 1 is approaching 1.0 (perceive alcohol use as “somewhat harmful) and x 2 is approaching 2 (sometimes drinking in social settings). In another word, with a binary outcome, the estimated sudden change becomes more sensitive than with a continuous outcome.

Figure 16.2 graphically illustrates the estimated cusp points and the associated cusp regions for both RegCusp and LogisticCusp models. As seen in Fig. 16.2, the dashed lines are for RegCusp model where the estimated cusp point is at ((1.082, 2.483) and the solid lines are for LogisticCusp model where the estimated cusp point is at (0.996, 1.982).

Fig. 16.2
figure 2

Estimated cusp point along with cusp region for both RegCusp (black line) and LogisticCusp (red line) models

7 Discussion and Conclusions

In this chapter, we report our research in successfully establishing the LogisticCusp method for modeling binary outcome variables. The method is grounded on the well-established logistic regression to solve for high-order cusp catastrophe models. The innovative use of a latent binary variable creates a mathematical bridge linking the deterministic cusp catastrophe with a statistical logistic regression. By application of the log likelihood method and numerical search approach with either Maxell or delayed convention, unbiased parameter estimates can be obtained; and by application of the bootstrapping, correct model variances can also be estimated. In addition to validation through simulation, we empirically test the method using data from national probability sample of youth with a binary variable for binge drinking. Binary variables are more common than continuous variables in research. The LogisticCusp provide a new and only tool, the first time for researchers to examine challenge questions with binary outcome variables. Binary variables are widely used by researchers in almost all scientific fields in addition to life sciences, psychology and behavioral studies.

There are several advantages with the LogisticCusp method we developed. First, all binary variables suitable for logistic regression can be used for cusp catastrophe modeling to nonlinearity and discreteness of a phenomenon. Second, both the asymmetry and bifurcation variables in a logistic cusp regression can be modeled as either a single or multi-variate variable, greatly enhancing the flexibility for modeling analysis. Research from this and previous analysis (Chen et al., 2019) also indicate adequate validity of the estimated the cusp point, and the corresponding cusp region and the two threshold lines with the LogisticCusp method. In addition to assessing the validity of the estimated parameters, determination of the threshold lines provide important data guiding practice to avoid sudden changes moving toward unfavorable outcomes and to promote sudden changes leading to favorable outcomes. Third, as in other method, R 2 or the variances explained by a cusp model can also be estimated as in the traditional regression analysis, facilitating model comparisons to help determine whether a study variable is nonlinear discrete or linear and continuous. Last, the method can be executed in R, free of charge.

There are a couple of limitations to the LogisticCusp method. Like many statistical methods with numerical search for parameter solutions, the LogisticCusp method is sensitive to initial values. Several measures can be used to help determine initial values: a) Generate initial values using parameter estimates for the same data but using other methods such as linear regression, logistic regression, RegCusp, and SDECusp. (b) Check if the estimated cusp point, cusp region and the two threshold lines are within the data range with a meaningful interpretation. Another limitation is the variance estimation. Like in RegCusp, the estimated variances tended to be too small for LogisticCusp. Despite that the bootstrapping provides as a remedy to this issue, we will conduct further research to understand this issue.

Despite these limitations, the establishment of the regression-based approach, including the LogisticCusp in this study and the RegCusp in our previous studies (Chen & Chen, 2017; Chen, Chen, & Zhang, 2016) provide an innovative and highly needed approach for researchers to solve for a deterministic cusp catastrophe model with a statistical method capable of handling sampling and measurement errors. In addition, the accurate estimation of the cusp point, cusp research and the threshold lines advanced cusp catastrophe modeling from qualitatively detecting the cusp to quantitatively describing cusp catastrophe. It is our anticipation that the application of the regression-based cusp catastrophe modeling methods we established will provide a set of great analytics to advance medical, health, social and behavioral studies.