Introduction

Sexual risk prevention during adolescence focuses on a number of behavioral targets, including delaying onset of sexual activity (Tortolero et al., 2010), reducing the number of condomless sex acts (Coyle et al., 2006), reducing the number of partners (Jemmott et al., 2010), and increasing condom use (DiClemente et al., 2009). Some programs focus on a specific behaviors such as condom use and abstinence (Boekeloo et al., 1999), while others focus on developmental and contextual processes that influence sexual risk behavior such as emotion regulation (Raffaelli & Crockett, 2003), parent–adolescent communication (Huebner & Howell, 2003), parental monitoring (Huebner & Howell, 2003; Li et al., 2000), and social networks and interpersonal connectedness (Markham et al., 2010). Programs that focus on these important but more distal processes likely impact more than one behavioral target of sexual risk. This impact, however, may be diffuse and may not be apparent when considering a single sexual risk behavior. These important, but diffuse, effects may only be detected when all behaviors are considered (Brown et al., 2011). While there has been guidance on how to assess sexual risk behavior (Dolezal et al., 2012; Fenton, 2001; Fonner et al., 2013; Graham et al., 2005; Schroder et al., 2003a, b), there has been limited guidance on how to integrate information across multiple adolescent risk behaviors. As a result, investigators have used multiple approaches complicating the accumulation of knowledge across studies (Mullen et al., 2002). The intent of this article is to discuss the challenges of integrating information across multiple adolescent sexual risk behaviors and examine how different approaches to forming composites influence conclusions drawn from randomized trials.

Adolescent sexual risk prevention trials typically measure specific risk behaviors and, to a lesser extent, the interpersonal and intrapersonal processes that influence the behaviors (e.g., beliefs and cognitions, emotion regulation, family processes, peer process, partner characteristics). Specific behaviors commonly assessed in adolescent prevention trials that are linked to increased HIV risk include age of sexual debut (Falasinnu et al., 2015; Heywood et al., 2015; Slater and Robinson, 2014), recency of sexual activity (Lightfoot, 2012), number of partners (Ashenhurst et al., 2017; Pequegnat et al., 2016), number of sexual acts, and number of condomless acts (Fonner et al., 2013). Other important, but less frequently assessed, behaviors include partner concurrency (Ashenhurst et al., 2017; Le Pont et al., 2003) and alcohol or other drug use before sex. Because there is more consistency among studies in which sexual risk behaviors are included than in the underlying interpersonal and intrapersonal processes contributing to risk, we will focus on quantifying the constellation of sexual behaviors that place individuals at risk of contracting HIV.

Assessing and Summarizing Sexual Risk During Adolescence

Adolescent sexual risk behavior differs from adult risk behavior. For example, adolescent relationships are not as stable as adult relationships (Carver et al., 2003; Connolly & McIsaac, 2009). Compared to adults, adolescents report lower rates of sexual behavior (Herbenick et al., 2010) and higher rates of condom use (Reece et al., 2010) but also have higher rates of sexually transmitted infections (STIs) (Slater & Robinson, 2014). Examining sexual risk behaviors among adolescents presents unique challenges to the assessment of these behaviors as well as the operational definition of a composite index that combines information across multiple risk behaviors. These challenges will be discussed below.

Assessment of Sexual Risk Behaviors

Regardless of the population, researchers studying sexual risk behavior make several important decisions in how to assess behavior, including whether to ask about global estimates of sexual behaviors across a specified recall period (e.g., number of partners, number of sexual encounters), whether to ask about event-level behaviors (e.g., condom use at last sex, alcohol or other drug use before last sex), whether to use categorical response options or ask for open-ended reports of frequency, and how much structure should be provided to assist with recall (e.g., ask questions partner-by-partner, timeline followback, computer-assisted self-interview). There have been a number of quality reviews that have addressed these measurement considerations (Dolezal et al., 2012; Fenton, 2001; Fonner et al., 2013; Graham et al., 2005; Schroder et al., 2003a, b; Scott-Sheldon et al., 2010), and it is not our intent to repeat these recommendations other than to highlight a couple of considerations specific to adolescent populations.

Recall and Numeracy Skills

Compared to adults, adolescents are not fully developed in terms of numeracy (e.g., estimation, casting, counting) or recall (Fenton, 2001; McAuliffe et al., 2010; Napper et al., 2010; Scott-Sheldon et al., 2010). They are also more sensitive to social norms and perceived expectations (Blakemore, 2008; Somerville, 2013). Together, these factors may lead to exaggerated or inaccurate estimates of sexual behavior when they are asked open-ended questions about sexual behavior. Moreover, the amount of error increases as a function of the number of events reported because recall is often more challenging as participants may use different cognitive strategies to report frequent (estimation, casting) versus infrequent (counting) events (McAuliffe et al., 2010). Thus, adolescents engaging in fewer behaviors provide more accurate reports of their behaviors than those who engage in more frequent behaviors. For these reasons, it is suggested that sexual risk assessments among adolescents include clear, familiar, and nonjudgmental language and that the recall period be no greater than 3 months (Scott-Sheldon et al., 2010). Providing additional scaffolding, such as asking partner-by-partner, timeline followback, or providing structured response options, will help with accuracy of the recall (Crosby et al., 1996; Weinhardt et al., 1998). Structured response options convey information about expectations and norms (Fenton, 2001), and care is required to ensure that the choice of response options appropriately reflect the expected rate of behavior in the population being assessed so not to suggest expected or normative responses. Ultimately, response options should be piloted and evaluated using cognitive interview strategies to help ensure that the structure of the question is not unduly influencing responses (Webb et al., 2015).

Fluidity and Sparsity of Adolescent Relationships

The fluidity of adolescent relationships and relatively low frequency of sexual partnerships are additional challenges to assessing change in sexual risk behavior during adolescence (Manning et al., 2014). The fluidity of adolescent relationships complicates attempts to categorize partner status (e.g., steady, friends with benefits, hookups, acquaintance). Therefore, instead of asking adolescents to classify their relationships, it is advisable to directly assess pertinent partner characteristics (e.g., duration, quality, time spent with partner). Beyond the fluidity of adolescent relationships, the low frequency of sexual partnerships can be challenging to analyze and can obscure the benefit of an intervention approach. Even among high-risk samples, a significant number of adolescents will not engage in sexual behavior during a study’s follow-up period and others will have extended periods of time without a sexual partner. The meaning behind the partnerless periods is not always clear. It may be that an intervention enabled some adolescents to choose not to partner or enabled them to partner but not engage in sexual behavior. Alternatively, some adolescents may not have changed their propensity for engaging in sexual risk behavior but may not have had the opportunity to partner during the observation period. The first two explanations would lend support to the efficacy of the intervention while the third would not. Asking solely about sexual partners does not allow researchers to differentiate among such possibilities. Measuring additional contextual information such as the number of nonsexual romantic relationships or about declining or avoiding sexual opportunities may help provide the needed contextual information to interpret the meaning of partnerless periods.

Sexual Risk Composites

Appropriately assessing sexual risk behaviors during adolescence helps improve the quality of the collected data as do choices about summarizing information across multiple types of sexual risk behavior. In the broader adolescent and adult literature, several approaches have been applied to quantifying the constellation of sexual risk behaviors. This variety is partially related to the challenges of defining a measurement model for adolescent sexual risk behavior. During adolescence, sexual behaviors are determined by a myriad of intrapersonal, interpersonal and contextual factors (Cooper, 2010). These factors differ depending on the decision being made by the adolescent and one decision may influence another. For example, the determinants driving the decision to engage in sex may differ from those driving the decision to use a condom or have multiple partners. Research also suggests complex interactions among gender, partner characteristics, relationship context, and condom use (Lescano et al., 2006; Senn et al., 2014; Staras et al., 2013). Differential determinants suggest that sexual behaviors may not be strongly related to one another when other important drivers and dynamics are not taken into account. In addition to having multiple differential determinants, sexual behaviors also relate differentially to health outcomes. For example, the behaviors that place an adolescent at risk for pregnancy are not identical to those that place them at risk for acquiring HIV. These issues pose significant challenges to fitting latent variable measurement models where a latent construct is thought to give rise to the observed indicators. Such models require that the indicators be correlated and assume that the correlation is due to a common construct (Borsboom et al., 2003; Edwards, 2011). The low correlations among sexual behaviors may be why researchers attempting to fit such models tend to expand the indicators beyond behavior to include intentions and attitudes (Siegel et al., 2001). The limitations of conventional measurement models have resulted in researchers utilizing composites where behaviors are not presumed to be correlated or related (Bollen & Bauldry, 2011), or pre-defined categories of risk (e.g., low, medium, high). Some of the composite weights or category definitions are defined a priori while others are sample-dependent. Moreover, the resulting composite can be analyzed as a continuous variable or as discrete categories. These advantages and disadvantages of each approach as well as examples from the literature are listed in Table 1.

Table 1 Organization of sexual risk composite scores

Using a priori weights and definitions are determined either by investigators or through validation of composites in samples other than the one being analyzed. Because using a priori definitions do not require advanced statistical models, they are the most commonly used approach to creating sexual risk composites. Unfortunately, there is little consistency in the definitions and limited work on developing and validating a unifying definition (Webb et al., 2015). In adolescent sexual risk behavior, composites are typically created by investigators using coding definitions and thresholds (e.g., two or more partners, any condomless sex) to create ordered categories (Bowleg et al., 2014; Brown et al., 2011; Epstein et al., 2014; Graham et al., 2013; Murphy et al., 2009). Definitions vary in the sexual behaviors that are included (e.g., number of partners, number of sexual acts, number of condomless acts, sex under the influence of alcohol or other drugs, participation in the sex trade, oral, anal, or vaginal sex) and in the thresholds that are used to define risk. This variability presents researchers with many options and the temptation to tweak the composite to provide results consistent with their expectations.

Compared with investigator-determined definitions, there are very few examples of investigators-using weights that were externally validated in other samples. One example of such a composite is the Vaginal Episode Equivalent Index (VEE; Susser et al., 1998). The VEE is a weighted composite of condomless oral, vaginal, and anal sex acts with weights calibrated to the relative risk of HIV transmission. Unfortunately, the VEE does not include other sexual risk behaviors important to adolescent sexual risk prevention such as multiple concurrent partners and substance use prior to sex. To our knowledge, there is no consistently used, empirically derived approach that captures all the sexual risk behaviors that are targeted by adolescent sexual risk prevention programs. Using a consistent composite index that captures these behaviors could help improve interpretability across studies and populations, but only if the composite values hold the same meaning from one population to the next, which requires replication and cross-validation.

An alternative to using a priori definitions and weights is applying statistical techniques to generate weights or thresholds from the analytic sample. There are numerous strategies for creating sample-dependent weights including ad hoc definitions such as standardizing and then summing rates of behaviors (Fergus et al., 2007; Huang et al., 2012; Wilson & Widom, 2011), unsupervised classification or data reduction algorithms (e.g., latent class analysis, random forest, k-means, latent variable models, principal components analysis), and supervised prediction models where the goal is to predict a particular outcome such as contracting HIV or other STIs (e.g., support vectors, regularized regression, random forest; (Kuhn & Johnson, 2013). Using these approaches can help maximize information in data from any given sample and thus help identify novel and potentially important patterns that may be missed by a priori definitions. Data-driven approaches, however, are also prone to capitalizing on idiosyncrasies of the data to which they are fit, which, if not addressed, can limit generalizability. Protecting against overfitting typically requires cross-validation within a sample, replication within a population, and careful extension to new populations (Kuhn & Johnson, 2013). The time, energy, and sample size requirements for careful and accurate generalization are prohibitive and, thus, often not completed. Consequently, there are very few data-driven or empirically based composites of sexual risk behavior that have been created and validated for use across samples.

There is no consistency in how the various composites are analyzed, with some investigators treating the score as a continuous measure, while others treat the score as categorical. Treating the composite as a continuous risk score assumes a single dimension of sexual risk, which simplifies data summary and analyses. However, a single dimension also obscures the meaning of the composite score due to the potential for different factors predicting the different behaviors that comprise the composite. For example, predictors for higher risk behaviors, such as condomless sex with multiple partners, may differ from factors that predict lower risk behaviors such as decision to have sex or condom-protected sex. These challenges are similar to those of applying latent variable measurement models to sexual risk behavior. While the composite score does not require correlation among indicators, the resulting index lacks an intuitive interpretation, which makes it difficult to discern clinically meaningful change (Denison et al., 2008; Noar, 2008).

Composites can also be treated as categorical. Well-validated classification models provide face-valid, often intuitive, groupings of individuals (e.g., high, moderate, low risk), which facilitates communication of findings to clinicians and policy makers. The intuitive meaning of a well-developed classification comes from highlighting large and typically clinically meaningful transitions between patterns or classes of behavior. For example, transitioning from engaging in condomless sex with multiple partners to consistent condom use with a single partner represents a marked reduction in risk of HIV infection, whereas the benefits of reducing the monthly number of condomless acts from three to two are less clear. However, small changes in risk behaviors may be important and categorical definitions are less sensitive to potentially meaningful change within each class. Treating composites as categorical also introduces additional analytic complexity when examining transitions among multiple classes. To make such analyses interpretable, researchers will often reduce the number of categories to a binary (risk vs. no risk) or focus on transitions that have theoretical relevance such as moves from high-risk to lower risk categories.

Worked Examples

To further illustrate the use of various approaches to calculating a sexual risk composite and to better understand the influences of different approaches on substantive conclusions, we attempted to fit a conventional measurement model with a single continuous latent variable along with three approaches to constructing sexual risk composites using pooled data from four adolescent risk prevention trials for adolescents dealing with mental health challenges.

Method

We used data from four clinical trials of HIV prevention programs designed for adolescents with mental health concerns (R01NR011906; R01MH066641; R01MH63008; and R01MH61149). These programs involved three general approaches to reducing risk: (1) improving HIV prevention skills (condom use, partner negotiation), (2) improving emotion regulation, and (3) improving parent–adolescent communication and parental monitoring. Each trial also included a health promotion control condition that was matched for time and attention to the respective active treatment conditions. The combined dataset included 1735 participants, 1322 of whom reported complete sexual risk data at both baseline and extended follow-up assessment (9–12 months). Demographics for the sample by study are presented in Table 2. Additional details about the studies are included in “Appendix 1” section. We focus our analyses on how findings differ depending on which risk composite is used. Therefore, subsequent analyses used only complete cases (n = 1322) and pooled all active treatment conditions.

Table 2 Baseline characteristics by study

Measures

All studies used audio computer-assisted self-interviews to assess adolescents’ sexual behaviors. Two of the studies used a recall period of 6 months, and two used a recall period of 3 months. All studies included questions for the following sexual behaviors: (1) ever engaged in vaginal or anal sex, (2) recently engaged in vaginal or anal sex (i.e., during the recall period), (3) number of recent sexual partners, (4) number of total vaginal and/or anal sex acts, and (5) number of condomless acts. The behavioral counts were transformed into rates across 3 months to account of differences in recall periods among studies. Behavioral count questions were open responses. As mentioned previously, adolescents’ developing sense of numeracy along with their sensitivity to social desirability may lead to inaccurate and inflated estimates of their behavior. Evidence of potential inflation was seen in the combined dataset where 22 (2%) adolescents reported rates equivalent to one or more sexual events per day in the 3-month period and 23 (2%) reported rates equivalent to more than 23 partners per year. Although such rates are possible, they most likely reflect a significant overestimation of the behavior. To limit the influence of inflated estimates, open-ended responses were transformed into ordered categorical variables using 1, 2, 5, and 8 as cut points for number of partners and 1, 2, 5, 10, 20, and 30 as cut points for both number of protected acts and number of condomless acts. Each of the four trials also assessed functional impairment due to psychiatric symptoms using the Columbia Impairment Scale (CIS; Bird et al., 1993), with 98% of included participants completing the measure at baseline.

Formative Measurement Model

A latent measurement model with ordinal indicators was fit using number of partners, number of protected acts, and number of condomless acts using Mplus 7.3. The model fitting process was complicated by the zero-inflation in the sexual risk variables that resulted in several ill-behaved models. We first attempted to fit the model with a robust weighted least square estimator (WLSMV). This model produced a nonpositive definite residual covariance matrix, a problem that persisted after simplifying the model to assume equal spacing between categories. We also attempted using a robust maximum likelihood estimator (MLR) with a logit link function, as well as only running the models using participants who were recently active using both WLSMV and MLR estimators, all with similar difficulties. Finally, we ran the model using a Bayesian framework with semi-informative priors. These models showed poor mixing of the MCMC chains. There may be an analytic approach that would address these challenges, but many readily available tools for a latent measurement model were not successful. It appears that using number of partners, number of protected acts, and number of condomless acts did not provide sufficient information to easily fit a latent measurement model. Indeed, studies that have employed measurement models to summarize sexual risk have used many more behaviors (e.g., carrying a condom, pregnancy, unwanted sex, HIV testing) and have included aspects of the intrapersonal and interpersonal processes driving sexual behavior such as intentions or partner communication (Siegel et al., 2001). Because we were not able to generate a reliable model, we did not include results from this measurement model in subsequent analyses.

Risk Composites

We selected three approaches to forming composites from the literature. Each composite used the same variables as the formative measurement model. The first composite (C1) used procedures similar to those used in previous studies that form a continuous risk score (Fergus et al., 2007; Huang et al., 2012; Wilson & Widom, 2011), where each behavioral count was z-scored using the grand mean and SD from the entire dataset (baseline and extended follow-up), thus preserving differences among assessments, treatment conditions, and studies. The resulting z-scores were then summed to form a composite score for each participant at each assessment. The second composite (C2) classified behavior into the following categories: never engaged in vaginal or anal sex (0), no recent sex (1), only one partner and no condomless acts (2), either multiple partners or any condomless acts (3), and both multiple partners and any condomless acts (4). This approach was also similar to previous studies (Bowleg et al., 2014; Epstein et al., 2014; Graham et al., 2013; Murphy et al., 2009).

Participants were assigned a category for each assessment. The third approach (C3) was to use a mixture model to define classes of participants based on the three ordinal variables used in creating the previous composites. Models were fit using Mplus 7.3. Class enumeration proceeded by fitting ten models while increasing the number of classes from 1 to 10. Each model was estimated using the MLR estimator with a logit link function. Both baseline and extended follow-up assessments were included in the class enumeration process. Nesting of assessment within participant was accounted for using the TYPE = COMPLEX utility. When using TYPE = COMPLEX, only the Bayesian information criterion (BIC) is valid when comparing models. The BIC values indicated that the three- and four-class solutions were similar and both outperformed the other models. The three-class solution identified a class with no recent sex and two sexually active classes that differed primarily in terms of the amount of condomless sex. The four-class solution also produced a class with no recent sex (No Sex) but separated the sexually active participants into three classes: one with higher numbers of partners, condomless sex, and protected sex (Prtnshigh/Sexhigh), one with low number of partners and high amounts of condomless sex (Prtnslow/ClSexhigh), and one with low number of partners and low amounts of condomless sex (Prtnslow/ClSexlow). Profiles for the three- and four-class solutions along with class enumeration statistics are listed in “Appendix 2” section. We retained the four-class solution because it provided a more nuanced description of number of partners and condomless sex.

Results

Comparing Risk Composites

The relationship among the three risk composites is depicted in Fig. 1. The association between the two categorical approaches (C2 and C3) was moderate (Cramer’s V = 0.65) when considering the entire sample and somewhat smaller (Cramer’s V = 0.37) when considering just the recently active. If it was assumed that the categorical composites are ordered, their associations with the continuously scaled composite (C1) were strong for the full sample (Spearman’s rho = 0.99 and 0.86 for C2 and C3, respectively) and somewhat less when only considering those who are sexually active (Spearman’s rho = 0.61 and 0.68). These associations indicate that overall the composites seem to be similar, but there are marked differences in how they classify adolescents who were recently active. It is also clear that the transition into sexual activity has a considerable influence on the relationships among composites.

Fig. 1
figure 1

For the z-score sum composite, behavioral counts were first recoded into categories, z-scored across participants, assessments, intervention, and study, and then summed. Assigned categories were defined a priori. LCA categories were defined using latent class analysis

To examine how findings from observational studies are potentially influenced by the choice of composite, we examined the association between each composite and the CIS at the baseline assessment. Models were fit using Mplus 7.3 with the composite as the dependent variable with CIS and study as independent variables. CIS was standardized using the mean and standard deviation of the total sample. A linear model was fit to the z-scored composite (C1), and the two categorical composites (C2 and C3) were treated as ordered categorical outcomes. Individual study effects were estimated by including the interaction between CIS and study in the model. To examine possible influence on more complex models, we also fit a set of models that included gender and allowed gender to interact with all dependent variables. All models were estimated using MLR. Regression coefficients were standardized using the standard deviation of the outcome or the latent variable underlying the ordered or binary categories.

Standardized regression estimates by study as well as the moderation of female gender on the relationships between sexual risk and CIS are depicted in Fig. 2. Overall, there was a small positive association between baseline impairment and sexual risk behaviors with stronger associations for females versus males. Generally, there were only minor differences in these associations based on which composite was used. One important exception was the use of the z-scored (C1) composite in Study 2, which had very little sexual behavior at baseline due to the younger age of the participants. These results suggest that the choice of composite may not greatly influence the estimation of associations in observational studies, so long as there is a reasonable amount of risk behavior observed in the sample. If risk behavior is low, it is likely that the categorical composites (C2 and C3) better represent the uncertainty around the associations.

Fig. 2
figure 2

Associations between the Columbia Impairment Scale (CIS) and sexual risk composites both within and across studies using data from the baseline assessment of the four clinical trials. The combined estimates included study as a fixed effect. Separate models were run that included female gender as a moderator. Moderation effects represent the difference between female versus male in the association between CIS and the composite measures. Model coefficients were standardized by using the standard deviation of the outcome, or the latent variable underlying the ordered or binary categories

We also examined how choice of composite may influence estimation of treatment effects in clinical trials and on gender as a possible treatment modifier. Each of the composites was defined separately using the baseline and extended follow-up (9–12 months) for all four clinical trials. The extended follow-up was used as the outcome, and each model included baseline, treatment condition, and study. Study-level results were estimated by including the interaction between study and treatment condition. Gender was also included as a potential treatment modifier in a separate set of analyses and allowed to interact with treatment and study. Models were estimated similarly to the CIS analyses. To highlight approaches that treat the categorical composites as nominal data, we included additional definitions for the two categorical composites. Specifically, we classified transitions among classes from baseline to the extended follow-up as follows: (1) reporting low/decreased risk defined as maintaining a low-risk category (i.e., 0, 1 or 2 for C2; or no sex or Prtnslow/ClSexlow for C3) at follow-up or moving down a category from their baseline report, versus (2) reporting high/increased risk defined as a reported high-risk category (i.e., 3 or 4 for C2; or Prtnshigh/Sexhigh or Prtnslow/ClSexhigh for C3) at follow-up or moving up a category from baseline. This definition was analyzed as a binary outcome with a logit link function but did not include baseline in the analytic model. Results are depicted in Fig. 3.

Fig. 3
figure 3

Treatment effects were estimated within each study and across studies using the extended follow-up data, controlling for baseline. The combined estimates included study as a fixed effect. For both categorical composite definitions, we defined change in risk as follows: low/decreased risk = maintained a low-risk category or moved down a category; high/increased risk = reported a high-risk category or moved up a category. Separate models were run that included female gender as a moderator. Moderation effects represent the difference between female versus male in the treatment effect. Model coefficients were standardized by using the standard deviation of the outcome, or the latent variable underlying the ordered or binary categories

None of the composites showed a significant treatment effect; however, the pattern of results differed depending on which composite was used, particularly among the smaller studies, with there being a moderate-to-large difference using the metric of Cohen’s d between the highest and lowest point estimates for Study 1. Moreover, the SEs of the z-scored (C1) composite appeared to vary as a function of the proportion of sexually active participants. For example, only 12% of participants in Study 2 reported sexual activity at the extended follow-up assessment and the confidence intervals for the treatment effects varied considerably among the three composite measures for this study.

The between-composite differences were magnified in the gender analyses with Study 2 showing significant improvement for females versus males on two of the composites (C1 and C2) with large differences (i.e., > 0.5) in the point estimates. There were also large differences among composites for Study 1 and small-to-moderate differences for the remaining two studies and for the combined sample.

Discussion

Creating a composite to summarize sexual risk behavior holds the promise of better understanding treatment effects of prevention programs, particularly for interventions targeting processes that may influence multiple risk behaviors. Although the promise of a composite is recognized in the literature, there is little consistency in how researchers combine dimensions of sexual risk behavior. Results from our worked example suggest that while these approaches may be similar for cross-sectional studies with sufficient variability in risk behaviors, they can influence estimates of treatment effects and their SEs, particularly among studies with smaller samples or low numbers of sexually active participants. These differences are magnified when used to identify treatment modifiers, which complicates the already challenging process of individualizing intervention approaches (Lagakos, 2006). Inconsistencies in how composites are defined further complicate efforts to aggregate findings across trials in a field that is already inconsistent in the assessment, analysis, and reporting of sexual risk behaviors. Improving consistency in how composites are defined requires guidance on how to address the challenges unique to each population of interest.

Challenges to Summarizing Sexual Risk During Adolescence

The worked examples highlight challenges with aggregating data across multiple adolescent sexual risk behaviors. The two principal challenges to producing a single index of sexual risk behaviors are the relatively low proportion of adolescents who engage in sexual risk behavior and the relatively low observed correlations among risk behaviors for those who are sexually active. Even in at-risk samples like the ones used in the worked example, a high proportion of the sample will not have engaged in sexual activity during any given recall period. For example, in the combined dataset, only 59% had ever been sexually active with only 35% recently active. Among those who were active, the associations among number of partners, number of protected acts, and number of condomless acts were low (ρPrtns/ClSex = 0.14, ρPrtns/PrSex = 0.26, and ρPrSex/ClSex = − 0.25). These low correlations suggest that each dimension provides unique information and may not be amenable latent variable measurement models. This is not to suggest that these dimensions are unrelated, only that without assessing the processes that link the dimensions (e.g., partner characteristics, knowledge, attitudes, relationship quality, relationship duration, risk propensity), the observed simple association among reported behaviors is small.

In adolescent populations, the low number of recent sexual events and low correlations among different types of sexual risk behaviors pose significant challenges to latent variable measurement models and to data-driven approaches to summarizing across sexual risk behaviors. When considering the full sample, data-driven approaches are primarily driven by recent activity versus no recent activity. When considering only the recently sexual active, the low correlations provide limited information for data-driven approaches to distill and summarize. The limited information among core risk behaviors is likely why studies that have employed latent variable measurement models often included additional behaviors such as alcohol and other drug use prior to sex, carrying a condom, unwanted sex, pregnancy, or STIs, and also included some of the drivers of the risk behaviors such as intentions, attitudes, or partner communication (Siegel et al., 2001). While including more behaviors may help with fitting a measurement model, the additional information may further complicate interpretation of the underlying risk score, particularly when behaviors are mixed with intentions or attitudes.

There are statistical models that explicitly address the high number of zeros. These approaches to zero-altered data (Atkins et al., 2013) jointly estimate the process that generates the zeros or excess zeros along with the process that generates the behavioral counts. Typically, zero-altered models estimate two processes, but for adolescent behaviors there are at least three: processes influencing the transition into sexual activity, processes influencing recent activity, and processes influencing the behavioral count. Moreover, the count processes (number of partners, condomless sex, protected sex) share the same zero-altered processes (transition into sexual activity, recent sexual activity) and estimating separate models for each count process will produce different estimates for these shared zero-altered processes. Fitting a joint model of the zero-altered processes and each of the count processes is a formidable analytic challenge. Although zero-altered models help improve estimation of any given risk behavior, they can be difficult to analyze with no clear approach to aggregating across the zero-altered and count processes; consequently, they have not been used when forming risk composites (Aicken et al., 2013; Webb et al., 2015).

Recommendations and Future Directions

In this article, we have briefly outlined the most common approaches to forming sexual risk indices and presented worked examples. Results suggested that the choice of index may influence findings from clinical trials but does not indicate which approach is best suited for evaluating adolescent sexual risk behaviors. Part of the difficulty is that there is no clear criterion against which to compare the various approaches to forming a composite. Although it is possible to utilize adverse health outcomes such as STI incidence or unwanted pregnancies to weight the sexual risk behaviors, such models require large datasets with both sexual behaviors and the adverse outcome of interest. Moreover, the models will differ depending on which outcome is used to generate the weights. Ultimately, deciding on a composite approach is not a function of improving measurement accuracy or statistical methodology but of agreeing on common definitions and short of that, consistency in assessment, analysis, and reporting of each sexual risk behavior. We would like to forward some recommendations to help improve the quality and consistency in how sexual risk behavior is reported in the literature.

Adding Context

One of the most challenging aspects of studying adolescent sexual risk behavior is the fluidity of adolescent relationships and low frequency of sexual behavior which result in inconsistent behavior across time and zero-altered data. The high number of zeros makes it difficult to identify treatment effects because during any given assessment period, it is not clear whether an adolescent is actively lowering exposure to sexual risk behavior or has simply not had opportunity to demonstrate their propensity for risk due to lack of partner. Enriching current assessment with additional context, such as asking about nonsexual relationships or asking about successful avoidance of sexual situations, will help provide additional information about those with low levels of risk behaviors (Manning et al., 2014). A richer assessment will help identify those who are actively avoiding risk from those who have limited opportunity to manifest risk behaviors. Collecting contextual information may require more time-intensive data collection approaches such as structured interviews, timeline followback, daily diaries, or experience sampling. These more time-intensive measures can be integrated with traditional longitudinal designs to add contextual information that will enhance the accuracy and interpretability of the data (Gioia et al., 2012; Sliwinski, 2008). Care is needed, however, when using daily measures as the relatively low frequency of sexual behavior during adolescence might result in high investment of resources and participants’ time for limited yield. Studies using daily assessment approaches to study sexual behavior tend to sample from populations that regularly engage in sexual behavior (Blood & Shrier, 2013; Wray et al., 2016) or pool across participants and time to focus on predictors of sexual events (Blood & Shrier, 2013). It is not clear whether prospective daily assessment is a cost-effective approach to assess individual change in sexual risk behaviors over time.

Scaffolding Recall

Given adolescents’ developing numeracy and heightened sensitivity to social norms and perceived expectations, it is important to appropriately scaffold their recall of sexual behaviors. Such scaffolding may include replacing open-ended responses with carefully calibrated response categories that have been piloted using cognitive interview techniques to ensure accurate communication of expectations and norms. Timeline followback methods (Weinhardt et al., 1998), which are widely used in the study of alcohol and other drug use, can provide much-needed structure to adolescent recall and can help track relationships overtime, thus providing contextual information about relationship duration and concurrent relationships (Rizzo et al., 2017). Self-administered administration of timeline followback methods has been developed which can help minimize bias due to social desirability and impression management often seen in face-to-face interviews (Collins et al., 2008; Rueger et al., 2012). More work is needed to evaluate the reliability of self-reported timeline followback methods in assessing adolescent sexual risk behaviors (Schroder et al., 2003b), but the strong performance of timeline followback methods across risk outcomes (Hjorthøj et al., 2012; Norberg et al., 2012), populations (Carey et al., 2001; Sobell et al., 2001), and modes of administration (Maisto et al., 2008; Pedersen et al., 2012; Sobell et al., 1996) argues favorably for increased use in sexual risk prevention trials.

Aggregating Across Assessments

One approach to examining group differences in inconsistent and infrequent behaviors is to lengthen the time frame being considered, thus increasing the opportunity to observe the behavior. Expanding time frames can be done through lengthening the recall window or by aggregating across multiple assessments with shorter recall windows. Because expanding recall windows adversely affects accuracy of the recall, we recommend aggregating multiple shorter recall windows. Although aggregating across assessments helps to minimize the number of zeros in the data, it does so at the expense of temporal precision. Aggregated data do not contain information about when events happened only that they happened. Retaining temporal information requires reducing the time interval over which data are aggregated and reintroduces the challenges of sporadic and zero-altered data. Balancing the relative importance of temporal information versus observing sufficient behavior depends on the frequency of the behavior and the importance of timing to the research question. For clinical trials, when adolescents reduced their risk may not be as critical as if they reduced their risk following intervention, making aggregating across time a promising approach.

Report Each Sexual Risk Dimension

To improve consistency of data reported in peer-reviewed articles for adolescent sexual risk behavior, it is necessary to report each of the most common risk behaviors, such as age of onset (or at least proportion of the sample that has ever been sexually active), any sexual activity during the period of observation, condomless acts, number of partners, and high-risk behaviors (e.g., substance use before sex, sex with someone you just met). Reporting how an intervention influenced each risk behavior enables comparisons among trials, even if each study used different strategies for forming risk composites. Routinely reporting each behavior will also protect against potential bias introduced by selectively choosing which behaviors to report.

Develop Standard Sexual Risk Composites

Significant work is needed to develop and validate standard sexual risk composites. As stated previously, the work of developing a common sexual risk composite is ultimately a measurement problem that requires building consensus around what is considered sexual risk behavior. As a field, we need to answer some fundamental measurement questions to find consensus in how to form a sexual risk composite. These questions include the following: (1) Should the underlying risk propensity be one-dimensional or categorical? (2) Should the model be calibrated to predict a specific health outcome such as STI incidence, or should it be a more general model of sexual risk behavior? (3) Should the model be strictly behavioral or should it include thoughts, attitudes, and beliefs? (4) Which behaviors should be routinely included as sexual risk behaviors? Note that these questions do not include those specific to the assessment of each risk behavior which are covered in several excellent reviews (Dolezal et al., 2012; Fenton, 2001; Fonner et al., 2013; Graham et al., 2005; Schroder et al., 2003a, b).

Statistical simulation might be able to assist in addressing questions about sensitivity to change. For example, it would be helpful to know how each composite responds to changes in one or more of the contributing behaviors. Understanding sensitivity to change could be evaluated by generating a “true” model for sexual life-history data including age of onset, number of relationships, relationship durations, number of sexual acts, number of condomless acts, and number of high-risk behaviors. Various competing composites could then be formed using these simulated data. Manipulating how the life-history data changes post-intervention would help inform the sensitivity of each composite to changes in components of the life history.

Agreement on a set of common composites will require more effort in harmonizing definitions which will take time; in the interim, researchers will continue to use composites. Beyond reporting each dimension of sexual risk, we recommend using a clearly defined, a priori classification of risk behavior. The categorical definition is easily defined and has high face validity that facilitates clear communication of results from one study to the next. When coupled with reports of each dimension, a theoretically defined, categorical risk composite will provide information about treatment changes in overall sexual risk behavior. Including this common composite definition along with outcomes for each risk dimension would provide a foundation that would greatly facilitate comparisons across studies. Consistent reporting will enhance meta-analytic efforts to summarize the effectiveness of current prevention approaches, which have been limited by inconsistent reporting of changes in adolescent sexual risk behaviors following prevention trials.