Keywords

Forensic assessment done well is a comprehensive process of obtaining information from diverse sources and creating an integrated conceptualization of the information in order to understand the client, inform decision makers, provide appropriate intervention, and manage future risk. This task is an important part of many legal decisions (e.g., civil commitment evaluations, end of sentence evaluations, and allocation of treatment) as the potential danger to society of individuals who are already known to have committed a violent offense constitutes a major concern for courts and forensic practitioners. A critical part of the process is risk assessment , which involves combining multiple risk factors together into an overall assessment of the likelihood of an outcome, such as recidivism (Hanson & Morton-Bourgon, 2009). Risk assessment and risk measures have evolved considerably over the last decades (e.g., Hanson, 2005; Harris & Hanson, 2010; Mann, Hanson, & Thornton, 2010) and distinct approaches to and generations of risk assessment can be differentiated (Andrews & Bonta, 2010; Bonta, 1996; Heilbrun, 1997).

Heilbrun (1997) argues that there are at least two models of risk assessment: the prediction and the management model. The prediction model focuses on maximizing the accuracy of the prediction of the outcome—in this model, it does not matter why something predicts the outcome, just that it does. The management model aims at reducing the risk of the occurrence of a specified event’s outcome (e.g., sexual recidivism). In contrast, Bonta (1996) has provided a similar but more nuanced characterization of the development of risk assessment in three generations. The first generation consists of unstructured clinical judgment (UCJ) , where a clinician gathers information and forms a subjective risk assessment. The weaknesses of this method are its overreliance on personal discretion and its lack of accountability and replicability (Bonta, 1996).

The second generation of risk assessment relies on instruments that combine primarily static (i.e., historical and unchanging), empirically derived risk factors (Bonta, 1996). In these instruments (commonly referred to as actuarial), items are often scored with either a 0–1 dichotomy (absent-present) or with a specified weighting determined by the strength of the item’s relationship to recidivism. The weakness in this generation is that the focus on static factors is assumed to preclude identification of areas to target in treatment to reduce risk and it cannot reflect positive changes (Bonta, 1996).

The third generation evolved from the second to incorporate criminogenic needs (Bonta, 1996), which are dynamic (i.e., changeable) risk factors that, if changed, should alter the likelihood of reoffending (Andrews et al., 1990). Examples of key criminogenic needs (Andrews & Bonta, 2010) include antisocial personality (e.g., aggression, impulsivity) and antisocial attitudes (e.g., negative attitudes toward the criminal justice system, identification with criminals). Third-generation scales are therefore sensitive to offender changes and they also tend to have a stronger basis in theories of offending, as well as empirical evidence (Bonta, 1996). Similar to the second generation, these tools are typically actuarial. Recently, Andrews, Bonta, and Wormith (2006) have suggested that a fourth generation of risk assessment has emerged, which provides a comprehensive guide for human service delivery that spans from intake through to case closure.

In terms of understanding dynamic risk factors (i.e., third- and fourth-generation approaches), Hanson and Harris (2000) have articulated a further distinction between stable and acute dynamic factors. Stable factors constitute relatively enduring problems (e.g., alcoholism, personality disorders) and acute risk factors are rapidly changing features indicating imminent risk of reoffending (e.g., intoxication, emotional collapse). Whereas the strength of stable risk factors is monitoring risk over the medium to long term (e.g., treatment change), acute risk factors are intended for monitoring current risk over a high-risk period (e.g., community supervision).

One area not addressed by Bonta’s (1996) description is the status of structured professional judgment (SPJ) . SPJ is a method of risk assessment where explicit risk factors (often both static and dynamic) are scored, but the combination of these items into an overall evaluation of risk is left to the judgment of the clinician (Boer, Wilson, Gauthier, & Hart, 1997). Proponents of SPJ argue that clinical judgment should be incorporated in risk assessment because the statistical approach of actuarial scales is not always appropriate in individual cases (Webster, Douglas, Eaves, & Hart, 1997). SPJ therefore has the greatest amount of flexibility to respond to unique case-specific factors. Other researchers, however, have been dismissive of SPJ (Andrews & Bonta, 2010; Bonta, 2002; Quinsey, Harris, Rice, & Cormier, 2006) and classify it as a variation of the first generation of risk assessment (Andrews et al., 2006).

Hanson and Morton-Bourgon (2009) have added to the classification of risk assessment methods by applying a more stringent definition of actuarial scales. Their definition is based on Meehl’s (1954) criteria that actuarial scales involve explicit rules to combine pre-specified items into total scores and empirically derived estimates of recidivism probability linked to each total score (Hanson & Morton-Bourgon, 2009). Given that several tools satisfying the first criteria of actuarial scales do not include absolute recidivism estimates, Hanson and Morton-Bourgon (2009) made a distinction between actuarial scales (using Meehl’s definition) and mechanical scales. Mechanical scales typically contain factors identified based on theory or previous literature reviews, which are combined into a total score based on explicit item weightings, but do not contain a table with recidivism estimates per score. If SPJ scales are used to sum items to produce a total score, without creating a summary professional judgment, this would be using the SPJ scale as a mechanical scale.

The purpose of this chapter is to discuss the strengths of actuarial risk assessment. First, we will provide greater discussion of ways to conceptualize risk factors that may be included in risk scales (actuarial or other approaches). Then, we will discuss what types of information can be provided by actuarial risk scales, how the greater objectivity inherent in actuarial risk scales contributes to understanding important psychometrics of the risk assessment approaches, and how the predictive accuracy of actuarial scales compares to other approaches. These sections will be applicable to any type of offender risk assessment (i.e., any scale designed to predict an outcome among offenders). In the next section, the reader will be introduced to a small sampling of sexual offender risk scales. Sex offender risk scales are focused on because we have greater familiarity with them and they will serve as examples of the types of scales that could be used with other offender types. Then, results of surveys will be highlighted to illustrate what scales are being used in practice and how the information is being used. Lastly, the practical clinical power of actuarial risk assessment instruments in everyday practice will be discussed.

Conceptualizing Risk Factors: Psychologically Meaningful Risk Factors

As discussed above regarding the generations of risk assessment (Bonta, 1996), risk factors have often been classified as either static or dynamic (with dynamic factors further classified as stable or acute). The assumption has been that only dynamic risk factors can identify treatment targets or be used in risk management models. As an alternative to the static/dynamic conceptualization of risk factors, however, another approach is to focus on psychologically meaningful risk factors (Mann et al., 2010), also sometimes called risk-relevant propensities. In this model, risk factors are indicators of underlying constructs/propensities. For example, self-regulation problems may be an underlying psychological propensity related to recidivism. Certain past and present behaviors, such as substance abuse, job instability, getting into fights, and poor problem-solving, may all be indicators of this propensity. In this model, the distinction between static and dynamic risk factors is simply a heuristic to describe indicators, rather than a fundamental difference between the risk-relevant constructs. For example, a history of car accidents (a static variable) and current substance abuse (a dynamic variable) may both be indicators of the same underlying propensity (poor self-regulation). In other words, psychologically meaningful risk factors can be measured using either static or dynamic risk factors.

Nonetheless, even though static and dynamic risk factors may measure the same constructs, there are practical advantages to distinguishing between them in risk assessment. Conceptually, it is easy to divide risk factors into those that the offender cannot change or manage (static) versus those he/she can (dynamic), with the latter being easier to incorporate into treatment planning (though this does not mean that static risk assessment cannot also inform risk management). Also, the types of information used to assess these risk factors are different. Static risk factors are often easy and reliably coded based on fairly straightforward criminal history information, as well as offender and victim demographics. Interviews with the offender may not be required, which makes these items practical for correctional systems that need to assess and manage large populations with limited resources. In comparison, dynamic risk factors are often more time-intensive to assess. Credible assessments should minimally include detailed reviews of file information (criminal history and personal/social history) and ideally an interview with the offender (e.g., Fernandez, Harris, Hanson, & Sparks, 2014). Other sources of information (e.g., specialized testing, collateral interviews) can also enhance dynamic assessment.

Complicating this distinction further is recent research and theoretical work that suggests the existence of protective factors (e.g., Farrington & Ttofi, 2011; Lösel & Farrington, 2012), which may reduce the risk of recidivism or interact with a risk factor to decrease its association with recidivism. Although the attempt to focus on offender strengths in assessment is admirable and would likely increase the comprehensiveness of the assessment and improve the therapeutic climate, Harris and Rice (2015) have argued that current descriptions of supposedly protective risk factors are mostly just the opposite end of risk factors and do not reflect new constructs. Consequently, the idea of risk-relevant propensities (Mann et al., 2010) implies that static, dynamic, and/or protective factors can be used to assess the same risk-relevant contructs, thereby informing risk management practices. Certainly, however, assessing changes in risk would require some consideration of dynamic risk factors.

Crime Scene Behaviors as Indicators of Risk-Relevant Propensities

One neglected area of research has been to use crime scene behaviors as indicators of risk-relevant constructs. Enduring risk-related individual offender propensities (e.g., hostility) may manifest themselves in concrete offense behavior (e.g., excessive humiliation, genital injury). Consequently, research trying to understand offender characteristics from crime scene behavior may be relevant to risk assessment.

Canter and Heritage were among the first researchers to classify sexual offenders on the basis of observable or directly inferred crime scene behavior alone. In essence, this task consists of analyzing largely observable behaviors with inferences made about the latent (or unobservable) dimensions and themes within the data. Loosely, this process is referred to as Behavioral Thematic Analysis (BTA) , a cornerstone of investigative psychology (IP) research (Canter, 2004). BTA has been used as a predictive tool exploring the relationship between behavioral themes and stranger offender characteristics with notable success (e.g., Goodwill, Alison, & Beech, 2009; Häkkänen, Puolakka, & Santtila, 2004; Mokros, 2007; Santtila, Häkkänen, Canter, & Elfgren, 2003).

Studies employing BTA of stranger rape offense details have found the presence of five (Canter & Heritage, 1990), four (Alison & Stein, 2001; Canter, Bennell, Alison, & Reddy, 2003) or three (Canter, 1994; Häkkänen, Lindlöf, & Santtila, 2004) themes of offense behavior. Although the BTA of these previous studies differed in interpretation, it is argued, in line with Wilson and Leith (2001), that each was consistent in finding themes of hostility, criminality, and pseudo-intimacy. The hostility theme is characterized by expressive, non-strategic aggression beyond that necessary to commit the offense. Here, the offender wants to hurt the victim and may perform brutal (sadistic) sexual acts. In the criminality theme, the sexual assault is considered one among many antisocial behaviors the offender commits. Whereas for stranger rapists the pseudo-intimacy theme may represent deviant sexual fantasies involving the victim receiving intense pleasure during the offense and falling in love with the offender, for the acquaintance rapist this theme may represent the misperception of the victim’s sexual intent. However, during the offense both offender types show behaviors frequently present in consensual relationships.

Similarly, studies employing BTA of child molestation offenses have found the presence of three (Canter, Hughes, & Kirby, 1998) or four (Bennell, Alison, Stein, Alison, & Canter, 2001) offense themes. Here, it is argued that these themes can be summarized as fixated (i.e., love, intimate), regressed (i.e., autonomy), aggression (i.e., hostility), and criminality (i.e., control, criminal-opportunist). The themes of criminality and aggression show considerable overlap with the offense behaviors of rapists. The theme of fixation describes offenders actively creating opportunities to offend by grooming potential victims with attention, affection, and gifts and actively seeking suitable targets. The theme of regression describes offenders motivated by non-paraphilic sexual excitation and victim availability, who could choose children as an alternative to age-appropriate partners.

However, the relevance of these behavioral themes as indicators of enduring offender propensities in the context of risk assessment has been previously neglected. Therefore, based on theoretical considerations (e.g., Ward, Polaschek, & Beech, 2005) and the discussed empirical evidence (e.g., Canter et al., 2003), Lehmann and colleagues developed precise and detailed conceptualizations of target propensities and their theoretical contexts to define crime scene behavior-based indicators of these constructs. In a first step Lehmann and colleagues were able to demonstrate the construct validity of the behavioral themes through correlational analyses with known sexual offending measures, criminal histories, offenders’ motivation, and offense characteristics. For stranger rapists (Lehmann, Goodwill, Gallasch-Nemitz, Biedermann, & Dahle, 2013), the analyses revealed three behavioral offender propensities: sexuality, criminality, and hostility. Statistical analyses indicated that the behavioral theme of criminality significantly predicted sexual recidivism (AUC = 0.64) and added incrementally to Static-99. For acquaintance rapists (Lehmann, Goodwill, Hanson, & Dahle, 2015), results indicated that the behavioral themes of hostility (AUC = 0.66) and pseudo-intimacy (AUC = 0.69) predicted sexual recidivism, with the latter adding incrementally to Static-99. For child molesters (Lehmann, Goodwill, Hanson, & Dahle, 2014), the behavioral themes of fixation on child victims (AUC = 0.65) and (sexualized) aggression (AUC = 0.59) significantly predicted sexual recidivism and added incrementally to Static-99. Recently, the predictive validity of the behavioral theme of fixation was cross validated with an independent sample (Pedneault, 2014). In sum, the results indicate that crime scene information can be used to assess risk-relevant constructs. Also, crime scene information seems to be relevant external information to the results of actuarial scales.

What Types of Information Can Actuarial Risk Scales Provide?

Risk assessment can include static, dynamic, protective, or crime scene behavior factors as indicators of risk-relevant propensities. Regardless of what types of risk factors are used, how they are combined, or how accurate the scale is, appropriately reporting risk assessment results make little difference if the decision makers do not understand the information, which is a serious possibility (e.g., Varela, Boccaccini, Cuervo, Murrie, & Clark, 2014). Consequently, there have been essential developments in actuarial risk assessment research regarding optimal ways to report and interpret risk assessment information in clinical practice (for a review, see Hilton, Scurich, and Helmus, 2015). Hence, an important advantage of actuarial risk assessment instruments is that they allow their scores to be linked to different types of empirically derived quantitative indicators of risk. In contrast, other approaches to risk assessment (e.g., SPJ) solely provide nominal risk categories (e.g., low, moderate, and high risk)Footnote 1 with research indicating that nominal risk categories are interpreted inconsistently by professionals (Hilton, Carter, Harris, & Sharpe, 2008; Monahan & Silver, 2003). Three important metrics for risk communication are percentile ranks, risk ratios, and absolute recidivism rates.

Percentiles

Percentiles communicate information about how common or unusual a person’s score is in comparison to a reference population (Crawford & Garthwaite, 2009). Percentiles have the advantage of being fairly easily defined and communicated and are consistent with the communication of many types of psychology tests, such as intelligence tests (for more information, see Hanson, Lloyd, Helmus, & Thornton, 2012). They are particularly helpful in decisions for resource allocation. For example, if a correctional service has sufficient resources to offer treatment to 15 % of their offenders, then all the information required by an offender risk assessment may be a percentile (e.g., the highest risk 15 % should be prioritized for treatment).

Disadvantages of this metric are that the information provided is norm-referenced (i.e., relative to other offenders), when risk assessment is often intended to be criterion-referenced (i.e., focused on the likelihood of recidivism). Additionally, the relationship between percentiles and the ultimate outcome of interest (recidivism) is not necessarily linear. In other words, the difference between two risk scores in percentile units may have little to do with the difference between two risk scores in terms of the likelihood of recidivism. For example, in Static-99R, scores of −3 and −2 correspond to the 1st and 4th percentiles, respectively (with percentiles defined as a midpoint average; Hanson et al., 2012). In the higher risk range, scores of 7 and 8 correspond to the 97th and 99th percentile, respectively, which is a similar difference as scores of −3 compared to −2. In contrast, the expected recidivism rates in routine correctional samples for scores of −3 and −2 barely have a perceptible difference (0.9 % versus 1.3 %, respectively), whereas the difference in recidivism rates for scores of 7 and 8 is larger and more meaningful (27.2 % versus 35.1 %; Phenix, Helmus, & Hanson, 2015).

Risk Ratios

Risk ratios describe how an offender’s risk of recidivism compares to some reference group (e.g., low risk offenders or offenders with the median risk score). For example, offenders with a Static-99R score of 4 are roughly twice as likely to sexually reoffend as offenders with a Static-99R score of 2 (Hanson, Babchishin, Helmus, & Thornton, 2013). Risk ratios are well-matched to the fundamental attribute being measured by risk scales (scorewise increases in relative risk for recidivism) and are robust to changes in recidivism rates across different samples as well as across different lengths of follow-up (Babchishin, Hanson, & Helmus, 2012a; Hanson et al., 2013). They also have the most potential for combining results from different risk scales because it is possible for them to have a common meaning across scales (Babchishin, Hanson, & Helmus, 2012b; Hanson et al., 2013; Lehmann et al., 2013).

Despite these advantages, risk ratios have rarely been developed or reported for forensic risk scales. They are, however, commonly used for medical risk communication. Possible barriers to their use include more complex calculations compared to other metrics for communicating risk (for an example of different types of risk ratios and other decisions required in their calculation, see Hanson et al., 2013), difficulty in communicating them to laypeople (e.g., Varela et al., 2014), and potential for misinterpretation. Specifically, risk is generally overestimated if risk ratios are not properly contextualized with information about base rates (Elmore & Gigerenzer, 2005). In the Static-99R example above, knowing that an offender with a score of 4 is twice as likely to reoffend as an offender with a score of 2 has a very different meaning if the recidivism rate for a score of 2 is 4 or 40 %.

Absolute Recidivism Estimates

Absolute recidivism estimates are by far the most frequent quantitative metric reported for actuarial risk scales. They are reported in approximately 90 % of assessment reports for preventative detention in Canada, compared to percentiles and risk ratios, which are reported in roughly 40 % and 0 % of cases, respectively (Blais & Forth, 2014). In a survey examining Static-99R reporting practices in sex offender civil commitment evaluations, absolute recidivism estimates were used by 83 % of respondents, compared to roughly one third who used either percentiles or risk ratios (Chevalier, Boccaccini, Murrie, & Varela, 2014).

Absolute recidivism estimates can be generated in a variety of ways, such as from observed recidivism rates for a group of scores (ideally requiring large sample sizes for each score) or using methods such as survival analysis or logistic regression (for discussion, see Hanson, Helmus, and Thornton, 2010). Absolute risk information is easy to understand but hard to obtain with high levels of confidence. Recidivism rates vary based on the follow-up length, so this must be specified. Additionally, there are several practical complications in obtaining good estimates of recidivism, including underreporting of offences, misclassification (e.g., sexual offences pled down to nonsexual violent offences), prosecutorial discretion, and legal/policy/cultural changes over time.

Likely due to the myriad factors that influence recidivism, research has found that absolute recidivism estimates were unstable across samples for the Static-99R and Static-2002R (Helmus, Hanson, Thornton, Babchishin, & Harris, 2012), as well as the MATS-1 (Helmus & Thornton, 2014) and the Risk Matrix 2000/Violence scale (but not the Risk Matrix 2000/Sex scale; Lehmann, Thornton, Helmus, & Hanson, 2015). Additional research has also raised concerns about the generalizability of the recidivism estimates for the VRAG (Mills, Jones, & Kroner, 2005; Snowden, Gray, Taylor, & MacCulloch, 2007). Moreover, analyses of two samples found that violent recidivism rates differed between samples after controlling for the VRS-SO pretreatment score (Olver, Beggs Christofferson, Grace, & Wong, 2014). Some solutions have been proposed for using absolute recidivism estimates in light of this variability (e.g., Hanson, Thornton, Helmus, & Babchishin, 2015), but the adequacy of these solutions is not yet known. Minimally, these findings of variability suggest that creating and reporting reliable and generalizable recidivism estimates for actuarial scales are more complicated than previously believed.

Psychometric Properties of Risk Scales

An important advantage of actuarial risk assessment is that (in contrast to UCJ) it is possible to test the psychometric properties of the risk scales. Compared to SPJ, the increased structure and availability of quantitative risk communication metrics in actuarial scales may provide more options and precision for evaluating psychometric properties, as well as stronger results. Professional standards dictate that forensic psychologists should have expertise on research related to the psychometric properties, appropriate uses, and strengths/weaknesses of risk assessment instruments they are using (American Psychological Association, 2013; Association for the Treatment of Sexual Abusers, 2014). The ability to comment on the psychometric properties of a risk scale is particularly important when risk decisions have to be defended in court; without this information, the method of risk assessment may be considered inadmissible evidence. This section discusses appropriate and inappropriate psychometric properties of actuarial risk scales and where applicable compares them to SPJ approaches.

Objectivity and Interrater Reliability

As actuarial risk assessment scales generally rely on explicitly defined predictor variables with specific scoring rules (e.g., how much weight to give the item), this facilitates more objective, transparent, standardized, and fair assessments. In contrast, UCJ has none of these features. SPJ scales may have explicitly defined predictor variables (contributing to greater objectivity than UCJ), but the subjectivity in how they influence the overall judgment should come at the expense of some objectivity, transparency, and standardization. This objectivity should increase interrater reliability, which refers to the consistency in scores across independent raters (i.e., if two different evaluators score the same individual, will they obtain the same results?). Not only does interrater reliability increase the general validity and defensibility of the assessment, but higher interrater reliability has also been associated with significantly higher predictive accuracy in some analyses (Hanson & Morton-Bourgon, 2009). Supporting the idea that the objectivity of actuarial assessment lends itself to higher interrater reliability is a finding from the Spousal Assault Risk Assessment guide (the SARA) , where the interrater reliability of the SPJ summary risk rating was considerably lower than for the total score (summing the items; Kropp & Hart, 2000).

Internal Reliability

Another metric sometimes applied to risk scales is internal consistency, which refers to the degree of interrelatedness among the items (Cortina, 1993). Cronbach’s α (Cronbach, 1951) is one of the most common indices of internal consistency. Unfortunately, despite its frequent use, internal consistency is not an informative metric for actuarial risk scales.

Developing a scale to predict an outcome (e.g., recidivism) is meaningfully different than classical scale construction in psychology. Specifically, most scales in psychology are norm-referenced, which means they are trying to capture how individuals display different amounts of some relevant construct (e.g., Aiken, 1985). Examples include tests of intelligence, ability, or personality. In contrast, risk assessment scales are inherently criterion-referenced, which means they are designed specifically to predict an outcome of interest. This means that some elements of test reliability and validity are not applicable (e.g., internal consistency; Aiken, 1985). In norm-referenced scales, internal reliability increases to the extent that multiple items are assessing the same construct (e.g., items are highly related to total scores); this may be achieved by including similar items but with different wordings or reverse-scored.

In contrast, the most important goal of criterion-referenced scales is to predict the outcome. For that reason, it does not make sense (and may be undesirable) to measure only one construct and to include multiple items assessing the same construct. Consequently, predictive accuracy and efficiency are maximized by including the smallest number of items measuring the most distinct constructs possible, instead of having multiple items assess a single construct. These goals would deliberately decrease internal consistency. Consequently, we do not recommend reporting internal consistency to evaluate the reliability of risk scales. Internal consistency is, however, useful for scales designed to assess a single construct (e.g., the Psychopathy Checklist-Revised; Hare, 2003).

Construct Validity

The results of risk scales should have greater meaning and clearer implications for case management decisions when the source of an offender’s risk is identified and understood. This requires knowing what constructs are being measured by actuarial risk scales. Given that risk scales were designed as criterion-referenced (i.e., items were chosen based on their ability to predict the outcome), construct validity has been largely neglected in actuarial risk assessment scales. In recent years, however, greater attention has been paid to construct validity of actuarial risk scales (e.g., Babchishin et al., 2012b; Brouillette-Alarie, Babchishin, Hanson, & Helmus, 2015).

Specifically, items are assumed to predict the outcome because they are an indicator of some kind of latent underlying construct/propensity (Mann et al., 2010). Efforts to improve construct validity may focus on identifying the underlying constructs measured by the items, determining how well the items measure those constructs, and assessing how to best combine constructs into an overall assessment. Consequently, greater focus on construct validity should help improve predictive accuracy (by potentially identifying better indicators of constructs), resolve discrepancies in risk scales, identify optimal ways to combine risk scales, and better identify whether external information is likely to add to the results of an actuarial scale (e.g., Hanson, 2009).

Predictive Validity

Whereas reliability specifies the extent to which risk assessments give consistent results, predictive validity refers to the accuracy of measurement in predicting the outcome. For risk assessment, predictive validity (also called criterion-related validity) is most important. Discrimination and calibration are distinct indices of the predictive validity of a criterion-referenced scale (Altman, Vergouwe, Royston, & Moons, 2009).

Discrimination quantifies the model’s ability to distinguish between recidivists and non-recidivists or in other words, to rank offenders according to their relative risk to reoffend. This indicates whether higher risk offenders are more likely to reoffend than lower risk offenders. The most commonly recommended and reported statistic for discrimination is the area under the curve from receiver operating characteristic curve analyses (AUC; Mossman, 1994; Swets, Dawes, & Monahan, 2000). For further discussion of the strengths and weaknesses of other discrimination statistics (such as correlations, Harrell’s c index, and Cox and logistic regression), see Babchishin and Helmus (2015).

In contrast, there is little research on the calibration of risk scales, which refers to the ability of a risk scale to estimate absolute recidivism rates (Helmus, Hanson et al. 2012). Consequently, there are no well-established statistics for measuring calibration. For example, in 2009 there were at least 63 studies examining the discrimination of Static-99 (summarized in Hanson & Morton-Bourgon, 2009) but only two studies that examined its calibration (Doren, 2004; Harris et al., 2003). One promising statistic to assess calibration is the E/O index (Gail & Pfeiffer, 2005; Rockhill, Byrne, Rosner, Louie, & Colditz, 2003), which is the ratio of the predicted number of recidivists (E) divided by the observed (O) number of recidivists (Viallon, Ragusa, Clavel-Chapelon, & Bénichou, 2009; for more discussion of this statistic, see Helmus and Babchishin, 2014). Although calibration statistics have been historically neglected, they present one of the most promising advantages of actuarial risk scales. Discrimination can be examined with either SPJ or actuarial approaches, but calibration is a unique property of actuarial risk scales, as they are the only approach with empirically derived recidivism estimates associated with total scores.

Predictive Accuracy of Actuarial Scales Compared to Other Approaches

Research across a variety of disciplines (including offender risk assessment) supports the superiority of actuarial prediction schemes over professional judgment (Ægisdóttir et al., 2006; Bonta, Law, & Hanson, 1998; Dawes, Faust, & Meehl, 1989; Grove, Zald, Lebow, Snitz, & Nelson, 2000; Hanson & Morton-Bourgon, 2009; Mossman, 1994). Examining sex offender risk assessment, for example, recent meta-analytic research (Hanson & Morton-Bourgon, 2009), has found that actuarial measures had significantly higher accuracy in predicting sexual recidivism (d = 0.67) than UCJ (d = 0.42), whereas SPJ scales had accuracy closer to UCJ, but not significantly different than either of the two previous categories (d = 0.46).

This cross-disciplinary literature contradicts the intuitive belief that the expertise of professionals should be better equipped to handle complex situations and case-specific factors (e.g., Boer et al., 1997). Paradoxically, it appears to be simultaneously correct that although level of expertise matters (e.g., experts generally outperform novices), actuarial decision algorithms outperform experts, but only under some conditions (Kahneman & Klein, 2009; Shanteau, 1992). An important question, then, is under what conditions?

In summarizing decision-making and cognitive science literature, Shanteau (1992) found evidence for good expert performance in weather forecasters, livestock judges, astronomers, test pilots, soil judges, chess masters, physicists, mathematicians, accountants, grain inspectors, photo interpreters, and insurance analysts. Poor professional judgments were noted for clinical psychologists, psychiatrists, astrologers, student admissions evaluators, court judges, behavioral researchers, counselors, personnel selectors, parole officers, polygraph judges, intelligence analysts, and stock brokers. Mixed performance was found for nurses, physicians, and auditors. Shanteau (1992) proposed a variety of task features that were associated with poorer performance from experts. He concluded that human behavior is inherently more unpredictable than physical phenomena and that decision-making is particularly difficult for unique tasks, when feedback is unavailable and when the environment is intolerant of error.

Kahneman (2011) provided a more updated summary of the performance of experts across a variety of tasks, with similar conclusions. According to Kahneman and Klein (2009), expert opinion can be expected to outperform actuarial decisions when the environment is regular (i.e., highly predictable), the expert has considerable practice, and there are opportunities to get timely feedback on decisions to learn from errors or false cues. These conditions are generally not present in offender risk assessment. The sheer number of diverse predictors of recidivism (e.g., see Andrews and Bonta, 2010, and Hanson and Morton-Bourgon, 2005) suggests that criminal behavior is not highly predictable (i.e., the number of contingencies are infinite; Hanson, 2009), and evaluators do not receive timely feedback on their decisions.

Professional Overrides

Another way to compare the predictive accuracy of actuarial approaches to SPJ is to examine “professional overrides.” A professional override is when the results of an actuarial scale are adjusted based on professional judgment. The premise of SPJ scales is that the professional judgment is a helpful way to respond to case-specific factors or apply flexibility in terms of weighting items for a particular individual. Research, however, has consistently found that overrides to actuarial scales decrease their accuracy (Hanson, Helmus, & Harris, 2015; Hanson & Morton-Bourgon, 2009; Wormith, Hogg, & Guzzo, 2012). Research also demonstrates that professional judgment tends to be more conservative, less transparent, and less replicable than are actuarial measures (Bonta & Motiuk, 1990). Alexander and Austin (1992) have found that overrides also disproportionately are used to increase offenders’ risk. If overrides are a necessary part of correctional policy (e.g., to introduce flexibility), Austin, Johnson, and Weitzer (2005) encourage adopting a general standard where only 5–15 % of final assessments should differ from initial actuarial results. Furthermore, the direction of inconsistencies should be balanced, where half are higher and half are lower than the original actuarial result. Overall, however, overrides may offer some advantages (e.g., flexibility), but the research seems clear that they have a negative impact on accuracy. One possible explanation for the disappointing results of professional judgment in this context is that the professionals may be able to accurately identify risk-relevant information that is not incorporated in the risk scale, but are unable to determine to what extent this new information is correlated with existing information in the scale or how much weight to give this new information.

Incremental Validity

Besides predictive accuracy , incremental validity which assesses the contribution of an additional measure to the prediction of an outcome (e.g., recidivism) is essential information in the context of risk assessment. Additional measures may add incrementally by either improving the measurement of constructs already included (e.g., attitudes, emotional regulation, intimacy deficits) or by the assessment of new risk-related constructs. The greater objectivity and structure of actuarial risk scales may facilitate easier interpretation of incremental results.

Incremental validity becomes increasingly important as the knowledge base for offender risk assessment expands. As risk scales become entrenched in practice, the threshold for newly developed scales should increase. In other words, if scales are already in use, the onus is on developers of new scales to demonstrate that their scale provides incremental accuracy to standard practice (Hunsley & Meyer, 2003). Unfortunately, statistical power is reduced for tests of incremental validity compared to bivariate predictive validity, and comparisons of scales may require sample sizes in the thousands (Babchishin et al., 2012b). This means that increasingly larger amounts of data are required for smaller gains in accuracy.

Combining Actuarial Risk Instruments

Generally, a comprehensive actuarial risk assessment of a range of psychological risk factors will yield better predictive accuracy than a less comprehensive assessment (Hanson & Morton-Bourgon, 2009; Mann et al., 2010). Accordingly, multiple risk measures are frequently used to assess offenders’ risk for future offending (Jackson & Hess, 2007; Neal & Grisso, 2014). The use of multiple risk tools is justified on the grounds that they provide incremental information (Babchishin et al., 2012b; Welsh, Schmidt, McKinnon, Chattha, & Meyers, 2008). For some scales the developers propose starting with a commonly used risk scale and adjusting the overall rating based on the scores of an incrementally valid, additional risk instrument (e.g., Helmus, Hanson, Babchishin, & Thornton, 2014). Also, recent research indicates that averaging the risk ratios of different risk tools is a promising approach to obtaining a better overall evaluation of relative risk (Lehmann, Hanson et al. 2013), as opposed to other approaches, such as taking the highest or lowest risk estimate. Hence, a strength of actuarial risk assessment is the inclusion of a range of empirically validated risk factors or scales, which under certain circumstances (see Lehmann, Hanson et al. 2013) could be combined into an overall risk judgment of recidivism risk with better predictive accuracy than a single scale.

Selected Examples of Actuarial Risk Scales for Sex Offenders

Below, specific examples of risk scales for sex offenders will be discussed. Note that they are not meant as an exhaustive list of scales available—they are illustrative examples of scales we are most familiar with. This chapter was not intended to provide a detailed review of actuarial risk scales available.

The Static-99/R

The most commonly used static sex offender risk assessment tools in Canada and the United States are the Static-99 and Static-99R (Hanson & Thornton, 2000; Helmus, Thornton, Hanson, & Babchishin, 2012; Interstate Commission for Adult Offender Supervision, 2007; Jackson & Hess, 2007; McGrath, Cumming, Burchard, Zeoli, & Ellerby, 2010; Neal & Grisso, 2014). The Static-99/R is 10-item actuarial scales designed to assess sexual recidivism risk of adult male sex offenders. The items and scoring rules for Static-99 (Hanson & Thornton, 2000) and Static-99R (Helmus, Thornton et al., 2012) are identical with the exception of updated age weights for the Static-99R. The scale developers have recommended that Static-99R be used in place of the original scale (Helmus, Thornton et al., 2012). Static-99/R contains items covering the broad constructs of age and relationship status (i.e., whether the offender has ever lived with a lover for two or more years), sexual deviance (e.g., stranger victims, noncontact sexual offences, prior sex offenses), and general criminality (e.g., number of prior sentencing occasions, index nonsexual violence, prior nonsexual violence) identified in meta-analytic research (Hanson & Bussière, 1998; Hanson & Morton-Bourgon, 2005).

Accordingly, the strength of the risk tool is that it only uses risk factors empirically associated with sexual recidivism. Also, explicit rules for combining the factors into a total risk score are provided (A. Harris, Phenix, Hanson, & Thornton, 2003). Other advantages are that with appropriate training, the scale can be scored quickly based on commonly available demographic and criminal history information, without a detailed file review or interview with the offender. The website for the scale (www.static99.org) contains an evaluator workbook that includes normative data for interpreting Static-99/R (nominal risk categories, absolute recidivism estimates, percentiles, and risk ratios) and sample reporting templates and is regularly updated with more recent research and normative data for the scale. Although Static-99R was designed to predict sexual recidivism, normative data for violent recidivism risk has previously been available for the scale as well. Most recently, Babchishin, Hanson, and Blais (2015) have found that the inclusion of so many items assessing sexual deviance overly dilutes the scale’s predictive accuracy for violent recidivism. Consequently, the developers of Static-99R no longer recommend its use to comment on violent recidivism risk among sex offenders. Instead, they recommend using the BARR-2002R (Brief Assessment of Recidivism Risk-2002R), which was created form a subset of Static-2002R items (see Babchishin et al., 2015).

In terms of the psychometric properties of the risk scale, recent meta-analyses found moderate accuracy in predicting sexual recidivism for both Static-99 (d = 0.67, k = 63, n = 20,010; Hanson & Morton-Bourgon, 2009) and Static-99R (d = 0.76, k = 23, n = 8106; Helmus, Hanson et al. 2012). The interrater reliability for Static-99/R reported across different samples was found to be generally high (ICC > 0.75; see Anderson & Hanson, 2010; Phenix & Epperson, 2015; Quesada, Calkins, & Jeglic, 2013). Risk ratios for Static-99R have been found to be highly stable across diverse samples and time period (Hanson et al., 2013), although the absolute recidivism rates per Static-99R score have significantly varied across samples (Helmus, Hanson et al. 2012), which complicates interpretation of the scale. Current recommendations for using Static-99R in light of this base rate variability are discussed by Hanson et al. (2015).

Risk Matrix 2000

The Risk Matrix 2000 (RM2000) has been adopted by the police, probation, and prison services of England, Wales, Scotland, and Northern Ireland (National Policing Improvement Agency, 2010; Social Work Inspection, HM Inspectorate of Constabulary for Scotland, & HM Inspectorate of Prisons, 2009). The RM2000 is an actuarial scale that assesses recidivism risk of adult male sexual offenders (Thornton et al., 2003). The scale is based on file information only and contains three separate sales: one for measuring risk of sexual recidivism (RM2000/S), one for measuring risk of nonsexual violent recidivism (RM2000/V), and one combination of the first two scales for measuring risk of any violent recidivism (RM2000/C).

The scoring of the RM2000/S includes two steps. In step 1 three risk items are scored (number of previous sexual appearances, number of criminal appearances, and age at next opportunity to offend) and offenders are assigned to four preliminary risk categories. In the second step four aggravating risk factors (any conviction for sexual offense against a male, any conviction for a sexual offense against a stranger, any conviction for a noncontact sex offense, and single – never been married) need to be considered. The presence of two or four aggravating factors raises the risk category by one or two levels, respectively. For the RM2000/V three items need to be scored (age on release, violent appearances, and any conviction for burglary) and offenders are also assigned to the four risk categories. The four nominal risk categories are low, medium, high, and very high risk. To get the score for the RM2000/C scale, the risk category points for the RM2000/S and RM2000/V need to be summed and converted into the four nominal risk categories.

In terms of the psychometric properties of the three risk scales, a recent a meta-analysis (Helmus, Babchishin, & Hanson, 2013) found moderate to high accuracy in predicting sexual recidivism for the RM2000/S (mean d = 0.74 in both fixed-effect and random-effects models, k = 15, n = 10,644), in predicting nonsexual violent recidivism for the RM2000/V (after adjusting the largest study weight, mean fixed-effect d = 0.98 and random-effects d = 0.96, k = 10, n = 9836), and in predicting any violent recidivism for the RM2000/C (fixed-effect d = 0.81 and random-effects d = 0.80, k = 8, n = 8277).

Recently, Lehmann, Thornton et al. (2015) developed non-arbitrary metrics for risk communication for the RM2000 (i.e., percentiles, risk ratios, and absolute recidivism estimates) based on combining offenders from four samples of fairly routine (i.e., complete/unselected) settings: England and Wales, Scotland, Berlin (Germany), and Canada (n = 3144). Although there were meaningful differences across these samples in the distribution of Risk Matrix scores, relative increases in predictive accuracy for each ascending risk category were remarkably consistent across samples. However, recidivism rates for the median risk category also showed some variability across samples for the Risk Matrix 2000 Violence and Combined scales, but not for the Sex scale (Lehmann, Thornton et al., 2015).

The Crime Scene Behavior Risk Measure

Whereas previous actuarial risk assessment instruments of static risk factors focused on the criminal history of sexual offenders, recent research indicates that sexual offender risk assessment can be improved by also utilizing crime scene behavior as indicators of risk for sexual recidivism. The seven items (explicit offense planning, sexualized language, actively seeking victim, no multiple juvenile offenders, approach-explicit, male victim at index offense, and hands-off: victim active) that comprise the Crime Scene Behavior Risk measure (CBR; Dahle, Biedermann, Lehmann, & Gallasch-Nemitz, 2014) showed high predictive accuracy for sexual recidivism with little variation between the development (c indexFootnote 2 = 0.72; n = 995) and the replication sample (c index = 0.74; n = 77).

The interrater reliability for the CBR total score ranged from moderate (ICC = 0.60) in the development sample to excellent (ICC = 0.89) in the cross-validation sample. For risk communication the authors provide estimated recidivism rates for each CBR score after 5 and 10 years. Further, the CBR was found to provide significant incremental validity and to improve the predictive accuracy of the Static-99R risk assessment tool (Dahle et al., 2014). Accordingly, the authors of the CBR recommend using the published nominal risk categories of the Static-99R (Helmus, Thornton et al., 2012) as an initial assessment of recidivism risk and adjusting the risk level according to the CBR score to obtain a better overall evaluation of recidivism risk. Hence, the assessment of sexual recidivism risk using different sources of information should yield a better understanding of the recidivism risk that emanates from a specific offender.

Stable-2007

The Stable-2007 (Hanson, Harris, Scott, & Helmus, 2007) is an interview- and file-review-based instrument designed to assess stable (i.e., medium- to long-term) dynamic risk factors for sexual recidivism, which are unlikely to change without deliberate effort (i.e., treatment targets; Hanson & Harris, 2013). Items are scored on a 3-point scale ranging from “0, no problem;” “1, maybe/some,” to “2, yes, definite problem.” The instrument contains 13 items divided into the 5 subsections of significant social influences, intimacy deficits (i.e., capacity for relationship stability, emotional identification with children, hostility toward women, general social rejection/loneliness, and lack of concern for others), sexual self-regulation (i.e., sex drive/preoccupation, sex as coping, and deviant sexual interests), general self-regulation (i.e., impulsive acts, poor cognitive problem-solving, and negative emotionality/hostility), and cooperation with supervision. The total score is obtained by summing all items and can range from 0 to 26 for offenders with a child victim and 0 to 24 for other offender types (the item emotional identification with children is scored only for offenders with a child victim). The Stable-2007 can inform decisions about treatment targets as well as about moderate- to long-term recidivism potential with higher scores indicating greater risk of sexual recidivism. In addition to detailed coding rules for each item, the Stable-2007 scoring manual also includes sample interview questions, practice cases, reporting suggestions, and advice for maintaining high quality risk assessments (Fernandez et al., 2014).

Excellent interrater reliability has been found for the Stable-2007 total score (ICC > 0.75; Fernandez, 2008; Hanson et al., 2007). The predictive accuracy of the Stable-2007 for sexual recidivism was found to range from moderate (e.g., AUC = 0.67; Hanson et al., 2015) to high (e.g., AUC = 0.71; Eher, Matthes, Schilling, Haubner-MacLean, & Rettenberger, 2012).

For risk communication the authors provide nominal risk categories for the Stable-2007 as follows: 0–3 = low need, 4–11 = moderate need, and 12 or greater = high need, as well as percentiles (Fernandez et al., 2014). Hanson et al. (in press) found the Stable-2007 to add incrementally to the Static-99R and Static-2002R in most analyses. Of the scales, however, the Static-99R and Static-2002R had higher predictive accuracy than the Stable-2007. Consequently, the scale developers recommend using it in conjunction with a static scale (Hanson, Helmus, & Harris, 2015). The current evaluator workbook of Stable-2007 contains 1-year, 3-year, and 5-year recidivism estimates for risk categories based on combining Stable-2007 with either Static-99R, Static-2002R, or the Risk Matrix-2000 (Helmus et al., 2014; Helmus & Hanson, 2013).

Acute-2007

The Acute-2007 (Hanson et al., 2007) is an interview- and file-review-based instrument designed to assess acute dynamic (i.e., rapidly changing) risk factors for sexual recidivism which is essential to managing sexual offenders on community supervision. Items are scored on a 4-point scale ranging from “0, no problem;” “1, maybe/some;” “2, yes, definite problem;” to “3, intervene now.” The Acute-2007 includes seven items (access to victims, sexual preoccupation, hostility, rejection of supervision, emotional collapse, collapse of social supports, and substance abuse), all of which are predictive of general recidivism. For predicting sexual or violent recidivism, however, a subscale of only four items is included (the first four listed above; Hanson et al., 2007). Some subsequent analyses have suggested that the four items of the sexual/violence subscale represent more of an approach trajectory toward offending, whereas the three additional items are more indicative of an emotional collapse/avoidant trajectory toward offending (Babchishin, 2013). Scores for the sex/violence subscale can range from 0 to 12, whereas the total of the general recidivism scale can range from 0 to 21, with higher scores indicating a higher likelihood of recidivism. The cut scores for the sex/violence subscale are 0 = low, 1 = moderate, and 2+ = high imminent recidivism risk. For the general recidivism scale, the recommended cut scores are reported as 0 = low, 1–2 = moderate, and 3+ = high.

In the development study the interrater agreement for the individual Acute items ranged from good to excellent with a median ICC of 0.90. Feedback from users suggested that the brevity of the item descriptions in the coding manual might be contributing to subjective variability in scoring some items. Consequently, a new manual with more comprehensive item descriptions along with examples for item scoring is in development. Both the general scale (AUC = 0.72) and the sex/violence subscale (AUC = 0.74) showed high ability to differentiate between the imminent sexual recidivists and the non-recidivists in the development sample (Hanson et al., 2007), though the three extra items of the general scale did not predict sexual recidivism. The sex/violence subscale significantly predicted imminent (within 45 days) sexual, violent, and any recidivism after controlling for the combined Static-99/Stable-2007 categories whereas the general recidivism Acute score only added incrementally to the prediction of violent and general recidivism. Accordingly, specific rules on how to combine static, stable, and acute factors into three priority levels were constructed by the authors. For risk communication relative risk ratios for sexual recidivism within 45 days based on combined Static-99, Stable-2007, and Acute-2007 scores are presented for the three priority levels. Recently, Babchishin (2013) investigated the temporal stability of the factor structure of the Acute-2007 and found observed changes to be attributed to true changes on risk-relevant propensities assessed by the Acute-2007, as opposed to measurement error.

Violence Risk Scale-Sexual Offender Version (VRS-SO)

The VRS-SO (Wong, Olver, Nicholaichuk, & Gordon, 2003) is a 24-item interview- and file-review-based instrument comprised of 7 static (e.g., age at release, prior sex offenses, unrelated victim) and 17 dynamic items which are scored on a 4-point Likert-type scale ranging from 0 to 3, with higher scores indicating increased risk for sexual recidivism. Factor analysis of the dynamic items generated three factors labeled sexual deviance (α = 0.87; e.g., deviant sexual preference, offense planning, sexual compulsivity), criminality (α = 0.79; e.g., impulsivity, substance abuse, compliance with community supervision), and treatment responsivity (α = 0.72; e.g., insight, treatment compliance, cognitive distortions). Accordingly, the first two factors are consistent with the two major constructs related to sexual reoffending discussed above. All 24 items are used to assess recidivism risk. However, the VRS-SO was designed to integrate sex offender risk assessment and risk reduction through treatment. Therefore, the dynamic items are used to identify treatment targets and to measure change. Here, change is measured on the basis of a modified application of the key transtheoretical constructs of stages of change (SOC; Prochaska, DiClemente, & Norcross, 1992). The progression in the SOC is supposed to indicate the extent to which the offender has improved (i.e., changed). Therefore, treatment targets (i.e., dynamic items rated 2 or 3) are given a SOC rating at pre- and posttreatment and both ratings are compared to quantify change (Olver, Wong, Nicholaichuk, & Gordon, 2007).

The developers of the scale investigated the psychometric properties of the VRS-SO (Olver et al., 2007). Excellent interrater reliability has been found for the pretreatment (ICC = 0.74) and posttreatment (ICC = 0.79) dynamic item total score. The predictive accuracy of the VRS-SO total score was found to be high for sexual recidivism for both pretreatment (AUC = 0.71) and posttreatment (AUC = 0.72). Also, both the VRS-SO static and dynamic item total scores made unique contributions to the prediction of sexual recidivism after controlling for Static-99. These findings were replicated in an independent validation study (Beggs & Grace, 2010). One limitation of this research is that similar to other scales, Olver, Beggs Christofferson, and Wong (2015) have found significant variability in the recidivism rates of two samples, even after controlling for the VRS-SO pretreatment score. Such variability poses a challenge for the creation of generalizable recidivism estimates.

Importantly, therapeutic change (i.e., positive change in dynamic items) was found to be significantly related to reduction in sexual recidivism after controlling for risk and follow-up time (Beggs & Grace, 2011; Olver et al., 2007, 2014). In their most recent risk communication efforts, Olver et al. (2015) applied an intuitively useful method of conceptualizing and communicating change to the VRS-SO. Olver and colleagues used the Clinically Significant Change model, which incorporates offenders’ change relative to external standards of what is “functional” and takes into account whether the change is reliable (i.e., likely accounted for by more than measurement error). Using this technique, the authors found that Clinically Significant Change provided some unique information in predicting recidivism beyond pretreatment risk scores, and they offered examples of how this approach can facilitate risk communication.

Survey Findings: What Is Used in Applied Practice?

Several surveys have been conducted to assess practical applications of risk assessment (e.g., what scales are used and how the information is incorporated). Examining 111 risk assessment reports for preventative detention hearings in Canada (intended for offenders at high risk of violent recidivism), Blais and Forth (2014) found that over 90 % of experts (appointed by either the prosecution or appointed by the court) used an actuarial risk assessment scale, compared to 53 % who used an SPJ scale. The PCL-R (Psychopathy Checklist-Revised) , designed to assess the construct of psychopathy (not as a risk assessment scale), was used in over 95 % of risk assessment reports. In terms of scales designed to assess risk of recidivism, the most commonly used scale was the Static-99, used in over 60 % of cases, which is surprising given that not all candidates for preventative detention are sex offenders. The next most commonly used scales were the VRAG (Violence Risk Appraisal Guide; 48 % of reports) and the SORAG (Sex Offender Risk Appraisal Guide; 42 % of reports), both of which are actuarial. Other risk scales were used in one quarter or less of cases.

In a particularly large study, Singh and colleagues (2014) surveyed 2135 mental health professionals who had conducted at least one violence risk assessment. Half of the respondents were from Europe, followed by 21 % from North America, 5 % from Australasia, and 3 % each from South America and Asia. Among this diverse sample, over 400 different instruments were reported as being used for violence risk assessment, although roughly half had been developed specifically for personal or institutional use only. Among the 12 most frequently used risk scales, half were actuarial and half were SPJ, with the HCR-20 (Historical Clinical Risk Management 20, an SPJ scale) reported as the most commonly used, followed by the PCL-R.

Neal and Grisso (2014) surveyed 434 psychologist and psychiatrist members of various professional associations, mostly from the United States, Canada, Europe, Australia, and New Zealand, who described 868 cases they had completed. The most common types of referrals these professionals dealt with included competence to stand trial, violence risk, sex offender risk, insanity, sentencing, disability, child custody, civil commitment, child protection, and civil tort. Use of structured assessment tools (e.g., note this is broader than risk assessment tools and could include personality assessments) varied based on the type of assessment being conducted, with the lowest rates of structured tool use reported for competence to stand trial cases (58 %), disability cases (66 %), and civil tort cases (67 %). Sex offender risk cases were most likely to use structured tools (97 %), followed by child protection cases (93 %) and violence risk cases (89 %).

Among sex offender risk cases, Neal and Grisso (2014) found that the most frequently used tools were by far the Static-99/R or Static-2002/R (which were clumped together), used in 66 % of cases. The next most commonly used tools were all either designed to assess a single construct or were personality assessments—none were designed for sex offender risk assessments. These included the PCL-R (35 % of cases), Minnesota Multiphasic Personality Inventory (MMPI; 27 % of cases), Personality Assessment Inventory (PAI; 23 % of cases), and the Millon Clinical Multiaxial Inventory (MCMI; 17 % of cases). Other sex or violent risk assessment scales, such as the Sexual Violence Risk-20 (SVR-20) , Risk for Sexual Violence Protocol (RSVP) , Stable-2007, SORAG, and VRAG, were used in less than 15 % of cases. Note that the SVR-20 and RSVP are SPJ scales, whereas the others are actuarial. Similar results were found in a survey of American psychologists, conducted by Archer, Buffington-Vollum, Stredny, and Handel (2006). For adult sex offender risk assessments, Static-99 was still the most commonly used scale (mentioned by roughly half of participants), but with a smaller margin over other frequently used scales, which included the SVR-20, Minnesota Sex Offender Screening Tool-Revised (MnSOST-R) , Rapid Risk Assessment for Sex Offense Recidivism (RRASOR) , and the SORAG. Note that Stable-2007 did not exist when this survey was completed. These findings mirror survey results of sex offender civil commitment evaluators (Jackson & Hess, 2007) and sex offender treatment programs (McGrath et al., 2010) which found Static-99 to be the most commonly used risk scale, by a wide margin. Additionally, among treatment programs, dynamic risk scales were being more widely adopted, with Stable-2007 being the most frequently used (McGrath et al., 2010).

Other important findings from surveys pertain to how experts use information from risk scales. In SPJ scales, the only information available is a nominal risk category (with the exception of the SARA, which provides some percentile information, although not for the final risk judgment; Kropp & Gibas, 2010). In actuarial scales, it is possible to report absolute recidivism estimates. Additionally, some scales may also provide information on percentiles or nominal risk ratios. In their study of Canadian preventative detention hearings, Blais and Forth (2014) found that over 95 % of risk assessment reports mentioned a nominal risk level. For actuarial scales, roughly two thirds of reports mentioned a total score, 37 % reported a percentile, and over 90 % reported absolute recidivism estimates. For SPJ scales, although the intent of the scales is NOT to sum the risk factors, 24 % of reports also included a mechanical total score from the scale. In a more recent survey of 109 experts who use the Static-99R in Sexually Violent Predator evaluations in the United States (Chevalier et al., 2014), 83 % included nominal risk categories and absolute recidivism in their reports, while 35 % included percentiles and 33 % include risk ratios. When asked to rank the importance of the various risk communication metrics, 54 % of the evaluators reported that absolute recidivism estimates provided the most important information about recidivism risk, compared to 25 % who felt the nominal risk categories provided the most important information.

Clinical Advantages to Actuarial Risk Assessment

Psychologists have been instrumental for more than a century in developing, validating, refining, and implementing scientifically rigorous procedures that have advanced our understanding of psychological constructs and our prediction of future behavior. Evidence-based practice, or the practice of providing services that have empirically demonstrated effectiveness for each client’s needs, has become the standard among clinicians and within most organizations and has extended into the field of assessment. Hunsley and Mash (2010) note that evidence-based assessment “relies on research and theory to guide the selection of constructs to be assessed for a specific assessment purpose, the methods and measures to be used in the assessment, and the manner in which the assessment process unfolds” (p. 7). In the area of correctional intervention, the use of evidence-based assessment tools such as actuarial risk measures is the first step in a comprehensive evidence-based approach, which includes assessing the client, formulating a case conceptualization, determining the client’s needs, deciding on and implementing a program of treatment, and monitoring and evaluating the outcome.

Evidenced Based Practice in Correctional Settings

There is extensive research into the basic principles that should be adhered to for human services to have the greatest positive impact. Within correctional work, research supports that the more risk, need, and responsivity factors a program adheres to, the more effective it is in reducing recidivism, while programs that do not incorporate these principles potentially increase recidivism (Dowden & Andrews, 2004; Flores, Russell, Latessa, & Travis, 2005; Lowenkamp, Pealer, Smith, & Latessa, 2006; Smith & Schweitzer, 2012; Wormith, Althouse, Reitzel, Fagan, & Morgan, 2007). Specifically, intervention is most effective when targeted proportionally to offender risk (risk principle), focusing on criminogenic needs (need principle), and matched to the learning style and needs of the offenders (responsivity principle).

Consequently, evidence-based assessment is a critical first component to an effective correctional intervention (i.e., identification of the first two principles: risk and need). As part of that approach, risk assessment tools can “facilitate decisions about the intensity of intervention in accordance with risk needs responsivity (RNR) principles ” (Hilton, 2014, p. 88), thus maximizing intervention effectiveness. However, Andrews and Dowden (2005) note that inconsistencies or a lack of implementation integrity across providers is related to differences in program outcomes. Risk assessment tools, like any part of an evidence-based intervention, must be implemented with integrity to be maximally effective. For example, two field studies examining the real-world utility of Static-99 show remarkable variability. In Texas, Static-99 demonstrated minimal accuracy in predicting sexual recidivism (AUC = 0.57; Boccaccini, Murrie, Caperton, & Hawes, 2009). In contrast, California implemented the scale with rigorous training, mentoring, and ongoing quality control policies (e.g., mandatory re-certification by users) and reported exceptionally high predictive accuracy (AUC = 0.82; Hanson, Lunetta, Phenix, Neeley, & Epperson, 2014). The discrepancy in these results from two American jurisdictions highlights the importance of implementation integrity. Additionally, Hanson et al. (2014) found meaningfully higher predictive accuracy for actuarial risk scales scored by front-line staff who were more committed to the project (defined as those who completed all the requested information). For additional suggestions on best practices for quality control, see Fernandez and colleagues (2014).

In the second half of this chapter, we argue that actuarial measures form a critical part of evidence-based practice and particularly enhance program integrity by providing a standardized and structured approach to the critical first steps (assessment) of any correctional intervention. We focus on the advantages of actuarial measures as part of implementing an effective evidence-based intervention program within a clinical practice, forensic setting, or organization. Adapting Bernfeld, Blase, and Fixsen (1990) “multilevel systems perspective,” the strengths and usefulness of actuarial risk assessment instruments in clinical practice are discussed across the four levels important to human service delivery: namely, the client, program, organizational, and societal levels.

The Client Level

Actuarial risk assessment has the potential for several direct advantages for the client including providing opportunities for a collaborative working relationship with the assessor, an introduction to the therapeutic relationship and to the concept of risk, identification of treatment targets, and making the best match between the client and the appropriate type of treatment. Shingler and Mann (2006) note that risk assessment offers a unique collaborative opportunity to build rapport and set the stage for subsequent intervention. The first step of their sexual offender intervention program, the Structured Assessment of Risk and Need (SARN; Webster et al., 2006), specifically integrates collaboration into the risk assessment process. Their in-house training encourages assessors to approach the risk assessment as a critical first step in the treatment process and emphasizes that the experience the offenders have during a risk assessment can heavily impact their desire to engage in treatment and the offenders’ trust of the process. Offenders themselves have expressed the importance of contributing to the assessment, and getting their side represented, in their sense of fairness and confidence in the outcome of the risk assessment process (Attrill & Liell, 2007). A thorough assessment at the front end of treatment using measures that identify factors empirically related to recidivism can help to focus the client on the important issues necessary for offenders to be able to identify and cope with risk factors to reduce the risk of recidivism (Proulx, Tardiff, Lamoureeux, & Lussier, 2000), saving them both time and effort as they move through the rehabilitative process. A collaborative approach to risk assessment, particularly an approach in which risk factors are thoroughly explained and the client contributes to identification of their most relevant treatment needs, provides clients with a sense that they have some control over their assessment and subsequent treatment, in contrast to feeling that assessment and intervention are something done “to them” (Attrill & Liell, 2007; Shingler & Mann, 2006). A structured approach to matching client risk and needs to treatment level can contribute to a sense of “fairness” within risk assessment, which is another area identified as important to offenders (Attrill & Liell, 2007).

While little research has examined offenders’ perceptions of risk assessment, Attrill and Leill (2007) interviewed 60 adult sexual offenders regarding their views of risk assessment. A consistent finding during these discussions was offenders’ concerns about the level of skill and training of the professionals completing the assessments. The identification of relevant risk factors that are empirically related to recidivism combined with the defined weighting of those risk factors offers an advantage to actuarial measures in this respect. The structured system for weighting the items can make it clear to the client that the assessor’s personal biases, level of experience, and skills do not directly influence the assessed level of risk. This is in contrast to SPJ tools that encourage professionals to rely on their experience and skills to examine the risk factors present and determine an overall risk level without a specific structure for combining risk factors (Skeem & Monahan, 2011). The structure associated with actuarial tools, however, has the potential to provide offenders with some sense of consistency, transparency, and evenhandedness to the outcome regardless of the real or perceived qualifications of the assessor.

Critics of actuarial measures note that the specified structure of actuarial tools necessarily limits the “individuality” of risk assessments; this concern was voiced by offenders themselves (Attrill & Liell, 2007). However, as noted earlier in this chapter, the move in recent years toward the integration of dynamic risk assessment with static risk factors provides room for individualization within the overall risk assessment while maintaining the consistency necessary for defensible integrity in implementation. Further, dynamic risk factors allow for more attention to some positive attributes or strengths, which may foster the therapeutic relationship and help in establishing approach rather than avoidance treatment goals (Mann, Webster, Schofield, & Marshall, 2004). As such we would argue that actuarial tools, when implemented well, have the advantage of providing the structure and consistency necessary for strong program integrity and limiting variability in assessor experience and knowledge while still allowing for individuality in the overall assessment.

The Program Level

As described in the first half of this chapter, actuarial risk assessment has evolved as an alternative to UCJ, widely recognized as less accurate, unreliable, and non-replicable. In fact, concerns about the predictive validity of clinical judgment have resulted in the mandated use of actuarial measures within some organizations (e.g., SIR-R used at intake within Correctional Services of Canada; Structured Assessment of Risk and Need, HM Prison Service) and legal jurisdictions. Critics of UCJ note that given its subjective nature, it is difficult to standardize judgments made by a single clinician over time let alone to standardize judgments made by multiple clinicians within one setting. Larger practices and organizations that employ multiple clinicians are often faced with considerable variability in terms of prior training and experience among staff. In more isolated or rural areas, clinicians may be called upon to provide assessments on rare occasions, meaning they bring limited knowledge and expertise to the assessment. The experience and knowledge required to appropriately and effectively use structured professional judgment tools may simply not exist or be realistic in these circumstances.

An advantage of actuarial measures as previously stated is they provide clear direction regarding not only the relevant factors but how to combine those factors into an assessed risk level. Within an intervention program, the detailed manuals that come with many actuarial tools contribute to consistency in application, potentially serve as a guide against which assessments can be audited, provide a training base for new employees, and can minimize program “drift” that may otherwise occur when clinicians are left to make decisions without structured direction. Not only do the manuals associated with actuarial measures provide a framework for appropriate training and skill acquisition for clinicians involved in an evidence-based program, but clinicians report enhanced confidence in their assessments based on actuarial measures (Dr. A. Schweighofer personal communication, 2014). In Neal and Grisso’s (2014) survey, the second most common reason cited by psychologists and psychiatrists for using structured tools in risk assessment after ensuring an evidence-based method was “to improve the credibility of my assessment.” The third most common reason was “to standardize the assessment” indicating that clinicians themselves perceive value to ensuring that risk assessments have consistent meaning across clinicians, sites, and organizations. Thus there are substantial advantages to the inclusion of actuarial tools in terms of training, consistency, and implementation integrity within evidence-based programs.

The Organizational Level

Leschied, Bernfeld, and Farrington (2001) note that there is political and sometimes philosophical opposition to “what works” in effective correctional interventions. Managerial doubts can undermine the impact and effectiveness of a program. A good defense to this is to rely on tools with heavy empirical support and demonstrated consistency and replicability; this leaves less room for argument. Actuarial risk assessment meets four goals critical to any organization managing offenders: (1) they identify the level of risk for an individual within a group or population of individuals, (2) they identify contributing salient risk factors that are appropriate targets for intervention (assuming dynamic risk assessment is used), (3) they identify strategies that manage or minimize risk, and (4) they communicate risk information (Mills, Kroner, & Morgan, 2011).

The identification of risk level within a population, along with the contributing risk factors, appropriate treatment targets, and strategies for managing risk, has the potential to directly impact policy in relation to management and intervention of offenders within an organization. A clear management framework and consistent structure for handling offenders within an organization (based on their risk assessment results) should result in time and resource efficiencies. Additionally, a standardized approach facilitates the identification of, planning for, and streamlining of staff training needs. Good quality staff development and training along with subsequent supervision can balance inequality in prior qualifications, knowledge, and skill level among staff members (Mann, Fernandez, & Ware, 2011).

Additionally, the fourth goal of risk communication is critical to the ethical and appropriate management of offenders within an organization. Mills and Kroner (2006) found that risk judgments given using high, moderate, and low categorizations were overestimated, even when the base rate of offending was provided. They note that subjective risk categories lack “solid empirical meaning” and may cause under- or over-estimates of risk, resulting in suboptimal resource allocation to offenders managed within the organization. As noted earlier, actuarial measures typically provide multiple methods to quantify risk, including recidivism estimates, percentiles, and risk ratios along with nominal risk categories. Thus an advantage to actuarial measures is that they provide a common language for risk communication. With appropriate training risk communication will hold the same meaning for everyone within the organization, including decision makers, and directly impact resource allocation.

The Societal Level

Controversy in the use of actuarial risk assessment has focused primarily on its use for decisions related to incarceration (e.g., civil commitment) and release (e.g., parole). There is less controversy over the use of risk assessment as part of treatment planning or about the identification of treatment needs using dynamic risk assessment measures, as is primarily discussed above. There is very little empirical research on the consumption of actuarial risk estimates generally (Scurich, Monahan, & John, 2012). Identified concerns include that decision makers may be misled to think that actuarial tools are more precise than they in fact are (Campbell, 2007) and consequently overly or inappropriately influence decisions made that impact offenders’ lives directly. However, this concern does not appear to be supported in recent research. For example, offenders referred for full SVP evaluations tend to have higher risk-measure scores than those who are not referred; mental health evaluators are more likely to conclude that an offender meets the criteria for civil commitment when risk scores are high; and attorneys are more likely to select cases for trial when risk measures are high (Boccaccini et al., 2009; Levenson, 2004; Murrie, Boccaccini, Rufino, & Caperton, 2012) suggesting that actuarial risk scores play an appropriate and essential role in determining who are the judges and jurors eventually (see Boccaccini, Turner, Murrie, Henderson, and Chevalier, 2013).

Once at trial, however, research suggests that mock jurors asked to make decisions in SVP cases are more likely influenced by testimony based on clinical judgment than risk assessment instruments and do not perceive actuarial testimony to be any more scientific than clinical testimony (Krauss, McCabe, & Lieberman, 2012; McCabe, Krauss, & Lieberman, 2010). Similarly, Boccaccini et al. (2013) found that risk-measure scores had little impact on real jurors surveyed after trial in Texas SVP cases. The authors posited that jurors may perceive that most offenders who are eligible for SVP commitment (most of whom are identified through actuarial measures) are “dangerous enough” or that jurors have retributive motives rather than being concerned with “protecting the public.” Regardless of the explanation, it appears that the use of actuarial measures serves an important purpose at the front end of this process (i.e., helping to ensure that the most restrictive measures are applied to the higher risk offenders) while idiosyncratic features may have more influence during the actual legal proceedings. Neal and Grisso (2014) make the interesting argument that current forensic training that encourages a too flexible approach to assessment may be a liability in that it interferes with the ability of courts to appropriately use risk assessment information as they are “required to become familiar with a bewilderingly wide range of tools” (p. 1417). The authors suggest that this could be minimized if clinicians are trained to select tools that are both appropriate to the referral question and have the best psychometric properties.

As we have noted previously, to be valuable risk assessment results must be communicated in a clear and appreciable manner to consumers (Heilbrun, Dvoskin, Hart, & McNeil, 1999). A reliable and valid risk assessment is of no use and in fact may be “worse than useless” if decision makers misapprehend the results (Heilbrun et al., 1999, p. 94). Interestingly, one study found that “unpacking” actuarial violence (i.e., explicitly articulating the extent to which individual risk factors impact the overall risk) appeared to aid subjects identified as “innumerate” with interpreting the results of actuarial risk assessments and more effectively applying the group-level risk estimates to the individual case (Scurich et al., 2012). Given the stakes involved in legal dispositions, we would argue that experts have a particular ethical obligation when communicating actuarial risk assessment results in high-stakes circumstances to precede the sharing of results with appropriate education on the meaning of risk.

Conclusions

While controversy remains regarding the use of actuarial risk assessment, actuarial measures continue to provide the most accurate available information, including for legal decision-making (Heilbrun, 1997). Critics of actuarial tools argue that because actuarial tools do not account for individual differences within their schemes, clinicians are unable to modify level of risk based on mitigating factors, and therefore there is a substantial margin of error inherent in actuarial measures (Hart & Cooke, 2013; Hart, Michie, & Cooke, 2007). Please note, however, that the statistics employed by Hart and colleagues (2007) cannot be used to support their position that group data cannot meaningfully be used to support inferences about individuals (e.g., Hanson & Howard, 2010; G. T. Harris, Rice, & Quinsey, 2008; Mossman & Sellke, 2007; Scurich & John, 2012). Also, it remains to be determined if the posited limitations on individuality produce greater error than clinical overrides based on individual items as applied in SPJ. Further, individuality can be incorporated (at least to some extent) into risk assessment by adding actuarial measures of dynamic risk factors and ensuring that the risk assessment process involves collaboration with the offender. Good risk assessment should use risk estimates obtained by actuarial methods and implemented with integrity, as an “anchor” alongside other measures that include factors that would allow for risk management. Actuarial measures do not replace a clinician’s integration and synthesis of information and selection and implementation of a plan of therapeutic action; rather, they can contribute to each aspect of the process. In other words, “scoring an actuarial risk tool is not a risk assessment” (Hanson, 2009, p. 174).

While some of the advantages of actuarial scales described in the present chapter are currently being realized, not all of them are necessarily being maximized by clinicians, programs, or organizations where actuarial risk measures are implemented. When asked, offenders often report a poor understanding of risk assessment, the benefits to them, and little sense of control or impact on the process (Attrill & Liell, 2007). Further, although many newer risk assessment tools include dynamic risk factors, there continues to be a lack of focus on strength or protective factors (Wilson, Desmarais, Nicholls, & Brink, 2010) in the risk assessment process. Wilson et al. note that strengths are not just the opposite of deficits, but capture unique information. This appears to be the next step in risk assessment research.

We also acknowledge that while the importance of consistency and reliability in risk assessment cannot be overemphasized, actuarial measures work best when the offender being assessed possesses characteristics similar to the development sample or validation research of the measure. Regardless of the measure chosen (whether by the clinician or as part of a standardized program or mandated by legislation), it is up to the clinician to ensure that measures used are appropriate to the client being assessed. Actuarial measures are not appropriately applied to every client, and there are circumstances where the current state of the research means that clinical judgment remains the only option. However, in the majority of cases, anchored risk assessment as part of a comprehensive “case conceptualization” should be used to inform intervention at a more individualized level. In our estimation, the integrated-actuarial approach to risk assessment, when implemented with thought and integrity, holds some valuable clinical advantages while leaving sufficient room for individualization.