Sexual harassment remains a persistent and serious problem for Human Resource (HR) professionals, corporate and public policy makers, and the legal profession. Although the Equal Employment Opportunity Commission’s (EEOC) original intent when it defined sexual harassment as a form of sex discrimination under Title VII (EEOC 1980) was to eliminate the phenomenon in the U.S., one could argue that it simply revealed the pervasiveness of sexual harassment in the modern workplace. Indeed, over 11,000 sexual harassment claims were filed with the EEOC in 2011 alone, costing employers over 50 million dollars in monetary payouts (EEOC 2011). Notably, this figure does not include additional monies won by claimants in litigation, nor does it include attorneys’ fees, internal investigation costs, or lost productivity costs. Thus, we can safely assume that the actual cost of sexual harassment to employers is substantially higher.

Current U.S. sexual harassment law effectively makes it management’s responsibility to implement programs reasonably calculated to prevent harassment and to correct it if it occurs, or else face a heightened likelihood of liability. A common element of harassment prevention programs is training. In many respects, harassment training has become effectively, if not explicitly, legally required. Yet there is very little empirical evidence that training is an effective preventative measure (Bisom-Rapp 2001a, b; Dobbin and Kelly 2007). Following Wiener and Hurt’s (1997, 1999) call for a “social analytic jurisprudence” wherein psycho-legal scholars examine the underlying assumptions in law and bring social science research to bear in testing those assumptions, we examine the underlying assumption regarding training’s efficacy in harassment prevention. More specifically, we explore the extent to which training of managers sensitizes them to sexual harassment and, more importantly, enables them to accurately identify harassment and recommend an appropriate action in response.

Our study is grounded firmly in the context of U.S. law where sexual harassment is conceptualized as a form of employment discrimination for which the employer may be held liable. While it may have some application to other countries, it will be less relevant for countries such as Germany (Zippel 2006) or France (Saguy 2003) where sexual harassment is treated as an assault on dignity or a form of sexual violence so that individual offenders might be held liable, but not employers. As a result, German and French employers have taken little action to prevent or correct workplace harassment.

Sexual Harassment Training and the Law

There are at least four ways in which U.S. law has encouraged or effectively mandated sexual harassment training. First, and most important, developing case law interpreting and applying Title VII of the Civil Rights Act has elevated training to an effective defense to charges of harassment in many cases. Second, EEOC Guidelines explicitly call for training. Third, high profile settlement agreements and consent decrees typically incorporate a training component. Finally, at the state level, many state legislatures or governors have enacted statutes or issued executive orders, respectively, that mandate training.

Prevention of discrimination has long been considered one of the primary purposes of Title VII of the Civil Rights Act. At least since the Supreme Court’s explicit endorsement of the law’s preventive aims in Albemarle Paper Co. v. Moody (1975), HR professionals have “recommended two arrows from their professional quiver: grievance procedures and training” (Dobbin and Kelly 2007, p. 1203). As sexual harassment came to be understood as unlawful discrimination, grievance procedures and training seemed especially apt as preventive tools. Grievance procedures might serve to keep complaints internal, encourage informal resolution of complaints and, over time, perhaps reduce the number of complaints and reduce discrimination itself. Finally, it was suggested that “if external complaints were filed, courts would look favorably on organizations that had taken steps to provide internal due process” (Edelman et al. 1999, p. 413). Similarly, training advocates suggested that training could sensitize employees regarding tolerable and intolerable forms of workplace social-sexual conduct, foster greater tolerance and, ultimately, alter workplace behavior and culture in ways that “can prevent, or at least greatly curb, sexual harassment” (Bisom-Rapp 2001b, p. 148). Again, if external harassment complaints were filed, a strong training program might provide evidence of good-faith efforts at prevention. Popovich (1988) summarized four steps that employers could take to deal with sexual harassment in the workplace. These included developing a clear sexual harassment policy, developing and articulating a grievance procedure, educating employees concerning sexual harassment, and providing support to harassment victims.

In 1998 and 1999 the U.S. Supreme Court issued three important decisions dealing with workplace sexual harassment. The three cases—Faragher v. City of Boca Raton (1998), Burlington Industries v. Ellerth (1998), and Kolstad v. American Dental Association (1999)—have important implications for sexual harassment training and mark the culmination of a long-term trend in the law such that sexual harassment training is now effectively legally required.

Faragher and Ellerth were companion cases, issued on the same day, and addressed questions concerning employer liability. The Court ruled that, in cases where managers harass, but no tangible employment action against the victim results, employer liability is subject to an affirmative defense. Specifically, the employer can avoid liability if it can prove “(a) that the employer exercised reasonable care to prevent and correct promptly any sexually harassing behavior; and (b) that the plaintiff employee unreasonably failed to take advantage of any preventative or corrective opportunities provided by the employer or to avoid harm otherwise” (quote appears in both Faragher (p. 807) and Ellerth (p. 765)). The following year, the Supreme Court ruled in Kolstad v. American Dental Association (1999) that the award of punitive damages in a proven disparate treatment case should consider whether discriminatory employment decisions of managerial agents were contrary to the employer’s “good-faith efforts” to comply with Title VII. Using language similar to the “reasonable care” requirement provided by Faragher and Ellerth, the Court ruled in Kolstad that employers may avoid punitive damage awards if they can show that they acted with a “good faith effort” to prevent the harassment and discrimination. Specifically, the Court emphasized the necessity for employers to “educate their personnel” on harassment and discrimination if they wish to avoid punitive damages.

Training is a traditional tool of choice where employers wish to take, or demonstrate, “reasonable care to prevent” and “good faith efforts.” Evidence of effective training programs can now eliminate employer liability for workplace harassment in many situations and allow employers to avoid imposition of punitive damages even in cases of proven harassment and discrimination. Writing shortly after the Faragher and Ellerth decisions, DiLorenzo and Harshbarger (1999) anticipated a much expanded role for training in demonstrating the employer’s “reasonable care to prevent” harassment. Similarly, Buchanan and Wiswall (1999) speculated that the courts would require more than an effective written policy to demonstrate reasonable care and recommended that employers provide extensive training, including training for all new hires, periodic training for all employees, and enhanced training for supervisors. While the Supreme Court has not gone so far as to explicitly mandate training, it has effectively made training a duty for at least larger employers.

The year following the Faragher and Ellerth decisions, the EEOC issued new policy guidance suggesting the employer “provide training to all employees to ensure that they understand their rights and responsibilities” and further recommended periodic supervisory training to “explain the types of conduct that violate the employer’s anti-harassment policy; the seriousness of the policy; the responsibilities of supervisors and managers when they learn of alleged harassment; and the prohibition against retaliation” (EEOC 1999a).

Only a few courts have insisted that training actually be effective before it can shield the employer from liability. For example, in Madison v. IBP, Inc. (2001), the Eighth Circuit found IBP, Inc. liable in a sexual and racial harassment case even though the company provided preventative training. The court reasoned that the training provided was insufficient because it did not have the effect of getting the managers to follow the company’s anti-harassment policy. Such cases suggest that it may not be enough for employers to simply provide training, but employers may need to demonstrate the effectiveness of their training programs in cultivating appropriate managerial responses to incidents of sexual harassment. The fact that only a few courts have been willing to examine the issue of training effectiveness, however, suggests the difficulty in distinguishing between effective and ineffective training programs.

Training is also frequently negotiated as a component of a settlement agreement or consent decree submitted for approval by a court (Bisom-Rapp 2001a; Mathiason and de Bernardo 1998). Several high profile settlements occurred in the years surrounding the Faragher, Ellerth, and Kolstad decisions. For example, in 1995 the EEOC, on behalf of eight former employees, sued Sears-Roebuck for sexual harassment, retaliation, and constructive discharge. An agreement to provide sexual harassment training to employees for a two-year period was part of the court-approved settlement of the lawsuit (Mathiason and de Bernardo 1998). In 1996, the EEOC filed a class action against Mitsubishi Motors over the sexual harassment of over 300 women in the company’s Normal, Illinois plant. After the court approved the EEOC’s petition for class certification, negotiations ensued culminating in a court supervised consent decree (EEOC v. Mitsubishi Motor Manufacturing of America 1998). In addition to $34 million in damages to victims, mandatory training on sexual harassment was part of the plan. Finally, in a 1999 consent decree, Ford Motor Company, in addition to providing $8 million in damages to victims of harassment, agreed to spend a projected $10 million dollars to train its employees (EEOC 1999b).

But it is not only the high-dollar high-profile cases where training is made part of the settlement. In scanning the log of EEOC press releases for recent years, one can scarcely go a month without running across at least one or two announcements of the settlement of sexual harassment cases by consent decree. Even in cases where there may be only one or two victims and a damage award of a few thousand dollars, training for all employees in the affected workplace is almost invariably a part of the settlement. The pattern established by these court-approved consent decrees is clear. In addition to providing monetary damages for victims of harassment, employers typically agree to train all their employees on their rights to be free from sexual harassment, often providing enhanced training to supervisors and managers on their responsibilities in dealing with harassment. In conjunction with these measures, employers often agree to upgrade their anti-harassment policies and complaint procedures and agree to periodic reporting to the EEOC or even independent monitoring for a period of time.

Although federal law does not explicitly mandate that employers provide preventative training programs, state legislatures and governors are beginning to explicitly mandate such training. A recent example of this is in California (Coyle and Sumida 2005) where Assembly Bill No. 1825 requires all employers with 50 or more employees to provide mandatory sexual harassment training to their supervisors. The law specifies that training must be provided in an interactive setting using expert trainers and requires that training address state and federal statutes, remedies available to victims of harassment, and practical examples of harassment in the workplace.

At least 22 states have an explicit policy regarding sexual harassment training, though provisions vary widely. Most are embedded in state statute, but several states require training through a governor’s executive order. Policies in ten states merely encourage sexual harassment training, but twelve states require mandatory training for public sector employees, sometimes only for specified groups such as managers and supervisors. Only three states extend their training mandate to private sector employers. Policies in some states say little beyond making training encouraged or required. But several states go further, addressing issues such as the timing of training, minimum number of hours of training, or requiring that refresher training be provided periodically. Only a few states address training methods or content.

Currently, a majority of U.S. employers have adopted anti-harassment policies and grievance procedures, and an increasing number of employers are providing anti-harassment training (Dobbin and Kelly 2007). As early as 1994, the U.S. Merit Systems Protection Board (1995) reported that all federal agencies had a sexual harassment policy and a training program. By 1998, Dobbin and Kelly (2007) estimated that 19 of 20 large employers already had harassment grievance procedures and at least 7 of 10 provided training. On the eve of the Faragher and Ellerth decisions, Mathiason and de Bernardo (1998) asserted, “[t]he handwriting is on the wall about sexual harassment training, and the hand is writing a clear message. Required training is on the way” (p. 26).

Unfortunately, many of the claims for the efficacy of grievance procedures and training were advanced in an empirical vacuum. Dobbin and Kelly (2007) argue that HR professionals “exaggerated the risk faced by the average employer and exaggerated the legal protection to be found in grievance procedures and training” (p. 1206). Bisom-Rapp (2001a) reviewed the existing literature on sexual harassment training (which we also review below) and concluded, “[t]here is, in light of the available research, absolutely no scientific basis for concluding that harassment training fosters employee tolerance and greatly alters workplace culture” (p. 38). While not going so far to suggest “absolutely no scientific basis,” others have acknowledged that the efficacy of sexual harassment training has not yet been demonstrated or even extensively studied (Beiner 2001; Fitzgerald and Shullman 1993; Grundman et al. 1997; Gutek 1997; Lengnick-Hall 1995; Moyer and Nath 1998; Pryor and McKinney 1995; Pryor and Whalen 1997).

Clearly, American law has come to view preventative training as a vital component in the effort to protect employees from workplace harassment. Yet, it is not clear that preventative training is up to this task (Bisom-Rapp 2001a, b; Dobbin and Kelly 2007). What is clear is that HR research and practice is lagging behind the legal developments in antidiscrimination law. In the decade and a half since the Faragher and Ellerth decisions, there has been very little research on the issue of sexual harassment training. The largely untested assumption lurking behind the law’s reliance on training is that employers know (or should know) what elements constitute an effective training program. To date, however, there is scant empirical evidence to support this assumption.

Efficacy of Sexual Harassment Training Programs

While social science offers few firm conclusions regarding the efficacy of anti-harassment training, the research literature on the topic is not quite as sparse as some critics have suggested (for reviews see Gutek 1997; King et al. 2011; Lengnick-Hall 1995; O’Leary-Kelly et al. 2009; or Wiener and Gutek 1999).

The earliest empirical studies of training effects began emerging around the time of Meritor v. Vinson (1986). Results were encouraging, at least, in studies of university residence advisors. Beauvais (1986) and Thomann et al. (1989) administered pre- and post-tests before and after a 2 h sexual harassment training session. Beauvais found that male participants, but not female participants, became more sensitized to sexual harassment after training. Thomann et al. observed statistically significant shifts in attitudes, beliefs, and knowledge of sexual harassment. Finally, Maurizio and Rogers (1992) tested residence advisors before and after a 2.5 h training session and found significant increases in knowledge, but only very small increases in attitudes.

One of the more robust conclusions from the empirical literature is that preventative training influences one’s sensitivity to sexual harassment, or the likelihood of perceiving harassment in any given scenario (Antecol and Cobb-Clark 2003; Beauvais 1986; Blakely et al. 1995, 1998; Bonate and Jessell 1996; York et al. 1997). Males tend to show a greater increase in sensitivity post-program than do females (Antecol and Cobb-Clark 2003), but females tend to be more perceptive of sexual harassment to begin with (York et al. 1997). Gutek et al. (1999) confirmed the general pattern of gender differences in sensitivity to harassment, but found training to a specific standard—a reasonable woman standard—had little effect on these judgments. Meta-analytic studies have also confirmed these gender differences (Blumenthal 1998; Rotundo et al. 2001), but suggest the effects are both small and context dependent. A few studies have examined age differences, generally finding older people more sensitive to harassment than younger people (Baker et al. 1990; Ohse and Stockdale 2008; Terpstra and Baker 1989).

As defined in these studies, sensitivity refers to an increased likelihood of identifying any given scenario as sexual harassment, whether or not it actually contains sexual harassment. Moyer and Nath (1998) have introduced an important distinction, pointing out that increased sensitivity to sexual harassment is not the same thing as increased expertise in accurately identifying harassment. While increased sensitivity to harassment may be a commendable outcome of training aimed at employees in general, when training is aimed at managers who must respond to harassment on the organization’s behalf, expertise may be more important than sensitivity. According to Moyer and Nath, expertise refers to an ability to discriminate by labeling scenarios as sexual harassment only when they contain harassment. Further, their results suggest that too much sensitivity in managers can be harmful to expertise in the sense that it increases false-positive identifications. Participants who received video-based training were more likely to perceive sexual harassment in a collection of hypothetical scenarios than untrained participants, but this advantage was offset by an increase in false-positive identifications (i.e., perceiving sexual harassment in a scenario when none exists). In a second experiment, Moyer and Nath suggested that training using written materials did in fact boost expertise, but this finding was restricted to male participants.

Additional results obtained in the Moyer and Nath (1998) study suggest the possible benefit of incorporating multiple training methods (e.g., videos, written materials, etc.) into the training program. This would certainly be consistent with the broader training and development literature. In a review of the research on transfer of training, Baldwin and Ford (1988) refer to incorporation of multiple methods into the training design as a form of stimulus variability. Specifically, they contend that transfer of training is maximized when a variety of relevant training stimuli are employed. By incorporating multiple methods into the training design, trainees avoid becoming attached to a narrow range of stimuli and responses. As Kazdin (1975) notes, differential reinforcement of various stimuli leads to response generalization and training transfer.

In the sexual harassment literature, the work of York et al. (1997) seems to support this hypothesis. York and colleagues found that participants whose training incorporated both video vignettes and written case analyses were more likely to label a scenario as sexual harassment than participants whose training only consisted of video vignettes. Thus, training designs that incorporate multiple learning methods may heighten sensitivity to harassment behaviors. Although the study did not examine the issue of expertise, the results suggest the potential utility of taking a multiple method approach to training.

One important limitation in each of the studies described above is that sexual harassment sensitivity—or expertise in the case of the Moyer and Nath (1998) study—was measured immediately upon the conclusion of a preventative training program. Thus, these studies failed to examine whether or not the participants retained the sensitivity or expertise over time. In an attempt to address this question, Wilkerson (1999) examined the effect of prior training on the ability to accurately label behavior as sexual harassment. Unlike the previous studies, Wilkerson had participants self-report whether they had received sexual harassment training sometime in the past (i.e., he did not manipulate who would receive training). His results indicated that previously trained participants identified strong cases of sexual harassment more accurately than untrained participants, but not weak cases. Importantly, the Wilkerson study did not address whether training effects dissipate over time (i.e., decay) or the effect of multiple methods. Nevertheless, the study did suggest the potential to retain trained material over time.

Research on the bottom-line question—Does training reduce harassment?—is sparse, but it is generally established that employee perceptions of organizational intolerance are associated with reduced reports of harassment (Fitzgerald et al. 1997; Willness et al. 2007). Two studies shed light more directly on the effects of training in signaling organizational intolerance for harassment. Gruber (1998) found that women whose employers adopted “proactive” measures, including providing training and complaint procedures, reported less harassment than women whose employers adopted merely informational measures (posters and pamphlets). In contrast, however, Williams et al. (1999) found that implementation of policies and procedures were associated with reduced reports of harassment, but that training had no independent effect. Thus, the ultimate role of training in reducing harassment remains unclear.

The Current Study and Its Hypotheses

The present exploratory study partially replicates and extends prior research on the effects of sexual harassment training. It is a scenario study, one that asks subjects to make judgments about brief written scenarios, so it shares the strengths and weaknesses of other scenario studies (Lengnick-Hall 1995). We expect to replicate some findings of other scenario studies, even though methodologies may differ. We draw on existing research in exploring effects of various aspects of training—quantity, variety of methods, and recency—on judgments of sexual harassment. But several features of the present study extend previous research in important ways.

First, all of our subjects are managers. While judgments of students or undifferentiated employees undoubtedly matter, judgments of managers are especially important in organizational responses to sexual harassment. A manager’s knowledge of harassment is generally imputed to the firm, so that employers can be held liable for the actions or inactions of their managers. A manager who becomes, or reasonably should have become, aware of sexual harassment and fails to take or initiate prompt and effective corrective action, has created a potential legal liability for the organization under the negligence standard. Unfortunately, too many managers are also harassers of their subordinates. Juliano and Schwab (2001) estimate supervisor harassment is alleged in 79 % of court cases. Managers who harass create strict (vicarious) liability for their organizations, subject only to the Faragher and Ellerth affirmative defense when no tangible employment action is taken. Thus, management’s ability to respond effectively to instances of sexual harassment is imperative.

Second, following Moyer and Nath (1998), the present study goes beyond judgments of sensitivity to sexual harassment, to the accuracy or expertise in making those judgments. Expertise in making judgments of sexual harassment is especially important among managers, arguably even more important than sensitivity. To assess expertise, we anchor our managers’ judgments to judgments made by subject matter experts. Operationally, expertise is inferred by the accuracy of manager agreement with the subject matter experts. York (1989), sampling a group of subject matter experts very similar to those used in the present study, found they exhibited high rates of consistency and inter-rater reliability. Inclusion of accuracy measures also permits assessment of the relationship between sensitivity and accuracy, particularly whether increasing sensitivity is associated with increased false-positive identifications (Moyer and Nath 1998).

Third, the present study models a bifurcated decision process similar to that suggested by Plater and Thomas (1998). In Step 1, managers must demonstrate expertise by discriminating between incidents in which sexual harassment is present from those in which it is absent. In Step 2, managers must decide what response is appropriate. Whereas Plater and Thomas (1998) examined judgments of culpability for misconduct in Step 2 of their model, the present study will examine both parts of this bifurcated decision process by assessing manager expertise in accurately identifying sexual harassment (Step 1) and judgments regarding the appropriate action to take in response (Step 2).

In sum, the present study is an exploratory study examining the effects of training quantity, variety, and recency on sensitivity and accuracy of manager judgments—both in identifying harassment and in recommending responses to harassment. While there is ample reason to doubt the effects of training, our hypotheses take the general form that training matters, and the more the better. We adopt this posture, despite some evidence to the contrary, because it is what the law and HR presumes to be correct.

Quantity of sexual harassment training is the first training related variable we examine. Previous research has indicated that trained participants are more sensitive to sexual harassment than untrained participants (Antecol and Cobb-Clark 2003; Blakely et al. 1998; Moyer and Nath 1998; Wilkerson 1999). Thus, it makes sense intuitively that there may be a cumulative effect. That is, the more training one receives, the more benefit one accumulates. Consequently, it is predicted that sensitivity and accuracy in identification of instances of sexual harassment will increase as sexual harassment training quantity (i.e., cumulative training hours) increases. Likewise, it is predicted that sensitivity and accuracy in response to instances of sexual harassment will increase as sexual harassment training quantity (i.e., cumulative training hours) increases.

The next training related variable we examine is variety of training methods. The literature seems to advocate the use of multiple methods. York et al. (1997) demonstrated that video vignettes and written materials were better than video vignettes alone. This is a finding consistent with the “stimulus variability” concept described by Baldwin and Ford (1988) where multiple methods not only increase the diversity of the training program, but also increase the potential for training transfer. Thus, it is predicted that sensitivity and accuracy in identification of instances of sexual harassment will increase as sexual harassment training variety increases. Likewise, it is predicted that sensitivity and accuracy in response to instances of sexual harassment will increase as sexual harassment training variety increases.

Finally, we assess the effects of training over time by examining training recency. Nearly all prior studies have assessed training effects immediately or shortly following training. Limited research on sexual harassment training has suggested that participants have the potential to retain trained material (Wilkerson 1999), but no study has examined the decay of learned material over time. This study seeks to investigate whether training recency (i.e., the elapsed time since training) can be used to predict one’s ability to effectively identify and respond to instances of sexual harassment. In general, it is expected that training effects will dissipate over time. As Baldwin and Ford (1988) point out, decreases in the use of trained skills over time can occur for a number of reasons including skill deterioration, lack of motivation, organizational constraints, and lack of rewards. Thus, it is predicted that sensitivity and accuracy in identification of instances of sexual harassment will increase when training is more recent. Likewise, it is predicted that sensitivity and accuracy in response to instances of sexual harassment will increase when training is more recent.

Further, we predict that training recency will moderate the relationship between the other predictors (quantity and variety) and the criterion variables (manager identification and response). The basic premise underlying this hypothesis is that diverse (or lengthy) training programs are most effective when they are relatively recent. Without examining such an interaction, we invariably lose information. For example, we posit that there is a difference between a manager with one hour of training yesterday and a manager with one hour of training 2 years ago. By simply examining main effects, however, we would classify these experiences as equal in one respect (i.e., quantity), and unequal in another (i.e., recency). Thus, it seems proper to look at recency as a moderator of training quantity and variety. Therefore, we predict an interaction such that participants who received more cumulative training hours more recently will exhibit greater sensitivity and accuracy in identifying instances of sexual harassment and in selecting a response. Likewise, we predict an interaction such that participants who received training through various methods more recently will exhibit greater sensitivity and accuracy in identifying instances of sexual harassment and in selecting a response.

Figure 1 presents a theoretical model that summarizes the interrelationship among all hypothesized predictor and criterion variables. Specifically, the model proposes each of the aforementioned correlational predictions. The integrity of the model will be examined using a path analysis.

Fig. 1
figure 1

Interrelationships among hypothesized predictor and criterion variables

Methods

Participants and Procedures

A management population was sampled. Borrowing a methodology successfully used in previous research (Breaux et al. 2009; Hochwarter et al. 2005; Hochwarter et al. 2007; Lui et al. 2004; Rotundo et al. 2003), students in 15 undergraduate management and psychology classes were asked to recruit one full-time practicing manager and have them complete an online survey. Students were offered extra class credit for their participation and were required to return a contact form for each manager recruited. In total, 209 managers participated in the study. To assess the validity of these data, 20 of the participants were randomly selected and contacted by e-mail or telephone. Once contacted, the participant was asked to confirm his or her completion of the survey and mailing address, which was compared to the contact form returned by the student. Assurances were made that candid responses in no way affected the student’s class credit. One hundred percent of those contacted confirmed both items.

The sample consisted of 129 men (62 %) and 80 women (38 %), with an average age of 42 (SD = 13.23, range = 19–66). Most in the sample were white (95 %). Sixty-nine participants (33 %) had only 1–5 years of management experience, with 35 (17 %) having 6–10 years, 27 (13 %) having 11–15 years, and 78 (37 %) having over 16 years of management experience. Roughly equal numbers reported having less than a bachelor’s degree (39 %) and having earned a bachelor’s degree (47 %); a smaller number of the sample held advanced degrees (14 %). A variety of job titles, industries, and sectors were represented. More than half of our sample held managerial positions in private companies (52.2 %), with smaller percentages in public (41.6 %) or not-for-profit (6.2 %) organizations. Interestingly, almost half (42.6 %) reported having “dealt with an instance of sexual harassment in the workplace,” and 35.4 % reported that they or someone close to them had been a victim of sexual harassment in the workplace. Thus, the variation within the present sample indicates a level of external generalizability.

Materials

Sexual Harassment Identification and Appropriate Response Questionnaire

Each manager was given a questionnaire containing 13 scenarios taken verbatim from the extant sexual harassment literature (Blakely et al. 1995, 1998). Each scenario described an interaction between a male supervisor and a female subordinate. For each, managers were prompted for two responses. First, managers were asked, “Does this behavior constitute sexual harassment?” Responses to the item were scored on a 5-point Likert scale ranging from “clearly not sexual harassment” to “clearly sexual harassment.” Second, managers were posed the question, “Does the situation that has been just described warrant action from you as a manager?” Managers could respond with “no action is necessary,” “wait to see if the problem persists,” “confront the employee/s” or “formally report to the appropriate authority.”

Two sets of criterion scores were constructed, one measuring “sensitivity” and the other “accuracy.” Sensitivity, treated as a continuous variable, simply summed the manager responses to the identification and appropriate response questions. Two sensitivity variables were created – identification sensitivity and response sensitivity. Higher scores indicated a greater likelihood of identifying any given scenario as harassment along with recommendations for stronger action in response. Accuracy was defined in terms of agreement with subject matter experts (SMEs, described below). Criterion variables were constructed by comparing manager responses to external ratings provided by the SMEs. Manager responses were defined as accurate (hits) if they fell within one standard deviation of the SME rating. Manager responses that fell outside of this prescribed range were defined as inaccurate (misses). Two summated criterion scores were calculated for each participating manager – identification accuracy and response accuracy. Participants were assigned criterion scores ranging from 0–13 indicating the number of accurate hits for both the identification items and the response items.

Prior Training Inventory

The prior training inventory assessed each manager’s experience with sexual harassment training along three dimensions: quantity, variety of methods, and recency. Use of measured predictors, as opposed to experimentally manipulated predictors, enabled us to survey practicing managers with a variety of backgrounds and in a variety of settings. Quantity, variety, and recency were measured and treated as continuous variables. “Training Quantity” was measured as the cumulative number of hours spent in training across all previous training sessions. Variety questions tapped into the specific training methods in each manager’s training history (e.g., videos, written materials, lecture, group discussions, role-play, web-based materials, etc.). “Training Variety” was measured as the number of methods indicated. A final question was asked to determine the time interval between the managers’ last training program and the current survey. “Training Recency” was measured as the number of months since the most recent training session.

Subject Matter Experts

A group of SMEs were contacted and surveyed in order to anchor manager responses to sexual harassment. Criteria for SME selection included both possession of advanced knowledge and practical experience concerning sexual harassment in the workplace. With the help of the Associate Vice Chancellor for Equity, Diversity, and Compliance at a southeastern public comprehensive university, 15 potential SMEs were identified. Of the 15 possible candidates, nine participated in the study. Table 1 displays demographic characteristics of the SMEs. Most worked for public or private universities throughout the State of North Carolina. All SMEs had advanced degrees and held senior leadership positions in their respective work units; two-thirds of the SMEs reported having over 16 years of experience dealing with sexual harassment issues; two-thirds of the SMEs were female and 78 % were white. Our SMEs share professional backgrounds and experiences very similar to the SMEs surveyed by York (1989).

Table 1 SME characteristics (N = 9)

Each SME completed a written questionnaire containing the same 13 scenarios that were presented to the management sample. The SMEs were prompted for the same two responses after each of the scenarios. First, SMEs were asked, “Does this behavior constitute sexual harassment?” Responses to the item were scored on a 5-point Likert scale ranging from “clearly not sexual harassment” to “clearly sexual harassment.” Second, SMEs were asked “Does the situation that has been just described warrant action from you as a manager?” SMEs could respond with “no action is necessary,” “wait to see if the problem persists,” “confront the employee(s),” or “formally report to the appropriate authority.”

Table 2 displays descriptive statistics for the SMEs. Item means and standard deviations for the identification score compared favorably to those reported by Blakely et al. (1995). Interrater reliability estimates were calculated for SME identification ratings and appropriate response ratings. Both ratings exhibited high interrater reliability indicating a low level of measurement error and a high level of agreement among the SMEs (intraclass coefficient alpha = .98, 95 % CI: [.95, .99] for identification and intraclass coefficient alpha = .98, 95 % CI: [.94, .99] for the appropriate response scale).

Table 2 SME descriptive and reliability statistics

Results

Descriptive statistics and correlations among study variables are provided in Table 3. With regard to the predictor training variables, most managers (86.6 %) reported receiving at least one hour of sexual harassment training. The average total hours of training reported was 14.06 (quantity: SD = 32.27), though 45.5 % reported five or fewer hours of training. The average respondent reported experiencing approximately five different training methods (variety: M = 4.71, SD = 2.34). Finally, the elapsed time since training averaged about 3 years (recency: M = 35.39 months, SD = 43.44). The three training variables were moderately correlated with each other (mean ׀ r ׀ = .25).

Table 3 Means, standard deviations, and correlations among major study variables

Means are not reported separately for men and women because there were few statistically significant differences. There were no significant differences between men and women on any of the criterion variables. On the predictor variables, men reported experiencing somewhat greater variety of training methods than women (t = −2.62, p = .01). Correlations between manager age and the study variables are also not included in Table 3, but are generally consistent with previous research. Older managers tend to be exhibit greater sensitivity in identification of sexual harassment (r = .29, p = .001) and in response to it (r = .24, p = .001). However, older managers were also less accurate in identification of harassment (r = −.25, p = .001).

From Table 3, on the criterion variables assessing manager expertise, both variables indicated moderate levels of agreement with the subject matter experts. The mean identification score was 8.49 (SD = 1.85) and the mean response score was 9.20 (SD = 1.87), both out of a maximum possible score of 13. Managers averaged 3.71 false-positive identifications (SD = 2.21) and 2.07 false-positive responses (SD = 2.15). Comparisons of false-positives and false-negatives indicates a clear tendency for managers to err on the high side in identifying harassment and responding to it (false-negatives = .80 and 1.73 for identification and response, respectively).

Examination of the correlations between the three predictors and the sensitivity outcome variables revealed modest correlations in the expected directions. Training quantity was not significantly associated with either of the outcome variables, but training variety was significantly positively correlated with both identification sensitivity (r = .15) and response sensitivity (r = .19). Correlations for training recency were not statistically significant, but were in the expected (negative) direction, suggesting that greater elapsed time since training may be associated with reduced sensitivity. Taken together, these results offer modest support for the findings of previous research that training increases sensitivity in identifying sexual harassment, and modest support for extending the findings of previous research to sensitivity in taking action in response to sexual harassment.

A different pattern emerges in examining the correlations between training and the outcome variables associated with manager expertise. Both training quantity and training variety were negatively related to accuracy in identifying sexual harassment (r = −.12 and r = −.15, respectively). However, neither training quantity nor training variety were related to response accuracy and training recency was unrelated to either of the variables measuring manager expertise. Training variety was associated with increased false-positives in both identification (r = .18) and response (r = .18). A negative relationship between sensitivity and expertise was observed. Managers exhibiting greater sensitivity to sexual harassment tended to accurately identify fewer cases (r = −.43). Similarly, managers inclined to recommend more forceful action in response to harassment were less likely to recommend the appropriate response (r = −.29). Heightened sensitivity is also strongly associated with increased false-positive identifications (r = .89) and false-positive responses (r = .87). Taken together, these results support Moyer and Nath (1998) that training, while increasing sensitivity, may compromise expertise by inducing false-positive reactions.

Table 4 offers another view of the relationships between our training variables and our variables related to expertise. We transformed each training variable into a dichotomous variable using a median-split – quantity (Mdn = 6 h of training), variety (Mdn = 4 methods), and recency (Mdn = 20 months). The hypotheses that greater quantity and variety of training are associated with identification accuracy were not supported. Results were significant, but in the opposite of the hypothesized direction. Training quantity and variety were associated with reduced identification accuracy. Managers with more training accurately identified fewer cases of sexual harassment than those with less training (t = 3.23, p = .001). Similarly, managers who had more diverse training were less accurate in identifying sexual harassment (t = 2.45. p = .015). Training recency had no effect on identification accuracy. There were no significant differences in response accuracy between those with more training and those with less, nor between those with more diverse training and those with less diverse training. There were significant differences in response accuracy based on recency, but the differences were in the opposite direction of the hypothesis. Between participants who received more recent sexual harassment training and those who had received less recent training, managers with more recent sexual harassment training recommended fewer appropriate responses than those with less recent training (t = −2.44, p = .016).

Table 4 Identification and response accuracy and false-positive responses across training variables

Training is consistently associated with increased false-positive identifications and responses. More hours of training increased false-positive identifications (t = −4.36, p = .001) and false-positive responses (t = −3.29, p = .001). Greater variety of training increased false-positive identifications (t = −3.29, p = .001) and false-positive responses (t = −2.71, p = .007). More recent training was associated with increased false-positive responses (t = 2.85, p = .005). There were no significant differences in false-positive identifications based on recency.

We hypothesized two theoretical models, one to evaluate the effects of training on manager sensitivity, the other to evaluate the effects of training on manager expertise. The model specified that quantity, variety, and recency of sexual harassment training, along with the interactions of recency with quantity and variety, would predict identification sensitivity that, in turn, would predict response sensitivity. To test the hypothesized model, a path analysis was performed to generate maximum likelihood estimates of the standardized path coefficients (see Fig. 2). Whereas in the bi-variate correlations, training variety had the most consistent effects on sensitivity, when the training variables were simultaneously controlled, the effects of training were expressed through quantity of training. Quantity of training (β = .27) and the interaction between quantity and recency (β = .38) each predicted identification sensitivity. Thus, the more hours of training and the more recently those hours have been received, the greater the sensitivity in identification of harassment. In turn, identification sensitivity predicted response sensitivity (β = .73).

Fig. 2
figure 2

Hypothesized model for sensitivity outcomes with standardized regression weights (maximum likelihood estimates)

In our second theoretical model, we hypothesized that training quantity, variety, and recency, along with the interactions of recency with quantity and variety would predict accuracy in the identification of sexual harassment and that, in turn, would predict accuracy in the recommendation of appropriate responses. To test the hypothesized model, a path analysis was performed to generate maximum likelihood estimates of the standardized path coefficients (see Fig. 3). None of the training variables predicted identification accuracy. Whereas in the bi-variate correlations training quantity and training variety were associated with reduced identification accuracy, when examined simultaneously in the path analysis, their shared variance may have masked their individual relationships with the criterion variable. Finally, Identification accuracy predicted response accuracy (β = .31).

Fig. 3
figure 3

Hypothesized model for accurate outcomes with standardized regression weights (maximum likelihood estimates)

Discussion

Our findings generally conform to prior research indicating that training increases sensitivity to sexual harassment (Antecol and Cobb-Clark 2003; Blakely et al. 1995, 1998; Moyer and Nath 1998; York et al. 1997). Our findings also extend prior research in several ways. First, the generalized finding of increased sensitivity has been particularized to an especially important group—practicing managers. Second, the concept of sensitivity was extended. Prior research, and our identification sensitivity variable, measured sensitivity as the likelihood of identifying any given scenario as harassment. Our second sensitivity variable – response sensitivity – measures the severity of action recommended in response to any given scenario. Our findings suggest training has modest effects in increasing response sensitivity.

Our most striking findings are on effects of training on manager expertise. Before reviewing those results, however, we wish to urge caution regarding our operationalization of the concept of manager expertise – the accuracy variables. This is simply an operational definition, linking manager survey responses to responses of SMEs, and it is not meant to imply that our SME responses are the only correct responses. Sexual harassment is subjectively experienced by all of those involved – victims, harassers, managers and compliance officers, coworkers and confidants. Our accuracy variables might seem to privilege the perspective of our SME compliance officers over the experience of others. But we do not imply that the perspectives of others are any less valid. Still, the perspective of compliance officers is especially important in organizational interventions intended to prevent and correct sexual harassment.

Counter to the overly optimistic assumptions about training, it not only failed to improve expertise, it reduced expertise in some ways. Specifically, we found that both increased training quantity and increased training variety decreased manager accuracy in identifying harassment. Our finding that training recency was unrelated to expertise suggests that any harmful effects of training quantity and variety may be durable over time. We did not find that any of the training variables had a meaningful relationship with response accuracy criterion scores. It appears that the effects of training on identification may not spill over to taking appropriate action.

On the surface, our finding that training may reduce expertise might seem to support the widespread skepticism of the efficacy of training among socio-legal scholars (Bisom-Rapp 2001a, b; Dobbin and Kelly 2007; Grossman 2003; Lawton 2004). We provide evidence that as the quantity and variety of training increases, accuracy in identification decreases and the number of errors increase. A conclusion that training has harmful effects on expertise generally could discourage serious training efforts and encourage doing only the absolute minimum legally required (Sherwyn et al. 2001) or engaging in mere “file cabinet compliance” (Lawton 2004, p. 198). We believe such a broadly skeptical conclusion is unwarranted. At most, our findings only reinforce the call for continued examination of the assumptions underlying training mandates and underscore the need to always take seriously potential harmful effects of training. But our findings also support a more optimistic assessment of the potential benefits of training.

First, the effect of training on identification errors was not random. Our findings support those of Moyer and Nath (1998) who found that training may over-sensitize participants to the point that it produces an increase in false-positive identifications. That is, if training increases sensitivity to harassment, and if increased sensitivity compromises expertise, it does so by inducing more false-positive identifications. These non-random identification errors might be characterized as over-reaction to sexual harassment scenarios. From this perspective, training might prove very beneficial to organizations. Managerial over-reaction might be an effective component of an organization’s prevention and correction program. HR professionals and corporate counsel might prefer that managers err in the false-positive direction in identification of harassment. At least for mid-size and larger organizations more likely to have formal prevention and correction programs staffed by professional subject matter experts, presumably capable of accurately vetting reports of harassment, manager sensitivity to harassment may be sufficient. In other words, leave questions requiring expertise to the experts. A program of managerial over-reaction might be effective in curbing sexual harassment, but it would likely generate a number of additional concerns.

Second, we did not find training to affect response accuracy. Even if training could be said to have negative effects on identification accuracy, we need not worry so long as response accuracy is unaffected. Accurate diagnosis may be an important component of a conceptual bifurcated decision model (Plater and Thomas 1998), but ultimately, manager response is the crucial step, and there may be many factors, besides training, that determine the response. Berkley and Kaplan (2009) employed a similar two-step model applied to juror decision-making in harassment cases, concluding “that individuals use different factors when determining guilt and punishment” (pg. 207).

One important strength of our study is its survey of practicing managers with a wide variety of experience in a wide variety of organizations. But this strength imposed a major limitation on the study. Our design and sample characteristics prevented us from effectively measuring or manipulating training content (i.e., the substance or information conveyed in the training program rather than the characteristics of the program itself). Future research should control for, measure, or manipulate training content to better examine its role as a predictor of training effectiveness. We know that training can provide a modest boost in sensitivity to sexual harassment, but evidence also suggests that sensitivity might come at the expense of accuracy. The question whether training, of the kind generally received by employees, can improve accuracy is in doubt. If the goal of manager training is to equip managers with the ability to accurately identify sexual harassment and take appropriate action, preventative training programs intended for a general employee population may not be sufficient. Unfortunately, it is all too easy for employers to pass off generic training content to managers. If accuracy is the goal, then accuracy should be the focus of training content, and not simply sensitivity. To illustrate, specialized training for management personnel could use similar content as that used to collect data in this study. Specifically, managers could be presented with a series of real or fictionalized scenarios and taught how to identify and respond to each scenario. Further, training programs should be evaluated on their claimed merits. If the objective is increased sensitivity, sensitivity should be evaluated. But if the aim is accuracy and expertise—specifically, managers making accurate identifications and taking appropriate actions—then accuracy and expertise should be evaluated.

Another limitation of our study is that the design failed to measure or control for other organization-related factors that might influence manager responses. Training is one preventive initiative, but it is often bundled with other organizational initiatives intended to prevent harassment. It might be expected that organizations with a strong culture of intolerance for harassment, or strong enforcement policies and practices, are also more likely to provide training and take it seriously. Such organizations may foster a climate of reduced harassment (Fitzgerald et al. 1997; Gruber 1998; Willness et al. 2007), but isolating the independent effects of training is challenging. Williams et al. (1999) found that training had no effect independent of other policies and procedures. Training may be among the most common anti-harassment organizational initiative, but it is certainly not the only available initiative. Future research should examine a broader array of organizational initiatives for their independent effects and their effects in combination.

A third limitation of our study, often overlooked, but commonly associated with the self-report method, is that it glosses over the distinction between knowledge and action. Knowledge of the appropriate response is not the same as executing that action in a real work environment. A decision to take action is likely shaped by myriad factors including internal cognitions, interpersonal relationships with the harasser/victim, and organizational climate (Lengnick-Hall 1995). Clearly, the decision to take appropriate action is a complex one emanating from a host of unidentified variables. Thus, training programs and future research should address the important distinction between knowing what to do and doing it. Bowes-Sperry and Powell (1999) found that observers of sexual harassment were more likely to intervene if they perceived a social consensus that the conduct described was sexual harassment, if they perceived the consequences to the victim as severe, and if they recognized the incident as an ethical issue. Training content anchored to SME judgments concerning harassment behaviors can suggest the existence of a potential consensus. Similarly, it may be advantageous for training programs to highlight the destructive effects of sexual harassment using fictional or fictionalized real-world examples to reinforce both the severity of the phenomenon and the array of ethical issues involved.

Several reviews of the sexual harassment literature emphasize the emerging role of scientific research as an instrument to quell workplace sexual harassment (Gutek 1997; O’Leary-Kelly et al. 2009; Wiener and Gutek 1999). Accordingly, the purpose of this exploratory study has been to inform HR theory and practice concerning the use of preventative training programs. Specifically, the study sought to investigate whether elements of training design and administration (quantity, variety, and recency) could be used to predict management’s ability to (a) accurately identify instances of sexual harassment, and (b) recommend an appropriate response. From both a social science perspective and from an HR perspective, the results of this study are a call to further research. It is clear that we know very little empirically as to what constitutes an effective sexual harassment training program. Moving forward, scholars should continue to work to discover antecedents that will predict management’s ability to accurately identify instances of sexual harassment and respond with an appropriate action. As additional scholarly findings emerge, training professionals and strategic HR planners can better design interventions, including training interventions, to effectively manage sexual harassment.

From an HR perspective, these results are a call for action research aimed at continuous improvement in the organization’s sexual harassment prevention and correction program, including its sexual harassment training programs. The foundation for this action research must necessarily rest on more systematic and rigorous evaluation of training effects (Noe 2010). While some might advise organizations to avoid program evaluation altogether, especially if such evaluation is likely to yield negative results, we encourage an aggressive research agenda. Researchers have long lamented the lack of scientific evaluation of organizational training programs (e.g., French 1953; Saari et al. 1988), concluding that the “most serious problem has been the failure to consider evaluation an ordinary part of the instructional design process” (Goldstein 1993, p. 150). Fortunately, over the last few decades, we have seen considerable improvement in the number of organizations that conduct some evaluation of their training and development programs. Unfortunately, however, the vast majority of these evaluations focused on trainee reactions to the program rather than identifying whether learning had occurred or whether on-the-job behaviors were positively affected (Noe 2010; Sugrue and Rivera 2005). Although trainee attitudes toward and reactions to training programs are not unimportant, clearly organizations must increase their focus on whether training is resulting in actual changes in behavior and improvements in overall organizational effectiveness. Smaller organizations may lack the capacity for such systematic training evaluation, but those organizations with the capacity to do so, should do so. Weakness of organizational training evaluation certainly contributes to gaps in the scholarly literature. There is likely good evaluation research being conducted in some organizations, but the results are not made available to external audiences. Scholars and organizations could readily partner in pursuit of their common research goals. In the process, perhaps, they might help to purge this unfortunate workplace phenomenon.