1 Introduction

The purpose of this paper is to describe what our ShadowBox® team has learned about cognitive skills training during the past few years. By “cognitive skills,” we mean skills such as decision making, sensemaking, problem detection, and uncertainty management, performed by specialists such as firefighters, pilots, nurses, warfighters, child welfare caseworkers, and others working in complex and demanding jobs. We appreciate that other researchers and practitioners have wrestled with these issues and described powerful methodologies—for example, Hall et al. (1995) developed the Precursor, Action, Result, Interpretation (PARI) method for cognitive skills training (Means and Gott 1988). It is beyond the scope of this paper to review the work of other cognitive training research programs. Our focus is on discoveries that emerged as we transitioned from research-based recommendations and demonstration projects to developing and delivering fielded cognitive skills training. We simply want to compile the lessons that we have painfully acquired using ShadowBox because our experiences may be useful for others who are engaged in training cognitive skills.

ShadowBox is a way for people to see the world through the eyes of experts, without the experts being there. It is a scenario-based approach developed by Hintze (2008). The trainee is given an engaging and realistic scenario, presented as text or video, with decision points interspersed. Their job is to read or watch the scenario unfold and respond to the decision prompts (i.e., decision points). Each decision point presents a question and a small number of options. The decisions can be about which action to take, which cues to monitor most closely, which goals have the highest importance, etc. The trainee ranks the options from best to worst and writes a rationale statement for their ranking. As part of the training development, a small panel of experts have also read the scenario, ranked the options, and provided their rationale. Their rankings and rationale statements have been synthesized so that once trainees complete a decision point they are shown what the experts ranked and, most importantly, why they ranked the options the way they did. Trainees are eager to match the expert rankings, but the real learning occurs when they read the experts’ reasons and appreciate what the experts have noticed. The final step is for the trainees to identify their biggest takeaways from that decision point—what have they learned from the experts.

2 ShadowBox mission statement

When ShadowBox LLC was stood up on 1 August 2014, the mission was very straightforward: use the ShadowBox strategy to provide cognitive skills training, using expert feedback, and by building scenarios based on a front-end cognitive task analysis (CTA). We developed an electronic version of ShadowBox to enable individuals to train on their own time. We evaluated training success in terms of the trainees’ match to the expert rankings. And we achieved quality control of ShadowBox scenarios by carefully reviewing all scenario materials generated by our clients. Since that time, we have applied ShadowBox to a variety of domains, including law enforcement, military, petrochemical operators, child protective services caseworkers, and helicopter rescue crews. Upon first glance, the ShadowBox mission statement seems benign. But in reality, it is a minefield laden with problems. To our surprise, many of the key assertions in this mission statement turned out to be misleading and problematic. Part of the difficulty was terminology, and terminology counts if potential clients become unnecessarily confused or discouraged. But there are also more serious, substantial problems because in some ways it was the wrong mission.

2.1 What is cognitive skills training?

2.1.1 Cognitive

The term “cognitive” is a problem for us, and for the Naturalistic Decision Making (NDM) community, because it is jargon. Potential clients are often confused by the term because they do not know what it means. Additionally, we suspect most cognitive researchers and trainers would have some disagreements over its meaning.

Despite these concerns, we still use the term “cognitive” because it distinguishes our approach from “procedural” or “rule-based,” and some clients do resonate with it. However, when we use it these days we are quick to unpack it, explaining that it covers the following activities: making decisions, making sense of situations, detecting and diagnosing problems, prioritizing and trading off goals, managing attention, anticipating future states, and performing workarounds. See Table 1 for definitions of each cognitive activity. Our clients can relate to these kinds of outcomes, in a way they can not to the term “cognitive.”

Table 1 Defining cognitive activities

We find even more success when we can provide relevant examples of what these specific cognitive activities actually look like in the client’s domain. For example, in our work with police we included an example of an important decision in our CTA interview guide: deciding whether to pursue an assailant or stay with a victim to provide aid.

2.1.2 Skills

Immediately following the term “cognitive” in our mission statement is the word “skills.” We use the term “skills” to describe what ShadowBox targets and strives to cultivate. But what does that mean? Oddly, it sounds very procedural, just the opposite of cognitive. There is danger in creating training to target a laundry list of skills and sub-skills (e.g., competencies); they lead to disjointed and stove-piped training.

Instead of trying to address a laundry list of skills, we seek to shift the trainee’s mindset and to help the trainee develop richer mental models–to think more like the experts. We are more interested in developing expertise than in training specific skills.

The large research team performing the Defense Advanced Research Projects Agency (DARPA) Good Strangers project, diligently assembled a list of general virtues that would make warfighters more successful at interacting with civilians (e.g., showing respect, perspective taking, gaining rapport, showing empathy, etc.). It was an exhaustive list of behaviors without a clear focus. However, our CTA work identified an overarching shift in mindset that seemed to organize the more specific behaviors: the warfighters and police who were good strangers had developed a mindset of trying to get the civilians to trust them more at the end of the encounter than at the beginning (Klein et al. 2015). As a result, we designed ShadowBox training to move the Marines and soldiers from an authoritarian mindset to a trust-building mindset.

The lesson we have learned is that we want to help people develop richer mental models, more powerful mindsets, more tacit knowledge (i.e., knowledge that is difficult to put into words, such as perceptual discriminations, pattern recognition, recognition of familiarity, and detection of anomalies). The focus on mindsets can be more powerful and efficient than addressing a greater variety of skills. We find that addressing mindsets allows for more efficient training, but the benefit goes beyond efficiency. Several of our clients are attracted to ShadowBox because they appreciate the need for radical shifts in the mindsets of their staff members, and ShadowBox is unique in the way it directly tackles mindset shifts. Vanderhaegen and Carsten (2017) describe how cognitive training techniques may lead to unforseen rules or mental models developed by the user, which they describe as “dissonances” leading to hazardous or beneficial results.

The goal for each ShadowBox scenario is to foster an “aha!” moment, a discovery stemming from a mindset shift and/or a revision of a mental model. On one occasion we actually heard gasps from the group we were training as they suddenly appreciated how they needed to adjust their mindset and mental model. This discovery process is different from training specific skills. We are still learning how we might achieve these discoveries as we examine which scenarios promote an “aha!” reaction or a revised mental model.

However, these mindset shifts are not always straightforward. Take the challenge of shifting social workers from a procedural to a problem-solving mindset as an example. We had to be careful not to present this shift as good (problem solving) versus bad (procedural). Workers still need to master the procedures, and the more entries in their playbook, the better. So the goal of our training became to successfully integrate ‘problem solving + procedures,’ not ‘shifting from procedures to problem solving.’ By not respecting the importance of procedures we were provoking resistance.

Clearly, there is a lot to learn about how to identify mindset shifts worth addressing, how to help people make mindset shifts, how to measure mindset shifts, and how to differentiate mindsets from other cognitive processes. In future work we hope to advance the concept of mindset shifts to a set of empirically supported practices.

Further, we are not just seeking to alter mindsets and initiate “aha” moments. ShadowBox training can help people build their tradecraft—acquire a more complete playbook of procedures for getting things done, as well as helping trainees gain a more nuanced understanding of how to adapt the procedures in their playbook.

2.1.3 Training

The notion of training seemed so straightforward until a potential client explained that his organization never did any training! Sure, they occasionally needed to bring new people up to speed. But for them, “training” meant formal training programs, lesson plans, platform instructors (i.e., trainers who lecture to classroom audiences), and so forth. They never did any of this formalized institutional training. So, the term “training” can be ambiguous and misleading without additional clarification. Another client complained that we were describing ShadowBox as a training tool, whereas they also wanted a tool for practice and supervision.

Instead of using the blanket term, “training”, we have learned to appraise the needs and the capabilities of clients to sort out the kind of solution the client needs and whether we can adapt our training to fit those needs. For example, do they have a system in place with formal lesson plans and instructors? Perhaps we should cognitize their existing training content by taking the scenarios they already have and injecting cognitive challenges (Klein 2017). One client in the petrochemical industry already had a full-mission simulator, so we expanded on the scenarios to emphasize the mindset shifts that trainees needed to make, working within the context of the simulation rather than our own software. If facilitators are a limited commodity, perhaps the client needs a personalized, electronic version of the scenarios including the expert feedback. If trainees are far-flung, ShadowBox may be useful as a pedagogy for distance learning. These considerations are important for developing a more customized solution for the client, which will better suit their needs.

People usually think training is about learning rules and procedures. In contrast, we see training as an opportunity to give trainees a wider range of experiences, and to provide an opportunity for them to have “aha” moments as they make discoveries and revise their mental models to better reflect reality. This is an important distinction that should be addressed early on when engaging clients. Rather than training to competencies, ShadowBox and other cognitive skill-based solutions seek to cultivate knowledge, skills, and abilities that the trainee will use on the job. That is, it is a way for trainees to develop the tacit knowledge that will allow them to do their job better. Refer to Kontogiannis (1999) for a more detailed review of the design considerations for cognitive training (e.g., cognitive themes, scenario creation, feedback).

Another lesson we have learned is to be careful with clients who want to use ShadowBox to evaluate workers. While ShadowBox offers a way to critically assess non-technical skills, which is crucial for organizational compliance (Jepsen et al. 2015), it can be counterproductive to use ShadowBox to simultaneously evaluate and train. If ShadowBox gets used for evaluation, we have learned that it stops being effective for training because the workers will no longer enter into the experience with curiosity and an eagerness to explore. ShadowBox’s value should be appraised by its impact on the trainee’s knowledge and performance. Through the ShadowBox exercise, they will compare their own decisions and rationales with those of a panel of experts. This allows the trainees (and trainers) to identify any misconceptions or flawed beliefs they may have about how something works, which can be repaired and restructured through targeted feedback and additional follow-up exercises.

2.2 Expert feedback

Next, consider the term “Expert.” ShadowBox systematically captures the decisions and reasoning of experts and presents this information to trainees without the experts having to be present. One problem is that no one likes this term. People we consider experts are uncomfortable being labeled as such. People who are comparing themselves to experts keep asking, “Who are these experts?” when the comparison ruffles them.

Another problem that we have experienced is that finding experts is not always straightforward. Not everyone nominated to be on the panel of SMEs is expert at the skills ShadowBox trains. In our Good Strangers project (Klein and Borders 2016) we struggled with so-called experts who were providing rankings and rationale that did not square with being a good stranger. Although they may have been expert at other aspects of leadership, we had to discard a number of them because they showed no sign of being skilled at de-escalating situations and gaining trust. They relied on an intimidation mindset in dealing with civilians. People are often nominated as experts because of years of experience, rather than their skill level. We had to more carefully vet the experts on the panel to ensure that they had been successful in gaining the cooperation of civilians in foreign countries and cultures. When asking for expert nominations, it is important to articulate the types of skills that are important.

A third problem, one that we have known from the beginning, is that experts do not completely agree with one another, which is why we include the potential for a minority view. We make it clear that the experts are not perfect, and their rankings should not be considered as ground truth. Yet in taking this position we are raising questions about what it means to be an expert, which is beyond the scope of this article.

We would like to replace the term “experts” but have not yet found a suitable replacement. We have considered “respected practitioners,” “skilled practitioners,” “proficient practitioners,” and so forth. Perhaps instead of SME (Subject Matter Expert) we should use HRP (Highly Regarded Practitioner).

We are also learning how to do a better job of synthesizing the expert feedback and describing it clearly and succinctly for the trainees. This is the critical piece of ShadowBox training because it opens the window into the expert’s head—the way the expert sees the world as reflected in the scenario. Previously, we just bundled the different comments from the panel members without giving enough attention to the clarity and cohesion of the material. Additionally, we have found it necessary to connect the expert feedback to consensus-based best practices from research and policy, especially in domains in which decisions are frequently subjective and depend on a practitioner’s style or personal preference (e.g., child protective services, law enforcement). For more subjective domains or decisions, we find it helpful to bring SMEs together for a discussion, which helps form a consensus regarding the more preferred answer. Discussion can help SMEs articulate the principles they can all agree upon (e.g., “it is best to always keep children with family members, if possible”) without getting bogged down in jurisdiction-specific minutiae. Although SMEs may not always agree on specific details, tactics, or strategies, we have found that there are common principles that most experts within a domain do agree upon. We aim to bring those trends forward in the expert rationale.

2.3 Front-end CTA

The notion of a front-end CTA has often turned out to be impractical. Clients often feel it is too time-consuming and expensive. Few training departments can afford a front-end CTA, which can take several months to plan, conduct, and analyze the interviews. As a result, we are exploring ways to fold the CTA into the scenario construction process. One approach would use the simulation interview strategy described as part of Applied Cognitive Task Analysis (ACTA; Militello and Hutton 1998; Klein and Militello 2004; Gore et al. 2018). The simulation interview presents the interviewee with a challenging incident, followed by questions about tough decisions, shifts in the way the interviewee understood the situation, critical cues, and so forth. The scenario used to structure the interview can be based on incident reports of real cases, or a tailored interview with one or two domain experts. A related approach that may have even greater potential, also drawing on the ACTA work, is the “scenario from hell” method. In this type of interview the SME generates a truly challenging scenario. The SME does not recall an actual incident but instead formulates one based on the kinds of challenges, particularly cognitive challenges, that make the work difficult. The scenario-from-hell method simultaneously gathers CTA material while generating a scenario we could use with ShadowBox. We have also had success with a hybrid critical decision method that combines elements of ACTA and the Critcal Decision Method—the Knowledge Audit interview (Borders and Klein 2017). Moving forward, we will use these streamlined CTA methods in conjunction with building ShadowBox scenarios to collect the cognitive data while simultaneously constructing the scenarios.

2.4 Training delivery

Our goal of using an electronic version of ShadowBox to allow individualized training is still active, but we found that many of our clients value the group discussions. These discussions are easily organized using paper-and-pen versions of ShadowBox. However, these group discussions create a need we had not anticipated—to train facilitators at each site.

In response to this request, we developed a facilitator training program for social workers, and the early results from pilot projects suggest these facilitators have done very well at conducting the group sessions (Newsome and Klein 2017). We have also developed facilitator training for petrochemical plant controllers. One petrochemical plant is using the scenarios we created (and new ones they have created on their own) for training in group settings. They project the scenario to a group of trainees and at each decision point they use an electronic clicker survey to poll the group. After each decision point, the facilitator leads the discussion about their rankings and selections. When the trainees have completed the scenario, they perform an after-action discussion and examine trends to see how the anomaly developed and what they could have done to prevent it from developing into an upset (if possible).

We have also come to appreciate the importance of ensuring the quality and consistency of facilitators. Not all practitioners can effectively facilitate ShadowBox scenarios—it requires curiosity, the ability to think on one’s feet, and the willingness to challenge flawed beliefs in a non-confrontational way. Effective facilitators will stimulate fruitful discussions that generate new insights and build richer and more accurate mental models. We recommend careful vetting of facilitators. Crafting and implementing non-technical skills training, such as ShadowBox, within an organization requires intimate knowledge of the domain and understanding of cognitive skills important for job success. We are also exploring the use of job aids such as scenario-specific facilitation guides that present key themes and indicators of mindsets to help ShadowBox facilitators.

Lastly, we believe ShadowBox training is best suited for short, distributed sessions (ideally one but no more than two scenarios per session) over an extended period of time. The training scenarios are designed to introduce complex challenges and augment on-the-job experiences, without introducing the safety risks often associated with such. Using ShadowBox, we can present a wide range of situations that the trainee may otherwise never experience. And through repeated exposure to the experts’ mental models in the form of expert feedback, trainees are encouraged to make new discoveries and restructure their own mental models. Unfortunately, in most of the evaluation studies we have conducted at this point, logistical constraints have forced us to introduce all of the scenarios, usually four and sometimes six, during one lengthy training session. This procedure is not recommended because each scenario provides a cognitive workout for the trainee and completing more than one or two during a session can be exhausting and possibly limit insights and knowledge retention.

2.5 Evaluating success

Unlike other cognitive skills training techniques, ShadowBox has a built-in evaluation measure, the match between the trainee rankings/selections to those of the expert panel. This internal measure provides a broad quantitative assessment of how well the trainee’s responses align with the experts’ responses. In other words, how closely they think and act like the expert panel in that particular scenario. Hintze (2008) demonstrated that engagement with the expert feedback after each decision can affect the trainee’s responses to be more closely aligned with the experts’. Firefighters that reviewed the expert responses, including their rationale after each decision point provided responses that more closely aligned with the expert panel’s responses over the course of the training compared to a group of firefighters that did not receive expert feedback. More recently, we have replicated these findings with warfighters using traditional paper/pen methods and mobile tablets to train social cognitive skills necessary for managing civilian encounters (Klein and Borders 2016). We found that non-facilitated, paper-based ShadowBox training improved trainee performance by 28% compared to a control group that did not receive expert feedback. We observed similar effects when the training was delivered on a mobile tablet; the trainees receiving expert feedback improved their performance by 21% over the course of the training. Based on these early findings our initial expectation was that with more scenarios (and exposure to expert feedback), the trainees would match the experts more closely. However, we have learned it is not this simple. Each scenario might have its own unique dynamics, and there is no reason to believe that the discoveries made on one scenario would translate to the next.

For a fair comparison, we now match two scenarios that revolve around the same issues and present one at the beginning of the training program and the other at the end—counterbalancing, of course. That way we can more powerfully determine how much the trainee has learned over the course of the training session(s).

Some issues do cut across scenarios, such as shifts in mindset, so we do expect some improvement with practice. However, another lesson we learned was that it was a mistake to design scenarios around specific mindsets that we wanted to change. We used to fashion the scenarios and decision points to reflect the mindset shifts of interest. When we did that, the contrived decision points tended to have “right” answers, and trainees learned to game the exercise. Even worse, the scenarios were less interesting and engaging. Currently, we design scenarios around problems rather than solutions. We try to present challenging dilemmas that force the trainee to prioritize and make difficult goal trade-offs. Decision points may focus on framing the problem, distinguishing urgent concerns from important but non-urgent issues, anticipating future problems based on the current situation, and prioritizing actions. Where practical, we incorporate mindset issues in the decision point options we present, but we do not let the mindset shifts dominate the scenarios. We try to use the distractor items (i.e., foils) for the decision points to present flawed beliefs and to reflect mindsets we are trying to alter. In this way, ShadowBox can serve diagnostic purposes by surfacing the weaknesses in the trainees’ mental models.

Many training directors want to go further than match to experts or mindset shifts—they want to see improvements in performance. While we support this notion, we often run into the problem that our clients cannot easily identify who is doing their job well or poorly. In other words, there are rarely any clear and objective, job-based performance indicators. So, there is no easy way for us to demonstrate performance improvements. The best we have come up with is to gather supervisor ratings pre- and post-training, or to compare supervisor ratings for trainees who have received ShadowBox training and those who have not.

2.6 Scenario quality control

We initially tried to ensure the quality of scenarios by reviewing all scenarios generated by our clients. We were worried that if we let clients make up their own scenarios they might not generate very good ones and the ShadowBox program would get a poor reputation simply because of the low-quality scenarios produced by organizations with little background in cognitive skills training. Therefore, we decided that only scenarios developed with our team, or at least reviewed by our team, would count as ShadowBox.

This policy made a lot of sense from a quality control perspective. However, it made very little sense from a business perspective. Clients were frustrated because they did not want to be tied to us forever. Potential clients were turned off for the same reason. The impetus of ShadowBox is that it provides a workaround for the training bottleneck imposed by unavailable or limited subject matter experts, but our policy made us into the bottleneck, needing to review every new scenario.

Due to this confound, we abandoned that policy and now encourage clients to build their own ShadowBox scenarios. We have also developed a training program that we are continuing to refine, to teach clients how to generate effective scenarios. However, we have also gained a great deal of humility about the difficulty of crafting good scenarios. One trap we have sometimes stumbled into was to craft decision point options that made good sense to us because we were so familiar with the scenario, not realizing that trainees would interpret the option differently than we expected. We needed to pre-test the items.

3 Conclusion

We think we have learned a great deal by trying to implement ShadowBox training. This paper is only incidentally about the ShadowBox approach. The goal of this paper is to use our experiences to convey lessons about presenting cognitive skills training, regardless of the techniques employed.

Despite all the false starts, we are more enthusiastic about cognitive skills training than we were at the start. Many of our clients had not even considered cognitive issues prior to interacting with us. For them, training was about teaching rules and facts and procedures. The opportunity to address cognitive skills opens up possibilities that they find very exciting. Several use the phrase “game changer.” Our new mission statement is to use ShadowBox scenarios based on tough cases to help learners shift their mindsets, gain insights, build tradecraft, and think like experts. We will see how long this version lasts.