Many worksite managers are troubled by the aspect of human error on workers. Though they have made various measures such as preparing manuals with guidelines and compelling workers diligently to follow the manuals, it has been impossible to eliminate human error. Field managers are vaguely aware of limitations of those measures that have been previously implemented. Therefore, there are significant expectations from Resilience Engineering, or Safety-II, as a new approach to tackle human error prevention. However, the emphasis that on-site staff ought to be flexible in adapting to changing worksite conditions can also lead to problems. For example, we can often hear such remarks from field managers.

  • “We tell our workers that resilience in the workplace is to adjust to a change of circumstances in your work and to act accordingly. We encourage them to become resilient and flexible workers. Consequently, under the pretext of being very busy with given work, they sometimes cease to follow or violate the manuals that describe the procedures and regulations, which consequently leads to accidents.”

  • “To avoid such accidents, we tell workers to adhere to the manuals once again, and they become confused.”

  • “In other words, they are told, on one hand, to be resilient and flexible, while on the other hand, they are also told to go by the book and stick to the manual.”

  • “How do we explain to our workers to be resilient and flexible while adhering to the manual?”

  • “How does one describe a resilient worker?”

While on-site managers are not necessarily safety professionals, they are generally not academics either. Thus, it is necessary to explain to them—in simple terms—the role and significance of resilience and flexibility, in addition to its relationship with traditional safety approaches and measures.

Therefore, in this chapter, an easy-to-understand scenario will be presented to explain an overview of safety activities, including resilience that must be undertaken at the worksite. Furthermore, the story is intended to draw attention to and emphasize certain key management ideas for resiliency.

1 A Story That Explains Safety

How to Explain Safety-I and Safety-II

Whenever an on-site worker plays a certain role in an accident, there is a tendency to use the term human error or failure. Bearing this tendency in mind and without avoiding these terms deliberately, the difference between Safety-I and Safety-II could briefly be explained to workers as follows.

Safety-I

Achieved only by following predetermined procedures. Deviations from such predetermined procedures are considered human errors. If human errors are avoided, work goals can be successfully achieved with safety. Thus, the goal of Safety-I is to eliminate all human errors. This approach has been from Safety mode of ‘centralized control’ (Provan et al., 2020), and will focus on adverse event like mishap, failure or accidents (Hollnagel, 2014).

Example: Before engaging in high-voltage electrical repair, the main power supply must be shut off. Performing electrical work without shutting off the power supply is a human error and may lead to accidents. Thus, this error must be eliminated.

Safety-II

Be flexible and act according to the current situation (Hollnagel, 2014). If a worker does not have the ability or potential to act according to the situation on hand, accidents may occur. Moreover, even if there were no accidents, in hindsight, there may have been a better way to cope with that situation. In the case, it also could be a failure. Although the outcome was acceptable or even if a successful one for the clients, it is still considered to be a failure from the worker’s view. In this sense, the level of success is limitless. Thus, resilience is a means to obtain more desired results; therefore, this will be the approach to focus on success (Hollnagel, 2014, 2018).

Example: Let us consider the case of a doctor performing a medical operation. If the surgery is not performed in accordance with the patient’s conditions, it could result in a failure. Even if the surgery is ultimately successful, the doctor may later have regrets upon realizing that there was a better way to operate in which the patient would not have been left with a scar. In this case, the doctor may feel the operation failed. To avoid such a situation, the doctor must build up his or her skills and be more resiliently with careful and flexible behaviour during the operations. It is in this act of refinement of one’s potential to read the reality of a situation and make necessary adjustments for the optimal outcome that the doctor becomes a resilient practitioner.

2 Production Activities and the Relationship Between Safety-I and Safety-II

The aforementioned explanations show the differences between Safety-I and Safety-II. However, the relationship between the two, in actual production activities, is still unclear. We present a story to explain this.

Hunting Activity of Primitive Man

Let's consider hunting activity of primitive man (Fig. 1). The primary reason for the necessity of safety is to achieve firm production. First, there is a production activity. We wish this activity can be achieved safely. This wish implies the hope that the production staff does not suffer injuries and good-quality service is provided. In the event that safety is the sole concern, production activities ought to be stopped. However, if this were a corporation, then the whole purpose of its existence may be lost.

Fig. 1
figure 1

Hunting activity of primitive man

This situation is no different from that of primitive man. Let us consider primitive man’s activity of hunting, wherein the production activity is the capture of prey in good quality condition. If man does not hunt, he will die of starvation. Therefore, production activities cannot be stopped.

However, the prey is alive, and live prey must be hunted down under changing weather and field conditions.

Efforts at the Individual Level: The Importance of Learning

Primitive man may have started out by randomly chasing after his prey. It is likely that he slipped and hurt himself in the pursuit. This is akin to a worker’s occupational accident. He may have missed capturing several prey in this manner. This is known as a production mishap. However, there may also have been instances where prey were caught in good quality condition, without any injuries to the man. These are known as successes or the safe fulfilment of production.

Surely, he must have looked back on these accidents and successes and analysed reasons for both. This is known as a root cause analysis. By combining the root cause analysis of failures and successes, primitive man would have learned effective and ineffective (i.e. good and bad) ways of hunting, eventually fine-tuning strategies to locate prey and successfully hunt. These experiences may have been compiled into a guideline manual to share with friends, which could also aid in educating newcomers.

Everyday Learning and Training

Learning of guidance is not the only task one must complete before proceeding to hunt. It is equally important to train the muscles. One must have also learned how to effectively use a spear. Technical skills must be built up through everyday learning and training. If all the time is spent on preparation, such as muscle training, and there is not enough time left to hunt, it will be meaningless. In other words, advance preparation that stands in the way of production activities is the evidence of getting one’s priorities in reverse order. On the other hand, going out to hunt without proper muscle training is bound to result in accidents or failure. One can only hunt within the range of the strength gained through muscle training activities; hence, to catch big game, it is essential to undergo considerable muscle training. Production activities must be commensurate with the abilities acquired by the individual. In other words, one must acquire the abilities and potentials to meet and match the needs of the production activity.

Going Hunting

After sufficient preparation with guidance and physical training, an individual can finally go and hunt, but before hunting, he may anticipate the development of the game of the day and would make a strategy of the game. At the hunting site, hunters must monitor the situation, must be very careful and beware of counterattacks by prey. Situational awareness is indispensable because the prey are ‘live’. At precise moments of attacking to respond the prey that appeared, during the hunt, hunters need to make effective and immediate decisions; otherwise, the prey may escape. Thus, non-technical skills, such as monitoring, situational awareness and decision making, are also important attributes (Flin et al., 2008).

Effort at the Team Level: Team Efforts

While aiming for big prey that cannot be tackled alone, primitive man must have organised a team to hunt. The size of the team must match the size of the game.

Once the team was organized and before setting out, there must have been a briefing about strategy and the division of roles. The rules for calling out during the course of the hunt would have been set in advance, and hunters would have responded to each other in accordance with the rules. Such communication while chasing the prey cannot be lengthy and must be kept simple. If a strategy was not formed and each person was allowed to act resiliently, without communication, there would be no coordination, and the prey would escape; this may call a functional resonance accident (Hollnagel, 2012a, 2012b). Upon noticing any suspicious movements of the prey—such as an attempt to strike back—the others must be immediately notified by making a loud call. This is known as assertion.

The relationship among team members is also important. If team members are not mutually considerate, one plus one will result in zero. To prevent this, it is necessary to build a good rapport among team members.

An amicable agreement regarding rules for sharing the catch must also be made in advance to avoid future discord among team members. Indeed, it is possible that one team member may experience an injury due to an unforeseeable circumstance despite efforts while hunting. Perhaps he may have received an extra share, notwithstanding the rules. Thus, there must be room to reconsider and revise decisions in an emergency or in an unexpected exceptional situation. Such treatment decisions, actions, and outcomes will accumulate and eventually be incorporated into manuals as precedents, but the precedent is for reference, and it is possible to be changed in the future, in some cases.

As part of routine activities before hunting, the entire team may have prepared by engaging in improvements of the hunting ground, for example, by cutting the grass so their feet do not get caught and to enable them to easily spot prey. This is known as kaizen or workshop betterment. The preparation requires a leader to provide instructions; however, no one will listen to the leader if the leader is a liar or inexperienced person. Thus, good leadership is essential.

Development of Tools

With the realisation that working with bare hands has its limitations, primitive man developed spears and bows; this is also known as tool development. They may have improved the tools ergonomically to increase their utility and usability. These improved tools would help those who do not have excellent hunting ability. Then, the rules for correct usage of these tools must also have been established; the blade of a spear must not be grasped. Its handle must be grasped. Grasping the blade will surely cause injury. In other words, technical rules are those that must be followed at all times. Perhaps they compiled the rules into a manual, shared it with newcomers, and, with the manual, trained them on how to use the tools.

With time, they must have thought of ideas to hunt in smaller groups, or perhaps not hunt at all; they may have come up with the idea of digging pits to trap prey. This would have been a labour-saving effort and a move towards automation.

Efforts at the Organizational Level

The tribes that worked hard on the aforementioned issues would have hunted well, that is, they would have safely accomplished, produced, and prospered with a high number of prey. The tribes that did not make any effort to tackle these issues would have been unable to catch any prey and declined due to casualties, resulting in an organizational accident. Ultimately, safety is an integral part of production and can be considered as a means to achieve successful production. In this sense, safety measures are important as the very basis of existence for tribes.

Even the tribes that prospered could not afford to be complacent. The population of prey may have gradually declined at the hunting field. In such a case, it may have been necessary for people to migrate. With the introduction of crops, they may have felt that it was better to switch production activities to agriculture rather than rely on hunting. In other words, as natural and social environments around the tribe or organization slowly change, it is important that they pay close attention to signs of change and encourage the community and organization to accordingly adjust; organizational resilience is important, of which potential can be assessed with Resilience Assessment Grid (RAG) (Hollnagel, 2018). This movement may be attributed to organizational culture, especially on flexibility.

3 Lessons That Should Be Learned from This Story

Activities to Ensure Safety

Although the entire aforementioned story is fictitious, it is likely an agreeable one.

The story’s lesson is that safety is a means to achieve production success. Furthermore, the following list of specific activities could be learned in addition to ways to achieve production safely.

  1. 1.

    Engage in labour-saving and automation activities.

  2. 2.

    Develop tools for production and improve them ergonomically for utility and usability.

  3. 3.

    Conduct improvement or kaizen activities for the production site.

  4. 4.

    Secure the number of people necessary for that production.

  5. 5.

    Compile manuals for guidance, rules and regulations. Make these available to everyone. Moreover, make everyone familiarise themselves with these rules and regulations.

  6. 6.

    Improve each person’s job capabilities and potentials, including technical and non-technical skills.

  7. 7.

    As part of the production activities of the team, develop non-technical skills for teamwork, such as communication skills, assertion and leadership qualities.

  8. 8.

    The organisation must be aware of changes in the natural and social environment in which it is placed and subsequently direct the organisation in a way that can adjust to the changes. To do so, however, organizational culture on flexibility may be required.

These may be considered to be safety activities to ensure satisfactory production. Furthermore, based on the definitions of Safety-I and Safety-II, steps 1–5 are mainly Safety-I activities and steps 5–8 are Safety-II-based activities.

The Order of Actions Taken for Safety

The order in which actions are performed at the site is also important: In other words, there exists a safety management process that includes Safety-I and Safety-II (Komatsubara, 2011). From the eight items listed, steps 1–7, which are directly related to site safety, must be addressed in this order.

For example, let us consider the case of driving a car. If the road is strewn with stones and rubble, it must be cleared (step 3; see Fig. 2). That is the activity to be undertaken first. Attempting to skilfully navigate a rubble-filled road in the name of resilience (step 6) would be a meaningless exercise. Likewise, it is important to establish traffic laws and be orderly when following them (step 5). Instead, trying to drive resiliently in a chaotic traffic situation (step 6) to achieve safety would be counterproductive.

Fig. 2
figure 2

If possible, making the road condition better to prevent needless resilience

It is also important to offer behaviour assistance facilities to workers (step 2) before training their resilience potentials (step 6). As Fig. 3 illustrates, if we supply rear-view monitors and navigation systems, we must provide them, especially if drivers and workers are novices and do not have such resilience potential.

Fig. 3
figure 3

Supply resilience assistance facilities, especially for novice workers

In other words, in case of a production site, if measures of Safety I can be executed, they must be undertaken as a priority.

4 Questions That Arise

Manual and Resilience

The relationship between Safety-I and Safety-II as safety activities has been explained by the story in the previous section. Now, we will focus on the relationship between resilience and adhering to the manual. This was the field manager’s problem at the onset.

If a manual can be reasonably created, it must be created, and everyone should act in accordance with it. The question that arises is, ‘is it good to merely follow the manual?’ Or, ‘is it good to be slightly flexible with regard to following the manual’s rules to adjust to the situation on hand and act resiliently?’

Case 1 illustrates how an issue was avoided by not acting according to the manual. In this case, if the manual instructions had been strictly followed, it would have resulted in utter chaos.

Case 1: Retreating from a Nuclear Power Plant, 2011

It is not formally reported but at the Great Eastern Japan Earthquake (in 2011), a nuclear power plant was said to have allowed workers from inside the plant building to escape outside without measuring their radiation doses. If individual dosimetry had been performed—as prescribed by the manual—the evacuation would have been delayed detrimentally, threatening people’s safety.

On the other hand, there are also instances of accidents that have occurred due to the violation of rules in the manual, as in cases 2 and 3.

Case 2: Tokaimura Nuclear Accident in Japan, JCO Plant, 1999

Three workers violated the authorised procedure to produce a small batch of liquid-type uranium fuel, although they were given the procedure. They used an incorrectly sized tank to mix uranium powder into liquid acid, in an attempt to reduce workload and production time. Then, therefore, a criticality accident occurred.

To answer the manager’s aforementioned predicament, let us consider the relationship between Safety-I, Safety-II, and the manual.

Types of Manuals

In the example of the primitive hunters, there were guides, rules, and regulations, such as manuals. These can be categorised as three types, according to stringency to compliance (Komatsubara, 2016), as follows:

  • Type 1: Technical regulations – No room allowed for resilient behaviour: a technical procedure, with a physical or natural science background.

For example, when preparing dilute sulphuric acid, concentrated sulphuric acid must be added slowly into the water. If this procedure is reversed, bumping will occur due to the heat of hydration, resulting in the occurrence of an accident. This is similar to the spear usage rule in the case of primitive man: The blade must not be grasped. This procedure is for Safety-I and must not be treated as resilience. The prescribed procedure must be followed under all circumstances. No matter how busy workers are, they should not behave resiliently. Workers are not permitted to change technical procedures for the sake of physical reasons. Established procedures must be followed, and any deviation from the procedure is considered an entirely inadmissible human error.

  • Type 2: Rules – Resilient behaviour is not acceptable under normal circumstances but it is allowed in emergency situations: a predetermined promise of organisations.

Rules are procedures with a sociological background. They are social promises. Traffic laws are simple examples. Vehicles must keep to the left of the road in Japan and the United Kingdom, and to the right in European countries and the United States. This system chosen by the respective societies is a predetermined arrangement resulting in the smooth flow of traffic, thereby avoiding accidents. Under normal driving conditions, traffic rules must be followed. However, in an emergency, being resilient and deviating from the rule may help achieve better results. In this case, it is permissible to adjust by taking the concept of necessity for an emergency in a legal sense. In the case of primitive man, the amicable agreement for sharing the catch is in this vein; an extra share for an injured member would be acceptable, notwithstanding the agreement.

  • Type 3: Guides – Resilient behaviour can be accepted, or rather, it is recommended: standard practices that serve as references.

Guides are the manual that defines standard treatment methods. The manual, that defines the methods of support and service to customers in a store, is an example. The manual is a guide and can also be considered a textbook. In the case of primitive man, guidance for chasing and hunting prey would fall into this category. Although this is a set standard, for example, store staffs are expected to change their service approach in a resilient manner depending on the nature of the customer or service targets, situation, and need at the time.

Type 1 requires strict adherence to the procedure. The JCO criticality accident was caused due to the violation of procedures (Komatsubara, 2000). Type 2 requires compliance, to the extent that the rule is a precondition for the procedure. However, in the event of an emergency, it is important to be resilient and take circumstance-appropriate actions rather than following procedures. It is helpful to have an emergency manual, but it is difficult to foresee every type of emergency situation. Hence, an emergency manual shall be a Type 3 manual. In Type 3, the manual serves only as a guide and reference. Achieving good results or success requires good resilience.

This is summarized in Fig. 4. Managers should understand the differences between the three types of manuals based on this figure and explain it to the workers.

Fig. 4
figure 4

Three types of manuals and their treatment in different situation conditions

Resilient behaviour in types 2 and 3 is not unconditionally allowed for every worker. It is determined by the relative relationship between the magnitude of the situation change and the worker’s resilience potential. In short, if the worker’s resilience potential is small in relation to the magnitude of the situation change, resilient behaviour is likely to result in undesirable outcomes. It would be beyond his capacity. In order to obtain good outcomes from resilient behaviour, the worker’s resilience potential must be large enough for the magnitude of the change.

Figures 5 and 6 are models showing this relationship. Line A indicates the level of resilience potential of the worker. Line B indicates the level of resilience potential required by the worker at that time. Both lines are wavy, indicating their respective dynamics.

Fig. 5
figure 5

Relation between resilience potential of the worker and situational demands; case of resilient behaviour being allowed

Fig. 6
figure 6

Relation between resilience potential of the worker and situational demands; case of resilience behaviour not being allowed

If the worker has a rich resilience potential, the line A moves up. Workers are allowed resilient behaviour in the situation where line B is below line A. On the other hand, line B moves upward as the situation deviates from the normal. As a result, line B goes up beyond line A, resulting in the case presented in Fig. 6. In this case, since the resilience potential of the worker does not meet the demand of the situation, an undesirable outcome shall be obtained if the worker behaves resiliently. Except for real emergencies in which the worker has no choice but to respond himself, the worker should ask to be replaced by another worker with a higher resilience potential or ask his supervisor to give appropriate instructions for coping with the situation.

When line B is far beyond line A, that is, when the situation is far beyond the worker’s resilience potential, he will perhaps not be willing to behave resiliently. However, in a real emergency, workers may behave resiliently if no one else can deal with the situation, even if they realize that it is beyond their ability. At such a time, regardless of the outcome, the resilient behaviour would be called a heroic act. However, in the worksite, problems arise when the line B slightly exceeds line A. In this situation, workers often behave with poor resilience potential, saying it is ‘probably OK’. This is what the ETTO principles state (Hollnagel, 2012a, 2012b). As a result, accidents often occur. The manager must always tell workers that ‘probably OK’ is ‘never OK’; it is not permitted. When the worker leans towards ‘probably OK’ resilience, he must first seek the permission of managers with rich potential.

Therefore, the on-site manager must confirm the ‘type’ each manual belongs to. Fieldworkers must also be clearly informed of the ‘type of manual’ they are being provided. Type 1 is a manual for Safety-I. Type 2 is a manual for Safety-I under normal circumstances or in the premise of the manual that is determined. However, we can say that Type 3 is a manual for Safety-II.

5 People Prefer Shortcuts

In general, people prefer to use shortcuts. Figure 7 demonstrates this tendency; while the correct route is to walk on the paved section of the road, many people violate this procedure. People generally employ shortcuts in order to save on workload, time and money to achieve their production goals. This behaviour may be instinctive. Therefore, when simply told to follow manuals, workers break the manual due to a tendency towards shortcuts. This is a major issue, especially with Type 1 manuals. This is because the accident will most likely occur, as seen in the JCO criticality accident in case 2. To prevent such accidents, managers need to actively manage workers. Figure 8 summarizes the management strategies that should be implemented. First, low-workload procedures must be constructed; then, the worker must unquestioningly follow the procedures specified in the manual. Since people prefer a lower workload, the shortcut route should be set as the correct procedure, as shown in Fig. 7.

Fig. 7
figure 7

People prefer a shortcut to save on workload, time and money (author’s own photo)

Fig. 8
figure 8

Management strategies that should be undertaken for persuading workers to follow manuals

For technical reasons, it may be necessary to define high workload procedures. However, this stimulates shortcut tendencies. To avoid the use of a shortcut, a barrier should be created. In Fig. 3, one of the measures mentioned is to build a strong fence. However, even if a physical fence is created, it may be destroyed to create a shortcut. To avoid this, managers must explain and convince workers about the reasons for the procedure and make them consciously follow the correct procedure. Even in the case of the JCO criticality accident, if the workers recognized that the reason for the troublesome procedure was to avoid a serious criticality phenomenon, there would have been no resilient violation (Komatsubara, 2000).

The AIDA model—Attention or Awareness, Interest, Desire and Action—old but still frequently used in marketing and advertising may be useful for convincing the workers and calling Attention and Interest to the reasons behind the manual. This stimulates the desire to understand. It will be helpful to explain the danger when the manual is not followed and thus convince them of the reasons for following it. Through this, it is expected that the workers will perform the Actions specified in the manual.

Resilience is Closely Knit to Safety-I

The manual should be described at three levels: process, activity and operation or motion. It must also be noted that different types of manuals may exist at different levels within the same job. For example, as aforementioned, to prepare to dilute sulphuric acid, concentrated sulphuric acid must be slowly added to water. This procedure corresponds to Type 1. However, pouring water ‘slowly’ is a resilient action, and therefore, this part of the procedure calls for Type 3. In other words, Safety-I and Safety-II often exist tightly coupled in the same job. The field manager must let workers understand this fact.

6 Becoming a Resilient Person

The larger a person’s resilience potential, the greater their ability to adjust to changes on site; Resilience success cannot be obtained beyond the resilience potential they have. On the subject of manuals, it is possible to significantly deviate from a Type 3 manual and achieve remarkable success. It could even lead to the creation of a more helpful Type 3 manual or guide. A person with high resilience may be called a professional.

Being professional is not merely confined to being knowledgeable. It is impossible to learn all situational responses in advance. Instead, the potential to create answers on the spot, according to the situation, must be cultivated.

Unno Kuniaki (1999) shows a model of skilled technicians, as illustrated in Fig. 9. As skill level increases, the individual becomes more professional and may be considered to be a resilient worker.

Fig. 9
figure 9

Skill levels (Adapted from Unno, 1999)

Learning is important; however, gaining experience alone does not guarantee improvement in a person’s quality as a professional. Moreover, simply learning from production success may sometimes lead to inappropriate practices from safety view, as described in the following example.

Case 3: Explosion of Paint Spray

Paint oil tends to harden in the winter, making it difficult to use. Therefore, a senior worker warmed a spray can using hot water in an electric kettle to soften the paint. A younger worker noticed this, and warmed a spray can using the same method, however, without giving any thought to the reason. He warmed it for so long that the can exploded.

Awareness of reasons about how things work well or work poorly is important. Deep understanding of fundamental principles is essential for good resilience. Indeed, it is important to follow manuals and the guidance they contain, and to learn from successful cases. However, learning the reason behind those successes and, a deep study of principles behind the manual is also necessary. It is only by this process that the desired resilient behaviour can be acquired. The field manager must tell workers not to confuse art with gimmicks; any good results that have been obtained on the surface, if resultant of tricks, any learning should be deterred from it.

7 Conclusion

Resilience is absolutely necessary at every worksite, and Safety-II is indispensable, because a dynamic changing to a greater or lesser extent always occurs. However, to achieve production safety, both Safety-I and Safety-II are imperative. An overall understanding and explanation of safety are needed for all workers. In areas where Safety-I is applicable, activities of Safety-I must first be conducted.

Workers must understand that there are three types of manuals regarding Safety-I and Safety-II. It may be best explained to workers that the concept of a resilient person is a professional who has the potential to be capable of taking actions based on basic rules and principles.