Introduction

There are three goals of neuromuscular blockade monitoring in the operating room. The first goal is to assess the degree of neuromuscular blockade present during surgery. Whether the desired degree of neuromuscular blockade is reached and whether additional doses of paralytic medications are needed can only be determined by ongoing monitoring. The second goal is to assess the degree of neuromuscular blockade that is present when surgery is completed. This information is necessary to determine if pharmacological reversal of neuromuscular blockade is needed. If so, the dose (and sometimes the choice) of reversal medication(s) will be determined by the degree of neuromuscular blockade. The third goal is to verify the adequate recovery of neuromuscular transmission before extubation.

The historical gold standard for neuromuscular blockade monitoring is the force of skeletal muscle contraction. Mechanomyography (MMG) measures force with a strain gauge and continues to be a research method; MMG has never been widely used in clinical practice. Instead, clinicians have used two approaches to indirectly assess muscle contraction—either muscle motion or muscle membrane depolarization. Qualitative assessments of muscle motion consist of either: (1) visual observation of patient-generated (volitional) movement—clinical assessment (“Clinical Assessments”) or (2) visual or tactile observation of non-volitional evoked movement resulting from peripheral nerve stimulation—qualitative monitoring (“Qualitative Assessments — the Peripheral Nerve Stimulator”). Alternatively, muscle activity can be measured quantitatively using several methods (see “Quantitative Monitoring”). The first measures muscle motion by use of small accelerometers typically fixed to the thumb. In principle, based on Newton’s second law of motion (force = mass × acceleration), with a constant muscle mass, muscle force should be directly proportional to muscle acceleration. This is the basis for acceleromyography (AMG). A second method based on motion, which measures the displacement of the thumb, is referred to as kinemyography (KMG). A third method does not measure motion, but instead measures muscle membrane depolarization by use of electromyography (EMG). The summed amplitudes of muscle membrane depolarizations in response to peripheral nerve stimulation are used to assess muscle activity.

Clinical Assessments

In January 2023, the American Society of Anesthesiologists published evidence-based practice guidelines related to the monitoring on neuromuscular blockade [1•]. This is the most rigorous evidence-based comprehensive review of neuromuscular blockade monitoring published to date. These guidelines specifically recommended against relying on the use of clinical assessments of neuromuscular blockade. Why? The first and most obvious reason is that they cannot readily be used intraoperatively. The second is that they are quite unreliable when used postoperatively.

Not long after the introduction of neuromuscular blocking drugs into anesthesia practice (early 1940s), clinicians recognized that the inability of a patient to maintain a sustained muscular contraction was a sign of residual neuromuscular blockade. Numerous clinical signs were empirically developed based on this, e.g., sustained head lift > 5 s, sustained hand grip, sustained eye opening, negative inspiratory force, forced vital capacity, and others. Thus, if a patient was not able to sustain a head lift > 5 s (a positive test), the patient was considered to have clinically important neuromuscular blockade (to be “weak”). Unfortunately, the converse is not true; a patient’s ability to perform these various tests does not exclude clinically important residual paralysis.

The generally accepted standard for “adequate” recovery from neuromuscular blockade is based on QUANTITATIVE measurements of the so-called train-of-four (TOF) ratio [2]. The TOF consists of 4 sequential stimuli delivered to the ulnar (or other) nerve at 2 Hz (4 stimuli over 2 s). The administration of a neuromuscular blocking drug produces the well-known dose-related decrease in the amplitude of the 4 responses, with progressive disappearance of the 4th, 3rd, 2nd, and finally all responses. The TOF ratio is the amplitude of the 4th response to a TOF stimulus divided by that of the 1st response; a ratio of < 1.0 indicates “fade.” Although a TOF ratio of > 0.7 was originally considered to represent adequate recovery from neuromuscular blockade [3], a value of ≥ 0.9 is now considered to be the standard of practice [1•, 4]. Starting in the late 1990s, quantitative measurements were used to assess the adequacy of clinical assessments. The universal finding was that many patients are able to successfully perform clinical tests when they have TOF ratios much less than 0.9 [5]. For example, in 12 young healthy volunteers, resting tidal volumes were maintained in all 12 with TOF ratios in the range of 0.4 [6]. Similarly, eye opening and tongue protrusion were present in all 12 subjects at TOF ratios ~ 0.4 [6]. In a different volunteer study, a 5-s head lift was maintained at a TOF ratio of 0.6 [4]. Other studies demonstrate that glottic competence (which protects against aspiration) is impaired with TOF ratios of < 0.9 [4, 7]. Because of this high percentage of misleading results, clinical tests have minimal value for detecting meaningful residual weakness. So, even when patients might appear strong, there is a significant chance they are not strong. These relationships were demonstrated in a clinical setting by Debaene et al. in 526 patients who received a single dose of non-depolarizing muscle relaxant and, without receiving reversal, were extubated based on clinical criteria [8]. Head lift for > 5 s (tested in 331 of 526 patients) and tongue depressor retention (tested in 308 of 526 patients) had low sensitivity to detect TOF ratios < 0.9 with sensitivities of 18% and 14%, respectively [8]. Thus, even in alert patients, clinical tests cannot reliably detect weakness. Stated simply, you cannot believe what you see.

A second problem with clinical signs is that they require patients to be sufficiently recovered from their anesthetics to be able to carry out the requested activity—e.g., “lift your head off the pillow and keep it off the pillow,” or “squeeze my hand as hard as you can and keep squeezing.” Thus, a patient may not be able to sustain a head lift because of residual anesthesia, not because of residual paralysis. This constitutes a false positive test. This common occurrence was also reported by Debaene et al. [8]. Upon arrival to the PACU, the investigators determined that because of residual anesthetic effects, it was not possible to evaluate the head lift and tongue depressor tests in 195 (37%) and 218 (41%) of the patients, respectively. If these sedated patients had not been excluded, the false positive rate for these two clinical tests would have been nearly 50%.

Qualitative Assessments — the Peripheral Nerve Stimulator

In the early years after the introduction of curare, clinicians relied almost entirely on the assessment of breathing (usually qualitatively) or subjective evaluation of “abdominal relaxation” to titrate intraoperative neuromuscular blockade, and clinical assessments to evaluate recovery. The need for something more objective was clear. The first peripheral nerve stimulator (PNS) was described in 1956 [9] and the first commercially available PNS appeared in the mid-1960s (the “Block Aid” monitor) [10]. This device (and others) could provide a single or tetanic stimulus of the ulnar nerve.Footnote 1 However, the assessment of the response to these stimuli was entirely subjective. How did the strength of a “twitch” compare to that seen before giving a paralytic and “how much” fade to tetanus was seen? Then, in 1971, Ali et al. published the now classic train-of-four (TOF) as described above [2]. In addition to their quantitative measurements (e.g., the TOF ratio), they also noted that by observing the degree of fade and counting the number of twitches, clinicians could make a more objective assessment of the depth of blockade. By the 1980s, commercially available peripheral nerve stimulators provided a combination of single twitch, TOF, and tetanus.

Unfortunately, clinicians generally assumed that if “fade” to a TOF represented paralysis, then the absence of fade (4 apparently equal twitches) indicated recovery. Then, in 1985, Viby Mogensen et al. compared the visual (or tactile) assessment of fade to quantitative measurements of the TOF ratio and observed that fade was generally undetectable when the TOF ratio exceeded 0.4 — a substantial degree of continued paralysis [11]. Thus, as with clinical signs, there is a high level of false negatives; fade is not observed but residual paralysis is still present. In a clinical setting, this was observed by Debaene et al. in 237 patients (45%) who, on arrival to the PACU, had a TOFR < 0.9. However, using a nerve stimulator, only 27/237 (11%) were judged to have TOF fade. Of the 85 patients who had TOF ratio < 0.7, only 23 (27%) were judged to have TOF fade.

Although a PNS and qualitative assessments cannot provide direct verification of full reversal from neuromuscular blockade, it can still provide valuable information to aid management of neuromuscular blockade. Moreover, when used correctly, these devices can reduce the incidence and severity of residual paralysis as compared with routine (unmonitored) clinical practice. For example, TOF count can aid in the titration of neuromuscular blocking drugs during surgery and, hence, avoid overdosage. It can also be used to guide reversal. For example, it has long been known that neostigmine reversal with less than 4 twitches is likely to be unsuccessful. Fuchs-Buder et al. and Thilen et al. have demonstrated that if a patient appears to have 4 equal twitches (defined as “shallow blockade”), then full reversal (TOR ratio ≥ 0.90 as measured by quantitative methods) can be achieved in > 60% of patients with neostigmine (although reversal requires at least a full 10 min wait after administration) [12, 13]. However, attempted reversal at deeper degrees of blockade (a TOF < 0.4) with neostigmine may not be successful — and even complete reversal with the recommended doses of sugammadex is not certain (see below).

What would the ideal PNS look like? A good description was provided by Beemer et al. in 1988 [14]. Unfortunately, to the best of our knowledge, only ONE device that meets most of their requirements remains on the market (EZStim III by Halyard — although this device is primarily designed for use in regional anesthesia) as of 2023. There are many other PNS units available — but providers should be cautious about their use, particularly if they do not display delivered current. Such units also rarely autocycle so true “monitoring” (i.e., continuous assessment) is impossible.

As noted, most current PNS units include a feature that allows the delivery of a tetanic stimulus, most commonly at 50 Hz. Like TOF, the observation of fade to tetanus is an indication of remaining paralysis. But, in general clinical use, another limitation of tetanus is the failure of providers to adequately time the delivery of the stimulus — a full 5 s is needed. Observing a very strong contraction after the initial application of tetanus does not exclude fade which might be apparent later. And as with TOF, the failure to observe fade (even with a full 5s stimulus) does not exclude the presence of residual blockade. For example, Capron et al. showed that fade to tetanus was observable (or felt) only when the measured TOF ratio fell below 0.3 [15]. Similar observations have been made by others, and applying tetanic stimuli to awake (or nearly awake) patients can be extremely painful.

The use of a higher stimulus frequency (100 Hz) may substantially improve this threshold — but we are unaware of current commercial devices that will deliver this frequency of tetanus. In addition, both 50- and 100-Hz tetanus can result in “artificially” augmented TOF response (due to post-tetanic facilitation) for several minutes after delivery and hence cannot be delivered too frequently.

Many PNS units also have a feature that delivers “double-burst stimuli” (DBS). DBS involves two very brief 50 Hz burst stimuli, separated by about 750 ms. Fade is defined as a visible decrement in the response to the 2nd “burst” relative to the first. Studies suggest that compared with TOF ratios, fade to DBS can be observed with TOF ratios up to about 0.6. But again, DBS cannot be used to verify full reversal.

The relationship between these various PNS-derived assessments and MMG-defined quantitative TOF ratios is described in Capron et al. [15] and shown in Fig. 1.

Fig. 1
figure 1

From Capron Fig. 5 [15]. The top shows the results for different tests in individual patients (an up bar means fade was detected, a down bar means it was not). The bottom graph shows the summary curves derived from logistic regression analyses. Note that none of the assessment methods can reliably detect MMG-defined fade above a ratio of roughly 0.8

Post-tetanic count (PTC) was introduced by Viby-Mogensen in 1981 — and provides a method for assessing deeper degrees of block [16]. This requires the delivery of a full 5 s 50 Hz stimulus, then a pause, followed by a train of 1 Hz stimuli. As blockade deepens, the number of post-tetanic responses decreases and eventually reaches zero with extremely profound paralysis (overdosage?). An ideal PNS should be able to deliver a PTC with the push of a single button. Unfortunately, no available PNS have this capability. PTC can be assessed “manually” but only if the provider can match the described requirements (in particular, a carefully timed full 5 s tetanus and the 1 Hz subsequent stimuli). However, the authors caution against the intentional achievement of “deep block” when only a PNS is available (even with PTC assessments) unless the providers are willing to wait until sufficient spontaneous recovery has occurred (as noted, 4 apparently equal twitches or, at least, minimal fade) before attempting reversal with neostigmine. Reversal from deeper degrees of blockade without quantitative monitoring is not guaranteed to succeed, even with sugammadex. A recent publication by Bowdle et al. showed that as many as 16% of patients failed to fully reverse (as assessed with quantitative measures) with the manufacturers recommended sugammadex doses [17•].

Given the full understanding of PNS use, as well as the caveats and cautions mentioned above, it should both be clear that it is possible to manage the depth of blockade using a qualitative device, but the practicalities and limitations of doing so explain why the 2023 ASA Guidelines strongly recommend the use of quantitative monitoring.

Quantitative Monitoring

The low sensitivity of clinical and qualitative monitoring to detect residual neuromuscular blockade encouraged the (re)-introduction of quantitative monitors. Many devices have been introduced over the decades, several within the last few years. At the present time, commercially available systems are based on either AMG, KMG, or EMG methodologies.

AMG devices have been available since the 1980s (e.g., the “TOF-Watch”) and are likely the most widely used monitors world-wide — although they have not previously been generally adopted in the USA. These are suitable for use on any free-moving muscle [18], most commonly the thumb or big toe (via contraction of the adductor pollicis or flexor hallucis brevis, respectively).Footnote 2 The sensors are reusable which may reduce overall cost. A drawback is that AMG tends to overestimate the TOF ratio by at least 15%, and a baseline (pre-paralytic) TOF ratio is required for accurate estimation of TOF recovery to 0.9 [19, 20]. This is because the baseline TOF ratio measured with acceleromyography before administration of muscle relaxant usually exceeds 1.0 [21], a behavior explained by Kopman et al. in 2001 [22]. Consequently, it is necessary to correct for this baseline (“normalize” the TOF values) to accurately assess recovery. If this step is omitted (as is usual in clinical practice), a target TOF ratio of ≥ 1.0 is recommended to avoid residual paralysis [23]. Claudius et al. have also reported that AMG measurements are improved if there is some preload on the thumb [21]. However, this requires specialized adaptors and is NOT typically employed in the OR. Finally, AMG may be unusable when free movement of the thumb or toe is not present (e.g., with tucked extremities).

KMG devices measure the electrical signal generated by the bending of a piezoelectric sensor strip contained in a specially molded and reusable polymer device which conforms to the contour of the outstretched index finger and thumb. This technology is only available as part of one manufacturer’s monitoring suite. Motamed et al. have shown that KMG agrees fairly well with MMG for monitoring TOF ratio [24]. Other studies have demonstrated strong correlations with both AMG and EMG measurements [25]. However, Hemmerling and Donati point out that only at TOF ratios \(\ge\) 0.7 can the KMG be considered equivalent to other quantitative devices and raised a number of other concerns regarding the technology [26]. Although KMG is very easy to use, it is only available for measuring the response from the ulnar nerve, may not fit all sizes of hands, and requires free motion of the thumb and good strip placement between the fingers [27].

EMG-based devices measure the muscle membrane depolarization following nerve stimulation and are comparable to MMGs [28, 29]. Because they monitor membrane depolarization rather than muscle strength or contractile force, they are not limited to only using a free-moving muscle [30]. The newest units function well in a “push and play” mode (e.g., place the electrodes, push start; the devices do almost everything else needed to implement and continue monitoring). Older, manufacturer-specific modules can use disposable electrocardiogram electrodes, but the newer devices require use of single-use electrode sets that increase their cost of use. Although these devices provide excellent data in most cases, electrocautery can sometimes transiently disrupt the response. However, other forms of electrical interference are extremely rare. In addition, excessive adipose tissue or callous over the muscle being monitored can interfere with the ability to detect the small EMG signal in some patients. On the other hand, Bowdle et al. have demonstrated that there is no concern of a “reverse fade” effect as seen with AMG monitors [31], and there is no baseline calibration of any type needed before use, simplifying usage further.

The Incidence and Consequences of Residual Paralysis

Much of the discussion above has focused on the use of monitoring to ensure adequate reversal at the end of surgery. In a pre-sugammadex era, the incidence of residual paralysis (either at case-end or on PACU arrival) ranges from 30 to as high as 60%, largely regardless of the neuromuscular blocking agents used [32]. These studies were typically done in an environment without any intraoperative monitoring (or even PNS use). Most of these studies were performed with AMG (nearly always uncalibrated), but similar results have been obtained with EMG. While it is widely believed that the introduction of sugammadex eliminates this problem, there are now several studies demonstrating that residual paralysis after “blind” sugammadex administration (without the use of monitoring) can be as high as 45%. Several more recent studies have shown that the incidence of residual paralysis (with either neostigmine or sugammedex) can be reduced (perhaps to zero?) with the introduction of either intraoperative quantitative monitoring or rigorous protocol-driven PNS use [12, 33,34,35,36]. However, no data exists to allow any conclusions regarding the specific value of any of the three aforementioned technologies vis-à-vis the incidence of residual paralysis.

There is also a substantial literature documenting the adverse consequences of residual paralysis, particularly in older patients with multiple comorbidities or undergoing extensive open body-cavity surgery (laparotomy, thoracotomy) [37]. A few studies have quantified the improvement in morbidity related to reducing the incidence of residual neuromuscular blockade [38,39,40,41], and one paper suggested a substantial positive economic impact [42]. In the latter paper, the authors followed 100 patients who were managed according to the standard qualitative assessment practices of their facility but who were also tested with an EMG-based monitor just prior to extubation. The incidence of residual neuromuscular blockade (TOF ratio < 0.9) was 60%. They then used National Surgical Quality Improvement Program (NSQIP) data from their institution to determine the postoperative complication rates for those who did and did not have residual blockade. They estimated a 66% reduction in postoperative complications with an associated cost reduction of $4.6 million dollars annually by utilizing a quantitative monitoring instead of the existing qualitative methods, inclusive of the cost of acquiring and deploying the monitors [42].

The use of quantitative devices in the operating room has seen significant adoption in the last two decades, not just due to the readiness of their availability and their ability to directly interface with EMR systems for automatic documentation but also because of large studies demonstrating the improved outcomes when using them [33,34,35, 42]. These factors all weighed heavily in the 2023 guidelines update recommending quantitative over qualitative monitoring for neuromuscular blockade.