Thyroidectomy is one of the most commonly performed surgical procedures and because postoperative vocal cord paresis or palsy (VCP) is an important surgically related outcome, knowing patient’s perioperative vocal cord (VC) status is of great importance.15 However, routine laryngeal examination of the VCs after thyroidectomy remains controversial.15 One reason is because it is relatively invasive and causes patient discomfort and distress.6

Transcutaneous laryngeal ultrasound (TLUSG) is an excellent noninvasive tool for assessing VC function after thyroidectomy.710 Our early experience found that TLUSG only failed to assess VCs in 5.4 % of post-thyroidectomy patients.10 With laryngoscopic validation, the reported sensitivity, specificity, positive predictive value, and negative predictive value were 93.3, 97.8, 77.8, and 99.4 %, respectively.10 Nevertheless, despite its success, a certain percentage of patients would have “un-assessable” VCs by TLUSG and still require undergoing a formal direct laryngoscopic (DL) examination. Furthermore, even for some patients with “assessable” VCs by TLUSG, their findings might be discordant (i.e., inaccurate) compared with the “gold standard” DL. We postulate that perhaps a better understanding of the patient and surgical factors leading to VC un-assessability and inaccuracy could potentially increase the validity of TLUSG, improve patient selection and reduce the overall number of DLs required in the future. However, to our knowledge, only two studies have specifically looked at factors leading to poor VC visualization, and no studies have ever examined factors leading TLUSG inaccuracy.7,8 Furthermore, it is unknown whether body habitus, neck anthropometry, and position of the cervical incision affect the assessability and/or accuracy of TLUSG. Given these issues, our present study was designed to assess a comprehensive set of patient factors to determine which of these factors were associated with TLUSG unassessability and which of these factors were associated with TLUSG inaccuracy.

Patients and Methods

After obtaining informed consent, 602 consecutive patients agreed to participate in the study. All patients were required to have preoperative laryngoscopic assessment a day before thyroidectomy and postoperative VC assessment by TLUSG followed by confirmatory DL 7–10 days after thyroidectomy. However, because 14 (2.3 %) patients missed their scheduled postoperative VC assessment and 7 (1.2 %) patients later refused confirmatory DL, only 581 (96.5 %) patients were finally analyzed.

Postoperative VC Assessment by TLUSG and DL

Both TLUSG and DL examinations were done in separate examination rooms by two independent assessors. To minimize biases, patients were instructed not to talk or speak once entered into the room. As a result, each assessor was unaware of patient voice quality. Details of this setup had been described previously.10 All TLUSG examinations was performed by one person (KPW) using a portable ultrasound (USG) machine (iLookTM 25 Ultrasound System, Sonosite®, SonoSite Inc., Washington, United States) with a 5–10 MHz linear transducer (L25). During the TLUSG examination, the patient was positioned flat with the neck slightly extended and arms on the side. After applying ample amount of gel over anterior neck, the linear transducer was placed transversely over the mid portion of the thyroid cartilage and scanned cranio-caudally along the sagittal plane until either true or false VCs were clearly visualized. To optimize the images, the grey-scale was adjusted until the false VCs became hyperechoic while true VCs became hypoechoic. Both passive (i.e., quiet spontaneous breathing) and active (phonation with a sustained vowel “aa”) movement of the VCs were assessed. After the TLUSG, the patient was directed to a second examination room where a flexible DL (Olympus BF-P40, Bronchoscope, Olympus®, Tokyo, Japan) was performed by an independent endoscopist. Any reduced or absent movement in ≥1 VC on TLUSG or DL was classified as VCP.

Definitions on Assessability and Accuracy

TLUSG examination was defined as “assessable” if the true and/or false VCs were clearly visualized by the assessor (KPW) (Fig. 1a), whereas it was “unassessable” if ≥1 VC could not be clearly visualized. It was defined as “accurate” if the findings between TLUSG and the confirmatory DL afterwards were concordant. Examples included normal VCs movement on both TLUSG and DL, or TLUSG identified a VCP on the same side as DL. On the other hand, TLUSG was defined as “inaccurate” if the findings between TLUSG and DL were discordant. Examples included VCP on TLUSG but normal VCs on DL, or TLUSG identified a VCP on the opposite side as DL.

Fig. 1
figure 1

a Sonographic view of normal symmetrical “assessable” vocal cords using transcutaneous laryngeal ultrasound. FC false cords. b Diagram showing the planned collar incision and its relationship with hyoid bone, thyroid cartilages, and sternal notch

Parameters for Body Habitus, Neck Anthropometry, and Position of Cervical Incision

All demographic data were collected prospectively into computerized database. Body weight (in kg) and height (in cm) were measured 1 day before the operation. The hyoid cartilage, cricoid cartilage, planned neck incision, and sternal notch were marked while the patient was laid flat with the neck slightly extended (same as TLUSG examination). A caliper was used to measure the distance between these landmarks to the nearest millimeter along the sagittal plane in the midline. Neck incision was made usually one to two finger breadths above the sternal notch along the most visible skin crease. Figure 1b shows these neck anatomical landmarks.

Statistics

Statistical analysis was performed using the SPSS (version 18.0, SPSS, Inc., Chicago, IL) software package. To evaluate factors associated with VC un-assessability and accuracy, demographics, body habitus, neck anthropometry, and position of cervical incision were analyzed by logistic regression. Factors that were statistically significant were put into multivariate analysis. P values <0.05 was considered statistically significant.

Results

Table 1 shows patient demographics, indication, and extent of the operation. This cohort comprised mostly females (80.2 %) and ethnic Chinese (82.5 %). The median age at operation was 52 years. Their median height and weight were 158 cm and 59.0 kg, respectively, and BMI was 23.3 kg/m2. Forty-seven (8.1 %) had previous neck operation. Only 35 (6.0 %) patients underwent video-assisted thyroidectomy. Based on the total number of nerves-at-risk, overall rate of temporary and permanent recurrent laryngeal nerve palsy were 39/918 or 4.2 % and 7/918 or 0.8 %, respectively.

Table 1 Baseline patient demographics, body habitus, neck geometry, position of incision, indication, and extent of operation

Table 2 shows the correlation between postoperative TLUSG and DL findings. Twenty-nine (5.0 %) patients had unassessable VCs; among these, 2 (6.9 %) had confirmed VCP, and 27 (93.1 %) had confirmed normal VCs on DL. Of the 552 patients with “assessable” VCs, 44 (8 %) had confirmed VCP on DL, but of these, 4 (9.1 %) were “inaccurate” because TLUSG showed normal VCs (i.e., false negatives). However, apart from the fact that three of the four patients were male, there were no significant factors found between this group and other assessable patients (data not shown). On the other hand, among those 552 assessable patients, 508 (92.0 %) had confirmed normal VC movement on DL, but of these, 25 (4.5 %) were incorrectly seen by TLUSG as VCP. Among those with inaccurate TLUSGs, the majority (25/29) were mislabeled VCPs. In fact, more than one third (25/65 or 38.5 %) of these VCPs by TLUSG were actually normal on DL. Therefore, the overall sensitivity, specificity, and accuracy of TLUSG on diagnosing postoperative VCP were 90.9, 95.1, and 94.7 % respectively.

Table 2 Correlation between transcutaneous laryngeal ultrasonographic and direct laryngoscopic findings 7–10 days after thyroidectomy

Table 3 shows univariate and multivariate analysis of factors leading to “unassessable” VCs during postoperative TLUSG. Patients with “unassessable” VCs were significantly older (odds ratio [OR] = 1.047, 95 % confidence interval [CI] 1.02–1.076, p = 0.001) and more likely males (OR = 45.09, 95 % CI 13.36–152.2, p < 0.001). They also were significantly taller (OR = 1.153, 95 % CI 1.100–1.209, p < 0.001) and heavier (OR = 1.071, 95 % CI 1.042–1.101, p < 0.001). However, there was no difference in body mass index. In terms of neck measurements, those with “unassessable” VCs on TLUSG had significant longer distance from hyoid to cricoid cartilage (OR = 3.223, 95 % CI 1.826–5.690, p < 0.001) but shorter distance from cricoid to sternal notch (OR = 0.515, 95 % CI 0.363–0.733, p < 0.001) and shorter distance from cricoid to incision (OR = 0.659, 95 % CI 0.467–0.848, p = 0.002). Surgical indication, type of operation, presence of VCP, and TLUSG experience were not significant factors. On multivariate analysis, only older age (OR = 1.055, 95 % CI 1.016–1.095, p = 0.005), male sex (OR = 13.657, 95 % CI 2.771–67.315, p = 0.001), taller height (OR = 1.098, 95 % CI 1.008–1.195, p = 0.032), and shorter distance from cricoid cartilage to incision (OR = 0.655, 95 % CI 0.461–0.932, p = 0.019) were independent predictive factors for “unassessable” VCs on postoperative TLUSG. The VC assessability rate for patients aged <30 years, 30–40 years, 41–50 years, 51–60 years, 61–70 years, and >70 years were 34/34 (100 %), 85/86 (98.8 %), 133/139 (95.7 %), 163/170 (95.9 %), 87/96 (90.6 %), and 50/56 (89.3 %), respectively. Using the median height (158 cm) as a cutoff, patients taller than 158 cm were significantly more likely to have “unassessable” VCs than those shorter than 158 cm (26/289 or 9.0 % vs. 3/292 or 1.0 %, p < 0.001).

Table 3 Univariate and multivariate analyses of factors leading to “unassessable” vocal cord during postoperative TLUSG (n = 581)

Table 4 shows univariate analysis of factors leading to “incorrectly diagnosed” VCs by postoperative TLUSG. Only older age (OR = 1.028, 95 % CI 1.001–1.056, p = 0.040) was an independent predictive factor of incorrect assessment. The accuracy rate of TLUSG in patients aged <30 years, 30–40 years, 41–50 years, 51–60 years, 61–70 years, and >70 years were 34/34 (100.0 %), 82/85 (96.5 %), 128/133 (96.2 %), 150/163 (92.0 %), 83/87 (95.4 %), and 46/50 (92.0 %), respectively. The inaccuracy rate was significantly higher in those aged ≥50 years than those aged <50 years (7 vs. 3.2 %, p = 0.045). Gender, body habitus, neck anthropometry and position of neck incision, surgical factor, presence of VC palsy, and TLUSG experience were not significant factors.

Table 4 Univariate analysis of factors leading to “inaccurate” postoperative transcutaneous laryngeal ultrasonography (n = 552)

Discussion

To our knowledge, this is not only the first study to look at whether body habitus, neck anthropometry, and position of cervical incision affected the assessability and accuracy of TLUSG after thyroidectomy but also the largest experience on post-thyroidectomy TLUSG with laryngoscopic validation. In order for TLUSG to become a real alternative of DL examination, TLUSG must possess both excellent assessability and accuracy. However, because the two are fundamentally different, they were evaluated separately in the present study.

In terms of VC assessability, similar to previous studies,79 our data showed that TLUSG was able to clearly assess VC function in the majority of post-thyroidectomy patients. Of the consecutive 581 patients, 552 (95 %) had “assessable” VCs. This rate appeared slightly higher than previous series reporting assessability rates of 80–90 %.79 However, because this depends on a multitude of factors including assessor’s experience and patient selection, a direct comparison is probably not valid. Nevertheless, similar to other studies, our data also showed that advanced age and male sex were significant independent factors of poorer VC assessability. Our data showed that there was a progressive decrease in VC assessability as patients became older. The assessability rate gradually dropped from 100 % in patients aged <30 years to 89.3 % in patients aged >70 years. Apart from advanced age, male patients also were significantly less assessable than their female counterparts. In fact, after adjusting for other factors, male patients were more than 13 times more likely to have “unassessable” VCs than female patients. Interestingly, of the 466 female patients assessed, only 3 (0.6 %) patients had unassessable VCs. Therefore, TLUSG is very applicable in young, female patients. These findings concurred to other studies (refs). In addition to age and sex, two new independent factors that had been previously reported were body height (OR = 1.098, 95 % CI 1.008–1.195, p = 0.032) and distance from collar incision to thyroid cartilage (OR = 0.655, 95 % CI 0.461–0.932, p = 0.019). Our data showed that those who were taller in height also were significantly less likely to have assessable VCs than those who were shorter in height (p = 0.032). Using the median height as a random cutoff, patients taller than 158 cm were significantly more likely to have “unassessable” VCs than those whose height was ≤158 cm (9.0 vs. 1.0 %, p < 0.001). Similarly, using the median distance as a cutoff, patients with distance <60 mm from wound to thyroid cartilage were significantly more likely to have “unassessable” VCs than those >60 mm (6.9 vs. 3.9 %, p = 0.050).

Because the quality of the TLUSG images depends solely on the propagation of USG waves from the skin through the body of the thyroid cartilage to the VCs and back, we postulate that all four factors are probably directly associated with wave propagation in one way or another. Because ossification of the thyroid cartilage occurs with increasing age, we believe this might be one explanation why older age tends to have lower assessability rate.1113 Regarding why male patients might have lower assessability rate, there might be two explanations. First, male patients tend to have more angulated thyroid cartilage and that renders less physical contact between the linear probe and skin and, therefore, less wave propagation. Second, male patients might have thicker thyroid cartilage and that makes propagation of USG waves more difficult. Similarly, we hypothesize that taller individuals might have thicker thyroid cartilage and, hence, less wave propagation. However, to better understand the relationship between age, sex, height, and assessability, further anatomical studies are needed. The distance between the incision and thyroid cartilage also might be important in VC assessability during TLUSG, because those with shorter distance (or higher collar incision relative to thyroid cartilage) might suffer more postoperative swelling closer around the thyroid cartilage 14 and, thus, less USG wave propagation.

There were some important negative findings worth noting. Both obesity and short neck (i.e., distance from hyoid to sternal notch) were not risk factors for VC assessability. Although assessor’s experience is believed to be important presumably for both “assessability” and accuracy, our data did not find any significant difference in both assessability and accuracy between first 50 case and the subsequent cases. However, further studies on the learning curve for postoperative TLUSG are needed.

In terms of assessment accuracy, older age was the only significant factor. The inaccuracy rate was significantly higher in those aged ≥50 years than those aged <50 years (7 vs. 3.2 %, p = 0.045) and given that the majority of them (25/29 or 86.2 %) were falsely diagnosed VCP by TLUSG, perhaps, those patients (particularly aged ≥50 years) with VCP on TLUSG findings would benefit from formal laryngoscopic verification.

Despite our data, there were several shortcomings with our study. First, although independent factors for VC assessability and assessment accuracy were identified, the actual reason for these findings remains unexplained. At this stage, we could only hypothesize that some of these factors might have impaired USG propagation leading to poor assessability and accuracy. We suspect that ossification and thickness of cartilage might only be part of the explanation. Another shortcoming was that because all TLUSG was performed by one single person, reproducibility of our results remains questionable. Nevertheless, our recent study has found that our technique could be reproducible by surgeons outside our institution.15 Because majority of patients were Asian, a multicenter, international study involving other ethnicity and body build would be worthwhile.16 Because our cohort was relatively slim (median body mass index of 23.3 kg/m2), it would be interestingly to see if these factors remain significant in a more obese population.

Conclusions

Older age, male sex, tall in height, and incision closer to the thyroid cartilage were independent factors for unassessable VCs on TLUSG, whereas older age was the only significant factor for inaccurate TLUSG assessment in the post-thyroidectomy setting. Because more than one-third (38.5 %) of VCPs were in fact normal (i.e., inaccurate), patients labeled as VCP on TLUSG would benefit from laryngoscopic verification. Our findings have important implications for future patient selection and, perhaps, technique modification.