Introduction

Critically ill patients almost inevitably suffer an important and accelerated skeletal muscle loss already occurring in the first few days of intensive care unit (ICU) stay [1,2,3].

This pathological condition, could represent a major cause of delayed weaning from mechanical ventilation, and is a well-known predictor of increased in-hospital mortality and morbidity [4, 5]. Muscle mass at the time of ICU admission and discharge also has a significant impact both on the patients’ outcomes and on the degree of functional recovery achieved in the medium- and long term in survivors [6, 7]. Finally, low muscle mass is associated with increased disability and higher risk for discharge into long-term care facilities [8, 9].

The pathogenesis of muscle wasting in the ICU is complex. Many factors are involved, such as undernutrition, increased catabolism due to stress-related cortisol response and, systemic inflammation, acute comorbidities (trauma, burns etc.), immobilization and the use of sedation/neuromuscular blockers [2].

A key problem in the critically ill is the lack of adequate tools for routine muscle mass evaluation and monitoring at the bedside [10]. In fact, the deranged metabolic milieu, as well as fluid overload and the acute phase response may significantly interfere with the use of conventional methods for muscle mass evaluation, such as anthropometry and bioimpedance analysis [10]. This is even more true in case of Acute Kidney Injury (AKI) [10], a frequent complication in this clinical setting, especially when sepsis coexists or develops [11, 12]. Even though Dual Energy X-ray Absorptiometry (DEXA), Computed Tomography scan (CT) and Magnetic Resonance Imaging (MRI) are considered the reference standard techniques for the assessment of skeletal muscle mass and body composition, they cannot be used routinely with this aim in the ICU [13].

The use of ultrasound (US) for the assessment of muscle mass has aroused considerable interest in recent years. Muscle US is a noninvasive technique easily applicable at the bedside even in non-collaborative patients, it is economically advantageous, viable, safe and does not require specialized staff or X-ray exposure [14,15,16]. Its reliability has been recently well documented in critically ill patients with AKI [15]. In addition, the US technique seems to be poorly influenced by the rapid and relevant fluid shifts typical of patients with AKI on Renal Replacement Therapies (RRT). In fact, no differences were found in these patients between measurements performed before and after RRT sessions [15], and this features has also been confirmed in end-stage renal patients on conventional hemodialysis [16]. However, to the best of our knowledge, a formal validation study of US assessment of skeletal muscle against a gold standard technique in the setting of AKI has never been performed. Pending the results of such study, US measurement of muscle mass may not be safely used for clinical practice in AKI, and US measurement numeric values may not be compared across different studies.

With this background, we aimed at validating US for the assessment of quadriceps femoris thickness in critically ill patients with AKI, using CT scan as the reference method. To this purpose, we applied a novel analytic approach that allows a detailed assessment, at each muscle site, of the amount of differential and proportional bias between US and CT measurements, as well as the precision of US measurements in comparison to CT.

Methods

We conducted a cross-sectional observational study in the Renal ICU of the Parma University Hospital. Procedures were performed in accordance to the Helsinki declaration. Informed consent was obtained from patients or their next of kin. The study was approved by the Local Ethics Committee Area Vasta Emilia Nord (AVEN). Adult patients with a diagnosis of AKI on the basis of the KDIGO criteria [17] consecutively admitted from March 15, 2017 to March 15, 2018, and in whom CT scan was performed for any medical reason, were eligible. We used the STARD checklist when writing our report [18].

US technique

Quadriceps rectus femoris thickness (QRFT) and quadriceps vastus intermedius thickness (QVIT) were measured by B-mode ultrasonography, wall tracking ultrasound system (Philips hd7xe) with a 7.5 MHz linear array transducer (L12-3 transducer), as previously described in detail [15]. The right and left quadriceps values were assessed in both legs with the patient lying in a supine position with both knees extended but relaxed and toes pointing to the ceiling. A metric tape was used to identify and mark the two reference points in each leg. QRFT and QVIT were measured at the border between the upper third (RF,Prox; VI,Prox) and lower two-thirds (RF,Dist; VI,Dist) between the anterior superior iliac spine (ASIS) and the upper pole of the patella [15, 19]. The transducer was placed perpendicular to the long axis of the thigh with a large amount of gel and with no pressure to avoid compression of the muscle. The assessor was positioned on the side of the patient while performing the measurements, and was allowed to tilt the probe to obtain the best possible image, in which RF and VI would be aligned and centered. Measurements were performed directly on the ultrasound machine while obtaining the images. The vertical diameter of the muscles was measured at the widest point, on the inner edge of the muscle fascia. All thickness measurements are expressed in centimeters (Online resource 1). Ultrasound measurements were performed immediately before or not later than 12 h after CT scan (the median time lapse between US and CT scans measurements were 3 h after the CT). The US assessor was blinded to the CT scan results.

CT scan technique

CT scans were performed using a Somaton Definition Flash CT scanner. Patients that needed CT scan for any medical reason were eligible. When the physician decided a patient needed a CT scan, the responsible researcher contacted the reference radiologist for the study protocol, which was responsible for arranging the radiological measurements. At the time of CT scan, the exam was extended to obtain a single slice for each point of reference in the legs (two images per patient). There was no limitation regarding the type of CT needed by the patient (abdominal, lung, lung + abdominal or lower limbs). Scans were taken exactly at the same sites used for the US. Sites utilized for CT scans and US measurements were marked with a temporary plastic electrode. Rectus femoris and vastus intermedius thicknesses were calculated using the Siemens Magic View VE 40 software, after manual outline with a movable cursor. The radiologist performing QRFT and QVIT assessments was blinded to US measurements.

Demographics, clinical data, renal function and outcome

Data were collected as per institutional routine at the time of ICU admission and during ICU stay, with special regard to demographic, clinical and laboratory data, renal function, acute and chronic comorbidities, severity of illness (APACHE II and SOFA scores), and data on renal replacement therapy.

Statistical analysis

Validation studies require at least 100 measurements [20]. In our study, we planned the enrollment of 30 patients with 8 measurements per patient (four in each leg). Stata SE release 15 (2017, StataCorp, College Station, TX, USA) was used for all the analyses which we carried out in three steps. First, we fitted a bivariate mixed model to joint CT and US data using the Stata program gsem with patients included as random effects, in order to estimate the differences in muscle thickness between muscle types (VI vs RF), different positions (distal vs proximal) and different sides (left vs right). Since we did not find any difference between left and right side, in the subsequent analyses we regarded the two sides as duplicate measurements of a constant value. The lack of difference between the right and left side was an expected finding since none of the patients had history of surgery on lower limbs and none of them were athletes. In a second stage, we used the approach of Taffé [21] to consistently quantify, for each muscle type and position, the amount of differential and proportional biases between US and CT (which we displayed as “bias plots”) and to compare precision between the two methods (which was displayed as “precision plots”). These analyses, which were carried out with the program biasplot [22], allowed for heteroscedastic measurement errors (i.e. measurement error changing with the level of the true -latent- value of muscle thickness). Since the differential and proportional bias between US and CT were non-statistically significant in any muscle type or positions, in the final stage we pooled all the data together and drew a Bland–Altman plot with 95% limits of agreement. We calculated those limits assuming that the observed differences between US and CT resulted from the sum of the overall mean difference (bias), of random-subjects effect (heterogeneity) and of random error within the subject [23]. For the purpose, we calculated the paired difference between US and CT and fitted a mixed model with muscle type and position as fixed effects.

Results

Thirty-four patients were eligible for the study. We enrolled 30 critically ill patients (17 males) with AKI, and we obtained 233 coupled measurements (1 patient had all his 4 proximal measurements excluded because the CT image was obtained on the wrong place, 1 patient was morbidly obese and his proximal VI muscle in both legs were not visible, and in 1 patient the image obtained of his proximal VI muscle on the right leg had an artifact that did not allow for a measurement). Four patients were excluded because CT scan images were not available due to technical or clinical problems. Patients were studied within 5 days (range 1–19) of the diagnosis pf AKI. The mean age of the cohort was 70 (± 13.6) years. Clinical and demographic data are shown in Table 1. The average APACHE II score was 21 (± 6); the median SOFA score was 7 (2–16). The majority of patients (17/30, 57%) had chronic kidney disease (CKD) prior to the ICU admission (AKI on CKD). Sixty percent of patients (18/30) underwent renal replacement therapy (RRT) within the first 24 h after ICU admission. Sixty-seven percent of all patients were oliguric and 27% were septic. Two hundred thirty-three couples of measurements were analyzed.

Table 1 Demographic and clinical variables

Table 2 reports the average muscle thickness of each measurement. Bivariate analysis (Fig. 1) showed that, as expected, US and CT yielded identical values of muscle thickness comparing left to right side (− 0.03 cm [P = 0.32], and -0.018 [P = 0.50], for US and CT, respectively). In contrast, in both US and CT scans, VI differed from RF, and distal differed from proximal measurement by approximately − 0.3 cm (P < 0.001 for the comparison between VI vs RF and between proximal vs distal, both in US and CT scans; Fig. 1). The estimated SD of measurement error was approximately 0.2 cm for both techniques, although it was numerically slightly larger for US compared to CT (Fig. 1). The overall standard deviation of between-individual differences was approximately 0.35 cm (Fig. 1).

Table 2 Pairwise comparison between ultrasound and CT scan measurements
Fig. 1
figure 1

Schematic representation of the bivariate mixed model fitted on joint CT and US data to estimate the differences in muscle thickness between muscle types (VI vs RF), different positions (distal vs proximal) and different sides (left vs right). There was no difference in muscle thickness between the left and right side for both CT [− 0.018 cm (P = 0.50)] and US [− 0.03 cm (P = 0.32)]. On the other hand, VI muscle thickness was lower compared to RF [− 0.286 cm for CT (P < 0.001) and − 0.264 for US (P < 0.001)] and distal muscle thickness was lower compared to Proximal thickness [− 0.34 cm for CT (P < 0.001) and − 0.313 cm, for US (P < 0.001)]. The variance of measurement error was 0.042 cm, and 0.052 cm for CT and US, respectively (the standard deviations, which are obtained by taking the square root of the variance, were 0.20 cm, and 0.23, respectively; P = 0.14). The between-subject variance was 0.12 cm (the standard deviation was 0.35 cm). The expressions “Gaussian” and “Identity” in the square boxes indicate that the dependent variables CT and US were analyzed as normally distributed variables (i.e. “Gaussian”), and that the regression model was an ordinary linear regression model on its natural scale (i.e. “Identity”), in cm. Rectangles represent the independent variables, whereas circles represent errors. According to the model there are two kinds of error (i.e. two causes of random variation about the population average at each measurement site) namely, measurement error (i.e. intra-patient variability), which is represented by the circle containing the letter “ε” and, error due to inter-patient variability, which is represented by the circle containing (“id”); the latter is drawn under a gray-shaded stripe to indicate that this error is shared by all measurements taken from the same patient. Arrows represent what causes a given CT or US measurement take its specific observed value. Number along arrows represent coefficients, whereas numbers close to circles represent the variance of the error, and in the square box the overall mean in the reference category (i.e. proximal right rectus femoralis). For instance, for a given patient, the CT distal vastus intermedius measurement is equal to 1.539 cm, minus 0.286 cm (because the site is vastus intermedius instead of rectus femoralis), minus 0.342 cm (because the site is distal instead of proximal), plus/minus the measurement error ε1 in cm, plus/minus the extent in cm the patient differed from average value. CT computer tomography scan, US ultrasound scan, VI vastus intermedius muscle, RF rectus femoralis muscle, Prox proximal measurement, Dist distal measurement ε1, variance (i.e. standard deviation squared) of CT measurement error; ε2, variance (i.e. standard deviation squared) of US measurement error; id between-subject variance (i.e. standard deviation squared) in muscle thickness

Figure 2a–d and Table 3 shows the bias analysis comparing US and CT. When comparing US to CT, both the observed differential bias (between + 0.04 and + 0.26 cm, depending on the muscle site) and the proportional bias (between 82 and 98% of the reference value, depending on the muscle site) were not statistically significant. Besides statistical significance the point estimates of the differential bias and proportional bias of US vs CT were remarkably close to the null value (i.e. 0 cm, and 100%, respectively), with the possible exception of the RF, Proximal (Fig. 2a; Table 3).

Fig. 2
figure 2

Bias plots showing bias comparing US (blue dots and fitted line) vs CT (brown dots and fitted line). The left y-axis shows the US and CT measures (in cm), whereas the right y-axis shows the bias (cm). The x-axis reports the true (latent) value of muscle thickness. The dotted red line refers to the bias, which has to be compared to the horizontal red line representing the ideal line of complete absence of bias. The bias changes linearly as a function of the true (latent) value of muscle thickness. The subtitle on the top reports the bias as absolute difference in cm (differential bias) and relative difference in percentage (proportional bias). Numerically, compared to CT, on US scan RF, Prox was on average + 0.26 cm thicker (differential bias), although the percentage difference of US vs CT was 83% (proportional bias) implying that the bias tended to change with larger absolute values of muscle thickness: the larger the muscle thickness the less positive the bias. CT computer tomography scan, US ultrasound scan, VI vastus intermedius muscle, RF rectus femoralis muscle, Prox proximal measurement; Dist distal measurement (color figure online)

Table 3 Estimated bias comparing US vs CT

Figure 3a–d reports the precision plots, showing that, confirming the finding reported above, US scan tended to be a slightly less precise technique compare to CT, over all the range of values of muscle thickness.

Fig. 3
figure 3

Precision plots showing precision comparing US (blue circles) vs CT (brown circles). The y-axis represents precision, which is displayed as the standard deviation σ (i.e. the square root of the variance) of the measurement error in cm. The x-axis reports the true (latent) value of muscle thickness. Compared to CT (brown circles), US (blue circles) tended to be slightly less precise (i.e. to have a larger values of σ) over the entire range of values of muscle thickness for any measurement (RF, VI, proximal and distal). CT computer tomography scan, US ultrasound scan, VI vastus intermedius muscle, RF rectus femoralis muscle, Prox proximal measurement; Dist distal measurement (color figure online)

Since the differential and proportional bias estimates were similar between VI, RF, proximal and distal, and the measurement error was anyhow close between US and CT and it was also approximately constant over all the range of muscle thickness values, we pooled all the data to draw a Bland–Altman plot with 95% limits of agreement (Fig. 4) which were between − 0.34 and + 0.36 cm.

Fig. 4
figure 4

Bland–Altman plot showing 95% limits of agreement between US and CT with all data pooled together. The y-axis represents the difference between US and CT, the x-axis their average. The solid horizontal black lines represent the 95% limits of agreement, which were − 0.34 and + 0.36 cm, respectively. The dotted red lines represent the (negligible) bias, both as a constant value (horizontal line, analogous to the differential bias shown above) and as a linear function of the mean (line with declining slope, analogous to the proportional bias shown above). CT computer tomography scan, US ultrasound scan, VI vastus intermedius muscle, RF rectus femoralis muscle, Prox proximal measurement; Dist distal measurement (color figure online)

Discussion

In the present study, we newly report how the US technique compares to CT scan for the measurement of quadriceps muscle thickness of critically ill patients with AKI. Our study provides evidence that, compared to CT scan, US bias is negligible for most of the measurements, and its precision is close to that of CT scan.

Data are in accordance to other studies validating the US technique for the assessment of quadriceps muscle mass in clinical settings different from the ICU [24, 25]. In one study in patients with coronary artery disease (CAD), rectus femoris thickness of 20 patients was measured by US and compared to CT scans [24]. A high correlation between measurements with low bias and narrow limits of agreement was found. In another study in patients with chronic obstructive pulmonary disease (COPD), rectus femoris cross-sectional area (RFCSA) assessed by US was compared to the whole quadriceps cross-sectional area (QCSA) assessed by CT [25]. A high intra-class correlation coefficient (ICC = 0.88), with non-significant bias, was found. Recently, muscle US measurements have been validated against CT scan also in patients with chronic kidney disease [26]. Similarly, studies comparing muscle mass measurement obtained by MRI, another gold standard technique, and US found no difference between the different methods, again confirming very high correlation coefficients and agreement in young and elderly healthy subjects [27,28,29]. In a study comparing quadriceps thickness measured by US and DEXA in COPD patients, US was found to have good reproducibility, and to be more sensitive to changes in muscle mass when compared to DEXA [30].

To our knowledge, this is the first study evaluating the validity of quadriceps muscle thickness assessment by US against a standard reference method in a cohort of critically ill patients. Earlier studies in critically ill patients compared US with muscle biopsies [1], or with muscle strength, as assessed using the Medical Research Council score (MRC-SS), or with muscle function, as assessed using the physical function in intensive care test score (PFIT-s) and the ICU mobility scale (IMS) [31]. In these studies, muscle US was able to detect muscle loss [1], and to predict muscle strength and function [31].

Muscle US has been shown to be sensitive enough to detect even small changes in muscle mass during the first 10 days of ICU stay [1, 31, 32]. In addition, muscle wasting, as assessed by bedside US, was able to predict adverse outcomes in surgical ICU patients [33]. In the critical care setting, the stratification of patients at risk of muscle wasting is essential to allow the optimization of the clinical and therapeutic management aimed at preventing muscle loss, and muscle US could represent a very useful screening tool in this regard [34]. A recent review analyzed 7 studies for a total of 330 patients admitted to the ICU for at least 7 days, suffering from sepsis and multi-organ failure, in which the Authors used US to evaluate muscle thickness or cross-sectional area at the level of the arm, forearm and thigh [14]. Muscle thickness at ICU admission was significantly decreased compared to healthy controls. In addition, decreased quadriceps muscle size as measured by US was an independent risk factor for unscheduled readmission or death in another study [35]. Thus, muscle US could represent in the future an important tool for both nutritional screening and prognostic assessment.

One additional strength of our study is that it provides US methodology estimates that can be used to develop and implement new protocols. Besides quantifying measurement error (standard deviation of 0.2 cm), we demonstrated that in a relatively old non-athletic population there is no difference between quadriceps muscle thicknesses in both legs, as expected, that the proximal measurements were thicker than the distal measurements and that the rectus femoris muscle was thicker than the vastus intermedius by about 0.3 cm. However, in the bias analysis, we noticed that the rectus femoris proximal measurement (RF, Prox) had the largest measurement error. Despite its non-statistical significance, it is important to notice that this repere tended to be the largest, suggesting a possible source of error in untrained assessors that might put more pressure on the probe to visualize the whole muscle. To allow for an accurate image and measurement is very important to use excess contact gel between the probe and the thigh, in order to put as little pressure as possible. Overall, the US took less than 10 min to set up and complete image acquisition and less than 10 min per image to complete measurement analysis.

It is important to address the limitations of our study, as well as the possible limitations for the use of US for the assessment of muscle mass. First of all, due to the limited number of patients enrolled we could not explore the prognostic value of quadriceps muscle thickness in the specific population of critically ill patients with AKI. Nevertheless, as reported above, a recent study in the ICU setting suggests that reduced muscle mass as assessed by US may predict adverse outcomes [33]. Secondly, the assessment of muscle thickness by US may be operator dependent. In our study, only one experienced assessor was responsible for all the US measurements. However, a reliability study published by our group on patients in the same clinical setting found high intraclass correlation coefficients (ICC) between non experienced operators that have received formal training and followed a standardized protocol in order to obtain US images and measuring muscle thickness [15].

In conclusion, US is a simple, easily applicable, valid, accurate and reliable method for skeletal muscle evaluation in critically ill patients with AKI. In these patients, quadriceps muscle thickness assessed by US is consistent with CT measures, and could have value both in the clinical practice of nutritional support, as well, as potentially, for risk stratification.

Further studies aimed at defining cut-off values for normal muscle mass are needed, in order to allow the early identification of patients with low muscle mass at ICU admission.