Introduction

Muscle and adipose mass are major determinants of outcomes in acute and chronic conditions such as critical illness [1], chronic heart failure [2], and cancer [3, 4]. Muscle and adipose tissue mass are determined by body composition measurements. Changes in body composition are important factors influencing prognosis and therapeutic guidance in terms of nutrition [5], medication dose [3], and physiotherapy [5]. A higher than 5% or 10% decrease in body weight within or beyond 6 months, respectively, is a major indicator of malnutrition [6]. It is important to know about muscle or fat loss [6]. The reference methods of body composition measurement are based on imaging, such as magnetic resonance imaging (MRI) [7], computed tomography (CT) [8, 9], and dual-energy X-ray absorptiometry (DEXA) [8]. Other non-bedside methods are based on plethysmography using displacement of air or water [10]. Bedside evaluation methods are based on ultrasound [11], anthropometry [12], or bioimpedance measurement [8]. Determinations of adipose tissue thickness by using ultrasound are more precise compared to those performed using skinfold measurements [13]. Muscle thickness can also be determined using ultrasound [11, 14]. Ultrasound involves three steps: marking of measuring points, scan performance, and evaluation of muscle and adipose tissue thickness [11]. Each of these steps affects the reliability of the measurement of muscle and adipose tissue thickness. Reliability is influenced by patient- and operator-related factors: patient’s position (use of supports and rotation of limbs), the bodyside (one side might be easier to scan than the other one), marking of measuring points (the use of the same site for each scan repetition), the scan performance (tilting of the probe, minimal compression with the probe, and ensuring a sufficient scanning depth to visualize the bone surface), and evaluation of ultrasound scans (correct identification of the muscle fascia and bone surface) [11, 15]. In previous reliability studies, the identification and quantification of the above-mentioned patient- and operator-related factors responsible for the differences in thickness values between repetitions were challenging [15]. Only a minority of researchers repeated the marking of measuring points. Repeating the marking of the measuring points is crucial when analysing intra- or inter-examiner reliability for thickness measurements using ultrasound [15,16,17,18,19,20,21,22]. However, the reliability for marking measuring points has never been assessed on its own. Therefore, the extent to which the repeat marking of measuring points influences the reliability of the thickness measurements remains unclear. Knowledge of the reliability of the ultrasound method is important for determining the method’s precision. Furthermore, knowledge of the precision of the method will allow accurate detection of the changes in muscle and adipose tissue thickness. On the basis of their reliability results, Fivez et al. stated that they would detect a 20% decrease in muscle thickness in adult ICU patients and a 30% decrease in paediatric ICU patients [23]. However, they did not state if they repeated the marking of measuring points. Repeat marking of measuring points is crucial in accounting for all the operator-related factors stated above.

Therefore, the aim of this study was to determine the intra- and inter-examiner reliability for marking points to measure adipose tissue and muscle thickness and assessing the reliability in terms of the performance and evaluation of the corresponding ultrasound scans.

Materials/subjects and methods

Study design and population

One hundred and twenty non-critically ill patients were recruited for the reliability assessment of the ultrasound method in the USVALID study at the Medical University of Vienna from 2017 to 2018 (clinicaltrials.gov identifier: NCT03160222). Patients were included in the USVALID study if they had undergone CT at the level of the third lumbar vertebra for any clinical reason. The ultrasound examination had to be performed within 48 h of the CT scan. Patients were excluded if they were younger than 18 years. The study was approved by the Ethics Committee of the Medical University of Vienna and conducted according to the Declaration of Helsinki. Reporting was done according to the STROBE guidelines [24].

Ultrasound examination

An illustrated step-by-step guide with all details of the USVALID ultrasound examination has been published [11]. The methodology of the examination is recapitulated in Supplementary Information, Table S1. In brief, the ultrasound examination comprised three steps:

Step 1 – Marking of measuring points: precise anatomical landmarks were palpated to determine the upper arm and thigh length. The anterior and anterolateral measuring points were marked on each side at 70% of the upper arm length. The ventral, lateral, and medial measuring points were marked on each side at 50% of the thigh length (Fig. 1). Furthermore, limb circumferences were measured at 70% of the upper arm length and 50% of the thigh length on each side, and limb lengths and circumferences were noted.

Fig. 1: Marking of measuring points.
figure 1

3 measuring points are marked on the thigh on both sides. 2 measuring points are marked on the upper arm on both sides. “Reprinted and adapted from Clinical Nutrition Experimental, 32:38–73, Fischer A, Anwar M, Hertwig A, Hahn R, Pesta M, Timmermann I, Siebenrock T, Liebau K, Hiesmayr M, Ultrasound method of the USVALID study to measure subcutaneous adipose tissue and muscle thickness on the thigh and upper arm: an illustrated step-by-step guide, Copyright (2020), with permission from Elsevier” [11].

Step 2 – Ultrasound scanning: two scans were performed at each measuring point: one in the short-axis plane and another in the long-axis plane. Scanning in both planes helped identify the muscle fascia, especially when oedema is present [11, 14, 20, 25,26,27]. Minimal compression was applied using a gel pad and additional gel on top of it (Supplementary Information, Fig. S1). Furthermore, to guarantee minimal compression, it was verified that the borders of each scan were blurred (Supplementary Information, Fig. S1). The Siemens Acuson Freestyle ultrasound scanner was used.

Step 3 – Evaluation of muscle and adipose tissue thickness: muscle and adipose tissue thickness were measured on each scan by using the inbuilt callipers of the ultrasound machine. The scan quality was evaluated according to the visibility of the muscle fascia and bone surface. In quality 1 scans, the muscle fascia and bone surface were clearly visible. In quality 2 scans, it was possible to spot the muscle fascia and bone surface. In quality 3 scans, the muscle fascia and bone surface were indistinguishable and no evaluation was possible [11]. After the quality assessment, the adipose tissue and muscle thickness were measured at the exact center of the scan by drawing the shortest possible line from the skin to the bone surface. Adipose tissue thickness including that of the skin and muscle fascia was measured. The thickness of the entire muscle without that of the muscle fascia was measured because it is easier to delineate muscle fascia from muscle tissue than to delineate muscle fascia from subcutaneous adipose tissue (Supplementary Information, Fig. S1) [11].

Assessment of intra- and inter-examiner reliability assessment

There were five examiners. Before the study, one experienced examiner trained four novice examiners. The ultrasound examination was performed 20 times in healthy subjects by each examiner. The first and second runs of the ultrasound examination required 2 h in all. To reduce the burden of the examination for the patient, intra- and inter-examiner reliability was assessed in two groups of 60 patients. For assessing intra-examiner reliability, it was necessary that both the first and second runs of the ultrasound examination in 60 patients be performed by the same examiner. Regarding the assessment of inter-examiner reliability, it was necessary that the second run of the ultrasound examination in 60 patients be performed by a different examiner. The first run of the ultrasound examination was performed on both sides. After the first run, all markings of the measuring points were erased with a disinfectant. After the measuring points were erased, the second run was performed by the same or a different examiner on only one randomly chosen body side to reduce the burden of the examination for the patient. During the second run, the examiner (same or different) had to relocate the markings of the measuring points by measuring the limb lengths. The examiner of the second run noted the values of limb length and muscle and adipose tissue thickness on a different case report form to ensure that the examiner was not influenced by the values of the first run.

Statistical analysis

First, intra- and inter-examiner reliability for marking the measuring points was assessed by comparing limb length measurements from the first and second runs. Second, intra- and inter-examiner reliability for the performance and evaluation of scans was assessed by comparing adipose and muscle thickness values from the first and second runs. For all reliability analyses, measures of correlation (correlation coefficient, CC) and magnitude of the error (Bland–Altman plots) were reported [28, 29].

For each measuring point, linear mixed models were computed. These included muscle thickness values from the first and second runs, random effects of the examiner (n = 5) and patient (n = 60), and a nested effect for the bodyside (right or left). The intra- and inter-examiner CCs were defined as the proportion of variance explained by the patient and body side in the linear mixed model. In mathematical terms, the inter-examiner CC at a certain measuring point was defined as follows:

$$\begin{array}{l}Interexaminer\;CC(muscle\;thickness)\\ = \frac{{V\left( {patient} \right) + V(body\;side)}}{{V\left( {patient} \right) + V(body\;side) + V\left( {examiner} \right) + V(residual)}}\end{array}$$
(1)

where V (patient) = estimated variance of muscle thickness values between patients (n = 60 patients)

V (body side) = estimated variance of muscle thickness values between the body sides of patients (right or left body side)

V (examiner) = estimated variance of muscle thickness values due to the examiner (n = 5 examiners)

V (residual) = residual variance of muscle thickness values

For the assessment of intra-examiner reliability, the same examiner was required to perform both the first and second runs of the ultrasound examination. Thus, the equation for intra-examiner CC only accounts for different patients and body sides and not for different examiners:

$$\begin{array}{l}Intraexaminer\;CC(muscle\;thickness)\\ = \frac{{V\left( {patient} \right) + V(body\;side)}}{{V\left( {patient} \right) + V(body\;side) + V(residual)}}\end{array}$$
(2)

where the parameters are defined as above.

The closer the CC was to 1, the more the variance between the first and second ultrasound runs because of the patient rather than the examiner. In other words, the closer the CC was to 1, the better was the intra- or inter-examiner reliability.

Intra- or inter-examiner CCs for adipose tissue thickness at each measuring point for limb length and limb circumference were calculated analogously.

Furthermore, Bland–Altman plots were used to illustrate differences in limb length and muscle and adipose tissue thickness values in scans repeated by the same or different examiner. On the Bland–Altman plots, the 95% lower and upper limits of agreement (LOAs) were respectively calculated as mean difference −1.96 × SD and mean difference +1.96 × SD. The Bland–Altman plots for intra-examiner reliability display all examiners and those for inter-examiner reliability display all combinations of examiners. R version 3.6.3 (or higher) was used for statistical analysis.

Results

Description of the study population and scans

The CONSORT flow diagram of included patients is presented in Fig. 2. Patients’ baseline characteristics are presented in Table 1. A total of 3552 (98.7%) of the 3600 planned scans could be performed in 120 patients, of which 3118 (86.6%) scans had the best quality (quality 1) (Supplementary Information, Table S2). The measuring points that most frequently had the best quality (quality 1) were the anterior measuring point of the upper arm and the ventral measuring point of the thigh. At both measuring points, more than 95% of the scans were quality 1 (Supplementary Information, Fig. S2).

Fig. 2
figure 2

CONSORT flow diagram for the intra- and inter-examiner reliability assessment of the USVALID study.

Table 1 Baseline characteristics of the study population (n = 120).

Intra- and inter-examiner reliability for measuring limb lengths, i.e. for marking the measuring points

Intra- and inter-examiner reliability was high to very high for measurements of the upper arm length (intra-examiner CC = 0.96, inter-examiner CC = 0.74) and the thigh length (inter-examiner CC = 0.85, intra-examiner CC = 0.96) (Table 2). For intra-examiner reliability of thigh length measurements, the 95% lower and upper LOA were −1.67 and 1.75 cm, and the corresponding values for inter-examiner reliability were −4.12 and 4.44 cm (Table 2 and Supplementary Information, Fig. S3).

Table 2 Intra- and inter-examiner reliability for length and circumference measurement on the thigh and upper arm.

Intra- and inter-examiner reliability for performance and evaluation of scans for adipose tissue and muscle thickness

Intra- and inter-examiner reliability for performance and evaluation of scans depended on the measuring point: For adipose tissue thickness, all measuring points showed high to very high intra- and inter-examiner correlation coefficients ranging from 0.70 to 0.97 (Table 3). For muscle thickness, the lateral and medial measuring points of the thigh, as well as the anterolateral point of the upper arm showed inter-examiner correlation coefficients below 0.65 (Table 4). In contrast, the ventral measuring point of the thigh and the anterior measuring point of the upper arm showed high to very high inter-examiner correlation coefficients ranging from 0.77 to 0.93 (Table 4). For intra-examiner reliability, the 95% lower and upper LOA values were between −0.45 and 0.57 cm compared to −0.81 and 0.80 cm for inter-examiner reliability when performing and evaluating scans for muscle thickness at the ventral measuring point of the thigh (Table 4). Bland-Altman plots for intra- and inter-examiner reliability are presented in the Supplementary Information, Figs. S4-S7.

Table 3 Intra- and inter-examiner reliability for adipose tissue thickness.
Table 4 Intra- and inter-examiner reliability for muscle thickness.

Discussion

The intra- and inter-examiner reliability for marking the measuring points was high to very high. The intra- and inter-examiner reliability for performance and evaluation of thickness depended on the measuring points: for adipose tissue thickness, all measuring points showed high to very high intra- and inter-examiner correlations. For muscle thickness, only the ventral measuring point of the thigh and the anterior measuring point of the upper arm showed high to very high intra- and inter-examiner correlations.

Reliability for marking the measuring points

Measurement of the limb length for locating measuring points is a crucial step in ultrasound examinations. Since the intra- and inter-examiner correlations coefficients for measuring limb lengths ranged from 0.74 to 0.96, reliability for measuring limb length was considered to be high to very high. Nevertheless, the 95% upper LOA for intra- and inter-examiner differences between limb length measurements were up to 1.7 cm and 4.4 cm, respectively. Thus, it is not easy to measure limb lengths, even though the anatomical landmarks were defined very precisely to identify the measuring points [11] and examiners were trained before the study. The development of a sufficient simple method with adequately high reliability is indeed a challenge. Some researchers determined the measuring points from fixed surfaces (wall, table, floor, wooden box) against which the person stands, instead of using bony landmarks [16, 17, 30]. Moreover, they defined the measured distances relative to the person’s body height [16, 17, 30]. However, defining distances relative to the person’s body height may not always reveal the differing proportions between the upper and lower body in all subjects. Moreover, a standing position is not feasible in an inpatient and intensive care setting. Perin’s group used transparent films with individual reference cutaneous marks (beauty spots, scars, and veins) to determine the measuring points again [17]. This is indeed an easy approach that guarantees the same location for multiple measurements over time by different examiners.

Reliability for performance and evaluation of scans

Performance and evaluation of ultrasound scans are other examination steps affecting reliability. For adipose tissue thickness, all measuring points showed high to very high intra- and inter-examiner correlations coefficients ranging from 0.70 to 0.97. For muscle thickness, only the ventral measuring point of the thigh and the anterior measuring point of the upper arm showed high to very high inter-examiner correlation coefficients ranging from 0.77 to 0.93. The ventral measuring point of the thigh and the anterior measuring point of the upper arm were also the points showing the highest proportion of best quality scans. These two measuring points were easiest to scan because of two reasons: First, they were easily accessible. No tilting of the probe was necessary to visualize muscle fascia and bone [11]. Second, these two measuring points showed a muscular morphology, which is easy to recognize. At less reliable measuring points, the muscular morphology is more complex and variable among patients. Depending on the patient, either the brachialis muscle alone or both the brachialis and the triceps muscle may be visualized at the less reliable anterolateral measuring point of the upper arm [11]. Similarly, the sartorius muscle may or may not be visualized in addition to the quadriceps muscle at the less reliable medial measuring point of the thigh [11]. In summary, the marking of measuring points as the first step of the examination was reliable. Performance and evaluation of the scans as the second and third steps of the examination were reliable at the ventral measuring point of the thigh and the anterior measuring point of the upper arm. Even though there was some variation in the anatomic location of the measuring point, the final thickness values were reliable at these two measuring points.

The reliability for thickness measurements is worse when each examiner independently marks the location of the measuring point instead of performing scans on an already marked measuring point [18]. Therefore, we only compared our results to similar reliability studies, where examiners repeated the marking of measuring points before performing a second scan. The 95% lower and upper LOA found in our study were similar to the LOA in other studies. English et al. noted that the mean muscle thickness at the anterior thigh was 3.1 cm with 95% lower and upper LOAs of −0.88 and 0.72 cm for intra-examiner reliability in stroke patients [15]. We obtained corresponding LOA values of −0.4 to 0.5 cm at the ventral measuring point of the thigh in our study. Paris measured a mean muscle thickness of 3.5 cm over four measuring points on the thigh with 95% lower and upper LOAs of −0.41 and 0.31 cm for inter-examiner reliability in 16 healthy participants [19]. We found corresponding LOA values of −0.7 to 0.8 cm at the ventral measuring point of the thigh in our study. Müller’s group measured a sum of eight subcutaneous adipose tissue thicknesses of 0.6 to 7 cm with 95% lower and upper LOAs of ±0.1 to 0.3 cm for intra-examiner reliability in athletes [22]. We found corresponding LOA values of −0.1 to 0.3 cm at the ventral measuring point of the thigh and the anterior measuring point of the upper arm in our study. Paris and Müller calculated the average or the sum of thickness values at different measuring points [16, 19, 22, 31]. A high difference in single thickness values obtained at one difficult measuring point may be less noticeable when averaging all thickness values across different measuring points. Thus, averaging or summing up of findings across multiple measuring points may result in overestimation of the calculated reliability.

Müller et al. excluded skin and muscle fascia when measuring subcutaneous adipose tissue thickness in young athletes [16, 32]. This may be a more precise approach. Müller et al. measured subcutaneous adipose tissue with and without embedded fibrous structures [16, 32]. Reliability was slightly better when measuring subcutaneous adipose tissue with than without embedded fibrous structures [16]. This was because more limits had to be determined to exclude the embedded fibrous structures [16], which is more complex. The limit between the skin and subcutaneous tissue or the limit between the subcutaneous tissue and muscle fascia is often hard to visualize in our experience (Fig. S1) [11]. That is why we included the skin and muscle fascia in the measurement of adipose tissue thickness in our USVALID method. We think that a simple, pragmatic, and reliable method is most important, even though we included skin and muscle fascia in the measurement of adipose tissue thickness. It was our goal to establish a pragmatic and reliable method that can be easily employed not only in healthy individuals but also in hospitalized patients.

Precision

Our reliability results account for the factors related to the marking of the measuring point and the performance and evaluation of adipose tissue and muscle thickness. The 95% LOAs can be considered to represent the degree of precision of our ultrasound examination. The 95% upper LOAs for intra- and inter-examiner reliability of 0.57 cm and 0.80 cm at the ventral measuring point of the thigh would correspond to precision values of 23% and 32% degree in relation to the mean muscle thickness of 2.50 cm (0.57/2.5 = 0.23; 0.80/2.5 = 0.32). This would mean that we can only detect changes in muscle thickness of over 23% and 32%, respectively, when considering the 95% LOAs for intra- and inter-examiner reliability.

Limitations and generalisability

The examiners performing the scans could not be blinded to the thickness values during evaluations on the ultrasound machine. Since the thickness values were directly displayed in the middle of the scan on the ultrasound machine (Siemens Acuson Freestyle) [11], the values could not technically be covered during evaluation. However, the examiner noted the values of the second ultrasound run on a different case report form to avoid the influence of the values noted in the first ultrasound run. We included a large sample size of hospitalized patients from different medical and surgical specialities. Therefore, our reliability data are applicable to a broad patient population.

Conclusions

Intra- and inter-examiner reliability for marking the measuring points was high to very high. Reliability for performance and evaluation of scans for adipose tissue and muscle thickness was the best at the ventral measuring point of the thigh and the anterior measuring point of the upper arm. Therefore, we recommend measuring adipose tissue and muscle thickness at the ventral measuring point of the thigh and the anterior measuring point of the upper arm.