Introduction

Research has shown that maintaining a proper balance in body fat and muscle composition is a key factor in good health [1]. Inappropriate amounts of fat can greatly increase the risks to cardiovascular disease, diabetes, and even cancers [24]. This problem has grown to such an extent that obesity now contributes significantly to the global burden of disease [5]. Conversely, sarcopenia, in which there is a general decline in skeletal muscle mass in relation to body fat, leads to frailty and functional impairment. Sarcopenia is usually an age-related process that is becoming increasingly prevalent in developed countries where populations are progressively ageing [6, 7]. However, it has been shown that loss of lean muscle mass alone does not correlate proportionately with declines in muscle strength and function; the quantity of intermuscular adipose tissue (IMAT) may instead be a better predictor [8].

MRI is an imaging technique that is capable of estimating the volume of body components. The main advantage of MRI over other techniques (such as DEXA) is its capacity for accurate depiction of regional body composition without any ionizing radiation. Recent advances in fat and water discrimination (e.g. Dixon sequence) using 3D multi-echo gradient recalled echo imaging have further improved the soft-tissue contrast and measurement accuracy of fat-infiltration within skeletal muscle attainable by MRI [9].

Accurate segmentation of multiple structures is the key to regional body composition study. Using fat and water discrimination technique in MRI is able to achieve more accurate and less variable segmentations [10, 11]. However, while Dixon technique delivers up to four contrasts in one measurement, i.e. in-phase, opposed-phase, water and fat images (Fig. 1), it remains an open question how these four contrasts can be efficiently used for the task of segmentation. Most studies have not used the combined image contrast space, but rather use a single fat or water image one at a time to segment the fat or muscle component separately. Thresholding of a fat image or fat fraction image is a popular choice for the segmentation of adipose tissue [10, 1214]. Some authors have adapted the atlas/registration-based segmentation method by registering a water/fat image alone [15, 16].

Fig. 1
figure 1

Four contrasts generated by Dixon technique in one measurement: a in-phase, b opposed-phase, c fat image, and d water image

Some studies used the combined contrast image space for body component segmentation. Makrogiannis
et al. applied K-means clustering to the combined space of fat image and water image intensities to separate muscle and intermuscular adipose tissue (IMAT) in the thigh [11]. Wang et al. also employed K-means clustering to fat and water images to segment adipose tissue in the abdomen [17]. Joshi et al. selected fat image and water image combined space to perform registration-based segmentation [18]. However, since the fat and water images are secondarily derived through subtraction/addition, there may be artifacts introduced through mis-registration, motion, etc. As a result, including the raw echo images (in-phase and opposed-phase) should be helpful for segmentation. Kullberg et al. used fuzzy clustering in combined contrasts of in-phase, fat, and water images, but only for adipose tissue segmentation in the abdomen [19]. Valentinitsch et al. applied a multi-parametric clustering method in combined in-phase, fat, and water images to segment different structures in the calf and thigh [20]. However, their clustering method was applied several times with different combination of contrast images to segment one structure at a time and only in single slice data.

In this study, we present a novel machine learning based segmentation method that fully uses the combined space of all four contrasts generated by Dixon technique to automatically segment multiple 3D structures in the thigh. We also show that segmentations that are more accurate can be generated by incorporation of all four contrasts compared to using non-fat suppressed (i.e. in-phase) MRI only and using combined fat only and water only images (two contrasts) analyses.

Materials and methods

Data

MRI scans of the thighs of 190 healthy community dwelling older adults (age 50–99 years, average age: 67.85 ± 7.90 years, 58 men and 132 women) were acquired and analyzed as a part of a larger longitudinal study. In this study, we have randomly chosen 40 subjects for validating our proposed segmentation method. The research protocol was approved by the National Healthcare Group Domain Specific Review Board. Informed written consent was obtained from all subjects prior to all examinations.

All the MRI scans were acquired using a 3T MR scanner (Siemens Magnetom Trio, Germany). Subjects were lying on a 6-channel spine-array coil and covered by a 6-channel body matrix external phased array coil. A rapid survey scan was obtained to identify axial slice locations, using the proximal and distal ends of the femur as landmarks. Next, four contrast images (in-phase, opposed-phase, water-only, and fat-only) were acquired using a 2D modified Dixon (multi-echo VIBE and T2 correction) T1-weighted gradient echo pulse sequence for each subject. The in- and opposed-phase images are source echo images while the water- and fat-only images were automatically constructed from the source echo images. Repetition time (TR), the first echo time (TE1), the second echo time (TE2), flip angle (FA), field of view (FOV), and matrix were: TR = 5.27 ms, TE1 = 1.23 ms, TE2 = 2.45 ms, FA = 9, FOV = 440 × 440 mm2, and matrix = 320 × 320. Both left and right thighs were encompassed in the image with in-plane resolution of 0.69 × 0.69 mm2 and with 72 slices (slice thickness: 5 mm). There is no gap between slices.

Machine learning based segmentation scheme

The key concept of our proposed approach is to use fully the image intensity information provided by the four contrast images of Dixon MRI. The approach can be summarized into two broad steps: (1) a machine learning 3-class (fat, muscle, and background/bone) classifier is learned from training samples and subsequently used for voxel classification on target subjects. The classifier is based on a set of image features extracted from all four contrast images. (2) Morphological operations are performed to smooth and generate segmentation masks for subcutaneous adipose tissue (SAT), intermuscular adipose tissue (IMAT), muscle, and bone. The overall scheme of the proposed approach is illustrated in Fig. 2.

Fig. 2
figure 2

Flow chart of the proposed machine learning based segmentation method. After preprocessing, a training set is segmented to assign labels in fat, muscle, and background/cortical bone. The labeled image and features extracted from four contrast images of the training set is passed to extreme learning machine (ELM) for training and then subsequently used to predict the unseen target. Finally, we apply some morphological operations to generate bone region, subcutaneous fat (SAT), muscle, and intermuscular fat (IMAT) volumetric images

Preprocessing: MRI intensity inhomogeneity correction

As the proposed scheme is based on the intensity of MR images, the first step was to correct the intensity inhomogeneity in the images due to bias in magnetic field during scans. In this study, we used a popular automated bias correction algorithm called N4ITK [21]. This algorithm was used to estimate iteratively the multiplicative bias field from the in-phase image that maximizes the high frequency content of the tissue intensity distribution. Sample images of the original in-phase image, estimated bias field, and bias corrected in-phase image are shown in Fig. 3 to demonstrate the effect of bias correction.

Fig. 3
figure 3

Sample images show the effect of bias correction: a original in-phase image, b estimated bias field, and c bias corrected in-phase image

Segmentation of training data set

In order to provide classification information for machine learning, a set of training subjects is randomly selected and interactively segmented using an active contour based segmentation method [22] available in “ITK-SNAP” [23]. Each voxel of a training subject is assigned a target category t such that:

$$t = \left\{ {\begin{array}{*{20}l} 0& \quad{{\text{Background/cortical}}\,{\text{bone}}} \\ 1& \quad{\text{Fat}} \\ 2& \quad{\text{Muscle}} \\ \end{array} } \right.$$
(1)

Feature extraction

For each voxel in both training data and unseen data, a feature vector is generated in terms of its intensity and neighborhood mean, variance, and entropy in all four-contrast image domains. These features were calculated in 3D. In this study, a cube of 5 × 5 × 5 voxels was used to calculate the corresponding feature vector for the center voxel in the cube. For each voxel sample, 16 different image features were generated.

Machine learning: extreme learning machine

Supervised machine learning algorithms aim at learning from labeled data for prediction on unseen data. In this work, we used a state-of-art machine learning algorithm called extreme learning machine (ELM) to train and test all voxel samples. Compared to other machine learning algorithms, ELM is simple, fast, and achieves high performance in terms of accuracy [24].

Generating segmented volumes of SAT, IMAT, muscle, and bone

After ELM prediction, each voxel in unseen data is assigned a target category, i.e. 0 = background/cortical bone, 1 = fat, 2 = muscle. The voxel labels are mapped back to the spatial 3D image domain. However, SAT, IMAT, and bone marrow are all labeled as fat tissue, and skin and muscle are labeled as muscle tissue due to similar image intensity. It is necessary to separate different types of fat tissue properly. In this step segmented volumes of SAT, IMAT, muscle, and bone are generated using morphological operations.

Step 1::

Generating segmented volume for muscle

Although some skin and muscle tissue are both categorized as muscle tissue by ELM, they are separated by SAT, hence not connected. The segmented volume for muscle (S muscle) was extracted by selecting the connected component (object) labeled as 2 = muscle with the largest number of voxels (Fig. 4b). A mask (M muscle) was generated by morphological closing of S muscle and filling all the holes within S muscle (Fig. 4c).

Fig. 4
figure 4

Illustration of generating segmented volumes of SAT, IMAT, muscle, and bone. a Labeled image after ELM prediction. Image (bg) is shown by overlaying binary mask onto the in-phase image for better illustration. b Segmented volume of muscle, c mask M muscle, d segmented volume of bone region, e mask \(M^{\prime}_{\text{muscle}}\), f segmented volume of SAT, and g segmented volume of IMAT

Step 2::

Generating segmented volume for bone region

We defined the bone region as the bone marrow and its surrounding cortical bone. The cortical bone was selected as the largest connected component labeled as 0 = background/cortical bone within mask M muscle region. The segmented volume for bone region S bone was then generated by filling holes within the cortical bone (Fig. 4d). S bone was removed from M muscle to include only voxels belonging to muscle and IMAT, resulting mask \(M^{\prime}_{\text{muscle}}\) (Fig. 4e).

Step 3::

Separating SAT and IMAT

In this last step, for voxels labeled as 1 = fat, IMAT was defined as voxels within \(M^{\prime}_{\text{muscle}}\) region (Fig. 4f) and SAT is the voxels outside \(M_{\text{muscle}}\) (Fig. 4g).

Evaluation

Datasets from 40 randomly selected subjects were used for quantitative validation. The characteristics of the 40 subjects are listed in Table 1. Usually, more training data means more accurate segmentation can be expected, but it is also more time consuming. In order to show that our proposed method works well with limited training data, the 40 subjects were grouped into four groups randomly with 10 subjects in each group. We applied leave-one-out cross validation (LOOCV) in each group to evaluate the accuracy and variability of the proposed method. LOOCV means that, out of the 10 subjects, one subject is selected as the target and the other nine as training set. This is repeated for all 10 subjects so that each subject is served as the target exactly once, resulting 10 automatically segmented volumes of four structures (SAT, IMAT, muscle, and bone region). The automated segmented volumes were compared to the ground truth in terms of Dice similarity coefficient (DSC), which is used to quantify how well two segmentations, A and B overlapped with each other [25]. All the ground truth segmentations were prepared interactively using “ITK-SNAP” [23] and subsequently verified by an experienced radiologist (Dr. C. H. Tan).

Table 1 Characteristics of validation subjects

The proposed method, based on in-phase image only and combined fat and water images, was also implemented to benchmark the accuracy of our proposed method based on four contrasts. To do this we used the features extracted from in-phase image only (one contrast) and fat and water images only (two contrasts) as input to our machine learning algorithm, left the other stages of the methodology unchanged, and repeated the above experiments.

Results

All the segmentations were performed using Windows XP, on an Intel Xeon Processor (dual core, 3.00 GHz) with 9 GB RAM. The proposed method was implemented using MATLAB version 7.1.1 (The Mathworks, Inc, Natick, MA, USA) [26]. The MATLAB implementations used in this study were not particularly optimized for reduced computational cost and memory usage. The typical execution time for training of nine datasets was less than 2 s. Segmentation time was less than 45 s per unseen data.

Typical segmented images of the proposed segmentation method based on four contrasts are shown in Fig. 5. The proposed method robustly classified all the structures (SAT, IMAT, muscle, and bone region) in thigh images, even in extreme cases where a very thin SAT layer or severe fat infiltration in muscle exists (Fig. 6).

Fig. 5
figure 5

Representative sample segmented images using the proposed method with four contrasts. Structures in yellow bone, blue muscle, red IMAT, green SAT

Fig. 6
figure 6

Automated segmented images based on scheme with four contrasts superimposed on unsuppressed images in extreme cases: a subject with severe IMAT infiltration, and b subject with very thin SAT layer

DSC values of comparing volumes segmented by the proposed method and the ground truth are presented in Table 2. The proposed method based on four contrasts achieved good segmentation accuracy and outperforms the method based on non-fat suppressed image and the method based on fat and water images, average DSCbone = 0.94 ± 0.03, DSCSAT = 0.96 ± 0.03, DSCIMAT = 0.80 ± 0.03, DSCMuscle = 0.97 ± 0.01. Friedman statistical test [27] is known for detecting differences in treatments across multiple test samples. The test was carried out using SATA 14 for Windows [28]. Results showed that there are significant differences (p < 0.0001) between the three schemes tested in terms of segmentation accuracy. Student’s t tests were performed and further showed significant improvement for the proposed scheme based on four contrasts over the scheme based on fat and water images (p = 0.0115 for bone region, p = 0.0001 for SAT, p < 0.0001 for IMAT, and p = 0.0025 for muscle). There were also significant differences between the scheme using fat and water images and using unsuppressed image only (student’s t test: p < 0.0001 for bone region, SAT, IMAT, and muscle).

Examples of segmented images using the proposed segmentation scheme based on different contrast images are shown in Fig. 7. Segmentation scheme based on four contrasts correctly segmented all the structures while the scheme based fat and water images failed to segment the bone region and underestimated IMAT and SAT. The scheme based on unsuppressed image only did not manage to segment muscle and fat content accurately.

Fig. 7
figure 7

Examples of segmented images based on: b four contrasts, c unsuppressed image, and d fat and water images superimposed on a unsuppressed in-phase image

Table 2 Average DSC values of different segmented structures compared to ground truth

Discussion

Our proposed segmentation method is shown to be accurate in segmenting 3D thigh MR images compared to ground truth segmentations. Implementing the machine learning technique for segmentation has shown to be easy and efficient in using all four contrasts generated by Dixon sequence. An advantage with the proposed method is that it segments multiple structures (bone region, SAT, IMAT, and muscle) in one shot, which offers great convenience to study relationship of different structures.

One concern over the machine learning technique is the variability faced with different training samples. In this study we used the LOOCV to show that our proposed method is consistently accurate using different sets of training data to segment different targets. The standard deviation of DSC is very low across four groups of total 40 validations, less than 0.03 for all components. The validation showed the high reproducibility of the proposed method.

A key element of this work is the incorporation of all four contrasts generated by Dixon sequence for a combined analysis. Makrogiannis
et al. [11] have shown that the use of combined fat and water images analysis improves the accuracy of tissue decomposition compared to the use of non-fat suppressed images only. Results from our experiments corroborate with their study. In addition, our results suggest that the use of four contrast images further improves the segmentation accuracy of different structures compared to the use of fat and water images, especially in small structure IMAT. This could be contributed by the fact that more contrast information compensates for the noise and intensity non-uniformity of a single or dual set of MRI images, making the algorithm more robust in classification of different tissues.

Previous studies have shown that automatic muscle and fat segmentation using clustering method can be challenging in subjects with very thin SAT layer or very severe fat infiltration muscles [20]. However, such segmentation errors did not appear in our extreme subjects using the proposed segmentation scheme with four contrasts. This could be due to that we employed the machine learning method for the segmentation. Machine learning techniques (e.g. artificial neural networks) has been shown to outperform fuzzy clustering method in non-homogenous regions such as abnormal brain with edema, tumor, etc. [29]. In the cases with very thin SAT or large IMAT area, SAT and muscle region become more non-homogenous. As a result, machine learning technique would work better than clustering method in terms of classifying different tissues.

We notice that a smaller DSC for the segmentation of IMAT was obtained. This is likely due to the small size and irregular shape of the IMAT structure. The voxel effects [30] become very significant in each step of the scheme and in the calculation of DSC. Any error in the procedures results in a relatively significant mismatch.

A limitation of this study is that a different preliminary segmentation of training set needs to be carried out for each MRI protocol. Nonetheless, once a small set of training data has been segmented for the machine learning algorithms, the scheme becomes fast, automatic and can be applied to any unseen data, making it suitable for large scale use in routine clinical practice.

Conclusion

This paper presents an accurate machine learning based segmentation method for thigh MRI images with Dixon technique. The method becomes fully automatic after initial segmentation of training samples. The proposed method incorporates combined image space of all four contrasts provided by Dixon sequence, improving the accuracy of tissue classification.