Introduction

Muscle quality is commonly defined as the ratio between muscle strength or power and muscle mass [13]. Muscle mass can significantly change in a variety of conditions such as growth [4, 5], aging [6, 7] or training [8]. It can also decrease in pathological conditions including neuromuscular disorders, cardiovascular diseases and diabetes [1, 9]. In this context, muscle size has to be accurately quantified in order to properly investigate the corresponding functional changes. Moreover, given that these changes can be muscle-specific [1013], the corresponding quantification must ideally be performed individually for each muscle.

Magnetic resonance imaging (MRI) has been recognized as the method of choice because of its noninvasive nature and capacity to distinguish fat, connective and muscle tissue. Considering the high contrast difference between muscle and fat signals, several automatic methods based on the signal intensity threshold have been successfully developed in order to quantify fat-free muscle area [14, 15]. However, muscle size has been considered as a whole so that individual muscle changes such as those occurring in aging [11, 16] or neuromuscular disorders [12] have not been investigated. The automatic segmentation of individual muscles within a limb or within the same muscle group is technically impossible using the same approach considering the poor contrast between connective and muscle tissues. In this context, the gold standard method is based on the manual delineation of regions of interest on contiguous individual slices, which is tedious and time-consuming and has questionable reproducibility. The corresponding results are largely operator-dependent, and it has been reported that measurement of a specific muscle volume using MRI with manual delineation in the forearm would not detect changes below 7 % in healthy adults [17]. In humans, a few alternative methods allowing a selective segmentation of muscles have been proposed in the last few years. A ‘pseudo muscle volume’ has been quantified on the basis of the manual segmentation of a limited number of slices by interpolating the missing data using geometrical [18] or mathematical models [5, 7]. Although these methods can reduce the processing time, they still involve extensive manual work so that a robust automatic method is highly desirable. However, automatic segmentation of individual muscles from MR images is technically challenging given that individual muscles share the same MR properties and that fasciae display very poor contrast. So far, very few methods targeting automatic segmentation of individual muscles have been reported, and to our knowledge none of them have been applied to small animal models despite their increasing use in the investigation of functional changes in skeletal muscle in various healthy and pathological conditions. The fully automatic method based on mesh modeling, which has been proposed recently, still requires integration of manual constraints [19]. Other methods based on the utilization of surface landmarks [20] or principal component analysis [21] have provided promising results, but no information related to the volume error has been reported. Interestingly, tissue segmentation methods based on an atlas database have been successfully used in brain, bee brain, heart and prostate as wellas whole body fat/water separated MR images [2227]. In this context, the specific hurdles related to individual muscle segmentation are mainly related to the high voxel anisotropy, i.e., a different spatial resolution in at least one of the three dimensions.

The purpose of the present work was to develop an original segmentation method based on a multi-atlas database and allowing a complete automatic quantification of individual muscles in rat leg.

Materials and methods

Animals

Sixteen Wistar Han rats (2.5 months old, 8 males/8 females) were investigated according to the guidelines of the National Research Council Guide for the Care and Use of Laboratory Animals and the French law on the Protection of Animals. Animals were initially anesthetized in an induction chamber with 4 % isoflurane (Forene®; Abbott France, Rungis, France) mixed in 33 % O2 (0.4 l/min) and 66 % N2O (0.8 l/min) and then positioned supine in a home-built cradle designed to be accommodated within the MR superconducting magnet to investigate functional changes throughout a muscle exercise session [28].

MRI experiment

MRI of the right leg was performed using a fast spin echo sequence (RARE) at 4.7 T on a Biospec 47/30 superconductor magnet (Bruker, Germany). Eighteen images were recorded in order to cover the entire length of the leg from the knee to ankle using the following parameters: rare factor = 6; repetition time = 2000 ms; echo time = 16 ms; field of view = 30 × 32 mm2, matrix size = 256 × 256 pixels; slice thickness = 1 mm, interslice space = 0.5 mm; slice orientation = axial; 8 accumulations for a total acquisition time of 8 min 32 s. Throughout the procedure rats were continuously anesthetized by gas inhalation of 2 % isoflurane mixed in 33 % O2 and 66 % N2O using a home-designed facemask. Body temperature was maintained at 37 °C using an electric blanket controlled by a temperature unit (Harvard Apparatus, MA, USA) connected to a home-made rectal probe [28].

Image analysis

Each set of images was both manually and automatically segmented.

Manual segmentation

Gastrocnemius and plantaris manual delineation was performed on each slice and for each data set (subject) using the FSL software (FMRIB Software Library, Analysis Group, FMRIB, Oxford, UK). The corresponding results were used as the gold standard measurements and were referred to the reference segmentation database (atlases). In order to assess the reproducibility of this process, manual delineation was performed twice by the same operator and once by two different operators, and the corresponding global muscle volume was compared between each process. The first version of the first operator manual segmentations was used as the ground truth segmentation of the atlas, resulting in 18 ground truth atlas images.

Automatic segmentation-general pipeline

The present automatic segmentation method has been tested using either one (Fig. 1) or multiple atlases (Fig. 2) for both muscles.

Fig. 1
figure 1

Schematic representation of a standard processing pipeline used for automatic segmentation of gastrocnemius (green) and plantaris (brown) muscles

Fig. 2
figure 2

Schematic representation of an automatic muscle segmentation procedure using multiple atlases

As indicated in Fig. 1, the set of images to be segmented (i.e., subject images) was initially corrected for bias inhomogeneities as previously described [29], and the background was removed [30]. Then, the images were successively registered to each atlas (one or multiple) using an affine transformation (12 degrees of freedom) [31] followed by a nonlinear procedure [32]. Each step of this standardized procedure was optimized in order to account for the high voxel anisotropy (see below). When using N atlases (Fig. 2), the initial segmentation process provided N intermediate segmentation maps, which were combined using a vote procedure in order to provide the final segmentation map.

Each step of the segmentation process was optimized throughout different options of the nonlinear registration (more specifically the model of deformation used), the image upsampling, vote procedure and artificial atlas addition.

Automatic segmentation: specific implementations

Nonlinear registration

The standard full 3D registration method (3D) previously described by Sdika [32] was modified using a slice-by-slice nonlinear registration (SBS) or using an additional constraint, i.e., the deformation along Z was zeroed (2Dc), both constraining the deformation in the XY plan. Acquisition and post-processing constraints were related to the fact that the image resolution was larger in the XY plane as compared to the Z direction, thereby compromising the interpolation process in the Z direction. Both methods assume that the linear registration is reliable for a proper slice alignment and that nonlinear deformation does not occur along Z. These assumptions seem reasonable in our context given that structures in the limb are mostly tubular and do not usually present any drastic morphological changes along the limb length.

Image upsampling

Alternatively, we addressed the voxel anisotropy issue by upsampling the image along Z before the registration processes. This process aimed at improving the resolution in the Z direction and thus the interpolation process. In this case, images were upsampled by a factor of three using a windowed cardinal sine kernel. This rescaling process has been combined with each of the previously described nonlinear registration models, i.e., with the standard 3D (R-3D), 2D constrained (R-2Dc) and slice-by-slice (R-SBS) models.

Vote procedure

The final segmentation of a given subject was obtained by combining the N segmentation maps produced by each atlas using a vote procedure. The final label was the one with the largest number of votes (Fig. 2). A simple majority vote (mv) and a vote weighted by the registration residual (wv) as previously described [24, 33] have been implemented and evaluated.

The weighting function used was:

$$w\left( x \right) = G_{{\sigma_{\text{s}} }} *e^{{ - \frac{{r^{2} }}{{2\sigma_{{{\text{r}}^{2} }} }}}}$$
(1)

where r is the residual of the registration at a given pixel, \(G_{{\sigma_{\text{s}} }}\) is a Gaussian function, * is the convolution operator, and σ s and σ r are parameters to tune the weighting.

Artificial atlas addition

Given that the quality of a multi-atlas segmentation procedure is an increasing function of the number of atlases used, we tested an original approach of artificial atlas creation by shifting the original atlas slices up to ± one (z1) (i.e., slice one became slice two, …) or ± two (z2) slices just before the nonlinear registration. As a result two (with z1) or four (with z2) artificial atlases were added to the database for each original atlas. This translation should compensate for linear registration errors along Z, especially when the nonlinear registration is 2Dc or SBS.

Automatic segmentation evaluation

The different settings proposed in this study were evaluated using a leave-one-out procedure, i.e., each atlas was sequentially used as the subject image and automatically segmented using each of the other subjects as atlases. A final segmentation was obtained from each possibility, and the results presented are related to the corresponding average. The standard deviations were not included in the figures because they were too small.

The segmentation quality for each method was quantified using the relative overlap (RO) defined as:

$${\text{RO}}\left( {L_{\text{g}}^{\text{s}} ,L_{\text{a}}^{\text{s}} } \right) = 100\frac{{v\left( {L_{\text{g}}^{\text{s}} \cap L_{\text{a}}^{\text{s}} } \right)}}{{v\left( {L_{\text{g}}^{\text{s}} \cup L_{\text{a}}^{\text{s}} } \right)}}$$
(2)

where \(L_{\text{g}}^{\text{s}}\) is the mask of the structure s (gastrocnemius or plantaris muscles) in the ground truth (i.e., manual segmentation) and \(L_{\text{a}}^{\text{s}}\) is the mask of the structure s in the automated segmentation. V is the volume of voxels (in mm3) inside the binary mask, i.e., the number of voxels multiplied by the voxel size (0.12 × 0.12 × 1.5 mm). As a result, a high RO is indicative of a high spatial overlap of the two given segmentations. We also calculated the RO values between the manual delineations performed by the same operator (intra) and two different operators (inter) in order to evaluate the accuracy of the present automatic methods with respect to with the accuracy of the reference method.

Muscle volume quantification

Gastrocnemius and Plantaris muscle individual volumes were quantified from both the manual delineation (reference measurements) and fully automatic segmentation results using the setup resulting in the highest RO values. The relative corresponding error was calculated as the volume difference between the two segmentations and expressed in %.

Statistical analysis

The intra- and interoperator variability of the manual segmentation was quantified from the global volume measurements using the coefficient of variation (CV) and intraclass correlation coefficient (ICC) for both gastrocnemius and plantaris muscles specifically.

Results

Representative transverse images of the rat leg (Fig. 3a) and the corresponding manual (Fig. 3b) and fully automatic segmentations (Fig. 3c) of the gastrocnemius and plantaris muscles are presented in Fig. 3.

Fig. 3
figure 3

a Representative images of right lower rat leg recorded at the proximal (top) and distal levels (bottom). b Corresponding manual and c fully automatic segmentation of gastrocnemius (G) and plantaris (P) muscles

Reproducibility of the reference measurements

Manual delineation of both muscles by a trained expert required a minimum of 15 min per subject (i.e., 18 slices) and has been performed twice by the same operator and an additional time by a second operator. The corresponding volumes for each subject are presented in Table 1. The intraoperator coefficient of variation (CV) was 1.1 ± 0.8 % and 5.1 ± 4.2 % for the gastrocnemius and the plantaris muscles, respectively. The corresponding ICC was 0.99 (gastrocnemius) and 0.87 (plantaris). While measurements performed by two different operators provided comparable results for the gastrocnemius muscle as indicated by the small CV (2.3 ± 1.4 %) and high ICC (0.97), the corresponding measurements for the plantaris muscle were found to be less reproducible, with a CV of 18.7 ± 7.3 % and a poor ICC (0.41).

Table 1 Individual muscle volumes calculated from manual segmentation

Accuracy of the automatic methods

We calculated the relative overlap index (RO) in order to evaluate the performance of each automatic procedure for both gastrocnemius and plantaris muscle segmentations and for different numbers of atlases, from 1 to 14. Processing time to complete both muscle segmentations for one subject ranged from 10 to 15 min, depending on the initial settings when one atlas was used. This time increased linearly with respect to the number of atlases (e.g., a minimum of 1 h was required to perform the same segmentation with 6 atlases). For the sake of comparison, the RO calculated for the manual segmentations performed by the same operator and two different operators were 88.3 and 83.2 for the gastrocnemius and 81.5 and 68.1 for the plantaris muscle segmentations.

Vote procedure

Table 2 presents the RO values obtained using the different nonlinear registration models (i.e., SBS, 2Dc and 3D) combined with either the majority or weighted vote with respect to the number of atlases for both muscle segmentation. Our results showed that, compared to the simple majority vote, the weighted vote improved the RO when the images were registered using the 2Dc or the SBS model and so regardless of the number of atlases used or the muscle segmented. However, the corresponding improvement was globally larger for the plantaris (3.3 ± 2.6 % and 2.2 ± 1.8 % using the 2Dc and SBS models, respectively) as compared to the gastrocnemius (1.1 ± 0.4 % and 0.6 ± 0.2 %) muscle. Using the 3D model, no effect of the vote procedure was observed for the gastrocnemius segmentation while the RO was globally slightly improved (by 1.2 ± 1.5 %) for the plantaris segmentation by the weighted vote procedure. According to these results and for the sake of clarity, only the weighted vote was used for the remaining comparisons.

Table 2 Relative overlap (obtained) with the different nonlinear registration models combined with the majority or weighted vote procedure with respect to the number of atlases included in the database

Images upsampling

We found a beneficial effect of the upsampling along the Z direction when the images were registered with the 3D model. Using this specific setting, the upsampling improved the RO associated with the gastrocnemius (0.6 ± 0.03 %) and plantaris (1.5 ± 0.07 %) muscle segmentation using 1–14 atlases. This indicates that the image anisotropy is a limiting factor for automatic segmentation using atlases and that the upsampling method can take this issue into account. When only an in-plane registration was performed, i.e., using the 2Dc and SBS models, this beneficial effect no longer existed. More specifically, combined with the SBS model, this procedure resulted in a decreased RO by 0.3 ± 0.06 % for both muscles regardless of the number of atlases used. Combined with the 2Dc model, we found a slight improvement for the gastrocnemius segmentation (RO improved by 0.3 ± 0.09 %), while this setting was globally detrimental for the plantaris muscle (−0.1 ± 0.11 %). Accordingly, we discarded the R-SBS and R-2Dc of the nonlinear registration model comparison.

Nonlinear registration models

As illustrated by the higher RO values, our results clearly showed a higher reliability of the automatic segmentation for the gastrocnemius muscle as compared to the plantaris regardless of the initial setting. However, the accuracy levels differed according to the registration methods independently of the muscle segmented (Fig. 4).

Fig. 4
figure 4

Relative overlap (RO) expressed with respect to the number of atlases used when the images were registered using 3D, R-3D, 2Dc and SBS models. The dashed line represents RO calculated between two manual segmentations performed by the same (intra) and by two different operator(s) (inter), respectively

According to the RO values, we found a lower performance of the SBS model compared to the other nonlinear registration models in our experimental conditions. More specifically, while RO increased from 80.2, 80.8 and 80.2 to 86.7, 87.3 and 87.4 with respect to the number of atlases using the 3D, R-3D and 2Dc models, respectively, for the gastrocnemius muscle, the corresponding values using the SBS model were significantly reduced to 75.8 and 84.5 % (Fig. 4a). The same difference was observed for the plantaris segmentation (Fig. 4b). For this specific muscle, the RO values ranged from 64.5, 65.9 and 66.1 to 73.2, 74.8 and 76.8 when the images were registered using the 3D, R-3D and 2Dc models, respectively, and from 57.7 to 70.7 using the SBS model (Fig. 4b). Finally, considering both gastrocnemius and plantaris muscle segmentation, the 2Dc model outperformed the other models in our experimental conditions.

Overall, with the exception of the SBS model, our results showed that a minimum of three atlases was necessary to automatically segment either muscle with a level of accuracy similar to the reference method when two operators were involved. However, despite an increased RO with respect to the number of atlases, none of the automatic methods reached the intraoperator threshold whatever the muscle (Fig. 4). In addition, our results showed that the corresponding improvement was significantly reduced beyond size atlases; from 1 to 6 atlases the RO was improved by 6.4, 6.3, 7.0 and 9.0 % for the gastrocnemius and by 10.0, 10.0, 11.6 and 16.1 % for the plantaris segmentations using the 3D, R-3D, 2Dc and SBS models, respectively, whereas the addition of eight more atlases resulted in a further RO improvement lower than 1.5 and 3 % for the gastrocnemius and plantaris segmentation, respectively (Fig. 4).

Artificial atlas addition

According to the previous results, we evaluated the effect of the addition of artificial atlases on the automatic segmentation quality using the weighted vote combined with the 2Dc nonlinear registration model. As illustrated in Fig. 5, the addition of artificial atlases translated by ± one (z1) or ± two slices (z2) improved the RO when a low number of original atlases was used regardless of the muscle segmented. However, the z1 setup resulted in a larger improvement compared to z2 for both muscles (Fig. 5). In addition, using the ‘z2’ atlases we observed a degradation of the RO when more than 4 and 8 original atlases were used, while the ‘z1’ atlases improved the RO up to 8 and 14 original atlases for the gastrocnemius (Fig. 5a) and plantaris (Fig. 5b) segmentation, respectively, compared to the results obtained without (i.e., z0).

Fig. 5
figure 5

Relative overlap (RO) calculated using the 2Dc nonlinear registration model expressed with respect to the number of atlases when only original atlases were used (2Dc) or when artificial atlases translated by one (2Dc-z1) or two slices (2Dc-z2) were added to the original atlas database. The dashed line represents RO calculated between two manual segmentations performed by the same (intra) and by two different operator(s) (inter), respectively

With respect to computation, the shift of up to k slices of the atlases and the addition to the vote data set multiply the number of atlases as well as the computation time by 2k + 1. So, when z1 was used, the computation time was three times greater than when only the original atlases were used.

Muscle volume quantification

The relative volume error (expressed in % of the reference measurement) resulting from the automatic segmentation of both gastrocnemius and plantaris muscles with respect to the number of original atlases and using the following settings: the 2Dc nonlinear registration and weighted vote using (2Dc-z1) or not (2Dc) artificial atlases are displayed in Table 3. The corresponding absolute volumes in mm3 when using six atlases as well as the percent error are presented in Table 4. We observed a systematic overestimation of the gastrocnemius muscle when the segmentation was performed automatically compared to the reference measurements (Table 4). Regarding the plantaris muscle, the absolute volume errors ranged from 0 to 19.5 % and 0.7 to 18.9 % using 2Dc or 2Dc-z1 procedures. However, despite this larger variability, the average volume calculated from the automatic measurements did not significantly differ from the reference method as a result of individual volumes over and underestimations (Table 4). These errors are in agreement with the volume differences calculated from manual delineation performed by two different operators, i.e., 3.0 ± 2.2 % and 22.9 ± 8.6 % for the gastrocnemius and plantaris muscles, respectively, but did reach the intraoperator variability (1.6 ± 1.1 % and 6.8 ± 5.3 % for the gastrocnemius and plantaris, respectively).

Table 3 Absolute volume error from automatic segmentation with respect to the number of atlases included in the database
Table 4 Muscle volumes quantified from automatic segmentation when using six atlases and corresponding error compared to the reference measurements

Finally, it is worth noting that in contrast to the improvement we observed for the RO values, the addition of artificial atlases did not improve the accuracy of the volume estimation for both muscles.

Discussion

In this study, we described for the first time a multi-atlas-based segmentation method that can be efficiently used to automatically quantify individual muscle volumes within the limb from rat MR images. We have investigated the effects of a number of modifications to the standard method in terms of accuracy (determined by the RO index) with a particular attention given to the voxel anisotropy of the images and reliability of individual muscle volume quantification.

We showed that an automatic process including a 2Dc nonlinear registration model, weighted vote and minimum of three atlases provided a better RO value than the value obtained from a comparison between manual segmentations performed by two different operators. On that basis, we demonstrated that the accuracy of the automatic segmentation of both gastrocnemius and plantaris muscles outperformed the interobserver manual segmentation variability. This result highlighted the robustness of the method, which only needs a very limited amount of manual work, i.e., on three atlases. The corresponding global muscle volume error was 4.2 and 8.0 % for the gastrocnemius and plantaris muscles, respectively. Interestingly, this error was lower than the interoperator variability for the reference method illustrating the reliability of the present fully automatic method. Finally, we showed that the quality of the automatic segmentation was improved by the addition of artificial atlases especially when a low number of original atlases were used. Of interest, this improvement did not affect the global muscle volume quantification.

The reference segmentation of both gastrocnemius and plantaris muscles was performed three times manually for each slice and each data set, twice by the same operator and an additional time by a new operator. The measurement variability was determined from the global volume quantification using the CV and ICC coefficients. Our results clearly disclosed a muscle-dependent reproducibility with CV much larger (18.7 ± 7.3 %) and ICC (0.41) and RO (68.1) much lower for the plantaris as compared to the gastrocnemius muscle. This inconsistency is likely related to the muscle size given that in our context the plantaris muscle volume was more than six times smaller than that of the gastrocnemius regardless of sex. Such a difference has already been suggested in humans. A <1 % interoperator variability was reported for a large muscle group (i.e., quadriceps) [18], whereas a much larger variability has been reported for smaller muscle groups such as the flexor (3.3 %) and extensor carpi ulnari (5.7 %) muscles [17]. This difference can also be related to the unclear distinction of the surrounding plantaris muscle fascia, especially on the distal part of the leg, making the segmentation on the corresponding images highly operator-dependent. On this basis, it is important to keep in mind that the reproducibility of the manual delineation and as a matter of consequence the performance of the automatic segmentation method is closely linked to the ability to clearly distinguish individual muscles on the MR image, which is itself dependent on the inherent contrast and image resolution.

To our knowledge, no data related to the reproducibility of the manual delineation from small animal MR images have been reported so far. However, both intra- and interoperator CV (1.1 ± 0.8 % and 2.3 ± 1.4 %, respectively) and ICC (0.99 and 0.97, respectively) reported for the gastrocnemius muscle volume are in agreement with values previously reported in humans for different muscles [17, 18, 34]. Although frequently used for the segmentation of soft tissues [22, 24, 25, 33, 35], only one study so far has used a multi-atlas-based approach in order to automatically segment a single muscle [36]. More specifically, the authors reported an automatic method dedicated to the delineation of the pectoral muscle from breast MR images using a standard 3D registration procedure but did not target the muscle volume quantification. The accuracy of this automatic segmentation [36] illustrated by a dice similarity coefficient (DSC) of 0.74 ± 0.06 [i.e., an RO of 0.59, DSC, RO = DSC/(2-DSC)*100] was actually inferior to the accuracy of our results. In addition, further supporting our initial conclusion, these authors reported a low reproducibility of the pectoral muscle manual delineation (RO = 0.54) and indicated that this poor reproducibility might be linked to the difficult distinction among the pectoral muscle, surrounding tissues and intercostal muscles on MR images [36].

Our results clearly showed that, regarding voxel anisotropy, a nonlinear registration constrained in two dimensions (i.e., 2Dc) was more beneficial to the automatic segmentation quality than the standard 3D model, especially given that the image resolution did not allow a clear distinction of a small muscle like the plantaris muscle Indeed, in the context of an atlas-based segmentation, voxel anisotropy should result in a biased image interpolation along Z, thereby affecting the segmentation mask and deformation itself. Several factors can explain the superiority of the 2Dc registration as compared to the other models. In the 3D model, the parameters for the deformation are estimated in all three directions, whereas the 2Dc model restrained the deformation within the atlas slices where the resolution is the highest while the Z component is set to zero providing a de facto regularization beneficial to the registration. As a result, this model still takes advantage of the 3D nature of the data and is able to generate a more coherent and robust deformation during the registration as compared to the SBS model for which no regularization is imposed slice to slice. Following the same idea, image upsampling along Z before the registration was, as expected, beneficial when it was combined with the standard 3D model. This improvement clearly indicates that image resolution and more specifically voxel anisotropy are limiting factors for the segmentation process using atlas-based approaches. Interestingly, this limitation can be artificially reduced using an upsampling procedure, though at the expense of a larger processing time. Another possibility would be the acquisition of fully isotropic 3D data sets as previously described for the automatic quantification of fat-free muscle volumes [26, 27]. The ICCs between automatic and manual segmentation reported by these authors were above 0.89 for all muscle groups and showed that under those acquisition conditions age and BMI (from lean to obese) differences did not affect the multi-atlas method performance [27]. Interestingly, using full isotropic voxels, Karlson et al. [26] reported a similar accuracy regarding automatic muscle volume estimates as compared to the manual segmentation, and so despite an eight-fold resolution difference when data were acquired at 1.5 T (voxel dimensions = 3.5 × 3.5 × 3.5 mm3) or at 3 T (voxel dimensions = 1.75 × 1.75 × 1.75 mm3). Although this approach might provide a more accurate registration and, on that basis, an improved segmentation quality, it would be at the expense of both post-processing and acquisition times. In addition, the resulting gain in terms of volume quantification accuracy should only be marginal given that no drastic change in anatomical shape is expected between two consecutive slices. The improvement resulting from the upsampling procedure combined with the standard 3D model was actually not better than the automatic segmentation using the 2Dc model when considering both muscles together.

Based on the RO index, the 2Dc nonlinear registration also provided the most accurate segmentation for both muscles when it was associated with a weighted vote procedure and no upsampling. The corresponding RO values increased from 80.2 to 87.4 and from 66.1 to 76.8 for the gastrocnemius and plantaris segmentation with respect to the number of atlases used (from 1 to 14). The RO values reported for the gastrocnemius segmentation were in agreement with the results reported by Andrews et al. [21] in much larger muscle groups (i.e., knee extensor and flexor muscles) in humans. On the basis of a principal component analysis, they reported good reliability of the automatic segmentation illustrated by an average DSC value of 0.92 ± 0.03 (i.e., RO = 0.85) [21]. However, it has to be kept in mind that the level of accuracy expected from the automatic method has to be considered with respect to the reproducibility of the standard method. Unfortunately, these authors did not report the variability of their manual delineation process [21]. In the present study, although the accuracy of the automatic segmentation was largely lower for the plantaris than for the gastrocnemius muscle, the RO values achieved for both muscles outperformed the interoperator RO value using a minimum of three atlases. Similarly, the DSC values from the automatic segmentation reported by Gubern-Merida et al. [36] for the pectoral muscle also outperformed the DSC value calculated from the computed interobserver variability (DSC = 0.70 ± 12), thereby highlighting the reliability of the multi-atlas approach for the individual muscle segmentation. It is worth pointing out that in spite of an increased RO value with respect to the number of atlases included in the database, the addition of further atlases beyond six resulted in a minor improvement of the RO values in both muscles, thereby suggesting that the manual work can be minimized to the delineation of six subjects for a high-quality level according to our data (i.e., RO = 86.2 and 74.8 for the gastrocnemius and plantaris segmentation, respectively).

From a physiological point of view, the accuracy of a segmentation method can also be evaluated on the basis of the error related to the muscle volume measurement. Here, the global volume error was 4.2 and 8.0 % and slightly decreased to 3.5 and 7.4 % using 3 or 6 original atlases for the gastrocnemius and plantaris muscles, respectively. Similarly to what we observed for the RO values, the benefit associated with the addition of more than six atlases was negligible for the volume quantification of both muscles. More specifically, using 14 original atlases the global volume error was 3.5 and 7.0 % for the gastrocnemius and plantaris muscles, respectively. The present errors were consistent with those previously reported from the commonly used alternative methods based on a reduced number of slices manually delineated [7, 18, 37, 38]. In these models, the truncated cone formula has been particularly used to describe the limb shape between two consecutive, manually delineated slices (as the limb describes a global conic shape). However, it has been shown that the related bias significantly increased with the gap between slices. Tracy et al. [38] showed in humans that a minimum of 5–6 slices (depending on the thigh length) evenly distributed along the quadriceps muscle was necessary to estimate the corresponding volume within a 4 % error range independent of age or sex. Bias reduction to a value of <2 % [38] or to a point lower than the variability of the reference measure [18] required the manual delineation of 12 slices. This was reduced to seven slices using the Cavalieri method (assuming a cylindrical shape between two consecutives slices) for this specific muscle group. Although considerably reduced compared to the reference method (i.e., manual delineation from ~100 contiguous slices), the corresponding manual work remains tedious, especially for large cohorts. In this respect, Morse et al. [37] reduced the manual work to a single slice segmentation by developing a regression model describing the evolution of the cross-sectional area specific to each muscle composing the quadriceps as a function of muscle length in humans. Depending on the location of the segmented slice, the global error ranged from 10 to 27 %. Although these errors have been considered acceptable for the purpose of individual muscle size quantification, their use in a different context is highly questionable. The investigation of any new population would require a new model validation and would lose the benefit of any manual work previously performed. The present automatic method provides a high level of accuracy despite the large morphological heterogeneity of the population, thereby strongly suggesting the possibility of using a given database of atlases for the automatic segmentation of individual muscles from new populations. However, it should be noted that the present method, similarly to the previous automatic work [21, 36], does not identify intramuscular fat so that any comparison of populations with a different relative proportion of intramuscular fat such as in aging [11, 39] and in various pathological conditions [12] would require an additional thresholding process [14].

Conclusion

We reported in the present study an original multi-atlas-based automatic segmentation process dedicated to the skeletal muscle offering high accuracy and reliability. We demonstrated that automatic quantification of individual muscle volume in rat leg can be performed in a population characterized by a wide range of muscle size, thereby opening up promising opportunities in humans.