Introduction

Neuromuscular diseases, e.g., progressive spinal muscular atrophy (Werdnig–Hoffmann disease), Duchenne muscular dystrophy, or myositis are common neurological disorders. Incidence rates are \(10/100{,}000\) for progressive spinal muscular atrophy, \(29/100{,}000\) for Duchenne muscular dystrophy, and \(1/100{,}000\) for myositis.

Ultrasonography is a convenient technique to visualize healthy and pathological skeletal muscle tissue, providing high resolution scans with noninvasive real-time image acquisition. Neuromuscular disorders, like myositis, often cause structural muscle changes that can be observed in ultrasound images. It is assumed that the replacement of muscle tissue by fat and an increase or a thickening of fibrous connected tissue is the main reason for an increased muscle echo intensity in pathological tissue, since the number of reflections increases within the muscle resulting in partially higher intensities in the ultrasound image [29]. Hence, only few reflections occur in healthy muscle tissue during image acquisition, resulting in a low echo intensity and thus in rather dark images with more evenly distributed high-intensity peaks. As a consequence, texture of healthy muscle tissue seems to be more oriented and structured due to higher contrast changes, while pathological muscle tissue appears to be diffuse and unstructured due to its smaller contrast changes.

We treat the issue of detection of neuromuscular diseases in ultrasound images at the challenging example of myositis detection. Myositis, or inflammatory myopathy, is a rare disease in which the immune system chronically inflames the body’s muscle tissue. Persistent inflammation progressively weakens or destroys the muscle tissue and is commonly accompanied by pain. Over time, this process may also lead to a loss of muscle mass. Myositis most often appears in childhood, at the age of 5–15, and in adulthood, at the age of 40–60, although it might occur at any age. In contrast to other neuromuscular disorders like progressive spinal muscular atrophy or Duchenne muscular dystrophy, a texture analysis of myositis is more difficult, since the disease may only occur focally and brightness changes in pathological cases are often small [28, 30]. For this reason, we developed a CAD system for detecting myositis in ultrasound images, which can be easily adapted to detect similar neuromuscular disorders.

Under the assumption that pathological muscle tissue appears to be diffuse and unstructured, a texture-based analysis seems adequate for detecting myositis in ultrasound images. A computer-aided approach serves several purposes: It provides an independent opinion, which can be taken into consideration, and it saves time for radiologists. Moreover, it helps young professionals with less experience to identify myositis in ultrasound images. A well-trained process may also be capable to support an early diagnosis.

Diagnosis from visual inspection can be difficult for the radiologist in early stages of neuromuscular diseases or if myopathies cause only little structural changes in muscle groups. Figure 1 shows ultrasound images of pathological muscle tissue with different degree of severity of myositis. Discrimination between healthy and strongly affected muscle tissue in Fig. 1a, d seems to be rather intuitive, compared to the more difficult discrimination between healthy and little affected muscle tissue in Fig. 1a, b, or little and medium affected muscle tissue in Fig. 1b, c. The latter distinctions are challenging tasks even for automated classification.

Fig. 1
figure 1

Exemplary stages of myositis disease on ultrasound images of biceps brachii showing healthy (a) and pathological muscle tissue with different degree of severity: b little, c medium, and d strongly affected muscle tissue

Related work

Texture features are well established in many medical applications incorporating ultrasound imaging. Most problems require a combination of statistical and spectral features, since they represent the nature of deterministic tissue-related variation and non-deterministic influences from image generation ,respectively, [16, 25, 31, 35, 38].

These problems include diagnosis and classification of, for example, diffuse, chronic, alcoholic, fatty liver diseases, and liver cancer in ultrasound images [4, 15, 19, 20, 24, 26]. Such methods often require texture measures that rely on spatial gray level dependence, gray level co-occurrence, fractal dimension, Fourier transform, Law’s texture energy, discrete wavelet transform, or wavelet packet decomposition features combined with, for example, probabilistic neural networks or support vector machines as classifiers. More recent methods also include an automatic ROI detection or segmentation of the liver [22, 32].

Another important research branch of texture analysis in ultrasound imaging is the detection of breast cancer. Earlier work uses first-order statistics or the Fourier transform to extract texture features [12, 13]. Autocorrelation [6], morphological [5], auto-covariance [16], or Haralick’s [21] features of the breast tissue have also been used. More recently, approaches combine morphological features, generated by active contour or level set segmentation, with texture features derived from autocorrelation functions, first-order statistics, or spatial gray level dependences [34, 39] as well as shearlet transform [40], which have proven to produce good and stable classification rates.

There are several other research areas of texture analysis in ultrasound imaging, e.g., the diagnosis of prostate cancer in transrectal images. Here, methods use, e.g., combinations of features derived from gray level co-occurrence matrices, Gabor transform, or discrete wavelet transform [3, 25, 33]. To detect carotid atherosclerotic plaque in ultrasound images, different approaches exist, which for instance use features based on first- and second-order statistics, fractal dimensions, discrete wavelet transform, wavelet package decomposition, gray level co-occurrence matrix, Law’s filter masks, and the Fourier transform [1, 2, 8, 35, 37]. Some geometrical, Gabor, and statistical features as well as combinations of these have also been used for Parkinson’s disease diagnostics in transcranial sonography [7, 17, 31].

Regarding skeletal muscle ultrasound, Pohle et al. [28] proposed an approach for examining a number of different neuromuscular diseases. Although they obtained good classification rates, they did not use a standardized examination procedure. Scan and ROI estimation were performed subjectively by one person for already preselected images. Moreover, it was necessary to record digital ultrasound images on analog S-VHS cassettes during that time. A re-digitization, which is often a source of error, was required for a computer-aided evaluation. However, it was shown that a texture-based analysis of ultrasound images may work well for detecting progressive spinal muscular atrophy and Duchenne muscular dystrophy.

Pillen and van Alfen [27] also investigated neuromuscular diseases in ultrasound images. They proposed a quantitative gray scale analysis using first-order histogram-based statistical features, e.g., the mean intensity values of the ROIs. This approach performs well for diseases that cause a regular, increased echo intensity like Duchenne muscular dystrophy. However, methods using only first-order statistics most likely fail for irregular or focally located diseases, as in some cases of myositis.

König et al. [18] proposed a classification approach for detecting myositis with an overall accuracy of \({\ge }85\,\%\). They used a combination of two statistical and two wavelet features, measuring both, brightness and structural changes of pathological tissue in ultrasound images. Moreover, they demonstrated that too small ROI sizes (\({<}1.2\,\hbox {cm}^2\)) are unsuitable for such tasks. But their dataset was sparse and the approach was too sensitive with respect to tissue surrounding the muscle tissue of interest. Furthermore, texture features were only determined by visual inspection of the feature space. To sum up, this approach is not really applicable for clinical practice.

Materials and methods

We performed texture analysis on a dataset of biceps brachii (BB) of 7 healthy subjects and 11 subjects affected by myositis. Subjects aged from 40 to 80 years. Images were captured with a Toshiba Aplio XG system using a 9 MHz linear-array transducer with a scan width of about 50 mm, viewing a field depth of 30 mm. We used a standardized acquisition protocol (75 gain, 65 dynamic range) and smooth transducer movement in distal direction and with respect to orientations in transverse plane. All scans were captured in the transverse plane that is perpendicular to the long axis of the muscle, resulting in a more speckled appearance because of reflections caused by the perimysial connective tissue [27]. Two to four frames were captured per subject at a pixel resolution of \(716\times 537\) px from the ultrasound video sequences of the skeletal muscle. To ensure that frames of the same subject are not highly correlated, we captured them at different distal positions on the muscle. Another reason is that myositis may only appear focally and may be affected to varying degrees within the muscle. Figure 2 exemplary shows the different appearance of muscle tissue between captured frames of the same subject. Frames were saved in the Tagged Image File Format (TIFF), resulting in a database of 60 ultrasound images.

Fig. 2
figure 2

Images of the same subject affected by myositis were captured at different positions in distal direction. The muscle tissue of the images (a) and (b) is visually well distinguishable and not highly correlated

We decided to use a twofold experimental design, which means that our pre-studies in the following subsections (Sects. 3.2 to 3.5) were based on pre-classified images of an expert radiologist E0 to optimize the classification process. Additionally, the expert E0 evaluated 25 of the 60 images as visually hard classifiable, whereby 18 of 25 were healthy and seven of 25 were pathological cases. Pathological cases were additionally verified by the existing results of diagnostic tests. Later experiments evaluate the performance of our eventually designed CAD system and were based on the decisions of two further expert radiologists E1 and E2 (see Sect. 3.6).

All methods were implemented in MATLAB®, partially using the freely available PRTools [11] toolbox.

ROI generation

To ensure that texture analysis is performed on areas of muscle tissue, a segmentation of the muscle would be beneficial. However, automatic segmentation relies heavily on prior knowledge about the muscle and joint. Therefore, we let the radiologist E0 define a ROI around the tissue of interest. We implemented two kinds of ROI placements, shown in Fig. 3, requiring low and medium user effort ,respectively.

Fig. 3
figure 3

Two different ROI shapes used for feature extraction in our CAD system. An rectangular ROI (a) is used to minimize user effort. A closed polygonal ROI (b) can be used on demand as it is more accurate with respect to delineation of muscle tissue. However, it requires slightly higher user effort (8–12 mouse clicks)

The first step of our CAD system is based upon an one-click approach where the radiologist selects the center of the relevant muscle area. From this selection, a rectangular ROI is generated, which covers the majority of the clinically relevant muscle tissue. This ROI size is based on the average muscle size of biceps brachii and represents only an initial guess. If the suggested ROI is over- or undersized, the radiologist may adjust sizes by using the mouse’s scroll wheel. Figure 3a outlines the rectangular ROI approach.

Additionally, we investigated a more precise polygonal ROI definition which is shown in Fig. 3b. The amount of user interactions for polygonal ROIs depends on the desired precision but typically requires 8–12 mouse clicks. Such a quasi-segmentation of the muscle is more reliable, since no other types of (surrounding) tissue are included in the texture analysis. The following pre-studies were based on the rectangular and polygonal ROI definitions of expert radiologist E0.

Feature extraction

We investigated several first-order statistical features derived from the intensity histogram, Haralick’s features, derived from the gray level co-occurrence matrix (GLCM) [14], and features obtained from two-dimensional discrete wavelet transform (2D-DWT) [23] due to the following reasons:

  • First-order statistics, of which some have already been successfully applied on muscle ultrasound images [27], should describe the intensity changes of pathological muscle tissue appropriately.

  • Haralick’s features have been used on ultrasound images of similar neuromuscular diseases [28] and seem promising for describing structural changes of pathological muscle tissue.

  • Wavelet features are able to describe both intensity and structural information similar to other spectral features that have been successfully applied in other ultrasound image classification tasks (see Sect. 2).

For first-order statistics, we used five features: the mean intensity value, standard deviation, skewness, kurtosis, and entropy of the ROI’s histogram. A bias-corrected formula is used to compute standard deviation, skewness, and kurtosis to cope with differently sized ROIs.

Haralick’s texture features are based on the GLCM from which several statistical measures can be derived. GLCM is usually calculated from intensity quantized images to obtain a statistically reliable estimate of the joint intensity probability distribution. We computed the GLCM from ROIs quantized to 8, 16, and 32 intensity levels resulting in GLCMs with sizes of \(8\times 8\), \(16\times 16\), and \(32\times 32\) for each direction \(\alpha \) and distance \(d\). The geometric arrangement of considered pixel pairs \((p,q)\) is given by a displacement vector \(\delta = (d, \alpha )\) in polar coordinates, where \(d \in \{1,2,3,4,5\}\) defines the distance from the pixel of interest \(p\) to any adjacent pixel \(q\). \(\alpha = \{0^\circ ,45^\circ ,90^\circ ,135^\circ \}\) defines the horizontal, vertical, and both diagonal directions of \(\delta \). After normalization of the GLCM to relative values, we derived contrast, correlation, energy, and homogeneity as statistical features. A detailed description of these measures can be found in Haralick et al. [14]. The combination of intensity quantization and displacement vectors \(\delta \) results in a 240-dimensional feature space.

In 2D discrete wavelet transform (2D-DWT), we decomposed the images at different levels with preselected wavelet functions using low-pass (L) and high-pass (H) filters. The 2D-DWT outputs a decomposition vector of approximation coefficients (LL) of the decomposition at the highest level as well as horizontal (\(\text {LH}_x\)), vertical (\(\text {HL}_x\)), and diagonal (\(\text {HH}_x\)) detail coefficients of the decomposition at each level \(x\). Decomposition and reconstruction low-pass and high-pass filters can be derived from different wavelet function families, each having different properties such as filter lengths, symmetry, and bi-/orthogonality. We investigated ten wavelets including Haar/Daubechies (Db: 4, 8), coiflets (Coif: 1, 4), and reverse biorthogonal (RBio: 1.3, 3.1, 3.3, 3.5, 4.4) wavelet families. A detailed description of wavelet families and related properties can be found in the work of Daubechies [9]. We used a three-level 2D-DWT (shown in Fig. 4 for an exemplary ROI) for which we compute the energy that corresponds to approximation coefficients (LL), horizontal (\(\text {LH}_x\)), vertical (\(\text {HL}_x\)), and diagonal detail coefficients (\(\text {HH}_x\)). More precisely, we summed up the squared coefficients for each of the ten sub-images for each wavelet function yielding 100 features. Additionally, we summed up horizontal, vertical, and diagonal energies over all levels for each wavelet function, ultimately resulting in a 130-dimensional feature space.

Fig. 4
figure 4

Results of a three-level 2D-DWT. The \(\text {LL}_3\) approximation coefficients of level 3 are located in the upper left corner. Horizontal detail coefficients (\(\text {HL}_x\)) of each level \(x\) are shown in descending order from this point toward the upper right corner. The same applies for the vertical detail coefficients (\(\text {LH}_x\)), which are positioned toward the lower left corner and for the diagonal detail coefficients (\(\text {HH}_x\)), which are arranged toward the lower right corner

Feature selection

Overall, we constructed a 375-dimensional feature space, which cannot directly be used for classification because feature vectors will most likely contain unnecessary, correlated, or even misleading information. Therefore, reduction is necessary. Here, the main purpose is the removal of features not contributing to the classification (feature selection). We investigated two feature selection methods: sequential forward feature selection (SFFS) and sequential backward feature selection (SBFS) to determine the features among first-order statistics, Haralick’s features, and wavelet-based features that perform best with respect to class discrimination in feature space.

We used two different objective functions for each strategy in order to detect whether the greedy selectors got trapped in a local optimum. We applied sum of squared differences between any two samples of the two classes as our first objective. This leads to selectors for compact features. As a second objective, we applied classification with linear SVM, providing selectors that target for linear separation. The former selectors can be executed much faster than the latter because they evaluate intrinsic properties instead of interactions with a special classifier. Thus, their results exhibit more generality. The latter, however, can achieve higher accuracy, although their solution often lacks generality.

Since the range of feature values varies greatly, the feature selection will not work properly without prior normalization (also known as pre-whitening). Hence, we normalized the range of features by shifting the values to zero mean and rescaling them to unit class variances.

Dependent on the selection strategy and objective function, different wavelet-based and first-order statistical features were ranked with highest score in almost all investigated cases. However, a stable feature combination could not be found. Contrarily, Haralick’s texture features seem to be less suitable to distinguish between healthy and pathological muscle tissue.

In summary, we selected a 135-dimensional feature subspace of wavelet-based features and first-order statistics, excluding 240 Haralick’s features from further analysis because they performed inferior in feature selection compared to the former. These results were independent of the ROI definition approach.

Feature reduction

Since our feature space is still high dimensional, we subsequently applied feature reduction to reduce it to a lower-dimensional representation yielding a better generalization. Such data transformation may be linear or nonlinear depending on the class distinction in feature space. From Fig. 5a, b, we can deduce that linear transformation is sufficient. Hence, we decided to apply a supervised principal component analysis (SPCA) on the averaged within-class scatter matrix \(S_w\) for our 135-dimensional feature space:

$$\begin{aligned} S_w = \sum _{i=1}^c p_i C_i \end{aligned}$$
(1)

with \(C_i\) being the covariance matrix of class \(i\), \(c\) the number of classes, and \(p_i\) the probability of a sample to belong to class \(i\). In the next step, we solve the eigenproblem

$$\begin{aligned} S_w \mathbf {a} = \lambda \mathbf {a} \Leftrightarrow (S_w - \lambda I)\mathbf {a} = 0 \end{aligned}$$
(2)

and permute results such that eigenvalues are in descending order.

Fig. 5
figure 5

Scatter plots of a two-dimensional projection of our feature space. a Feature combination for polygonal and b for rectangular ROIs. The stars represent samples of pathological (p) and the plus samples of healthy (h) muscle tissue. Obtained decision boundaries of \(k\)-NN, Fisher’s classifier, and the linear SVM are depicted after a training on the whole training dataset

The aim of this reduction step was to find a trade-off between the subspace dimension and the preserved variance in the data. Table 1 details this trade-off for both, feature space of rectangular and polygonal ROIs. It can be seen that 99 % of the data’s variance could be preserved using only the 22 most significant eigenvalue directions. This means the feature space could be already reduced by a factor of about six. At 80–90 % of the preserved variance, reduction was saturated. Therefore, we used five most significant eigenvalue directions as features for the next step.

Table 1 Comparison of the size of the feature subspace after a linear transformation to their preserved variance for both, rectangular and polygonal ROIs

Classification

For classification, we tested three approaches: a nonlinear \(k\)-nearest neighbor (\(k\)-NN), Fisher’s linear discriminant, and a linear support vector machine (SVM). \(k\)-NN selects the class membership of a sample according to the majority of class memberships of its \(k\) nearest neighbors in the training data. As distance metric, we used the Euclidean distance. We optimized parameter \(k\) using the leave-one-out error of the dataset for both the rectangular and polygonal ROIs. Fisher’s least square linear classifier attempts to find a linear discriminant function between the two classes along which the ratio of between- and within-class scatter is maximized in the least squares sense [10, 36]. The linear SVM seeks a decision boundary that maximizes the distance to the nearest class sample to both of its sides. Therefore, a quadratic programming problem must be solved.

Decision boundaries, resulting from training \(k\)-NN, Fisher’s classifier, and linear SVM on a selected feature subset of our training dataset, are shown in Fig. 5. It can be seen that the linear SVM adequately separates healthy and pathological muscle tissue. The decision boundary of \(k\)-NN also seems to provide good separation in contrast to that of the Fisher’s classifier. At first glance, separation of polygonal ROIs (see Fig. 5a) seems to be slightly better than that of the rectangular ROIs (see Fig. 5b). The leave-one-out classification results are shown in Table 2. Feature reduction and classifier training were performed on the \(N-1\) images of the training dataset in each round of validation. Figure 6 indicates the general trend for cross-validation with smaller training set sizes. Here, the linear SVM outperforms Fisher’s classifier yielding almost similar results when compared to \(k\)-NN. Hence, we chose a linear SVM, which provides a more generalized and less overfitted decision.

Fig. 6
figure 6

Averaged classification error for x-fold cross-validation on rectangular (dashed lines) and polygonal ROIs (solid lines) compared for 100 stratified repetitions each

Table 2 Leave-one-out error rates for trained classifiers

Performance measurements

To obtain a fair judgement of the performance of our CAD system, our performance measurements were undertaken independently by two expert radiologists (E1, E2) classifying our data by the single rectangular and polygonal ROI approach as well as by our CAD system. The resulting dataset was denoted as validation dataset. Note that all pre-study experiments regarding the classification process, i.e., feature selection, reduction, and classifier training (see Sects. 3.3, 3.4, or 3.5), were based on the training dataset and thus on the ROI definition of another field expert (E0). Furthermore, to reduce any mutual dependence, we performed leave-one-out cross-validation. Hence, we excluded the current ROI of the training dataset, our ground truth, from feature reduction and classifier training in a leave-one-out manner. Therefore, our SPCA and linear SVM training were performed on \(N-1\) ROI definitions of our training data defined by expert E0, where \(N\) is the number of total ROIs. Performance was measured for each ROI of our validation dataset defined by experts E1 and E2, finally producing averaged values per expert after all rounds of validation. An ROI-wise leave-one-out cross-validation should be appropriate since texture features of images of the same subject should not be highly correlated due to our proposed standardized image acquisition procedure (see, e.g., Fig. 2).

We used well-established performance measures namely, sensitivity and specificity for the evaluation. Sensitivity, also called true positive rate, measures the proportion of pathological cases which are correctly classified as such. Specificity, also called true negative rate, measures the proportion of healthy cases which are correctly classified as such by our CAD system. Additionally, we calculated the accuracy of our CAD system as overall performance measure since the number of class instances of healthy almost equals that of pathological ROIs.

Results and discussion

CAD system

Figure 7 shows the sequence of operations performing in our CAD system. The first step (1) of our system is based upon an one-click approach where the radiologist selects the center of the relevant muscle area. To overcome imprecise ROI definitions and to increase the robustness of our approach, we additionally evaluate eight new ROIs which are shifted versions (along \(x\)-, \(y\)- and both diagonal axes) of the placed single rectangular ROI. Figure 8 outlines the shifting of the ROI. The amount of shifting depends on the chosen size of the single rectangular ROI and typically ranged from 1.5 to 4.0 mm. Because of the curved shapes of biceps brachii in the ultrasound images, our shifted ROIs cover most of the relevant muscle tissue. For each of the ROIs, five first-order statistical and 130 wavelet-based features are extracted; the feature space is reduced by the trained SPCA and then classified by the trained linear SVM. Classifying each of the nine ROIs separately allows us to stabilize the final decision.

To estimate whether the classification is reliable, we map signed distances to the linear SVM boundary using the sigmoidal function \((1+\exp (-d))^{-1}\). This provides us with a class confidence score (CCS) allowing us to judge how likely it is that an unknown sample belongs to either of the classes. In our CAD system, we used the CCS to identify inconclusive ROIs for which a distinct class assignment was not feasible. If the CCS ranges from \(\frac{1}{3}\) to \(\frac{2}{3}\), the considered ROI is set as inconclusive.

Fig. 7
figure 7

Workflow of the proposed CAD system

Immediately after the classification of the nine ROIs, our CAD system evaluates the CCS for each of them ,whereby the final decision depends on the majority of the non-discarded ROIs. We used this majority voting scheme to overcome imprecise ROI definitions and to increase the robustness of our approach. If the resulting decision of the voting scheme is conclusive, our CAD system outputs the class assignment for the defined ROI. If not, the radiologist is prompted to define a more exact polygonal ROI (2) in a second step. The feature extraction, feature reduction, and classification methods used for the polygonal ROI are equal to those of our one-click voting scheme. In case that the CCS of the polygonal ROI is still inconclusive after classification, no class assignment is produced by our CAD system, but the resulting CCS is displayed as a guidance for the radiologist.

Fig. 8
figure 8

ROI shifting scheme used by our CAD system. The arrows illustrate the eight directions in which the rectangular ROI was shifted, producing eight new shifted versions of the ROI with unchanged size

Experimental results

Results of our experiments are given in Table 3. We also included the results of our one-click voting scheme without the on-demand polygonal ROI definition (voting only) to show the impact of the voting scheme. Moreover, the number of inconclusive ROIs is indicated and all results are given for both radiologists, E1 and E2.

Table 3 Accuracy, sensitivity, specificity, and the number of inconclusive images (#Inconclusive) for single rectangular and polygonal ROIs, our voting scheme only, and our CAD system

As expected, accuracy, sensitivity, and specificity was worst using single rectangular ROIs (72–81 %) and best using polygonal ROIs (92–97 %). However, the user effort of polygonal ROI definition is substantially higher than that of single rectangular ROIs, which renders its application impractial in clinical routine. It can be seen that the accuracy, sensitivity, and specificity of the voting scheme (82–89 %) were higher than that of a single rectangular ROI, which is especially true for the second radiologist, E2. This confirms the need for an improved stability of the user input. Moreover, our voting scheme produced a number of inconclusive images, which could not be reliably classified. These images were further processed with our CAD system, using polygonal ROIs.

Our system finally achieved 85/87 % accuracy, 90 % sensitivity, and 83/85 % specificity, depending on the expert radiologist. Moreover, the number of inconclusive images could be reduced to zero for both expert radiologists. Since the achieved values for both expert radiologists are almost equal, it can be concluded that the influence of user input variability was successfully reduced by our CAD system, although, mouse click positions of both expert radiologists were quite different. Moreover, nearly all of the falsely classified images are the same for both radiologists.

Referring to the problems discussed in Sect. 1 (see Fig. 1), we were still able to classify 17 of 25 (68 %) visually hard distinguishable muscle images correctly, e.g., the ultrasound images shown in Fig. 9c, d. However, the figure also shows exemplary images for which our classification approach fails. The images shown in Fig. 9a, b were falsely classified due to the fact these do not meet our previously defined assumptions: texture of healthy muscle tissue is more oriented and structured, whereas pathological muscle tissue appears to be diffuse and unstructured. In Fig. 9b, the scan of the pathological muscle tissue might be mistaken to appear healthy, because myositis was captured in an early stage or because there is no affected muscle tissue in this particular scan of the muscle due to the fact that myositis may often occur focally. Contrarily, the healthy scan from Fig. 9a seems to be very diffuse caused by the high amount of fat and connective tissue occurring in elderly people. To overcome this problem, we might include a patient-specific weight or fat content parameter into the classification process.

Fig. 9
figure 9

Examples of incorrectly classified images of healthy (a) and pathological (b) muscle tissue in comparison with correctly classified images of healthy (c) and pathological (d) muscle tissue

Conclusion

We developed a computer-aided diagnosis (CAD) system for detecting neuromuscular diseases in ultrasound images, providing radiologists an independent opinion and a tool that helps training young professionals. We investigated this at the challenging example of myositis detection in scans of the biceps brachii. In our CAD system, rectangular regions-of-interest (ROIs) are placed with a single click, triggering an automatic classifier that was learned on training samples. To deal with an potential misplacement of the ROI, e.g., when non-muscle tissue is included, we evaluated several spatially shifted ROIs and established a majority voting scheme to provide a stabelized decision. The decision of our CAD system is based on a class confidence score, classifying into healthy, pathological, and inconclusive cases. Regarding the latter, the radiologist is asked to specify a more detailed polygonal ROI using a small number (8–12) of clicks on which the actual decision is based. Thus, we combined minimum effort of the one-click solution, which works for most cases, with a highly accurate polygonal ROI, when needed.

For automatic classification, our CAD system uses a combination of feature extraction, feature reduction, and classification. Within this work, we investigated three kinds of features for the classification task: first-order statistics, wavelet-based features, and Haralick’s features, all of which measure different important aspects of the texture of the muscle tissue, i.e., its contrast and structure, and therefore were used in several other classification tasks as well [21, 24, 33]. In a preliminary feature selection step, sequential forward and backward feature selection were used and it was shown that Haralick’s features contribute little, if at all, to the classification. Therefore, we proposed to use only first-order and wavelet-based features. Moreover, we showed that most of the information can be preserved with as little as five linearly combined feature space dimensions, using a supervised principal component analysis for feature reduction. The final five-dimensional feature space separates well between healthy and pathological classes, which was shown for three classifier: Fisher’s linear classifier, a linear support vector machine, and a nonlinear \(k\)-nearest neighbor algorithm. The latter performed equally well in our comprehensive experiments, while the former performed worse. We proposed to use a linear support vector machine due to its beneficial in generalizability (see Sect. 3.5). Finally, our approach achieves high sensitivity, specificity, and accuracy for single rectangular ROIs (72–81 %, radiologist-dependent) and even higher values for single polygonal ROIs (92–97 %, radiologist-dependent). We also investigated the one-click voting scheme that our CAD system uses. Here, the user is asked to specify the polygonal ROI only if the decision for the one-click voting scheme is inconclusive. As expected, performance of our CAD system (83–90 %, radiologist-dependent) lies midway between those of single rectangular and polygonal approaches, but requires substantially lower effort than the polygonal ROI definition. Further work will include the application of our CAD system to other muscles, other types of neuromuscular diseases, and other ultrasound devices. Moreover, an inclusion of patient-specific parameters, e.g., fat content and weight, into the classification process will be of interest.