Introduction

Periodic screening with mammography is considered effective in reducing the number of cancer deaths for women with normal risk [13]. When an abnormality is found, it is often reviewed by additional examinations, such as spot compression mammograms, ultrasonography, and magnetic resonance imaging. Even in such multimodality diagnostic framework, it may be important to fully assess findings on mammograms and to compare with expected findings in the other imaging modalities. For assisting radiologists in the interpretation of mammograms, presentation of the computer-estimated likelihood of malignancy of lesions has been found potentially useful in the observer performance studies [46]. However, without proper explanation or evidence, the utility of numerical likelihood may be limited. To complement numerals, presentation of reference images has been suggested [7], and a number of research groups investigated the relevant image retrieval methods for classification of masses and clustered microcalcifications on mammograms and breast ultrasound, nodules in chest radiographs and lung computed tomography (CT), and liver lesion in CT [821].

In some previously proposed image retrieval systems, reference images were selected from the suspected diagnosis or disease category of a query [8, 12, 13]. Other image retrieval methods were often based on the closeness of image features [10, 12, 15, 18]. In such systems, images are retrieved on the basis of similarity of image contents. For both types of the systems, it is important that the retrieved images are visually or diagnostically similar to enhance the utility of reference images. However, visual similarity between a query image and the retrieved images is usually not evaluated probably because of a lack of the gold standard. To select visually similar images, some investigators attempted to quantify subjective similarity of images by radiologists or physicists [14, 16, 17, 19, 22, 23] and to incorporate them in the determination and evaluation of objective similarity measures. Others included the feedback procedure for improving the retrieval relevance [18, 20].

In our previous studies, subjective similarity ratings for pairs of masses and pairs of clustered microcalcifications on mammograms were determined by groups of breast radiologists and the average subjective similarity ratings were employed as teacher in the supervised learning [19, 24]. The results in these studies showed that the similarity measures obtained by the machine learning, called psychophysical similarity measures, agreed better with the subjective similarity than the distance-based similarity measures, which was indicated by the higher correlation coefficient between the psychophysical similarity measures and the subjective ratings.

The usefulness of presenting similar images in the distinction between benign and malignant masses by radiologists was evaluated in the observer performance study [25]. It has been found that although the presentation of similar images has a potential utility, reference images may sometimes confuse radiologists when an unknown case is similar to both benign and malignant images. In our previous studies, only the malignancy or benignity information of reference images was provided. However, it is known that although typical malignant masses have some distinctive margin characteristics, such as spiculation, masses with some malignant pathologic types can have circumscribed margin. Therefore, information of pathologic types and the graphical relational map could enhance utility of reference images.

In this study, we have investigated a new method for providing a similarity map by use of multidimensional scaling (MDS) [26, 27]. The MDS is one of the multivariate statistical techniques which can express the dissimilarity relationship of data in lower dimensional space. It can be a useful tool for visualizing the information and understanding the characteristics with respect to the similarity of data. However, to our knowledge, only a small number of studies [2833] have explored this technique in the field of medical imaging. It was mainly used as visualization tools in understanding the functional connection between brain subregions [28], in finding a possible outlier for drawing conclusion about population mean [29], and in evaluating and comparing different segmentation methods [30]. It has also been used as a mean of dimension reduction from large information data to three dimensions for a possible application in the interactive search for brain atlas [31] and visualization of similarity information in color [32]. Wei et al. proposed the use of MDS as a perceptual evaluation tool to compare similarities obtained by radiologists and results of retrieved images for microcalcifications on mammograms [33].

In this study, we employed the MDS for understanding the similarity relationship between breast masses with different pathologic types. The relational map between an unknown image and the reference images may serve as an intuitive guide which can provide the supplemental information to the reference images. Such a map may also be useful for the teaching purpose for residents and medical students in training. In addition, we attempted to reconstruct the MDS map by modeling each dimension with a combination of image features. None of the above studies used MDS to estimate similarities for new data. By using the model, without subjective similarity data, a new case can be mapped to visualize a predicted relationship with reference data.

Materials and Methods

Image Database

Digital mammograms were obtained at the National Hospital Organization, Nagoya Medical Center, Nagoya, Japan. The institutional review board approved the research protocol. The images were obtained with three digital systems, i.e., phase contrast mammography (PCM) system (Mermaid or Pureview, Konica Minolta Holdings, Inc.), Fuji digital mammography system (Amulet, Fujifilm Corporation), or computed radiography systems (MAMMOMAT 3000, Siemens, with C-Plate, Konica, or Profect, Fujifilm). The pixel size of the original images was 25, 43.75, or 50 μm, and the pixel values were stored in gray levels of 10, 12, or 14 bits depending on the systems used. For image processing, the pixel size was adjusted to 50 μm by the linear interpolation and the gray level was down sampled to 10 bits.

On the basis of the radiologic and pathologic reports, square regions of interest (ROIs) including masses were extracted from craniocaudal and mediolateral oblique views by one of two radiologists if a lesion was visible on mammograms and not partially cutoff by the field of view. The size of the ROIs varied from 168 × 168 to 1,888 × 1,888. In this study, as a preliminary investigation, nine pathologic types with at least five lesions were considered. Six malignant types included the ductal carcinoma in situ (DCIS), invasive lobular carcinoma (ILC), mucinous carcinoma (MC), papillotubular carcinoma (PTC), scirrhous carcinoma (SC), and solid–tubular carcinoma (STC), and three benign types included the cyst, fibroadenoma (FA), and benign phyllodes tumor (PT). Note that PTC, SC, and STC are the three subcategories of invasive ductal carcinomas (IDC), and IDC with unknown subcategories were not included. Figure 1 shows the examples of masses of these subtypes. The mass database consisted of 322 ROIs for 201 lesions obtained from 186 individuals. A breakdown of the imaging systems used was 21, 17, 39, and 23 % for PCM, C-Plate, Amulet, and Profect, respectively. Table 1 shows the number of lesions and ROIs for masses with different pathologic types. The malignant cases were confirmed by biopsy and/or surgery, and benign cases were confirmed by biopsy or follow-up by mammography and ultrasonography.

Fig. 1
figure 1

Examples of mass images of nine pathologic types used in this study

Table 1 The number of lesions and ROIs for masses with different pathologic types

Subjective Similarity Ratings

For creating a similarity map by use of MDS, distances (dissimilarities) between pairs of masses must be determined. Our goal is to provide a similarity map which is diagnostically reasonable and agreeable by radiologists so that it can be useful in the diagnosis and training. Therefore, it was essential to obtain the “gold standard” of similarity (or dissimilarity) by a group of experienced radiologists.

From each of the nine groups, three ROIs with representative characteristics of the subtype are selected as the sample cases for obtaining the subjective similarities. No more than one ROI was selected from the same patient. The effective diameters of these 27 masses ranged from 8 to 34 mm with the mean of 15.5 mm based on the manual outlines of the masses determined by a co-author (CM).

For MDS analysis, pairwise dissimilarities for all pairs, in general, must be determined. Therefore, subjective similarity ratings for all possible 351 pairs were obtained by eight experts who have certified for breast image reading by the Central Committee on Quality Control of Mammographic Screening in Japan. The years of experience in reading mammograms ranged from 4 to 25 years with the mean of 12 years. Each reader provided the subjective similarity ratings for all 351 pairs on a continuous rating scale between 0.0 and 1.0, corresponding to extremely dissimilar to similar, respectively, based on the overall impression for diagnosis including the shape, density, and margin with the consideration of predicted pathologic types. They were asked not weight on the size of lesions, the surrounding normal tissue, and unrelated calcifications, if present. The readers were told that the masses included nine pathologic types; however, the number of lesions in each type was not revealed.

During the reading session, a pair of ROIs was displayed on one monitor (17 in., 1,280 × 1,024 pixel resolution, Eizo Nanao Co., Hakusan, Japan) with the rating scale, and their corresponding entire-view mammograms were provided with squares specifying the ROI positions on the next monitor (27 in., 2,560 × 1,440 pixel resolution, Dell Inc., Round Rock). Figure 2 shows the observer interface for this reading session. The observer could change the contrast and brightness of the ROIs, if desired. Five training cases, including pairs with the same pathologic types and those with different types, were provided in the beginning of the session so that the observers could become familiar with the appearances of similar and dissimilar pairs. The order of the pairs was randomized for each observer.

Fig. 2
figure 2

An observer interface used for obtaining the subjective similarity ratings. The left monitor shows two ROIs with masses to be rated for their similarity. The right monitor shows the corresponding entire view mammograms with red boxes specifying the ROIs

MDS Analysis

Kruskal’s nonmetric MDS was performed by use of an existing source code with R programming language. By providing the dissimilarity matrix and the desired dimension, k, it iteratively fits data to k-dimensional configuration by minimizing the stress, S, or goodness-of-fit, which is given by

$$ S=\sqrt{{\frac{{\varSigma {{{\left( {d-\widehat{d}} \right)}}^2}}}{{\varSigma {{\widehat{d}}^2}}}}}, $$

where d and \( \widehat{d} \) are the input dissimilarities and the estimated dissimilarities (configuration distances), respectively. The input dissimilarities for the mass pairs were determined by simply subtracting the average subjective similarity ratings from 1. The output values by MDS were the k column vector coordinates and the stress. In this study, to keep it simple and to avoid overtraining, small numbers of dimensions k of two, three, and four were tested.

The similarity map obtained by MDS can visualize which masses are considered similar and dissimilar by the group of experts. In order to reproduce the similarity map without the gold standard data, each axis in the configuration maps was fitted by a linear regression model with image features. In this study, we determined 13 image features for masses characterizing the shape, density, and margin. The shape features included the area, effective diameter, perimeter, circularity, irregularity, ellipticity, elliptical irregularity, and minor-to-major axis ratio of a fitted ellipse; the density features included the contrast and the standard deviation of pixel values; and margin features included the average edge gradient, radial gradient index (RGI), and the full width at half maximum of the radial gradient histogram. The definitions of these features were provided elsewhere [17]. The manual contours of the masses by a co-author (CM), who was blinded with the pathology results but under the guidance by radiologists, were used for the determination of image features. Although automated segmentation of the masses is desirable, the accuracy of our segmentation method is not sufficient at this point; it would be a subject in a future study.

The results of the fitting models were evaluated by the mean absolute residuals, i.e., mean differences between the MDS configuration values and the fitted values by the regression models. For evaluating whether fitted similarity maps can effectively represent the relationships of the subjective similarities for masses, the correlation between the average subjective similarity ratings by the experts and the distances in the new feature space by the regression models was determined. The correlation coefficient was compared with that between the subjective ratings and the similarity measure based on the distances in a conventional feature space.

Results

The detailed analysis of the subjective similarity ratings were provided elsewhere [34]. Briefly, average inter-reader agreement on subjective similarity ratings in terms of Pearson’s correlation coefficient was 0.58 (range [0.43, 0.71]). The experts generally rated higher for the pairs with the same pathologic types than those with the different types. However, some masses with different pathologic types were also considered similar, such as cysts and FAs; SCs and ILCs; and DCISs, PTCs, and STCs.

By use of MDS, the similarity (dissimilarity) data were fitted to two-, three-, and four-dimensional configurations. The final stresses were 11.84, 9.18, and 7.16 %, respectively. Figure 3a shows the MDS similarity map for the 27 masses in the first and second axes of the 4D configuration. It can be seen that the cysts and FAs and also ILCs and SCs are clustered close together. Although with the wider spread, DCISs, PTCs, and STCs were placed nearby. PTs are located somewhat independent of other types. One MC is close to the malignant masses, while other MCs are placed very close to the benign ones, which indicates that MC could be similar to both benign and malignant masses. These observations agree with the average subjective ratings. The similar trends were observed with 2D and 3D configurations; however, the separations between the pathologic groups were smaller in third and fourth axes with 3D and 4D configurations.

Fig. 3
figure 3

Similarity map for the 27 masses. a MDS similarity map in the first and second axes of the 4D configuration, and b reconstructed map by fitting the linear models with 13 image features in the first and second axes corresponding to a

Each axis in 4D MDS configuration was fitted by use of a linear regression model with the all features. The mean fitted errors for first to fourth axes were 0.07, 0.08, 0.06, and 0.06, respectively. Compared to the ranges of the values, the errors of 6–11 % can be considered relatively small. Figure 3b shows the reconstructed similarity map by the regression models which corresponds the MDS map in Fig. 3a. Although some differences between two maps can be observed, the key features, such as the three groupings of cysts and FAs; ILCs and SCs; and DCISs, PTCs, and STCs, are consistent. On the basis of the coefficients of the regression models, the contributions of some features such as the area, effective diameter, perimeter, irregularity, elliptical irregularity, and full width at half maximum (FWHM), were generally larger than other features in four axes. However, the correlations between the reconstructed axes were small, indicating the independence of the reconstructed feature components.

Using the reconstructed feature map, the dissimilarity based on the distance between a pair of masses was determined. The relationship between the dissimilarities and the average subjective similarity ratings for the 351 pairs is shown in Fig. 4a. The Pearson’s correlation coefficient was −0.87. The high correlation indicates that the reconstructed map can represent the similarity relations between the masses relatively well, and that the new features determined by the linear combination of features may be useful in determining the objective similarity measures.

Fig. 4
figure 4

Relationship between the average subjective similarity rating and dissimilarity measures based on the distance in the a reconstructed feature space and b conventional feature space

In comparison, the dissimilarity measure based on the distance in a conventional feature space between a pair of masses was determined. Several combinations of the features were tested, and a combination of six features provided the similarity measure with the highest correlation to the subjective ratings. The six features included the perimeter, ellipticity, elliptical irregularity, contrast, RGI, and FWHM. The relationship between the average subjective ratings and the dissimilarities based on the six features is shown in Fig. 4b. The correlation coefficient was −0.65, which was much weaker than that for the dissimilarities in the reconstructed space.

Discussion

In the study, a new method for constructing the similarity map for breast masses by use of MDS was investigated. Using the subjective similarity ratings determined by the experts, the similarity map can visualize the relationship between masses with different pathologic types. The MDS map consistently reflected the facts that the masses with some pathologic types such as cysts and FAs; ILCs and SCs; and DCISs, PTCs, and STCs were considered similar and that one MC was considered similar to the malignant masses while other MCs were considered similar to benign masses by the experts. In this study, only small number of cases with each pathologic type was used by the limitation of the cases available and the time spent by experts for rating similarity. However, if a larger number of cases can be used, MDS maps may be useful to understand the characteristics of masses with different pathologic types in terms of the similarity relationship of masses with different types. Such map may be useful as a teaching material for residents and medical students.

On the basis of the MDS map, the similarity map was reconstructed by the linear models with 13 image features. For the 27 masses used in this study, the similarity map could be reproduced with relatively small errors. When the dissimilarity measures were determined by the distances in the reconstructed map, the strong correlation between the average subjective ratings and the dissimilarity measures was obtained. Such dissimilarity (or similarity) can be useful in the selection of similar reference images as a diagnostic aid. Previous methods for determination of image similarities were often based on the distance in the feature space. When the dissimilarity measures for the 351 pairs was determined by the distance in the conventional feature space, the correlation between the subjective similarity ratings, and the dissimilarity measures was moderate. In addition, the correlation coefficient (in absolute values) for the distances in selected six-feature space (−0.65) was only slightly higher than the one for the differences in one feature (−0.63). This was because some of the selected features were highly correlated. Therefore, the reconstruction of the MDS map by the linear models may have advantages of reducing the conventional feature dimensions and creating new independent features with better representation of lesion similarity.

One of the limitations in this study was that, as mentioned above, only a small number of cases in each pathologic type were used for the experiment. We have collected new digital cases for this study, and because of the disease prevalence, only small numbers of cases were available for some pathologic types. In addition, for MDS analysis with 27 cases, 351 subjective similarity ratings had to be obtained by each expert. If we were to include five ROIs per pathologic type, the number of pairs almost triples. Therefore, for this preliminary investigation, small numbers of cases were included. However, three ROIs for each type may not entirely represent the pathologic type. For better understanding of the similarity relationship of different types of masses and creating a useful similarity map, a large number of cases must be included in the future. Also, the method should be tested with the new cases that are not used in the construction of MDS map.

In this study, the images were obtained by three different types of digital systems with four kinds of detectors and three different pixel resolutions. Because of the small number of study cases, the reliable analysis on the effect of the imaging systems cannot be performed. However, its effect to the observers’ subjective ratings is expected to be small, because all the observers were familiar with the images obtained with all these devices. It could be suspected that the image characteristics may differ, which would affect the image feature values. However, the feature values for the 322 images obtained by the four devices were largely overlapped and therefore, the effect should be small.

Conclusion

A new method for visually presenting the similarity relationship between masses with different pathologic types by MDS was investigated. The MDS similarity map and the reconstructed map with linear combinations of the features have potential usefulness in the diagnosis and training. The reconstructed feature space can be useful for the determination of similarity measures in selecting the similar reference images.