Keywords

The basic ideas of MDS are easily explained using a small example. Consider Table 1.1. It contains correlations for the frequencies of different crimes in 50 U.S. states. These correlations show, for example, that if there are many cases of assault in a state, then there are also many cases of murder (\(r = 0. 81\)). In contrast, the murder rate is not correlated with the rate of larceny (\(r = 0.06\)).

We now scale these correlations via MDS. This means that we try to represent the seven crimes by seven points in a geometric space so that any two points lie the closer together the greater the correlation of the two crimes that they represent. For this we proceed as follows.

Fig. 1.1
figure 1

Starting configuration for an MDS of the data in Table 1.1

Fig. 1.2
figure 2

Measuring distances with a ruler

We take seven cards, and write the name of one crime on each of them, from Murder to Auto Theft. These cards are placed on a table in an arbitrary arrangement as shown in Fig. 1.1. We then measure the distances among all cards (Fig. 1.2) and compare these values with the correlations in Table 1.1. This comparison makes clear that the configuration of cards in Fig. 1.1 does not represent the data in the desired sense. For example, the cards Murder and Assault should be relatively close together, because these crimes are correlated with 0.81, whereas the cards Murder and Larceny should be farther apart, as these crimes are correlated with only 0.06. We therefore try to move the cards repeatedly in small steps (“iteratively”) so that the distances correspond more closely to the data. Figure 1.3 demonstrates in which directions the cards should be shifted, by some small amounts, to improve the correspondence of data and distances.

Since iterative modifications of a given configuration by hand can be fairly tedious and since they do not guarantee that an optimal configuration is found in the end, we did not continue these iterations by hand but used an MDS computer program instead. It reports the solution shown in Fig. 1.4.

Table 1.1 Correlations of crime rates over 50 U.S. states
Fig. 1.3
figure 3

Directions for point movements to improve the MDS configuration

Fig. 1.4
figure 4

MDS representation of the correlations in Table 1.1 after several iterations

Fig. 1.5
figure 5

GUI for the MDS module of Systat

One such MDS program is Proxscal, a module of Spss. To use Proxscal, we first save the correlation matrix of Table 1.1 in a file that we call ‘CorrCrimes.sav’. Then, we only need some clicks in Proxscal’s menus or,alternatively, the following commands:

figure a1

The "PROXIMITIES" sub-command informs the program that the data—called proximities in this context, a generic term for both similarity and dissimilarity data—must be interpreted as similarities. That is, small data values should be mapped into large distances, and large data values into small distances. No further specifications are needed. The program uses its default specifications to generate an MDS solution. We will show later how these specifications can be changed by the user if desired.

Many other programs exist for MDS. One example with nice graphics is the MDS module in Systat. Systat can be run using commands, or by clicking on various options in a graphical user interface. Having loaded the data file with the correlations, and then calling the MDS procedure, we get the menu in Fig. 1.5. In this menu, we select the variables ‘Murder’, ‘Rape’, etc. and leave all other specifications as they are, except the one for “Regression” (marked by the arrow on the left-hand side), where we request that the MDS program should optimize the relation of data to distances in the sense of a least-squares linear regression. (The default is ordinal regression which is discussed later; see p. 37f)

Both computer programs—Proxscal in Spss and the MDS module of Systat—generate essentially the same MDS solution for the correlations in Table 1.1. This solution is not only optimal, but also quite good, as Fig. 1.6 shows: The relation of data and distances is almost perfectly linear (\(r = -\,0.99\)). Hence, the distances among the points of Fig. 1.3 contain the same information as the correlations of Table 1.1. Expressed differently, the data are properly visualized so that one can interpret the distances as empirical evidence: The closer two points in the MDS plane, the higher the correlation of the variables they represent.

What has been gained by analyzing the crime data via MDS? First, instead of 21 different numerical indexes (i.e., correlations), we get a simple visual representation of the empirical interrelations. This allows us to actually see and, therefore, more easily explore the structure of these data. As shown in Fig. 1.7, the various crimes form certain neighborhoods in the MDS plane: Crimes where persons come to harm are one such neighborhood, and property crimes form another neighborhood. This visualizes, for example, that if the murder rate is high in a state, then assault and rape also tend to be relatively frequent. The same applies to property crimes. Robbery lies between these neighborhoods, possibly because violent crimes not only damage the victims’ properties but also their bodies.

This interpretation builds primarily on the first principal axis.Footnote 1 This axis corresponds to the horizontal direction of the graph. (Most computer programs for MDS automatically rotate their graphs so that the coordinate axes of MDS plots correspond to principal axes.) The second principal axis is difficult to interpret in this example. On this axis, Larceny and Robbery are farthest apart. Hence, these two crimes might lead us to a meaningful interpretation of the second dimension. Yet, no compelling interpretation seems to offer itself for this dimension: It may simply

Fig. 1.6
figure 6

Relation of data in Table 1.1 and distances in Fig. 1.4

Fig. 1.7
figure 7

MDS solution with two interpretations: neighborhoods and dimensions

Fig. 1.8
figure 8

An 1-dimensional MDS solution for the crime data

Fig. 1.9
figure 9

Relation of data in Table 1.1 and distances in Fig. 1.9

represent a portion of the “error” component of the data. So, one can ask whether it may suffice to represent the given data in a 1-dimensional MDS space. This is easy to answer: One simply sets “Dimension\(=\)1” in the GUI in Fig. 1.5 and then repeats the MDS analysis, leaving all other specifications as before, to get the desired solution.

Figure 1.8 shows the 1-dimensional solution. It closely reproduces the first principal axis of Fig. 1.4. However, its distances correlate with only \(r = 0 .88\) with the data, i.e. this MDS solution does not represent the data that well. This is also evident from the regression graph in Fig. 1.9, which has a much larger scatter than the graph for the 2-dimensional MDS solution in Fig. 1.6. One should therefore be cautious when interpreting this configuration, because it is partly misleading. For example, Larceny and Auto Theft correlate much lower (\(r = 0. 55\)) than Larceny and Burglary (\(r = 0. 80\)), but the configuration in Fig. 1.8 does not represent this difference correctly. Rather, the respective two distances are about equal in size.

1.1 Summary

Multidimensional scaling (MDS) represents proximity data (i.e., measures of similarity, closeness, relatedness etc.) as distances among points in a multidimensional (typically: 2-dimensional) space. The scaling begins with some starting configuration. Its points are then moved iteratively so that the fit between distances and data is improved until no further improvement seems possible. Computer programs (such as Systat or Proxscal) exist for that purpose. The more precisely the data correspond to the distances in the MDS space, the better the MDS point configuration represents the structure of the proximities. If the fit of the MDS solution is good, it can be inspected visually in an attempt to interpret it in terms of content. A popular approach for doing this is to look for dimensions, mostly principal axes, that make sense in terms of what is known or assumed about the objects represented by the points.