Keywords

Modern MDS is mainly used for general data analysis, especially for visualizing data. This was not always so. Historically, MDS served a different purpose: It was a psychological model of how persons form judgments about the similarity of objects. In many modern MDS applications, traces of this original model can still be found (e.g., in the way MDS solutions are interpreted or in the terminology used in MDS), even if the scaling method is used as a mere statistical tool. In the following, we begin by discussing a recent application that uses MDS as a visualization tool. Then, we consider typical examples of the early days of MDS.

2.1 MDS for Visualizing Proximity Data

Over the recent years, MDS has been predominantly used as a tool for analyzing proximity data of all kinds (e.g., correlations, similarity ratings, co-occurrence data). Most of all, MDS serves to visualize such data, making them accessible to the eye of the researcher. Let us consider a typical visualization application of MDS. Figure 2.1 shows a case from industrial psychology. Its 27 points represent 25 items and two indexes from an employee survey in an international IT company (Liu et al. 2004). Two examples for the items are: “All in all, I am satisfied with my pay”, and “I like my work”, both employing a Likert-type response scale ranging from “fully agree” to “fully disagree”. The two indexes are scale values that summarize the employees’ responses to a number of items that focus on their affective commitment to the company and on their general job satisfaction, respectively. The distance between two points in Fig. 2.1 represents (quite precisely) the correlation of the respective variables. As all variables are non-negatively intercorrelated, it is particularly easy to interpret this MDS configuration: The closer two points, the higher the correlation of the variables they represent. Hence, one notes, for example, that since “satisfied with pay” and “satisfied with benefits” are close neighbors in the MDS plane (see lower left-hand corner of the plot), employees rated these issues similarly: Those who were satisfied with one job aspect where also satisfied with the other, and vice versa. In contrast, being satisfied with pay is far from “encouraged to voice new ideas” (see top of the plot), and, hence, these two items are essentially uncorrelated.

Fig. 2.1
figure 1

MDS representation of the intercorrelations of 25 items and 2 indexes of an employee survey in an international IT company. The grayed area around organizational commitment contains likely drivers of commitment

The value of this MDS configuration is based on the notion that a picture is worth more than a 1,000 words or numbers. Indeed, most researchers and practitioners find it much easier to study such a plot than studying a \(27 \times 27\) correlation matrix with its 351 coefficients. It is almost impossible to understand the structure of the data in such large arrays of numbers, while their graphical display in an MDS plane can be explored with considerably less effort.

The fact that 351 correlations can be represented by the distances of 27 points that lie in a merely 2-dimensional space makes clear, moreover, that the data are highly structured. Random data would require much higher-dimensional spaces. Hence, the persons who answered this employee survey must have generated their answers from a consistent system of attitudes and opinions, and not by generating evasive random ratings, because such ratings would not be so orderly interlocked.

The ratings also make sense psychologically, because items of similar content are grouped in close neighborhoods in the MDS space. For example, the various items related to management (e.g., trust management, trust management board, support strategy) form such a neighborhood of items that received similar ratings in the survey.

One also notes that the one point that represents general job satisfaction lies somewhere in the central region of the point configuration. This central position reflects the fact that general job satisfaction is positively correlated with each of the 25 items of this survey. Items located more at the border of the MDS plot are substantially and positively correlated with the items in their neighborhood, but not with items opposite of them in the configuration. With them, they are essentially uncorrelated.

The plot leads to many more insights. One notes, for example, that the employees tend to be the more satisfied with their job overall the more they like their work and the more they are satisfied with their opportunities for advancement. Satisfaction with working conditions, in contrast, is a relatively poor predictor of general job satisfaction in this company.

Because the company suffered from high turnover of its employees, the variable ‘orginizational commitment’ was of particular interest in this survey. Management wanted to know what could be done to reduce turnover. The MDS configuration can be explored for answers to this question. One begins by studying the neighborhood of the point representing ‘orginizational commitment’ (see dark cloud around the commitment point in Fig. 2.1), looking for items that offer themselves for action. That is, one attempts to find points close to commitment that have low scores and where actions that would improve these scores appear possible. Expressed in terms of the MDS configuration, this can be understood as grabbing such a point and then pulling it upwards so that the whole plane is lifted like a rubber sheet, first of all in the neighborhood of commitment. Managers understand this notion and, if guided properly, they are able to identify and discuss likely “drivers” of the variable of interest efficiently and effectively. In the given configuration, one notes, for example, that the employees’ commitment is strongly correlated with how they feel about their opportunities for advancement (42 % are satisfied with them, see Borg 2008, p. 311f.); with how much they like the work they do (69 % like it); with how satisfied they are with the company overall (88 % satisfied); and, most of all, with how positive they feel about “performance pays” (only 36 % positive). Thus, if one interprets this network of correlations causally, with the variables in the neighborhood of commitment as potential “drivers” of commitment, it appears that the employees’ commitment can be enhanced most by improving the employees’ opinions about the performance-dependency of their pay and about their advancement opportunities. Improving other variables such as, for example, the employees’ attitudes towards management, is not likely to impact organizational commitment that much.

In this example, MDS serves to visualize the intercorrelations of the items. This makes it possible for the user to see, explore, and discuss the whole structure of the data. This can be useful even if the number of items is relatively large, because each additional item adds just one new point to an MDS plot, while it adds as many new coefficients to a correlation matrix as there are variables.

2.2 MDS for Uncovering Latent Dimensions of Judgment

One of the most fundamental issues of psychology is how subjective impressions of similarity come about. Why does Julia look like Mike’s daughter? How come that a Porsche appears to be more similar to a Ferrari than to a Cadillac? To explain such judgments or perceptions, distance models offer themselves as natural candidates. In such models, the various objects are first conceived as points in a psychological space that is spanned by the subjective attributes of the objects. The distances among the points then serve to generate overall impressions of greater or smaller similarity. Yet, the problem with such models is that one hardly ever knows what attributes a person assigns to the objects under consideration. This is where MDS comes in: With its help, one attempts to infer these attributes from given global similarity judgments.

Let us consider an example that is typical for early MDS applications. Wish (1971) wanted to know the attributes that people use when judging the similarity of different countries. He conducted an experiment where 18 students were asked to rate each pair of 12 different countries on their overall similarity. For these ratings, an answer scale from “extremely dissimilar” (coded as ‘1’) to “extremely similar” (coded as ‘9’) was offered to the respondents. No explanation was given on what was meant by “similar”: “There were no instructions concerning the characteristics on which these similarity judgments were to be made; this was information to discover rather than to impose” (Kruskal and Wish 1978, p. 30). The observed similarity ratings, averaged over the 18 respondents, is exhibited in Table 2.1.

Table 2.1 Mean similarity ratings for 12 countries (Wish 1971)

An MDS analysis of these data with one of the major MDS programs, using the usual default parameters,Footnote 1 delivers the solution shown in Fig. 2.2. Older MDS programs generate only the Cartesian coordinates of the points (as shown in Table 2.2 in columns “Dim. 1” and “Dim. 2”, respectively, together called coordinate matrix, denoted as \(\mathbf X \) in this book). Modern programs also yield graphical output as in Fig. 2.2. The plot shows, for example, that the countries Jugoslavia and USSR are represented by points that are close together. In Table 2.1 we find that the similarity rating for these two countries is relatively high (\(=\)6.67, the largest value). So, this relation is properly represented in the MDS plane. We note further that the points representing Brazil and China are far from each other, and that their similarity rating is small (\(=\)2.39). Thus, this relation is also properly represented in the MDS solution. Checking more of these correspondences suggests that the MDS solution is a proper representation of the similarity data.

Fig. 2.2
figure 2

MDS representation of similarity ratings in Table 2.1

If we want to assume that the similarity ratings were indeed generated by a distance model, and if we are willing to accept that the given MDS plane exhibits the essential structure of the similarity data, we can proceed to interpret this psychological map. That is, we now ask what psychologically meaningful “dimensions” span this space. Formally, the map is spanned by what the computer program delivers in terms of “Dimension 1” and “Dimension 2”. These dimensions are the principal axes of the point configuration. However, one can also rotate these dimensions in any way one wants (holding the configuration of points fixed), because any other system of two coordinate axes also spans the plane. Hence, one has to look for a coordinate system that is most plausible in psychological terms. Wish (1971) suggests that rotating the coordinate system in Fig. 2.2 by 45\(^ \circ \) leads to dimensions that correspond most to psychologically meaningful scales. On the diagonal from the North–West to the South–East corner of Fig. 2.2, countries like Congo, Brazil, and India are on one end, while countries like Japan, USA, and USSR are on the other end. On the basis of what he knows about these countries, and assuming that the respondents use a similar knowledge base, Wish interprets this opposition as “underdeveloped versus developed”. The second dimension, the North–East to South–West diagonal, is interpreted as “pro-Western versus pro-Communist”.

Table 2.2 Coordinates \(\mathbf X \) of points Fig. 2.2; Economic development and number of inhabitants show further measurements on these countries in 1971

These interpretations are meant as hypotheses about the attributes that the respondents (not the researcher!) use when they generate their similarity judgments. That is, the respondents are assumed to look at each pair of countries, compute their differences in terms of Underdeveloped/Developed and Pro-Western/Pro-Communist, respectively, and then derive an overall distance from these two intra-dimensional distances. Whether this explanation is indeed valid cannot be checked any further with the given data. MDS only suggests that this is a model that is compatible with the observations.

2.3 Distance Formulas as Models of Judgment

The above study on the subjective similarity of countries does not explain in detail how an overall similarity judgment is generated based on the information in the psychological space. A natural model that explicates how this can be done is a distance formula based on the coordinates of the points. We will discuss this in the context of an example.

Distances (also called metrics) are functions that assign a real value to two arguments of elements from one set. They map all pairs of objects \((i,j)\) of a set of objects (here often “points”) onto real values. Distance functions—in the following denoted as \(d_{ij}\)—have the following properties:

  1. 1.

    \(d_{ii}=d_{jj}= 0 \le d_{ij}\) (Distances have nonnegative values; only the self-distance is equal to zero.)

  2. 2.

    \(d_{ij}=d_{ji}\) (Symmetry: The distance from \(i\) to \(j\) is the same as the distance from \(j\) to \(i\).)

  3. 3.

    \(d_{ik} \le d_{ij} + d_{jk}\) (Triangle inequality: The distance from \(i\) to \(k\) via \(j\) is at least as large as the direct “path” from \(i\) to \(k\).)

One can check if given values for pairs of objects (such as the data in Table 2.1) satisfy these properties. If they do, they are distances; if they do not, they are not distances (even though they may be “approximate” distances).

A set \(M\) of objects together with a distance function \(d\) is called metric space. A special case of a metric space is the Euclidean space. Its distance function does not only satisfy the above distance axioms, but it can also be interpreted as the distance of the points \(i\) and \(j\) of a multi-dimensional Cartesian space. That means that Euclidean distances can be computed from the points’ Cartesian coordinates as

$$\begin{aligned} d_{ij}(\mathbf X )&= \sqrt{ ( x_{i1} - x_{j1} )^2 + \cdots + ( x_{im} - x_{jm} )^2 }, \end{aligned}$$
(2.1)
$$\begin{aligned}&= \left({\sum _{a=1}^m (x_{ia} - x_{ja})^2 }\right)^{1/2}, \end{aligned}$$
(2.2)

where \(\mathbf X \) denotes a configuration of \(n\) points in \(m\)-dimensional space, and \(x_{ia}\) is the value (“coordinate”) of point \(i\) on the coordinate axis \(a\). This formula can be easily generalized to a family of distance functions, the Minkowski distances:

$$\begin{aligned} d_{ij}(\mathbf X ) = \left( {\sum _{a=1}^m | x_{ia} - x_{ja}|^p}\right)^{1/p}, \ p \ge 1. \end{aligned}$$
(2.3)

Setting \(p=2\), formula 2.3 becomes the Euclidean distance. For \(p=1\), one gets the city-block distance. When \(p \rightarrow \infty \), the formula yields the dominance metric.

As a model for judgments of (dis-)similarity, the city-block distance (\(p=1\)) seems to be the most plausible “composition rule”, at least in case of “analyzable” stimuli with “obvious and compelling” (Torgerson 1958, p. 254) dimensions. It claims that a person’s judgment is formed by first assessing the distance of the respective two objects on each of the \(m\) dimensions of the psychological space, and then adding these intra-dimensional distances to arrive at an overall judgment of dissimilarity.

If one interprets formula (2.3) literally, then it suggests for \(p=2\) that the person first squares each intra-dimensional distance, then sums the resulting values, and finally takes the square root. This appears hardly plausible. However, one can also interpret the formula somewhat differently. That is, the parameter \(p\) of the distance formula can be seen as a weight function: For values of \(p>1\), relatively large intra-dimensional distances have an over-proportional influence on the global judgment, and when \(p \rightarrow \infty \), only the largest intra-dimensional distance matters. Indeed, for \(p\)-values as small as 10, the global distance is almost equal to the largest intra-dimensional distance.Footnote 2 Thus, one could hypothesize that when it becomes more difficult to make a judgment (e.g., because of time pressure), persons tend to pay attention to the largest intra-dimensional distances only, ignoring dimensions where the objects do not differ much. This corresponds, formally, to choosing a large \(p\) value. In the limit, only the largest intra-dimensional distance matters.

Another line of argumentation is that city-block composition rules make sense only for analyzable stimuli with their obvious and compelling dimensions (such as geometric figures like rectangles, for example), whereas for “integral” stimuli (such as color patches, for example), the Euclidean distance that expresses the length of the direct path through the psychological space is more adequate (Garner 1974).

Fig. 2.3
figure 3

Three circles with the same radius in the city-block plane, the Euclidean plane, and the dominance plane, respectively

Choosing parameters other than \(p=2\) has surprising consequences, though: It generates geometries that differ substantially from those we are familiar with. What we know, and what is called the natural geometry, is Euclidean geometry. It is natural because distances and structures in Euclidean geometry are as they “should” be . A circle, for example, is “round”. If \(p \ne 1\), circles do not seem to be round. In the city-block plane (with simple orthogonal coordinate axesFootnote 3), for example, a circle looks like a square that sits on one of its corners (see left panel of Fig. 2.3). Yet, this geometrical figure is indeed a circle, because it is the set of all points that have the same distance from their midpoint \(M\). The reason for its peculiar-looking shape is that the distances of any two points in the city-block plane correspond to the length of a path between these points that can run only in North–South or West–East directions, but never along diagonals—just like walking from A to B in Manhattan, where the distance may be “two blocks West and three blocks North”. Hence the name city-block distance. For points that lie on a line parallel to one of the coordinate axes, all Minkowski distances are equal (see points \(M\) and \(i\) in Fig. 2.3); otherwise, they are not equal. If you walk from \(M\) to \(j\) (or to \(j^{\prime }\) or \(j^{\prime \prime }\), respectively) on a Euclidean path (“as the crow flies”), the distance is shorter than choosing the city-block path which runs around the corner. The shortest path corresponds to the dominance distance: The largest intra-dimensional difference will get you from \(M\) to the other points. This is important for the MDS user because it shows that rotating the coordinate axes generally changes all Minkowski distances, except Euclidean distances.

To see how the distance formula can serve as a model of judgment, consider an experiment by Borg and Leutner (1983). They constructed rectangles on the basis of the grid design in Fig. 2.4. Each point in this grid defines one rectangle. Rectangle 6, for example, had a width of 4.25 cm and a height of 1.25 cm; rectangle 4 was 3.00 cm wide and 2.75 cm tall. A total of 21 persons rated (twice) the similarity of each pair of these 16 rectangles (see example in Fig. 2.4, lower panel) on a 10-point answer scale ranging from “0\(=\)equal, identical” to “9\(=\)very different”. The means of these ratings over persons and replications are shown in Table 2.3.

Fig. 2.4
figure 4

Design configuration for 16 rectangles with different widths and heights; lower panel shows two rectangles in a pair comparison

Table 2.3 Dissimilarity ratings for rectangles of Fig. 2.4; ratings are means over 16 subjects and 2 replications
Fig. 2.5
figure 5

MDS configuration with city-block distances for data of Table 2.3 (points) and design configuration of Fig. 2.4 (squares) fitted to MDS configuration

The MDS representation (using city-block distances) of these ratings is the grid of solid points in Fig. 2.5. From what we discussed above, we know that this configuration must not be rotated relative to the given coordinate axes, because rotations would change its (city-block) distances and, since the MDS representation in Fig. 2.5 is the best-possible data representation, it would deteriorate the correspondence of MDS distances and data.

If one allows for some re-scaling of the width and height coordinates of the rectangles, one can fit the design configuration quite well to the MDS configuration (see grid of dashed lines in Fig. 2.5). The optimal re-scaling makes psychological sense: It exhibits a logarithmic shrinkage of the grid lines from left to right and from bottom to top, as expected by psychophysical theory.

The deviations of the re-scaled design configuration and the MDS configuration do not appear to be systematic. Hence, one may conclude that the subjects have indeed generated their similarity ratings by a composition rule that corresponds to the city-block distance formula (including a logarithmic re-scaling of intra-dimensional distances according to the Weber–Fechner law). The MDS solution also shows that differences in the rectangles’ heights are psychologically more important for similarity judgments than differences in the rectangles’ widths.

2.4 MDS for Testing Structural Hypotheses

A frequent application of MDS is using it to test structural hypotheses. In the following, we discuss a typical case from intelligence diagnostics (Guttman and Levy 1991). Here, persons are asked to solve several test items. The items can be classified on the basis of their content into different categories of two design factors, called facets in this context. Some test items require the testee to solve computational problems with numbers and numerical operations. Other items ask for geometrical solutions where figures have to be rotated in 3-dimensional space or pictures have to be completed. Other test items require applying learned rules, while still others have to be solved by finding such rules. One can always code test items in terms of such facets, but the facets are truly interesting only if they exert some control over the observations, i.e. if the distinctions they make are mirrored somehow in corresponding effects on the data side. The data in our small example are the intercorrelations of eight intelligence test items shown in Table 2.4. The items are coded in terms of the facets “Format = {N(umerical), G(eometrical)}” and “Requirement = {A(pply), I(nfer)}”.

Table 2.4 Intercorrelations of eight intelligence test items, together with codings on two facets
Fig. 2.6
figure 6

MDS solution for correlations in Table 2.4

Fig. 2.7
figure 7

MDS configuration partitioned by two facets

Fig. 2.8
figure 8

Schematic radex of intelligence test items

Fig. 2.9
figure 9

Cylindrex of intelligence test items

A 2-dimensional MDS representation of the data in Table 2.4 is shown in Fig. 2.6. We now ask if the facets Format and Requirement surface in some way in this plane. For the facet Format we find that the plane can indeed be partitioned by a straight line such that all points labeled as “G” are on one side, and all “N” points on the other (Fig. 2.7). Similarly, using the codings for the facet Requirement, the plane can be partitioned into two subregions, an A- and an I-region. For the Requirement facet, we have drawn the partitioning line in a curved way, anticipating test items of a third kind on this facet: Guttman and Levy (1991) extent the facet Requirement by adding the element “Learning”. They also extent the facet Format by adding “Verbal”.

For the intercorrelations of items in this \(3 \times 3 \) design, that is, for items coded in terms of two 3-element facets, MDS leads to structures with a partitioning system as shown in Fig. 2.8. This pattern, termed radex, is often found for items that combine a qualitative facet (such as Format) and an ordered facet (such as Requirement). For the universe of typical intelligence test items, Guttman and Levy (1991) suggest yet another facet, called Communication. It distinguishes among Oral, Manual, or Paper-and-Pencil items. If there are test items of all \(3 \times 3 \times 3\) types, MDS leads to a 3-dimensional cylindrex structure as shown in Fig. 2.9. Such a cylindrex shows, for example, that the items of the type Infer have relatively high intercorrelations (given a certain mode of Communication), irrespective of their Format. It is interesting to see that Apply is “in between” Infer and Learn. We also note that our small sample of test items of Table 2.4 fits perfectly into the larger structure of the universe of intelligence test items.

2.5 Summary

Originally, MDS was a psychological model for how persons arrive at judgments of similarity. The model claims that the objects of interest can be understood as points in a space spanned by the objects’ subjective attributes, and that similarity judgments are generated by computing the distance of two points from their coordinates, i.e. by summing the intra-dimensional differences of any two objects over the dimensions of the space. Different variants of Minkowski distances imply that the intra-dimensional differences are weighted by their magnitude in the summing process. Today, MDS is used primarily for visualizing proximity data so that their structure becomes accessible to the researcher’s eye for exploration or for testing certain hypotheses. Structural hypotheses are often based on content-based classifications of the variables of interest in one or more ways. Such classifications should then surface in the MDS space in corresponding (ordered or unordered) regions. Certain types of regionalities (e.g., radexes) are often found in empirical research.