Keywords

14.1 Introduction

When Francis Galton, an early exponent of experimental psychology, decided to conduct research into mental imagery, he did so using a questionnaire (Galton 1880, 1907). Questionnaires have remained a staple of imagery research, despite the rise of various neuroimaging methods and their increasing ability to lay bare imagery’s essentially private nature (see Chap. 15). Galton’s so-called breakfast-table questionnaire (reproduced in Table 1 of Burbridge 1994) anticipates a surprising number of the topics that would form the backbone of imagery research when it was resumed almost a century later. Some questions clearly concern the vividness of visual imagery, for example, “Are the colors of the china […] or whatever may have been on the table quite distinct and natural?” Other items reflect imagery processes, for example, maintenance (“Can you deliberately seat the image of a well-known person in a chair and retain it, and see it with enough distinctness to enable you to sketch it leisurely?”) or transformation (“Can you judge with precision of the effect that would be produced upon the appearance of a room by changing the position of the furniture in it?”). Some items even concern what modern imagery researchers would recognize as the capacity of a visual buffer (Kosslyn 1980, 1994), for instance, “Extent of field of view—does it correspond in breadth and height to the real field of view?” and “How much of a [printed] page can you mentally see and retain steadily in view?” Other items tackled auditory imagery of voices and music, olfactory and gustatory imagery, and the vividness of imagined smells or tastes relative to visual images of the related object. Thus, the Galton questionnaire reflects many of the themes of later research: the nature of the conscious experience of imagery, the non-unitary nature of imagery and its different components and processes, and the extension of mental imagery beyond vision to other senses and to interactions between imagery in different senses.

Questionnaires are generally used to assess individual differences. Such differences may be theoretically important in that they may reveal separate cognitive systems or strategies that would be masked if all data was pooled (Cornoldi and Vecchi 2003; see also Sect. 14.4 for an example in which reanalyzing data in the light of individual differences led to new information about multisensory representations). Questionnaires may also be important for screening from an applied perspective. For example, the benefit of imagery-based therapies and rehabilitation strategies may be related to individual differences in imagery preferences and abilities. Despite the rise of neuroimaging research into imagery, questionnaires look set to survive as a useful, complementary technique.

In this chapter, we review the usefulness of imagery vividness which has historically been used as an important index of imagery ability. We then go on to discuss how a more process-oriented approach, grounded in a theoretical model, has been fruitful in visual imagery and how this might be translated to other modalities. We also describe how object and spatial dimensions of visual imagery (Kozhevnikov et al. 2002; Kozhevnikov et al. 2005) have been extended into the haptic and multisensory domains, using a theory-driven questionnaire. Finally, we review imagery questionnaires that assess imagery in a number of different modalities.

14.2 Vividness of Imagery

As we shall see in the following section, vividness is not necessarily the most important or informative aspect of imagery. However, it is the most intuitively open to introspection, and many early questionnaires operated on the assumption that ­vividness reflected imagery ability (e.g., Betts 1909). This assumption has persisted so that even today the most commonly used imagery questionnaires require people to generate a mental image and to rate its vividness. These include questionnaires assessing a single imagery modality: visual (Marks 1973: Vividness of Visual Imagery Questionnaire [VVIQ]; Marks 1995: Vividness of Visual Imagery Questionnaire-Revised [VVIQ-R or VVIQ-2]), olfactory (Gilbert et al. 1998: Vividness of Olfactory Imagery [VOIQ]), auditory (Willander and Baraldi 2010: Clarity of Auditory Imagery Scale [CAIS]), and those measuring imagery of movement (Roberts et al. 2008: Revised Vividness of Movement Imagery Questionnaire [VMIQ-2]; Campos et al. 1998: Vividness of Haptic Movement Imagery Questionnaire [VHMIQ]). We are not aware of any modality-specific questionnaires that assess vividness for tactile imagery of passive touch that is unrelated to movement, nor for haptic imagery of active touch, or for gustatory imagery. However, these senses are addressed in some multimodal questionnaires in which individuals rate their imagery in a number of senses (e.g., Betts 1909; Sheehan 1967; Switras 1978; Schifferstein 2009: see Sect. 14.5). Note that the VHMIQ (Campos et al. 1998) rates imagery of skin and muscle sensations associated with movement in terms of exertion, temperature, pressure, etc., for example the feel of water moving over the body whilst swimming. It therefore assesses certain aspects of tactile imagery, as it refers to imagined experiences of passive touch. However, it does not require imagery ratings for purposive (Gibson 1966) or active exploratory touch for the purpose of obtaining information about objects (Lederman and Klatzky 1987) which is how haptics is usually defined.

Vividness is an aspect of our conscious experience of imagery (Dean and Morris 2003) which is not necessarily related to the specific content of the image and might therefore be regarded as a surface property of an image. Most questionnaires equate vividness to the similarity between imagery and perception, i.e., to the realism of the image: visual vividness is “a combination of clarity and liveliness […] the more vivid an image […] the closer it approximates an actual percept” (Marks 1972, p. 83). This definition is evoked in the vividness rating scales used in other modalities: “perfectly realistic and as vivid as the actual odor” (Gilbert et al. 1998: VOIQ), “as intense as executing the action” (Malouin et al. 2007: Kinesthetic and Visual Imagery Questionnaire [KVIQ]), and “perfectly clear and as vivid as the real situation” (Campos et al. 1998: VHMIQ). For auditory imagery, however, Willander and Baraldi (2010) distinguish vividness (clarity and liveliness) from clarity (brightness and sharpness). Their CAIS rating scale for vividness asks “subjectively, how clearly do you hear the sounds” with no reference to realism and no comparison to auditory perception. Nevertheless, CAIS auditory clarity ratings were correlated with VVIQ-2 visual vividness ratings (Campos and Pérez-Fabello 2011), suggesting that they might measure the same construct, i.e., the realism of the image.

Questionnaires rely on self-report and may be subject to confounding factors such as socially desirable responding (see Allbutt et al. 2011, regarding the VVIQ). However, Cui et al. (2007) suggested that vividness could be objectively measured. They employed a task in which participants were required to name color words presented against a background that was either congruent or incongruent in color with the words. Performance on this task correlated with VVIQ scores: highly vivid imagers were less affected by incongruently colored backgrounds. This reverse Stroop effect could be explained if color naming induced color imagery for the vivid imagers, resulting in the color words being easier to see against incongruently colored backgrounds (Cui et al. 2007). Crucially, VVIQ scores strongly predicted both performance on the color naming task and activity in early visual cortex during functional magnetic resonance imaging (fMRI) of a visual imagery task (Cui et al. 2007). Thus, “private” imagery vividness could be read off from the blood-oxygen-level-dependent signal detected in fMRI.

Relationships between vividness ratings and cortical activity have also been found for other modalities. Modality-specific ratings for visual, tactile, gustatory, kinesthetic, and somatic imagery (as measured by the relevant subscales of the Questionnaire Upon Mental Imagery (QMI) Betts 1909) were correlated with activity in modality-specific cortical regions during visual, tactile, gustatory, kinesthetic, and somatic imagery, respectively (Palmiero et al. 2009; Olivetti Belardinelli et al. 2009). In these studies, vividness ratings related to imagery in the same modality: an interesting question is whether vividness ratings in one modality can predict performance or neural activity in another. Kilgour and Lederman (2002) found no correlation between VVIQ scores and performance on a haptic face recognition task. However, Zhang et al. (2004) asked participants to rate the vividness of their visual imagery specifically during the haptic shape perception task in their study (VI_HS) in addition to completing the VVIQ. While VVIQ and VI_HS ratings were not significantly correlated with haptic shape-selective activity in the lateral occipital complex (LOC) when treated separately, when taken together in a multiple regression they strongly predicted haptic shape-selective activity in the right LOC (Zhang et al. 2004). This suggests that vividness ratings might have cross-modal predictive value.

14.3 Beyond Vividness to the Processes Involved in Imagery

Some early questionnaires equated vividness with imagery ability (Betts 1909), and many studies still divide participants into “good” and “poor” imagery groups on the basis that more vivid imagery is better imagery (Dean and Morris 2003; McAvinue and Robertson 2007). However, this is problematic because vividness, however defined, is only weakly connected to theoretical models of how imagery might work (Dean and Morris 2003). In addition, using a single measure of vividness implicitly treats imagery as an undifferentiated ability at which individuals are either good or bad. This ignores the fact that conscious experience of imagery is the product of a collection of subprocesses—image generation, maintenance, inspection, and transformation (Kosslyn 1980, 1994)—and that there are object and spatial subtypes of imagery (see Sect. 14.4): individuals might vary widely in their ability on these subprocesses and subtypes. The ability to generate vivid imagery is thus only one ability among many. We should not, though, conclude that vividness is unimportant; for example, vividness of motor imagery is a key factor in improving performance (Roberts et al. 2008).

VVIQ scores correlate with a variety of tasks (see Marks 1989) but it is unclear why correlations are, or are not, found, and few studies have found correlations between self-reported vividness and performance on those tasks (usually spatial) that are assumed to rely heavily on imagery (Dean and Morris 2003; McAvinue and Robertson 2007). This may be because (1) vividness is functionally unrelated to performance on these tasks; (2) vividness questionnaires typically require rating of relatively familiar items retrieved or constructed from long-term memory whereas the spatial tests thought to correlate with imagery ability involve constructing novel shapes and holding them in short-term memory; and/or (3) vividness questionnaires do not measure theoretically meaningful imagery processes. With regard to (3), it is worth noting that some visual imagery questionnaires that are still in use predate the detailed exposition of the most advanced model of visual imagery (Kosslyn 1980, 1994; Kosslyn et al. 2006), for example, the QMI (Betts 1909; Sheehan 1967) and the Gordon Test of Visual Imagery Control (Gordon 1949). Such questionnaires may not accurately reflect the current state of knowledge.

Dean and Morris (2003) devised a new questionnaire that required participants to generate an image of either a 2D or 3D novel shape and to imagine it as either static or rotating. Participants then rated their imagery on items that were explicitly cast in terms of Kosslyn’s model of visual imagery. For example, they rated how easy it was to generate, maintain, and rotate the image; how detailed and clear it was; and whether this changed during maintenance and rotation. These process-related ratings predicted performance on spatial tests that are assumed to rely on imagery whereas VVIQ scores did not. In addition, the process-related ratings and VVIQ scores were uncorrelated (Dean and Morris 2003).

One reason for these findings may be that the VVIQ and the Dean and Morris (2003) questionnaire required imagery from different sources. The VVIQ requires imagery of familiar items recalled or constructed from long-term memory. In contrast, the spatial tests used by Dean and Morris, and in the majority of earlier studies that they reviewed, require participants to imagine novel shapes using short-term memory. Thus ratings in Dean and Morris (2003) questionnaire may have predicted spatial task performance because the source of the image (novel items constructed in short-term memory) was matched.Footnote 1 Their questionnaire is important in demonstrating that participants could introspect successfully about structural aspects of their imagery as well as surface properties of the resulting image, such as vividness. However, in order to obtain a complete picture, it will be important to apply Dean and Morris’s (2003) questionnaire to tasks that require imagery of familiar items, such as everyday scenes and objects, in order to examine individual differences in imagery processes for items stored in long-term memory (McAvinue and Robertson 2007).

Thus, if we want to know how imagery relates to performance, we should not rely on measuring vividness alone. Instead we must determine what imagery subprocesses are relevant to the task and be aware that the source of the image may be important. An important avenue for future research will be to take this ­process-oriented approach and to apply it to imagery in other modalities. One barrier to this is that other modalities have less well-developed, or perhaps no, model of imagery. A starting point might be to take what we know of the detailed processes of visual imagery and to see how far these apply to other modalities. This approach raises the problem of whether visual processes can be ported unchanged into other models. For example, whereas generating auditory and visual images might be regarded as similar processes, maintaining an auditory image may be very different to maintaining a visual image given that auditory images (such as imagining a melody) unfold over time anyway.

The Bucknell Auditory Imagery Control Scale (BAIS: unpublished but employed by Zatorre et al. 2010) highlights the difficulties in defining imagery processes across modalities and the importance of definitions for process-oriented questionnaires. The BAIS includes a subscale that rates control of auditory images. This asks participants to rate how easily they can change an auditory image, for example, imagining the sound of a dentist’s drill and then the drill stopping and hearing the voice of the receptionist. Here, the concept of control is not clear in terms of imagery processes. Control does not equate solely to maintenance since the image has to be changed, and nor does it appear to equate to image transformation since the task can be accomplished by simply switching to a different image. Similarly, transformation of a visual image is normally a transformation in space whereas for an auditory image it would be a frequency or temporal transformation (for example, (Zatorre et al. 2010) required the temporal reversal of imagined melodies). It is unclear whether these differences are important. Note, though, that for the melody reversal condition the BAIS scores were correlated with activity in the intraparietal sulcus. This area is known to be involved with spatial transformations (Alivisatos and Petrides 1997) and so may be involved in transformations more generally.

14.4 Object and Spatial Imagery Dimensions in Vision and Touch

A recent example of the more theory-driven use of questionnaires advocated in Sect. 14.3 is the investigation of individual imagery preferences using the Object-Spatial Imagery Questionnaire (OSIQ: Blajenkova et al. 2006). A later version incorporates a verbalizer subscale (the Object-Spatial Imagery and Verbal Questionnaire, OSIVQ: Blazhenkova and Kozhevnikov 2009). A detailed account of visual object and spatial imagery can be found in Chap. 16 (see also Kozhevnikov et al. 2002, 2005). In brief, object imagers tend to create images that are pictorial, vivid, and detailed. Their images integrate the structural property of shape with information about surface properties, such as color, texture, and brightness. By contrast, spatial imagers tend to prefer images that are more schematic and less focused on surface properties. Their images make salient the spatial relations between component parts of objects and support complex spatial transformations.

The representations underlying visual and haptic object recognition share many features (for example, both are orientation dependent and size dependent and are sensitive to changes in surface properties; see Lacey and Sathian 2011, for a review). This begs the question of whether object and spatial imagery dimensions exist for haptics as well as for vision. This was investigated by Lacey et al. (2011). They devised visual and haptic tasks that required discriminating shape across texture changes and texture across shape changes. The same stimuli were used in both tasks for both modalities. In both visual and haptic tasks, when object imagers focused on surface properties they could discriminate texture equally effectively whether shape changed or not. However, when they focused on the structural property of shape, their performance was disrupted if there was a change in texture, indicating that shape and texture tended to be integrated in object imagery. The reverse was true for spatial imagers who, again regardless of modality, could discriminate shape across texture changes but not texture across shape changes, indicating that spatial imagers tended to abstract away from surface properties. Thus object and spatial imagery dimensions appear to exist in both vision and touch, when each modality is tested alone.

In a second experiment, Lacey et al. (2011) reanalyzed data from an earlier study (Lacey et al. 2010) involving cross-modal discrimination of shape across changes in both texture and orientation. Visuo-haptic cross-modal object recognition is thought to be subserved by an orientation-independent multisensory representation (Lacey et al. 2007; Lacey et al. 2009; but see Newell et al. 2001; Lawson 2009). Inspection of the data revealed two levels of performance: one showed above-chance recognition independent of changes in both orientation and texture; in the other, a change in texture reduced performance to chance whether or not there was also a change in orientation. OSIQ scores obtained from participants recalled from the earlier study showed that these two patterns of performance corresponded to the use of object and spatial imagery (Lacey et al. 2011). Object imagers were impaired by a change in texture but not orientation while spatial imagers were unimpaired by either. Furthermore, imagery preference scores based on the OSIQ correlated with cross-modal performance: a preference for object imagery was associated with worse shape discrimination regardless of orientation (Lacey et al. 2011) and was uncorrelated with within-modal object recognition (Lacey et al. 2007). By contrast, OSIQ-spatial scores were correlated with cross-modal object recognition, whether orientation changed or not, but not with within-modal recognition (although the latter approached significance when orientation changed) (Lacey et al. 2007). Taken together, these studies suggested that construction of multisensory ­orientation-independent representations was linked to the ability to perform spatial transformations. These might include transformations involved in changes of orientation but also translation between differing frames of reference in vision and touch.

Thus, these studies suggest that object and spatial imagery dimensions extend into the haptic and multisensory domains. However, it is important to note that in Lacey et al. (2011) participants were classified not only by reference to the OSIQ but also by self-report in response to a brief explanation of the difference between object and spatial imagery. These two classifications did not always agree, and, in fact, only self-report predicted performance in both within-modal experiments. There may be several reasons for this. Firstly, the tasks focused on whether people integrated surface properties into their images, but only six of the thirty OSIQ items mention surface properties and only three refer to their being included in an image. Most items relate to the difference between pictorial and schematic images and are thus more about format than content. To address this it will be important to develop a questionnaire with subscales probing different aspects of object and spatial imagery. Secondly, the self-report measure explicitly explained the difference between object and spatial imagery while the OSIQ does not (and like many similar questionnaires was not intended to do so). In addition, of course, the OSIQ is a visual questionnaire being applied to a haptic task on the assumption that the object-spatial dimensions are stable across modalities, i.e., that a visual object imager is also a haptic object imager. This assumption remains to be tested. A benefit of creating haptic and multisensory versions of the OSIQ would be that these could be used to compare early-blind, late-blind, and sighted participants on their object and spatial imaging preferences.

14.5 Multisensory Imagery

To our knowledge, there are no multisensory or cross-modal imagery questionnaires that assess people’s ability to create images in more than one modality at the same time (for example, simultaneously imagining the sight and sound of an orchestra or the sight and smell of food) or the ability to create an image in one modality from perceptual input in a different modality (for example, creating a visual image of an object from haptic input), although the VVIQ rating scale has been applied to visual imagery of haptically perceived objects (Zhang et al. 2004; see Sect. 14.2).

There are, however, several questionnaires that address unisensory imagery in more than one modality. The earliest, after Galton, is the QMI (Betts 1909) which assesses vividness of imagery in seven modalities: visual, auditory, cutaneous (generally tactile, passive touch), olfactory, gustatory, kinesthetic, and “miscellaneous and organic” (i.e., bodily sensations such as hunger). The short-form version (Sheehan 1967) corrects a potential bias in the full QMI in that it has five items for each modality whereas the full-length version had unequal items and was heavily weighted in favor of visual imagery.

The Survey of Mental Imagery (SMI: Switras 1975, 1978) also assesses seven imagery modalities: visual, auditory, tactile, olfactory, gustatory, kinesthetic, and somesthetic (i.e., bodily sensations). Images are rated for vividness and controllability. The definition of controllability is confusing. It is first described as the “ability to produce precisely the target image” (Switras 1978, p. 379). This might also be related to image generation, particularly as Switras considers controllability and vividness as “sequential steps in which an image must first be produced before one can evaluate its vividness” (ibid., p. 380). However, controllability is also considered to be the “ability to manipulate, modify, and prolong an image” (Switras 1975 p. 33, cited in Grebot 2003), thus also encompassing the separate processes of maintenance and transformation. Factor analysis did not distinguish between vividness and controllability; instead it extracted factors that reflected single modalities, except for kinesthetic-tactile vividness and controllability factors (Switras 1978). In a shortened version of the SMI testing only four modalities (visual, auditory, somesthetic, and kinesthetic), Grebot (2003) identified separate vividness and controllability factors together with a third, image formation, factor. However, this still leaves several imagery processes confounded under the single heading of controllability.

A different approach was taken by Schifferstein (2009) in a questionnaire assessing five imagery modalities: visual, auditory, tactile, olfactory, and gustatory. Previous questionnaires had specified what images people were to produce, but these might contain cross-sensory confounds. For example, imagining the smell of a rose might also bring to mind a visual image of the color and shape. In Schifferstein’s questionnaire, participants are instructed to think of a product or an event that involves a characteristic or conspicuous smell, taste, and so on. It is not clear that this approach completely avoids cross-sensory confounds since, for example, Intons-Peterson (1983) reported that auditory imagery was almost always accompanied by involuntary visual imagery. Nevertheless, Schifferstein’s approach does allow participants to think of a personally salient item rather than a forced choice. In addition to vividness ratings, participants rate how well they can imagine, say, a smell; how difficult it was to imagine it; and how strongly they felt that they could really smell it. It is, though, unlikely that these are independent aspects of imagery. How well one can imagine something is related to how difficult it is to imagine, whilst how close the imagery experience is to reality is what vividness is supposed to measure. Indeed all four ratings were highly correlated, and principal components analysis revealed a single factor such that the average of the four ratings was used as the index of image quality (Schifferstein 2009).

Results from all of the above questionnaires rank imagery vividness or quality being greatest for visual imagery and poorest for olfactory and gustatory imagery. But, as discussed in Sect. 14.3, vividness does not appear to provide a good index of imagery ability or reflect the individual contributions of the underlying imagery processes. In addition, concepts such as controllability (Switras 1978) and image quality (Schifferstein 2009) are not well defined. Thus, it is difficult to draw conclusions from these instruments about individual differences in imagery between different modalities or the relative contribution of different processes within a modality. For example, relative to a visual image an olfactory image may be harder to generate and maintain, but we lack theoretically motivated questionnaires that allow such a claim to be tested or that permit individual differences in imagery across different modalities to be measured.

14.6 Conclusions

In this chapter, we have reviewed the usefulness of imagery vividness (i.e., the ­similarity of imagery to perception) as a source of information about imagery processes and individual differences in imagery ability. We conclude that the strong emphasis on measuring vividness in many imagery questionnaires is not warranted given the relatively weak evidence that vividness predicts imagery performance. We argue that there is a pressing need for theoretically driven questionnaires that address clearly defined component processes of the imagery system in each modality (e.g., Dean and Morris 2003). Visual imagery may be a convenient starting point for this endeavor but we need to be wary of assuming that imagery processes are the same across modalities. At issue is whether it is appropriate to develop process-oriented questionnaires for nonvisual modalities based on our understanding of visual imagery or whether more open-ended theoretical research into imagery within each nonvisual modality is required before appropriate questionnaires can be developed. The reality is probably that, in order to understand the relationship between imagery systems in different modalities, and individual differences therein, both approaches will be required to work in tandem to make progress.