Effects of vision and haptics on categorizing common objects

Humans use multiple sources of sensory information to estimate environmental properties. The eyes, for example, estimate an object’s shape, and the hands supply additional information by means of tactile cues (Hills et al. 2002). As vision and touch (i.e., haptics) are both common ways to interact with objects, some researchers have suggested that haptic and visual percepts overlap at a neural level and share abstract representations of an object (Amedi et al. 2001, 2002; Easton et al. 1997; James et al. 2002; Shimojo and Shams 2001; Zangaladze et al. 1999; Zhou and Fuster 2000). In support of this, a study using functional magnetic resonance imaging (fMRI) revealed that the haptic system activates the visual processing system and may facilitate mental representation of objects or act as part of a cross-modal network encoding information from both vision and touch (James et al. 2002).

At a cognitive level, though, in terms of categorizing objects, it is unclear how haptically based (vs. visually based) categories might be organized, as the area of haptics is less explored. Modality may have a significant effect on the genesis of semantic categories. Thus, to further examine the influence of modality on object recognition and categorization, the present study had participants perform naming (i.e., recognition) and sorting (i.e., categorization) tasks with common hand-held three-dimensional (3D) objects (toy animals).

Vision and haptics in categorizing objects

Vision has historically been considered the primary sensory modality (Rock and Victor 1964). While the visual system plays a major role in object recognition, objects also can be recognized through haptic exploration when judging size and shape (James et al. 2002; Norman et al. 2004). Similarity exists between vision and touch (Ernst and Banks 2002; Millar and Al–Attar 2005), as both vision and haptics are based on the extraction of basic features and their spatial arrangement (i.e., contours), which together define an object. Disparate features of objects (e.g., shape and texture), which are analyzed separately, may converge to generate a coherent percept in object recognition (Newell 2004; Newell et al. 2001, 2002).

It is widely accepted that the early stages of visual object recognition involve the extraction of spatial information in the form of oriented edges (contours), which together produce low-level object features or primitives (Lederman and Klatzky 2004). In the process, the object to be categorized is compared in similarity with the representatives of different object categories stored in memory (Biederman 1987). Haptic object recognition also relies on edge extraction, but through the fingertips, which involves tracing along object edges. In contrast with vision, haptic sequential edge exploration with the fingertips slows down the haptic glance, resulting in poor performance (Lederman et al. 2004).

Similarity has provided the basis for many theories of categorization in cognitive psychology (Goldstone 1994). In theories of categorization, an object is categorized as an “A” and not a “B” if it is more similar to A’s features than to B’s (Medin and Schaffer 1978; Nosofsky 1986). Current research has shown that objects within the same category must share some common attributes; thus, category learning can be accomplished by identifying the attributes shared by exemplars known to be from the same category and by identifying the attributes discriminating some exemplars known to be from different categories (Hammer et al. 2009). Object similarity—based on the sharing of common attributes or features—has been effective in explaining classification of perceptual stimuli in vision. Indeed, these findings suggest that there exists a specialized visual perceptual system that can group particular objects into different object classes based on their shape. An intriguing question then arises: Does the haptic system likewise group objects into different categories based on shape?

Such a question is embedded in a larger area—namely how people learn haptic categories formulated through touch. Limited research has suggested similar visual and haptic development of category learning. Schwarzer et al. (1999) had children and adults categorize haptic exemplars that varied in four attributes (shape, size, texture, and weight), allowing participants to learn the categories analytically (single attribute) or holistically (overall similarity). Participants more often learned the haptic categories analytically rather than holistically; that is, children preferred a surface attribute such as texture and adults preferred a structure-related attribute such as shape. Schwarzer and colleagues concluded that analytic or holistic processing in category learning develops in a similar manner in both vision and haptics.

Research on touch indicates that haptics is better suited for perception of 3D structure, as haptic identification can be based on properties like surface shape, structure, and texture (Klatzky et al. 1993). A recent study reported on haptic and multisensory categorization on the basis of similarity employing concrete categories and novel (unfamiliar) objects. Cooke et al. (2007) conducted a multidimensional scaling analysis of haptic and visual ratings of similarity between pairs of unfamiliar objects and found that the shape and texture influenced both modalities; although vision weighted shape as more important than texture for detecting similarity, haptics weighted shape and texture as equally important. The same perceptual connections contribute to the pattern of ratings from both vision and haptics, supporting the expectation that the modalities share common representation (Cooke et al. 2007). It may be that similarities are more difficult to detect using touch versus vision, as haptics is not our dominant sensory modality. These questions underscore the complex and less studied haptic processing involved in object recognition and categorization.

Overview

To examine the influence of modality on object recognition and categorization, the present study had participants perform naming and sorting tasks with common hand-held, 3D objects (toy animals). The stimuli could be sorted on the basis of three attributes that are commonly used in categorizing animals (see Henley 1969), namely size (big vs. small), domesticity (domestic vs. wild), and predation (meat-eater vs. plant-eater). In the present study, it was hypothesized that participants would commit more overall errors using touch in both naming and sorting tasks relative to vision, as haptic edge extraction has resulted in increased error in other studies (Lederman et al. 2004). As Chan and colleagues (1995, 1997) had previously defined “size” as a concrete category and “predation” as an abstract category, we expected to see fewer errors committed in the more concrete category. It was also hypothesized that we would find behavioral evidence of shared representation of objects, specifically that error would increase in both modalities when moving from more concrete categories (e.g., size) to more abstract (e.g., predation). This may be due in part to conceptual overlap at a neural level (Amedi et al. 2001, 2002; Sathian 2004; Zhou and Fuster 2000; Lacey et al. 2009).

Finally, another area of interest in the literature is the effect of gender. Prior results have revealed female superiority in a number of verbal tasks, including naming and word retrieval of categories (Capitani et al. 1999; Kimura 1999), a male advantage in haptic categorization for shape and texture (Cohen and Levy 1986a), and better overall haptic discrimination of categories by males (Cohen and Levy 1986b). Although we did not set out to test gender, we examined this effect in both tasks.

Method

Participants

Fifteen female and 15 male undergraduate students, aged 18–22, at Arizona State University, participated in this study in exchange for course credit. Although 34 students started the experiment, the data for four students were discarded. Two students became ill, and two students were unable to follow instructions during the sorting tasks.

Stimuli

Familiar, common 3D objects were used in all tasks. Stimuli were hand-held toy replicas of animals. The stimuli were realistically textured and, in general, maintained relative real-life size differences. The manufacturer of the animal stimuli was Discovery Communication, Inc., which included a Wildlife Set and a Farm Set. The stimuli (e.g., bear, cow, dog, elephant, horse, lamb, lion, pig, zebra, crocodile, monkey, and rhinoceros) were relatively high-frequency animals (Battig and Montague 1969; Van Overschelde et al. 2004). High-frequency, familiar objects were selected for study to ensure that the items could be easily identified and named by the participants. The stimuli were chosen because they had been used previously to examine categorization abilities (Chan et al. 1993, 1997, 2001).

Apparatus

A cardboard “blind box” was constructed for the purpose of haptic object study. Participants were seated at a table and inserted both arms and hands under opaque drapery so that their hands were free to explore objects haptically without viewing the objects. Participants were unable to view other stimuli while they examined the stimuli in the blind box.

Procedure

Vision naming task

Animal stimuli were not visibly arrayed but were hidden behind the cardboard blind box on the table in front of the participants. The participants were unable to view the other stimuli while conducting the naming tasks. The stimuli were presented randomly, one at a time, and participants were asked to name the animal. The researcher recorded the participants’ responses but did not provide feedback regarding accuracy.

Vision sorting task

Stimuli were sorted three times: once each for the dimensions of size (big or small), domesticity (wild or domestic), and predation (carnivore or herbivore). In each sorting task, the two appropriate attributes for the tested dimensions were written on separate index cards and positioned on the table facing the participant. Then, the participant was given each animal, one at a time, and asked to state which attribute best described it.

For example, when sorting by size, two separate index cards, positioned on the table facing the participant, were labeled big or small. The participant was asked to judge which of the two labels better described each of the animals presented (i.e., Is a cow a big or a small animal?). The researcher read to the participant a script that said, “For example, a big animal would be more than average in size, and would be extensive, large, or broad.” A small animal would be little or limited in size. “When sorting by domesticity, two separate index cards positioned on the table facing the participant were labeled domestic or wild. The researcher read to the participant a phrase that said, “For example, a domestic animal is an animal pertaining to the family, household, or farm.” In contrast, a wild animal would characterize an animal growing or living in a natural state.” When sorting by predation, two separate index cards positioned on the table facing the participant were labeled carnivore or herbivore. The researcher read to the participant a phrase that said, “For example, a carnivore is an animal that eats meat, while an herbivore is any animal that feeds chiefly on grass and other plants.” Overall sorting errors were manually recorded.

Haptics naming task

Participants inserted both arms and hands under opaque drapery in the blind box so that their hands were free to explore object haptically without viewing the object being examined. Participants were unable to view other stimuli while they were conducting the naming task. The stimuli were presented randomly, one at a time, in the blind box; and the participant was asked to name the animal. The researcher recorded participants’ responses but did not provide feedback regarding accuracy.

Haptics sorting task

Participants were asked to sort stimuli haptically on the basis of several dimensions. Stimuli were sorted on the same three dimensions as in the vision sorting task; however, participants explored the stimuli in the blind box.

All twelve stimuli were presented in each modality condition. The stimuli were presented for 30 s in touch and 15 s in vision, consistent with the two-to-one haptic-visual ratio suggested in prior studies, which avoids placing haptics at an unfair advantage (Newell et al. 2001). For both tasks and conditions, participants were tested individually in a quiet room during a 90-min session; overall errors were manually recorded; and order of modality was randomized across participants, as well as the tasks within the same modality.

Analysis

Overall

For the naming tasks, a paired t-test was used to compare differences between visual and haptic perception. The dependent variable was naming error. For the sorting tasks, a 2 (Modality) × 3 (Dimension) within-participant repeated-measures ANOVA was used. The dependent variable was sorting error. A Bonferroni adjustment was used for multiple comparisons, and the Greenhouse-Geisser correction was used when the assumption of independence of observations was violated (with corrected degrees of freedom reported where appropriate). A significance criterion of .05 was used for all tests.

Effects of gender

For the naming tasks, a 2 (Modality) × 2 (Gender) repeated-measures ANOVA was used. For the sorting tasks, a 2 (Modality) × 2 (Gender) × 3 (Dimension) repeated-measures ANOVA was used. The adjustment for multiple comparisons and the significance criterion were the same as those used in the overall analysis.

Results

Overall

Naming task

There was an effect of Modality [t(29) = −7.90, P < .01], where error was greater in the haptics condition. This suggests that participants found it more difficult to accurately name objects when using touch (M = .15, SE = .01) versus vision (M = .03, SE = .01). See Table 1.

Table 1 Mean naming error across modality

Sorting task

There were effects of Modality (F 1,29 = 41.67, P < .001, η 2 p  = .59) and Dimension (F 2,58 = 31.47, P < .01, η 2 p  = .52), suggesting greater difficulty in sorting objects haptically (M = .16, SE = .02) versus visually (M = .05, SE = .01), as well as differences in error based on object dimension (M = .06, SE = .01, for size; M = .09, SE = .01, for domesticity; and M = .16, SE = .02, for predation). There was also a Modality × Dimension interaction (F 1.45,41.94 = 14.70, P < .01, η 2 p  = .33), with simple effects analysis showing an effect of Dimension in vision (F 1.66,48.22 = 32.77, P < .01, η 2 p  = .53) and an effect of Dimension in haptics (F 1.57,45.47 = 22.75, P < .01, η 2 p  = .44). In vision and in haptics, there was greater error for predation versus size (P < .01). For both conditions, error increased from the concrete attribute of size to the more abstract quality of predation. See Table 2.

Table 2 Mean sorting error across dimension and modality

Effects of Gender

Naming task

There was an effect of Gender (F (1,28) = 5.65, P < .05, η 2 p  = .17), where females outperformed males. Females had a mean error of .07 (SE = .01) compared to the male mean error of .11 (SE = .01). The Modality × Gender interaction was not significant (F (1,28) = .507, P > .05, η 2 p  = .02).

Sorting task

There were effects of Modality (F (1,28) = 60.84, P < .001, B η 2 p  = .69), Gender (F (1,28) = 6.43, P < .05, η 2 p  = .19), and Dimension (F (2,56) = 27.59, P < .001, η 2 p  = .49), with a significant Modality × Gender (F (1,28) = 9.48, P < .05, η 2 p  = .25) interaction. Simple effects analysis of this interaction showed an effect of Gender in haptics (F (1,28) = 9.39, P = < .01, η 2 p  = .80), where females committed more errors (M = .20, SE = .02) than did males (M = .11, SE = .01). The Dimension × Gender (F (2,56) = .30, P = > .05, η 2 p  = .01) and Modality × Dimension × Gender interactions were not significant (F (2,56) = .10, P > .05, η 2 p  = .003).

Discussion

The present experiment examined how modality influences object recognition and categorization. Because vision has been considered the dominant sensory modality, it was expected that overall error would be greater in the haptic modality, which was supported by present results. Participants found it more difficult in general to recognize and categorize objects when using haptics versus vision. One explanation for this difficulty is that haptic perception does not allow for the same quality or speed of processing as visual perception; with some studies finding that, in haptic exploration, finger tracing across the edges of objects imposes a heavy cognitive load that slows processing and increases errors (Lederman and Klatzky 1987, 2004).

Errors in recognition and categorization

In the present study, then, poor performance in object recognition using haptics can be further explained by the slow fingertip exploration of object edges. During the haptics recognition tasks, participants were observed using their fingertips to explore the objects’ shape. Most participants completely traced around an animal’s contour with their fingers to seemingly “figure out” the object’s name. In touch, some participants may have failed to detect some important object details. For example, two toy animal objects (i.e., the dog and the cow), although dissimilar in size, were similar in shape and in head orientation. It was noted by participants that in the haptic modality that “discriminating between the cow and the dog was difficult” because of their similarity. Yet, despite similarities, the cow had prominent utters, which distinguished it from the dog; a feature that was overlooked by some of the participants.

Regarding object categorization, error was expected to increase in both modalities when moving from more concrete categories (e.g., size) to more abstract (e.g., predation), perhaps due in part to conceptual overlap at a neural level (Amedi et al. 2001, 2002; Sathian 2004). In line with this prediction, error with touch and vision did increase across dimensions. In prior research, Chan et al. (1995) defined “size” as a concrete category and predation as an “abstract” category. Replicating findings by Chan et al. (2001) in vision, the present study showed that error increased when moving from size to predation. A similar effect of dimension occurred in touch. This finding may provide behavioral evidence that visual and haptic systems share perceptual mechanisms.

Shared visuohaptic representation

In support of this, current research indicates that regions previously thought to specialize in processing aspects of visual inputs are also activated during haptic tasks (Lacey et al. 2009; Sathian and Lacey 2007). Psychophysical data indicate that interaction between modalities is the rule rather than the exception in brain function, and neuroimaging provides evidence against modularity and for interaction in areas traditionally thought to be unimodal (Shimojo and Shams 2001). Neurobiological data also suggest that there are points of convergence of the parallel pathways in multimodal brain regions that integrate information from diverse and multiple modalities (Sathian 2004; Zangaladze et al. 1999; Zhou and Fuster 2000). Information related to objects may be integrated in the lateral occipital cortex, an area that is active when a person recognizes an object using either vision or touch. Likewise, category preferences regarding objects, even faces, appear to occur in ventral temporal cortex during use of either modality (Amedi et al. 2002). Importantly, such studies demonstrate that multiple areas (e.g., temporal, parietal, frontal, and insular cortices) take part in binding object-oriented information across modalities. It must be noted, however, that these regions are not equally responsive to all types of sensory information and their combinations; rather, they show a degree of specialization (Amedi et al. 2005). Current research further suggests that direct sharing of information between modalities without recruitment of multisensory regions may be expected only for quick retrieval (e.g., alerting or processing of socially important information such as facial recognition and vocal identification). Conversely, more complex information (i.e., common objects) may be integrated through associative nodes enabling more flexible mappings between information from diverse modalities (Amedi et al. 2005).

Nonetheless, a growing body of research on visuohaptic perception provides evidence for a metamodal model, with a multisensory task-oriented organization (Pascual-Leone and Hamilton 2001). Specifically, evidence demonstrates similarities between visual and haptic object recognition, and the convergence of activity resulting from visuohaptic object processing in the same brain areas suggests that they share at least shape representation (Craddock and Lawson 2009; Lacey et al. 2009). That is, shared representation may be constrained by surface representations or the degree of spatial congruency shared between modalities (i.e., spatial information, Woods and Newell 2004).

Spatial information

Woods and Newell (2004) examined unfamiliar objects and found that cross-modal object representations of objects are mediated by surface-dependent representations; in a second experiment of familiar 3D objects, they examined how spatial information is integrated across modalities and viewpoint and found that scene recognition performance was less efficient when there was either a change in modality between learning and test, indicating there are constraints under which cross-modal integration is optimal for object recognition. Cooke et al. (2007) found that in vision, participants found shape more important than texture, but in touch they relied on both shape and texture. In the present study, it could be that when participants were exploring, they were focusing more on shape and component parts (based on qualitative comments that they were concentrating on shape) and less on texture, which is equally important in touch. Because we looked at the increasing levels of abstraction, however, it is also possible that semantic information could have played a part. For instance, results may have reflected what participants already knew about animals. Even though there was less error in the concrete category of size, when it came to more abstract categories, perhaps participants were less aware of particular animal habitats or predatory behaviors, further contributing to increased error.

Spatial imagery

Traditionally, visual imagery has been treated as a single ability. Yet, Kozhevnikov et al. (2005) and Blajenkova et al. (2006) created subdivisions such as object imagery (images that are linked to color and shape) and spatial imagery (images that are aligned with the spatial relations of objects and individual object parts). According to Lacey and Campbell (2006), vision and haptics encode spatial properties of objects (e.g., shape, size, and the position of features) irrespective of modality (modality independent). Although object-spatial dimension of haptic representations remains unexamined, Lacey et al. (2009) suggest that similarities in processing of visually and haptically derived representations provide evidence for the concept that they share a common spatial representational system.

Visual cortical engagement in haptic shape perception has been well established, but its connection with visual imagery is a matter of debate. Various experiments have sought to address this topic by adopting individual tasks for imagery of visual objects and shape perception by touch. For example, Lacey et al. (2010) conducted two experiments: the first involved a haptic shape task that relied on unfamiliar (or novel) objects, and the second involved familiar objects. The authors found that visual-object-imagery activation crossed over more prominently with activation occurring in haptic familiar-object shape perception. Deshpande et al. (2010) contributed further supporting evidence, as they employed computational analyses related to task-specific functioning and connectivity to discover similar neural networks underlying visual imagery and haptic perception of shape, but only for familiar objects. In our study, because we used highly familiar objects, it is plausible that participants may have used visual imagery to facilitate recognition and categorization in their haptic shape perception.

Gender

Finally, although not a focus of the present study, our results also suggest a possible role of gender in visuohaptic perception. That is, females may have found it more difficult to categorize objects through touch although they were better at verbal recognition, consistent with prior findings (Capitani et al.1999; Kimura 1999). Haptic categorization results in the present study also support prior research revealing male superiority in spatial tasks and more specifically that males showed better overall haptic discrimination of categories by touch (Cohen and Levy 1986b). This factor would benefit from more extensive study.

Summary

In the present experiment, we found further evidence of shared object representations between vision and haptics, specifically that error increases in both modalities along dimension when moving from more concrete categories (e.g., size) to more abstract (e.g., predation). Significantly, the present behavioral evidence parallels neurophysiological evidence of shared visuohaptic processing. It will be necessary to investigate whether this overlap holds for other types of abstraction (i.e., object use or function).