1 Objects for perception

In perception, objects are key. Objects are constituents of the conscious perceptual manifold, they are targets of perceptual attention, and they are subjects of perception-based demonstrative thought. Vision is the paradigm. Humans see objects like corks and cormorants, and seen objects look some specific way, such as cylindrical, mottled, animate, or feathered. Accordingly, objects play a central role in psychological accounts of vision and in philosophical accounts of visual awareness. Recent exemplars of this rich history include, in particular, Strawson (1979), Marr (1982), Spelke (1990), Clark (2000), Scholl (2001, 2002), Campbell (2002), Cohen (2004), Matthen (2004, 2005), Siegel (2006b), Pylyshyn (2007), and Dickie (2010). Model visible objects are medium-sized, three-dimensional, extended, bounded, persisting, cohesive particulars. They occupy regions of space, they are neither too big nor too small, and they last through time.

Extending this beyond vision faces two puzzles. The first is a puzzle about diversity. It concerns the nature of objects for non-visual perception. On one hand, paradigm objects recede. Haptic touch reveals stoppers and feathers, but it is an outlier. In the first instance, humans hear sounds, smell odors, and taste flavors. Each of these things is unlike a bird or a hunk of bark. Sounds, odors, and tastes are unisensory. Thus, it is common to say that objects are perceived only indirectly, if at all, in audition, olfaction, and gustation. Beyond vision, objects play a less critical role. On the other hand, objects remain central in theorizing about non-visual perception. Philosophers and psychologists have posited auditory objects, olfactory objects, and gustatory objects in characterizing perceptual capacities and the structure of perceptual awareness in non-visual modalities. Theorists of perceptual objects for non-visual modalities include, for example, Kubovy and Van Valkenburg (2001), Griffiths and Warren (2004), Humphreys and Riddoch (2007), O’Callaghan (2008), Kubovy and Schutz (2010), Matthen (2010), Nudds (2010), Batty (2015), and Smith (2015). We consciously perceive audible, olfactory, and gustatory items that appear loud, pungent, or bitter. We can perceptually discern, track, demonstrate, and attend to these things.Footnote 1 If, as the evidence suggests, such objects play a role like that which is so central in vision, they must differ in stark respects from vision’s objects. So, is there any good account according to which we hear, smell, or taste objects in something like the sense that we see objects? What are their natures?

This draws attention to a second puzzle. It is a puzzle about unity, and it concerns objects for multisensory perception. Some forms of multisensory perception target common objects across senses. For instance, I have argued elsewhere that some multisensory effects, revealed by cross-modal illusions, resolve conflicts across modalities. Conflict requires a common subject matter, so performing conflict resolution demonstrates a shared perceptual concern for the common sources of stimulation to multiple senses (O’Callaghan 2012). Moreover, intermodal binding awareness involves perceiving something’s bearing both visible and tactual features, or visible and audible features (O’Callaghan 2014; see also Chapters 2, 5, and 6, especially, in Bennett and Hill 2014). Perceiving apparent intermodal motion involves using two different senses to perceive one thing’s changing location over time (O’Callaghan forthcoming). Each of these multisensory capacities requires that different modalities may share objects and that perception targets shared objects as such. While some are skeptical, such as Spence and Bayne (2015), I maintain that this holds even as a claim about conscious perceptual awareness.Footnote 2 This calls for an account of such shared multisensory perceptual objects. The obstacle is that perceptible objects typically differ so dramatically across the senses.

This paper resolves the puzzles. It presents a general account of the nature of perceptual objects that applies across sensory modalities, then uses this general account to characterize multisensory perceptual objects. According to this proposal, perceptual objects are structured mereologically complex individuals. Roughly, objects for perception are items that bear perceptible features and have perceptible parts arranged to form a unified whole. This characterization is general enough to apply to diverse forms of non-visual perception. Not only the features but also the structures of perceptible objects differ from one sense to another. This flexibility enables the account to accommodate strictly unisensory objects, common objects, and shared objects for multisensory perception. For instance, one can bimodally perceive a common whole with some parts accessible to one but not both senses. This is so because perceiving an object does not require perceiving each of its attributes or parts.

This account has several strengths. It accommodates objects as consciously perceptible targets of attention and demonstrative thought for hearing, smell, touch, taste, and multisensory perception. However, it avoids identifying perceptual objects with medium-sized dry goods, which is too visuocentric. Moreover, unlike the truism that perception’s objects are whatever is perceived, it provides a substantive, theoretically useful notion of an object for perception. By design, since I am committed to there being such objects, it delivers objects for episodes of conscious multisensory perceptual awareness. This conception also illuminates perceptual processes and mechanisms across the senses that do not engender experience. It works for conscious or unconscious perception.

Several philosophers have posited sensory individuals as perceptible feature bearers (e.g., Strawson 1959; Clark 2000; Cohen 2004; Nanay 2013, §3.3). And I have claimed both that unisensory auditory objects and that multisensory audio-visual objects are mereologically complex individuals with differing structures (O’Callaghan 2008, 2011a, respectively). However, this paper newly generalizes the account to apply in a unified way to varied forms of non-visual and multi-sensory perception, including olfactory, gustatory, and visuo-tactual awareness. It addresses why such diverse capacities are forms of object perception, and it applies the account to diagnose whether a capacity involves a form of object perception. In addition, this paper provides motivation for accepting the account against the alternatives, and it describes and responds to salient objections. In short, it presents, develops, and defends a unified general account of objects for multisensory perception and awareness.

In Sects. 2 and 3, I consider and reject two venerable proposals for perceptual objects beyond vision. The first is that such objects are best understood as intentional objects. This is too permissive. The next is that such objects are the same material objects we see. This is too restrictive. In Sect. 4, I explain my proposal that perceptual objects are structured mereologically complex individuals. In Sect. 5, I apply it to deliver accounts of objects for several non-visual perceptual modalities. In Sect. 6, I extend the account’s application to multisensory perceptual objects. In Sect. 7, I reply to four objections. Section 8 concludes.

2 Objects of perception

Contemporary philosophical accounts of perception agree that humans consciously perceive objects. According to content views, objects are perceived thanks to their being represented. According to relationist views, objects are constituents of conscious perceptual episodes.

One hypothesis suggested by contemporary work is that perceptual objects just are the objects of perception. This section considers and rejects that suggestion as too permissive.

Start by distinguishing an account of perceptual objects—objects for perception—from an account of the objects of perception. An object of perception is that which is perceived or perceptually represented. It is common to hold that perceptual episodes are directed at or about something, and in that sense are intentional. Thus, the objects of perception may be understood as intentional objects. For instance, Crane (2009) says, “for every intentional state of kind ϕ, there is something on which the ϕing is directed. What the ϕing is directed on is the object of the state. This is what I mean by saying that every intentional state has an object” (454).

One obstacle to identifying objects for perception with intentional objects is the close association between intentional objects and the theory of representation. For a mental state or episode to have an intentional object generally is taken to be compatible with there existing no such object. My hallucinating seeing a dinosaur seems intentional if seeing one is, but it does not require a dinosaur’s current existence. For an episode to have an intentional object is taken to require only that it represents or has content and thus may misrepresent or be inaccurate. However, not every theorist who thinks humans perceive objects believes perceptual or perception-like episodes regardless of their accuracy involve contentful mental states and have intentional objects. Relationists reject this.Footnote 3

So, distinguish the theoretically committal interpretation of an intentional object from a neutral construal of an object of perception. The neutral construal does not commit concerning whether inaccurate perceptual or perception-like states misrepresent or have objects. According to the neutral construal, the objects of a perceptual episode simply include that which one perceives or perceptually represents. The objects of perception are its targets—what’s perceived or represented.

The neutral construal of the objects of perception is far too permissive as an account of perceptual objects. It rules out a lot that is imperceptible to humans, such as everlasting world peace, electrons, and Earth’s magnetic field.Footnote 4 But it still lets in too much. Humans perceive not just objects but also their attributes and qualities (shape, motion, color, pitch, and perhaps tigerhood); relations among them (temporal, spatial, causal); happenings that involve them (collisions, collapses, scratchings); and perhaps states of affairs (her eyelid’s occluding her iris, the recording’s being unbalanced, the wine’s being overly tannic).

Talk of perceiving objects typically contrasts them with perceptible properties, relations, events, and states of affairs. This contrast marks noteworthy differences in the structure of the manifold revealed by perception. For instance, features perceptibly belong to objects, objects are perceptibly bounded and thus differentiated from their surroundings, objects perceptibly persist and survive changes, objects stand in perceptible relations to each other and participate in perceptible events, and objects partly constitute perceptible states of affairs. Moreover, experimental psychologists distinguish object-based from feature-based and location-based attention, investigate object-specific preview effects, and posit object files to explain differing aspects of tracking and reidentification.

“What can be perceived?” is a good question—it concerns the objects of perception. However, a bare inventory of the perceptible is silent about the differing varieties of actual and potential objects of perception. Perceptible objects differ from perceptible attributes, relations, and states of affairs. This paper’s target is an account of perceptible objects among the objects of perception that is theoretically illuminating in characterizing non-visual and multisensory perception (cf. Casati 2015).

3 Material objects

Typically, the objects of perception are taken to include familiar, ordinary objects in the environment, such as books, cups, cars, guitars, noses, and tails. Brewer (2011) maintains that physical objects—“things like stones, tables, trees, and animals: the persisting macroscopic constituents of the world that we live in”—are presented to humans in perception. “[W]e see and otherwise consciously perceive physical objects: they are in this sense elements of perceptual consciousness” (2; see also Brewer 2007, 87). Siegel (2006a) focuses on phenomenological constraints on seeing “paradigm ordinary objects,” such as, “people, horses, trains, and the like” (429). Dickie (2010) argues, in part for empirical reasons, that humans are acquainted perceptually with “ordinary middle-sized objects” and not just an array of features as in the “Old Empiricist View.” Lycan (2014) is enthusiastic (if vigilant) about this idea: “Surely [but watch that ‘surely’ operator!] vision represents everyday objects, not just volumetric shapes and distances. And object-recognition is obviously [!] one of vision’s functions” (312, Lycan’s brackets). In psychology, Spelke’s famous work argues that even human infants segment perceptual arrays into objects. “Object perception does accord with principles governing motions of material bodies: Infants divide perceptual arrays into units that move as connected wholes, that move separately from one another, that tend to maintain their size and shape over motion, and that tend to act upon each other only on contact” (Spelke 1990, 29). Spelke maintains that early object perception honors principles of cohesion, boundedness, rigidity, and no action at a distance that also govern commonsense reasoning about objects in the physical world (see, especially, Spelke 1990, 48–54).

Accordingly, suppose that perceptual objects are material objects in what Anscombe (1965) calls the “modern sense,” which does not apply to debts. Bodies works well as a single term because it avoids complicated questions about how to interpret “matter.” So understood, stereotypical perceptual objects are medium size, three-dimensional, extended, bounded, cohesive, persisting items.

Still, this might be too materialistic. Humans see things like rainbows, holes, spots of light, images, and shadows that are not material objects. Such items are public, even if they are only accessible by vision. They bear perceptible features and serve as targets for object-based attention and demonstrative thought. They have shapes and sizes, they are bounded and cohesive, and they persist and survive change. But they are not made of matter in the straightforward way that rocks and tires are made of matter. This can affect whether such entities seem to occupy space in three dimensions (fill it up) or behave like good rigid objects. Vision might just mischaracterize such things as material objects, in which case, for instance, rigidity violations should be surprising, and holes might turn out imperceptible. Alternatively, construing objects for perception as familiar material objects is too narrow because it leaves out visibilia like rainbows and shadows. If so, we might use “bodies” more figuratively to include them.

The choice does not matter to my argument. Even understanding perceptual objects in the more inclusive sense is too restrictive. Items that are perceptible with other senses can play a role analogous to visual objects, yet they are not bodies even in the inclusive sense that stretches to include rainbows, shadows, and holes.

For instance, I have argued that sounds are public objects of audition, even if they are merely audible. They are audible bearers of perceptible attributes such as pitch, timbre, loudness, and duration. Multiple distinct sounds are audible at a time, sounds persist and survive change, and sounds can occlude and mask each other. You can attend to a sound in contrast to its features, its location, the material object that makes it, or its audible background. You can track a sound over time and form audition-based demonstrative thoughts about it. You can reidentify the same sound after a period in which it existed but was inaudible to you, as when you wear earphones or leave the room, or the sound’s frequency gets just too high for you (even while keen-eared others can hear it). And, spoken language and musical melodies bind together distinct sounds to form a single audible object. Thus, something closely analogous to visual object perception occurs in hearing that warrants describing it as a form of object perception.Footnote 5 Researchers posit auditory objects as the targets of auditory object perception.

Nevertheless, humans do not typically hear spatial edges, surfaces, occlusion, or rigidity. In the first instance, human hearing does not target medium size, three-dimensional, spatially extended, bounded, and cohesive bodies as such.Footnote 6 Auditory objects are not bodies. So, construing perceptual objects as bodies is too restrictive.

Similarly, olfaction does not in the first instance involve smelling material objects as such. Ordinary objects do not appear in olfaction to occupy space. Olfaction does not resolve the edges of hibiscus flowers or cinnamon sticks and thus cannot differentiate them from their surroundings. But you can smell an odor that outlasts its source and differs chemically from any of its source’s components. You can attend to an odor and discern its various specific attributes, you can track it through time as it persists and changes, and you can form demonstrative thoughts about it that do not just concern its source. So, odors are among the public objects of olfaction but are not ordinary objects or bodies.

There is a nice disagreement about whether olfaction itself involves a form of object perception and thus warrants talk of perceptual objects. Lycan (2000) says that olfactory experience involves mere modifications of one’s consciousness. Burge (2009) suggests that olfaction is not perception—it does not involve objective perceptual representation—because it fails to appreciate constancies through sensory variation.Footnote 7 Batty (2010a, b) maintains that olfaction represents, but rejects that it assigns features to distinct objects at one time. According to Batty, olfaction thus fails to solve Jackson’s many properties problem (Jackson 1977; Clark 2000). Olfaction, unlike vision and audition, fails to distinguish distinct feature bearers and thus cannot target and track multiple objects simultaneously. Batty therefore argues that olfactory content is general or existentially quantified in form and does not involve particular objects.

Others maintain that olfaction is a form of object perception. Batty’s recent work reconsiders the case for olfactory objects on empirical grounds. For instance, Stevenson and Wilson (2006, 2007) argue that even static olfaction involves awareness of a “wholistic, unified percept” that is neither a mere mixture of features nor an ordinary object. Such olfactory objects are claimed to be subject to figure–ground effects, a hallmark of object perception, and they can be recognized and reidentified, central functions of object perception. Moreover, active multisensory perceptual exploration may enable or enhance a human’s capacity to discern odors as olfactory objects in space (Batty 2015; see also Matthen 2005; Carvalho 2014; Young unpublished).

Settling whether or not olfaction involves object perception does not matter for now. The point is that there is no debate about whether olfaction seems to present ordinary objects or bodies in the direct or immediate way that vision does—everyone agrees that it does not. The debate is about whether or not smelling is usefully construed as involving something analogous to object perception and, thus, whether odors are objects for perception in a theoretically interesting sense over and above their being (intentional) objects of olfaction. So, the question is: What is the notion of a perceptual object in play when we say there are auditory objects and debate whether or not there are olfactory objects?

4 Mereologically complex individuals

Humans are able perceptually to discern and to target items as distinguished from their surroundings. Such items perceptibly bear certain qualities and attributes and extend in a continuous or otherwise connected manner (where connection need not require contact). They are tracked perceptually as persisting or surviving from moment to moment and place to place despite changing. Typically, they have perceptible parts that also are items in this same sense. If scattered or discontinuously connected, such an item nevertheless is treated and presented perceptually as a single unified whole rather than as a mere plurality.

This core family of perceptual capacities is fruitfully regarded as the object perception suite. It constrains object-based attention, subserves object recognition and reidentification, and feeds perception-based demonstrative thought. And it is found beyond vision. Perceptual objects across the senses are the targets of such families of perceptual capacities.

My proposal is that perceptual objects are structured, mereologically complex individuals. I have suggested a similar account elsewhere (O’Callaghan 2008, §5; 2011a, §4; b, §5.2). Here, having shown that the alternatives are unsatisfactory, I elaborate, extend, and defend it.

First, perceptual objects are individuals.Footnote 8 By this I mean they are singular, first-order feature bearers. Individuals are, for a range of ways, some specific way or another—they have attributes. Individuals are not merely subjects of predication, and individuals are not simply logical objects, or (n − 1)th-order properties for n > 1.Footnote 9 Groups, construed as mere pluralities, are not individuals. Moreover, I do not identify individuals with particulars, so as to exclude tropes and to allow abstract perceptual objects.

Second, perceptual objects are mereologically complex. They have perceptible parts treated individually as belonging to and collectively as composing a whole. A cormorant visibly has wings, a beak, and feet that belong to a common visible individual. A yodel audibly includes a high-pitched bit and a low-pitched stretch. Philosophers have devoted a lot of attention to the fact that humans enjoy perceptual awareness of objects and their attributes, and much has been made of differing subject-like and predicate-like roles in cognition and in perception. I am trying to emphasize that the apparent relation of part and whole has a similarly central role in perception and in cognition. Some further clarifications are in order. (1) Here, “mereology” should be understood in the general sense that concerns parts and wholes rather than as committing to a specific mereological system, such as classical mereology. (2) I am not asserting the priority of parts or of wholes over the other. (3) Any perceptible simplicity is the lower limit of complexity.

Third, perceptual objects have structure. The parts of a perceptual object stand in constitutive relations of various kinds. On one hand, some general structural characteristics, such as spatial and temporal relations, govern a given class of perceptual objects, such as visual objects. More specific relations among parts also matter. For instance, typical visual objects are individuated and recognized on the basis of spatial boundedness and temporal continuity. A different arrangement of the same parts need not even be a perceptible object. For perception, then, it might be that composition is not ontologically innocent, in the sense of Lewis (1991) (see also Hawley 2014). A perceptible bird should be treated as differing from a plurality of (innocently fused) perceptible bird parts. (Consider their differing survival conditions.) A bird, for perception, requires a certain perceptible organization. The visible parts must appear to compose or to belong to a common, unified whole.Footnote 10 This is clearly illustrated by the difference between a plurality of visible birds and those birds’ visibly forming a flock, as with starling murmurations. Classical mereology thus may not suffice for perceptual objects.

This account of perceptual objects has several advantages. First, it is permissive enough to admit sounds, rainbows, holes, clouds, odors, and shadows because it does not identify perceptual objects with ordinary objects or bodies. It also can admit flocks and melodies. It can even admit events like collisions and coronations. The crucial differences across cases are structural. Nevertheless, this account is selective among the objects of perception. Cormorants and corks are in; colors and causality are out.

The account is not visuocentric. As I argue in Sect. 5, it captures what is common to visual objects and auditory objects, but it explains their distinctive differences. Moreover, it provides a set of criteria that help settle whether or not to admit olfactory objects. It also grounds a theory of multisensory perceptual objects, which I present in Sect. 6.

The account is theoretically illuminating. It abstracts to characterize the general features in virtue of which certain items play the psychologically important role of a perceptible object for awareness, attention, and demonstrative thought across sensory modalities. It thus makes room for a fruitful conception of perceptual objects among the objects of perception.

Finally, the account is flexible in the theory of perception because it remains neutral about the metaphysical status of the objects of perception. My framing refers to public objects. However, whether the objects of perception are ordinary physical things, intentional objects, content constituents, or mind-dependent items, there is no barrier to their including structured mereologically complex individuals.

5 Objects beyond vision

Perceptual objects are structured, mereologically complex individuals. Vision and touch are clear enough. They reveal bodies extended in space that persist through time. Visual objects include familiar material bodies, ephemera such as rainbows and holograms, and scattered things like flocks. Some visual objects, such as rainbows and holograms, are strictly unisensory objects. Others, such as material bodies, also are accessible to touch. Tactual objects include material surfaces and bodies.Footnote 11 Perhaps strong magnetic fields might be unisensory objects for touch. What about perceptual objects beyond vision and touch?

Let’s begin by reviewing the auditory case. This discussion of auditory objects revisits and develops my earlier account from O’Callaghan (2008). The idea of an auditory object is puzzling. In audition, sounds are central. You cannot hear without them. But sounds are unlike ordinary material objects, and you do not typically hear bodies as such—as three-dimensional, extended, bounded, cohesive collections of persisting spatial parts discriminable from a background. Nonetheless, audition, like vision, does carve up the auditory scene into distinct perceptible individuals. This is the core insight of Bregman’s (1990) groundbreaking work on auditory scene analysis, or segregating entangled wave information to yield what he thought of as distinct auditory streams. Like ordinary visible objects, sounds audibly appear to possess specific features. Auditory objects appear to have attributes such as pitch, timbre, loudness, and duration. The cocktail party effect demonstrates that it is possible to distinguish a sound from its audible background. Thus, audition, like vision, satisfies a plausible condition on object perception: perceiving a particular requires being able to differentiate, discriminate, or distinguish it from the surrounding environment (see Strawson 1959; Dretske 1969, 20; Bermüdez 2000, 364; Martin 2007, 706; Siegel 2006a, 434). It also is possible at one time to discern distinct sounds with differing attributes. Thus, audition solves the many properties problem: you can distinguish hearing a loud, high-pitched sound and a soft, low-pitched sound from hearing a loud, low-pitched sound and a soft, high-pitched sound. And, like ordinary objects, sounds audibly persist and survive changes over time: a sound can begin high-pitched and loud and become low-pitched and soft.

What sorts of individuals are auditory objects? Structure matters in object perception in two ways. Structures of relations extrinsic to objects are critical to object individuation, and internal structure among the parts of objects is critical to object recognition.

In vision, space plays both roles. In audition, space does not play these structural roles the way it does in vision. Sounds audibly are located in space—they seem to come from some direction and distance. So, audition is spatial. But space plays a diminished role in individuating auditory objects. Qualitatively matched sounds from separate loudspeakers typically appear as a single audible individual, but qualitatively matched figures on separate pieces of paper appear as distinct visible bodies. Thus, in contrast with vision, spatial separation does not suffice for distinct audible individuals.

Moreover, while sounds can audibly have greater or lesser spatial extent, internal spatial structure plays no role in recognizing audible individuals. Sounds lack audible spatial boundaries, and audible items do not auditorily appear to have any richly detailed internal spatial structure. This is the truth in Strawson’s (1959) claim that sounds are not inherently spatial, though it does not imply that a purely auditory experience must be aspatial.

However, for auditory object individuation, pitch is a structural analog of space in vision. At a time, a difference in pitch suffices for distinct audible individuals. For instance, a high-pitched note and a concurrent low-pitched note from one loudspeaker are audibly distinct items. Nevertheless, they can be heard to be parts of a common individual when played together in a chord. The chord is a mereologically complex individual with a unifying structure. It is like a flock of birds in pitch space.

Time also plays a role in auditory object perception like that of space in vision. For instance, audible individuals have perceptible durations, and they perceptibly begin and end. They are perceptibly extended and bounded in time. Accordingly, just as a spatial boundary can belong to one adjacent visible surface or another but not to both, one note at a temporal boundary can belong to one temporally adjacent sound stream or another but not to both (see O’Callaghan 2008, Figure 16, for illustration). One sound also can audibly appear to persist through masking by another sound—an analog of visual occlusion. For example, imagine listening to a televised police siren interrupted by a brief pulse of static. Hearing the earlier and later bits as belonging to a continuous but masked siren sound differs from simply hearing them as two disjointed sounds.

Moreover, time drives recognition by providing an internal structure for auditory objects. For instance, their differing arrangements of features over time distinguish utterances of “belated” and “tabled.” Police and ambulance sirens differ in their patterns of audible qualities through time. A horse’s neigh would not be the sound it is without exhibiting that type of qualitative pattern through time. The same goes for a melody. Recognizing an audible individual—an auditory object—requires appreciating its temporal profile.

So, in the first instance, an object for audition is a complex but bounded and unified individual with audible parts in time and in pitch space. Auditory objects include unisensory individuals, such as sounds and melodies, as well as happenings, such as utterances, accessible through other senses.

A further difference between visual objects and auditory objects is noteworthy. Sounds and sound streams appear to persist in a manner that differs from that of visible bodies. Sounds and sound streams begin and end, and they require time to unfold and transpire. To be the perceptible sound of a siren or of a spoken word requires characteristic changes in audible features over time. Such audibly persisting individuals need not seem to be the sorts of things that wholly exist at any particular moment in time. This contrasts with visible bodies, which appear to exist in their entirety at each moment. Thus, I am claiming that audible sound streams perceptually appear to persist by perduring—by having temporal parts at different times—but that visible bodies perceptually appear to persist by enduring—by being fully present at each moment. This difference is mirrored by differences in object recognition. We recognize visible bodies by the feature profiles they display at a time, while we recognize sound streams on the basis of a feature profile they display through time.

For perception to treat some object-like individual as enduring, or as “wholly” or “fully” present while persisting requires only that it treats what’s present at each moment to suffice for being that thing, while granting its identity across time. It thus targets neither the stage nor the worm as such. For perception to treat an event-like individual as perduring means that it does not treat what’s present at each moment to suffice for being that thing—instead, some of it occurs at other times. Notice the asymmetry in contrast with spatial parts. Perception does not treat each of an individual’s proper spatial parts as sufficing for being that thing; it regards them as components, with more elsewhere.

These facts about perception shape intuitions about the metaphysics of persistence. However, my account does not imply that there is a fundamental ontological difference between the manners in which objects and events persist. Metaphysically, the difference between object-like and event-like persistence may just be a matter of degree, depending on how much change occurs. (If so, perdurantism about object-like individuals should be surprising.) My claim is that perception treats persisting audible items as perduring “event-like” individuals and persisting visible items as enduring “object-like” individuals. For my purposes, both count as structured mereologically complex individuals.

A natural response is that you can see persisting event-like individuals, such as a bottle being uncorked or a cormorant landing on water. I agree, with two qualifications. First, the apparent structure of a visible happening differs from that of a standard visible body. The two belong to differing classes of perceptual objects: what I have dubbed “event-like” individuals and “object-like” individuals. Second, seeing an event requires a visible body. Visible events are perceptible happenings or interactions involving visible bodies. The same does not hold for hearing. Hearing a persisting sound stream does not require that any apparently enduring object-like individual is among its audible constituents.Footnote 12

According to this account, visual, tactual, and auditory objects are mereologically complex individuals with differing structures. As such, they figure in conscious perceptual awareness, they are targets for object-based attention, and they are available as subjects for perception-based demonstrative thought. A strictly unisensory object is accessible only to one sense, but objects for a modality may also include objects perceptible through other senses. For instance, holograms and sounds are unisensory objects, while material bodies are common to sight and touch.

Can we extend the account further, beyond vision, touch, and audition? Consider familiar orthonasal olfaction (through the nostrils), a topic of recent discussion (e.g., Lycan 2000; Batty 2011, 2015; Richardson 2013). Whether there are olfactory objects depends on whether olfaction involves awareness of mereologically complex individuals. The best candidates are odors. Odors appear to have attributes such as being floral, intense, rancid, or sweet, and they perceptibly persist and survive change. But temporal structure is not significant for olfactory individuation or recognition. Odors seem fully present at each time at which they are perceived, so odors are more like bodies than like sounds or symphonies in their apparent manner of persisting.Footnote 13 With active exploration over time, odors do have perceptible spatial boundaries and can appear to differ qualitatively from place to place. So, odors have actively perceptible spatial structure.

Still, fully static orthonasal olfaction does not reveal spatial structure or discriminate odors from their surroundings. And, if a single odor can jointly appear to have several qualities, then it is not clear that static olfaction ever distinguishes distinct individuals at a time, so it may not solve the many properties problem (see Batty 2010a). There is no evident analog of pitch space for distinct individual odors to inhabit at once. Accordingly, in fully static olfaction, external structural relations play no significant role in individuating odors, and consciously discernible internal structure among parts plays no significant role in recognizing odors. Instead, the qualitative profile of an odor at a time drives recognition.

On balance, let’s grant that odors are unisensory olfactory objects, since active olfactory exploration reveals persisting mereologically complex individuals with spatial structure. However, being a form of object perception is not olfaction’s most impressive feature. As object perception, static orthonasal olfaction for humans is degenerate.

6 Multisensory objects

Multisensory perceptual objects are trickier. Multisensory perception sometimes involves awareness of common items or features as such across modalities. This requires being differentially sensitive to the identity or sameness of something perceived through multiple senses. Some cases, such as intermodal binding awareness, involve perceiving features’ jointly belonging to something common. My aim here is not to defend these claims. Instead, I assume it.Footnote 14 This calls for an account of multisensory perceptual objects. The main obstacle is the diversity of perceptual objects across modalities. We see visual objects arrayed in space, touch parts of surfaces of bodies, hear sounds and melodies, smell odors, and taste the flavors of stuff we ingest. What room is there for a general account of the objects on which multisensory perception converges?

Objects for multisensory perception, too, are individuals with parts arranged in a structure. Perceptual objects for a given modality need not have parts or features accessible to an additional modality. Rainbows are intangible visual objects, and sounds are invisible auditory objects. Objects for a modality might be accessible to other senses, as material bodies are objects for vision and for touch. However, multisensory perceptual objects must be perceptible through the use of multiple senses. According to my general account, multisensorily perceiving such an object as such involves using multiple senses to perceive features’ or parts’ belonging to a common whole. We need to describe the characteristics and structures of such unified wholes, as well as their relationships to the various modality-specific objects.

Let’s consider three types of cases. First, take visuo-tactual perception. You can see and feel the baseball in your hand. Further, you can perceptually identify what you see with what you feel. Some visuo-tactile illusions trade on this.

So, what is the problem? In this type of case, the obstacle is not that vision and touch reveal wholly different sorts of objects. You see bodies in space and you touch bodies in space. The obstacle is that how objects look differs from how objects feel—bodies are presented in differing ways in vision and in touch. Sight reveals object after object, a range of colored forms visibly populating space at a distance. In contrast, touch reveals the textured parts of surfaces that make contact with your skin (or with a proxy that does). Moreover, during a typical visuo-tactual episode, vision and touch do not reveal the same parts of any particular object. When you hold a baseball, you see its facing surface but feel its other side. Touch also typically blocks facing parts from view.

The task, then—the multisensory perceptual achievement—is perceptually identifying what is felt with what is seen. It is determining that the leathery parts in contact with your palm belong to the same object as the red-laced surface you see. A key to this is that touch reveals the textured parts to belong to an extended solid body, and sight reveals the facing hemisphere to be part of an extended solid body. In each case, you perceive not just the surface part, but also the fully extended body, which continues out of view and beyond contact. In vision and in touch, you perceive it to be the sort of thing of which there are more parts to be perceived.

To be clear, perceiving the whole does not require perceiving each of its parts, either in sight or in touch. Perceiving an item requires that some of its parts and features are currently perceptible, but it does not require that all of them are. Thus, you can see the baseball even though some of its parts are hidden, and you can touch the baseball even though most of its parts do not contact your hand. Awareness of the whole does not require awareness of each of the parts.

So, vision reveals parts and the whole to which they belong; the same holds for touch. Cross-modal identification then requires taking a stand on whether or not the senses converge on a common whole. In the visuo-tactile case, this depends on the alignment of structures and features between senses. For instance, simultaneity, colocation, and matching surface configuration over time, in part as revealed by patterns of dependence among looks and haptic feels in perceptual exploration, drive intermodal identification. When successful, you multisensorily perceive a common unified whole to have visually and tactually accessible parts and features. Visuo-tactual objects are commonplace bodies extended in space, a subset of visible objects.

Next, take audio-visual perception. In the first instance, you hear sounds and sound streams. Objects for audition are temporally extended event-like individuals that appear to persist by perduring. They are structures of parts in time and in pitch space. On the other hand, you see spatially structured bodies that appear to persist by enduring. So, this is not just a difference in how the same objects appear. Vision and audition present different objects.

Look deeper. In hearing sounds, you can hear their sources. You can hear a collision or the vibration of an object. However, the sound is not heard merely to be a byproduct of the source. Sounds do not perceptually appear wholly distinct from such environmental happenings. Instead, audible environmental happenings, such as the grinding of gears, include sounds. A sound is perceived as a feature that belongs to such an occurrence. Since the sound is an individual, rather than a property, it is an individual part of the more encompassing event. The broader happening involves ordinary things and events, such as bells, vibrations, and gears; and it includes sounds. The sound of an event thus is like the auditory analog of an object’s visible surface. Just as an object’s surface fixes its visible appearance, an event’s sound fixes its audible appearance.Footnote 15

Humans also see events, such as cars’ colliding, hands’ clapping, or gears’ grinding. So, audition and vision can target the same environmental happenings. Nevertheless, audition and vision target events in the environment differently. Hearing happenings involves perceiving their sounds. Seeing happenings does not. Typically, instead, visible objects visibly participate in visible happenings. Generally, however, you do not hear events by or in hearing bodies as such to participate in them; instead, you hear what bodies do. Thus, from their differing modality-specific perspectives, vision and audition reveal differing perceptible features of common happenings in the environment.

Audio-visual perception sometimes not only converges upon a common object but also identifies a shared perceptible object as such. When it does, its perceptual object is an individual with a complex mereological structure. Typically, it involves a visual object’s participating in a happening that is perceptually identified with an audible happening that has an auditory object or a sound as a part. Audio-visual perception thus reveals a temporally extended, event-like individual with visible bodies as participants and sound streams as parts. The multisensory perceptual object is the broader, encompassing happening—the hands’ clapping, the wheels’ screeching, the tuba’s soloing—that you perceive audio-visually.

Not every part or feature of a complex audio-visual object is audible, and not every part or feature is visible. You cannot see the sound or its pitch, and you cannot hear the visible object or its color as such. This is not a barrier to the perceptibility of the whole. Perceiving a thing may require perceiving some of its parts and features, but perceiving a thing does not require perceiving all of its parts or features.Footnote 16 Each sense provides a partial perspective on the complex whole, revealing its differing aspects. Nevertheless, through the coordinated use of multiple senses, the mereologically complex individual becomes a common, unified target for perceptual awareness: a multimodal perceptual object. Some multimodal perceptual objects may only be accessible (or first accessible) as such through multisensory episodes.Footnote 17

Finally, consider flavor perception. Viewing objects for perception as mereologically complex individuals provides tools to resolve disputes about whether some perceptual capacity involves object perception. Flavor perception is a richly multisensory capacity. Perceiving the distinctive flavor of mint, chocolate, or chili depends on tongue-based taste, retronasal olfaction, and trigeminal somatosensation (see Smith 2015; Spence et al. 2015). On one hand, apparent flavor is complex. It involves attributes and aspects that are perceptible through taste, olfaction, and somatosensation. Thus, flavors are fully accessible only through multisensory perception. Nonetheless, flavors typically are unified. In tasting chocolate, the bitterness and cacao are aspects of a single perceptible flavor. Multiple senses work together to reveal a complex but unified gustatory profile.

Are flavors multisensory perceptual objects? This turns on whether they are mereologically complex individuals. The question is whether or not a flavor is a perceptible individual made up of perceptible parts with a certain sort of structure. I say, “No.” Flavors are attributes. Flavors do have a complex structure, including component features drawn from different senses and emergent multisensory characteristics. Nevertheless, flavors are just complex perceptible properties attributed to stuff in your mouth. That stuff also has perceptible texture, temperature, and shape in addition to its flavor. The item or substance is the object for multisensory perception—the mereologically complex individual. Among its attributes is flavor. And flavor is a repeatable. Thus, flavor is not a multisensory perceptual object. To be clear, perceptible flavor is an object of perception or intentional object in the sense of Sect. 2. But it is not an object for perception in the special sense reserved for the targets of object perception in contrast with feature or attribute perception.

We have canvassed three types of multisensory perception. Visuo-tactual object perception is uncomplicated. Vision and touch share objects of a common sort: spatially extended, bounded, cohesive, persisting rigid bodies. Vision and touch reveal differing parts and features of bodies, but in typical cases of multisensory coordination, they converge on and identify common individuals as such. Audio-visual object perception is more complex because perceptual objects for vision and audition differ in structure. Nevertheless, vision and audition do converge on common event-like individuals that involve bodies and have sounds. Multisensory audio-visual objects are hybrids with visible and audible features, participants, and parts that perceptibly belong to a whole with a complex spatio-temporal structure, perhaps only revealed in multisensory episodes. Finally, flavor perception as such is not multisensory object perception because flavors are not individuals. Instead, flavors are complex repeatable attributes perceived multisensorily. The relevant perceptual object is the stuff in your mouth to which flavor, in addition to texture, temperature, and shape, perceptibly belongs. We now have three strategies that provide templates for resolving questions about multisensory perceptual objects understood as structured mereologically complex individuals.

7 Objections and replies

Objection: Treating perceptual objects as mereologically complex individuals with differing structures is too permissive. It does not rule out much as a potential perceptual object. For instance, it admits houses, rainbows, holes, sounds, odors, flocks of birds, and melodies. What is the explanatory value of such an encompassing account?

Reply: The explanatory value is in having a general schematic account that captures what is common to all of these perceptible items, while ruling out a vast store of perceptible qualities, attributes, and features. Each of these items is a perceptually apparent bearer of features that persists through time and is distinguished from other items in its surroundings. This is an important role among the objects of perception. The explanatory task then is to characterize the differing features and structures of things perceived using our several sensory modalities, which we can sort into unisensory, common, and multisensory perceptual objects.

Objection: Mereologically complex individuals include both ordinary objects (material bodies), such as corks and cormorants, and events, such as uncorkings and cormorants’ cries. So, perceptual objects include object-like individuals and event-like individuals, which suggests there are object-like perceptual objects and event-like perceptual objects. Why call the event-like individuals “objects” for perception at all? Why not just speak about objects for perception and events for perception?

Reply: First, perceptual objects are the targets of a central group of perceptual capacities I’ve dubbed the object perception suite, and the object perception suite has as its objects mereologically complex individuals of various sorts. This suite’s targets are differentiated from their surroundings, have features and parts, and persist through time. They anchor perceptual attention, enable recognition and reidentification, and ground demonstrative thought. So, mereologically complex individuals play an important explanatory role in human perception across the senses.

Second, there might be no deep metaphysical difference between objects and events. Objects might just be boring events. If the difference between objects and events is merely apparent—for instance, if it is grounded in differing recognition criteria or perceptual approaches—then treating perceptual objects as mereologically complex individuals does not hypostatize the distinction.

Third, it may be that the distinction between object-like individuals and event-like individuals is explanatorily important in capturing perceptual differences. For instance, it could help to capture the difference between vision and audition to say that vision is object-based and audition is event-based. So, I would like to preserve the distinction between object perception and event perception. Nonetheless, I think both belong to a common, explanatorily illuminating type of perceptual capacity directed at mereologically complex individuals. The objects of this capacity typically have been labeled using determinates of the determinable, perceptual objects, such as visual objects, auditory objects, and olfactory objects. Still, the issue concerning what is properly called an “object” for perception is largely terminological. I am satisfied if the notion of a structured mereologically complex individual is perceptually important.

Objection: What if mereological nihilism is true? Then, there are no mereologically complex individuals, so mereologically complex individuals are imperceptible.

Reply: Mereological nihilism is counterintuitive. If nihilism is true, much more than how we should understand perceptual objects needs revision, so it should be unsurprising that we will need to revisit the intuitive claim that we perceive objects. Still, if nihilism is true, we may say that mereologically complex individuals are mere intentional objects or intentional inexistents, or that apparent awareness of perceptual objects is illusory or misleading. The account thereby helps explain in perceptual terms why the commonsense view is so intuitive.

Objection: What if the objects of perception are mind-dependent? If so, we never perceive mind-independent things and thus do not perceive complex individuals composed of mind-independent parts. This account seems committed to a strong realism about perceptual objects.

Reply: The account is designed to be neutral concerning the general metaphysical status of perceptual objects. Suppose the objects of perception all are mind-dependent sense-data or intentional inexistents. If so, perceptual objects are mereologically complex individuals composed of sense-data or mere intentional individuals. However, this account is not compatible with simple forms of adverbialism that do not capture perceptual awareness as of individuals.

8 Concluding remarks

Object perception involves a suite of perceptual capacities that constrains attention, guides reidentification, subserves recognition, and anchors demonstrative thought in distinctive ways. Such capacities include, for instance, distinguishing items from their surroundings, attributing features in a way that solves the many properties problem, and tracking items over time through change. Objects for perception—perceptual objects—are the targets on which such capacities converge.

Objects for perception are mereologically complex individuals. From sense to sense, their structures may differ. This provides a substantive conception of perceptual objects. It circumscribes the targets of object perception where neutral talk concerning intentional objects and objects of perception cannot. Still, it is more permissive than identifying perceptual objects with material bodies, which is too restrictive and leans too heavily on vision. This account places emphasis on the notion of an individual to which attributes and individual parts perceptibly belong.

This enables us to capture the role of objects in perception beyond vision. Visual objects in the first instance include familiar material bodies, ephemera like rainbows and holograms, and scattered things like flocks. Their structures are spatial, and they perceptually appear to persist by being fully present at each time rather than by having temporal parts. Tactual objects include only material surfaces and bodies. On the other hand, auditory objects include sounds and sound sources. Sound streams are structured in time and pitch space, and they perceptually appear to persist by having temporal parts rather than by being fully present at each moment at which they exist. Odors are olfactory objects, but human olfaction is a degenerate form of object perception. Strictly unisensory objects are perceptible only with one sense, but perceptual objects for a modality are not limited to its unisensory objects. So, objects for a modality may include common objects.

Multisensory perceptual objects are mereologically complex individuals with hybrid structure. Some of their parts and features are perceptible through one sense, and some are perceptible through another. Each sense provides a partial perspective on the whole. The complex whole is perceptible as such through the coordinated use of multiple senses. The key is that perceiving a whole does not require perceiving each of its parts or features. Visuo-tactile objects include the material bodies on which vision and touch converge—a subset of visible objects. Audio-visual objects are environmental happenings that involve bodies and include sounds. Flavors are complex, and they are only fully perceptible using multiple senses; however, flavors are properties, attributed to things we ingest, rather than individuals.

So, this account accommodates perceptual objects beyond vision, and it serves as a criterion to help settle whether or not some perceptual capacity is a form of object perception. It also provides a framework for explicating how perceptual objects and their structures differ across senses. Future work that elucidates the varieties of objects for multisensory perception and their differing structures will be a valuable advance in understanding perception’s grip on things.