1 Introduction

Noë and Thompson (2005) distinguish two groups of theories of visual perception: orthodox and heterodox theories. The orthodoxy states that visual perception is a process whereby the brain builds up detailed representations of the environment on the basis of the sensory inputs delivered by receptors (2005, p. 2). The most obvious example of orthodoxy is Marr’s (1982) computational approach, which characterizes visual perception as information processing at three distinct stages of representations, from the primal sketch to 3D representations (cfr. also Frisby and Stone 2007). In contrast with the orthodoxy, heterodox approaches usually reject or downplay the role of representations in visual perception. Paradigmatic examples of such theories are Gibson’s (1979) ecological optics, Ballard’s (1991) animate vision, and O’Regan and Noë (2001a, b) sensorimotor theory (SMT). Defenders of the sensorimotor theory maintain that visual perception is constituted by the active exercise of our sensorimotor skills that obey a set of specific sensorimotor laws (cfr. Noë 2002, 2004, 2009a, b, 2012; O’Regan 2011, 2014; O’Regan and Noë 2001a, b, c; Bishop and Martin 2014).

The SMT has stirred much debate in the last years (e.g. Block 2005; Clark 2006; Flament-Fultot 2016; Hutto 2005; O’Regan and Block 2012).There have been contrasting reactions to some of the SMT’s claims. It is not clear whether the SMT marks a significant “departure from traditional computational functionalism” (the orthodoxy) or should rather be understood as an enrichment of it (Buhrmann et al. 2013, p. 1). Furthermore, although there have been some recent proposals to extend and apply this approach to robotics (e.g. Hoffman 2014; Maye and Engel 2011), the theory suffers from a severe limitation, due to the lack of any formal or operational definition of its key concepts (Buhrmann et al. 2013; Hoffman 2014). Finally, it is not clear which role the theory ascribes to representations. Although defenders of the SMT seem to admit representations—as I will show in the next pages—they also downplay their role in a way that makes unclear why and to what extent the visual system relies on representation.

In this study, I argue that we can throw light on the contribution of the SMT in relation to the orthodox theories by elucidating its explanatory structure. While many aspects of the SMT have been object of intense discussions, to my knowledge, there still is no analysis of the kind of explanation of visual perception provided by the SMT. I will begin with an outline the most important features of the SMT (Sect. 2). A central thrust in my line of argument is to exploit the similarity between the SMT and the dynamical hypothesis (Sect. 3). Indeed, just like proponents of the SMT, some defenders of dynamicism have proposed a radically different way of studying a cognitive system that does not rely on internal information-processing or representations (e.g. Chemero and Silberstein 2008; Van Gelder 1995, 1998; Van Gelder and Port 1995). This claim has fuelled controversy about the explanatory status of dynamical models (e.g. Bechtel 1998; Gervais 2015; Kaplan and Bechtel 2011; Kaplan and Craver 2011; Ross 2015). According to the mainstream view, dynamical models afford covering law or nomothetic explanations that depart from the inherently mechanistic framework that underlies computational or connectionist approaches to cognition (Zednik 2011). I will argue that the standard formulation of the SMT conforms the framework of nomothetic explanation (Sect. 4). This however creates the problem of determining the role of representations within the cognitive system, and exposes the SMT to the “mere description worry.” A closer inspection of the SMT will show that the theory can be upgraded to a mechanistic framework, and that in this way it can eschew some of the problems of the standard formulation (Sect. 5).

2 An outline of the sensorimotor theory

The central notion of the sensorimotor theory (SMT) is that of “sensorimotor contingency,” which O’Regan and Noë define as: “[...] the sensory changes produced by various motor actions [...]” (2001a, p. 941). Of course, vision scientists have long been aware of the importance of sensorimotor regularities in shaping our conscious perceptual experience (e.g. Cliff 1991; McKay 1962). What is distinctive of the SMT, however, is the stronger claim that our perceptual experience is constituted by the active and lawful exercise of our sensorimotor skills: “[...] to see something is to interact with it in a way governed by the dynamic patterns of sensorimotor contingency characteristic of vision [...]” (Hurley and Noë 2003, p. 146). We can capture this claim in the following thesis:

  1. T1:

    Seeing is constituted by the active exercise of our sensorimotor skills.Footnote 1

When an object falls within the reach of the senses, it triggers a sensorimotor reaction in the organism. The exercise of our sensorimotor skills obeys a set of “sensorimotor laws” or “regularities.” It follows that the main task of proponents of SMT is that of finding the laws that govern the sensorimotor contingencies: “[...] we must direct our investigations not on some ineffable inner event, bur rather to the temporally extended activity itself, to the laws that govern this activity” (O’Regan and Noë 2001c, p. 80; my emphasis). According to the original formulation (O’Regan and Noë 2001a, b), the sensorimotor contingencies can be grouped into two categories. The first category is that of sensorimotor contingencies determined by the visual system, whereas the second category is specific to the visual attributes or “features” of the perceived items, such as colors and shapes. The former category of sensorimotor contingencies is “independent of any characterization or interpretation of objects” (O’Regan and Noë 2001a, p. 943), and is the fundamental level of visual sensation. The sensorimotor contingencies determined by visual attributes are specific to visual properties at a perceptual level and are related to the nature of the objects themselves (O’Regan and Noë 2001c, p. 88). The two categories are governed by a distinct set of sensorimotor laws that modulate the corresponding motor outputs that constitute visual perception:

  1. T2:

    The exercise of sensorimotor skills obeys a set of sensorimotor laws.Footnote 2

On this formulation, the theory rejects the fundamental claim of orthodox theories of vision, according to which visual perception would be fully explained by computations or representations in the neural system.Footnote 3 Noë, for example, concedes that the perceptual states exhibit intentionality or aboutness (2012, p. 25), but he denies that perception is constituted by representations.Footnote 4 He clearly rejects the idea of explaining the feeling of perceptual presence by means of the notion of representation (2012, p. 30), and maintains that: “It doesn’t seem to us, when we see, that we represent environmental detail in our heads all at once in the way that detail can be present, all at once, in a picture” (2007, p. 242; cfr. also 2012, p. 31). This can be read as making a claim about perceptual experience: the personal level is not a representation of the external environment. However, in several other passages, both O’Regan and Noë do seem to admit the existence of representations in the cognitive system:

The claim is not that there are no representations in vision. That is a strong claim that most cognitive scientists would reject. The claim rather is that the role of representations in perceptual theory needs to be reconsidered (Noë 2004, p. 22)

[...] I have nothing against representations per se. Information from the outside world must be processed in the brain, and thus it must somehow be represented (O’Regan 2011, p. 64).Footnote 5

(cfr. also Noë 2002, p. 67; O’Regan 2011, p. 62, ft. 1; O’Regan and Noë 2001b, p. 1017; the notion of representation is also used frequently in Philipona et al. 2003). Although they seem to admit representations, sensorimotor theorists are suspicious of the very notion of representation (e.g. O’Regan 2011, p. 64). Indeed, what they seem to object to the orthodoxy are two specific claims. First, the claim that all we need in order to explain perceptual presence is a process whereby the brain constructs representations of the external world. Second, they contest against the orthodoxy the unnecessary postulation of static, photographic, or picture-like representations (Noë call this “snapshot conception” 2004, pp. 35ff; O’Regan “postcard representations” 2011, p. 41; cfr. also Sect. 5.2). As they specify, to think that vision is some form of richly detailed static pictorial representation of the environment is a form of Cartesian materialism (Dennett 1991), i.e. a way of conceiving the mind as a stage where completed representations can be shown to an internal spectator. The problem with Cartesian materialism, in its various forms, is that it simply pushes the mystery of conscious visual perception to a specific brain region where the “magic” happens (for analogous considerations, cfr. Pessoa et al. 1998). The SMT’s non-representationalism is formulated in the next thesis:

  1. T3:

    Perception is not constituted by static pictorial representations of the environment.Footnote 6

The SMT, as we will see (Sects. 4.15.2) lays emphasis on the explanatory role of sensorimotor laws rather then on the generation of static internal representations, and rejects static representations in virtue of the active character of perception.

The notion of “activity” plays a central role in the SMT: the perceiver is an agent in a dynamical context (Noë 2006). In various publications, both Noë and O’Regan urge that the perceiver should be understood as an agent:

It is thus only in the context of an animal’s embodied existence, situated in an environment, dynamically interacting with objects and situations, that the function of the brain can be understood (Noë 2009b, p. 65).

...seeing involves actively interacting with the world. (O’Regan 2011, p. 41).

Over the years, Noë has proposed different names for the theory: in collaboration with Susan Hurley (Hurley and Noë 2003) the theory was called “dynamic sensorimotor approach,” whereas more recently he calls it “actionism” (Noë 2012, p. 23). We can enucleate this claim in the following thesis:

  1. T4:

    The perceiver is an agent that is part of a dynamical system.

In order to make an agent consciously visually aware of an object two conditions must be met. The first condition is that the agent’s visual perception must actively exercise her knowledge of the sensorimotor laws or, to put it in other terms, visual perception only occurs “when the organism masters what we call the governing laws of sensorimotor contingency” (O’Regan and Noë 2001a, p. 939, 2001c, p. 82). O’Regan and Noë urge that the knowledge involved in the exercise of our motor skills is not a form of intellectual or propositional knowledge, but is instead a practical knowledge, a form of know-how (Noë 2004, p. 11, pp. 117–122, 2005a, 2012, pp. 147–151; Silverman 2017). The second condition is that there must be an item in the environment that triggers the perceiver’s sensorimotor reactions (Noë 2005b, 2012, p. 25). In this sense, the sensorimotor theory is a form of disjunctivism (McDowell 1982), the claim that genuine perceptual states are different in kind from merely hallucinatory states. This second condition also lays bare the theory’s vehicle externalism. In Noë’s words: “According to active externalism, the environment can drive and so partially constitute cognitive processes. [...]. The mind reaches [...] beyond the limits of the skull” (2004, p. 221, also 2009b, pp. 67–95). The SMT’s commitment to externalism dovetails nicely with the theory’s non-representationalism: the job of the brain is not that of generating static internal representations (T3), but rather “[...] that of facilitating a dynamic pattern of interaction among brain, body, and world. Experience is enacted by conscious beings with the help of the world” (2009b, p. 47). This leads us to the following thesis:

  1. T5:

    Cognitive processes are partly constituted by the environment.

There is much more to be said, of course, but the five theses introduced so far present the theoretical nucleus of the SMT. It is noteworthy that defenders of the SMT usually bring forward their theses by showing the implausibility of the alternative options. Thus, for example, T1 is never explicitly argued for, but its strength follows from a number of other assumptions, in particular the rejection of picture-like representations.

The SMT purports not only to provide a phenomenologically adequate description of our perceptual experience, but also “to offer an explanation of visual consciousness” (Noë 2004, p. 226) with a robust empirical support. In this study, I will focus exclusively on the explanatory capacity of the SMT. I maintain that the SMT is a form of nomothetic dynamical theory of visual perception, and that this explanatory structure exposes the theory to some challenges. In order to substantiate my proposition, I will now turn to the nature of dynamical system theory.

3 Dynamical system theory and the dynamical hypothesis

Dynamical system theory (DST) is based on the notion of “dynamical system:” a mathematical description of how things change with time (Hotton and Yoshimi 2011). Such systems take the form of models that are expressed by means of differential equations:

[A] typical dynamical model is expressed as a set of differential or difference equations that describe how the system’s state changes over time. Here, the explanatory focus is on the structure of the space of possible trajectories and the internal and external forces that shape the particular trajectory that unfolds over time, rather than on the physical nature of the underlying mechanisms that instantiate this dynamics (Beer 2000, p. 96)

Hotton and Yoshimi (2011) define a dynamical system as a function of the form \(\phi \): S \(\times \) T \(\rightarrow \) S (where S is the set of states of the system and T the set of times) that satisfy the following properties:

  • There is a time \(\hbox {t}_{0} \in \hbox {T}\) such that allows for all states \(\hbox {s}_{0} \in \hbox {S}\)\(\phi \)(\(\hbox {s}_{0}, \hbox {t}_{0}\)) = \(\hbox {s}_{0}\).

  • For all states \(\hbox {s}_{0} \in \hbox {S}\) and all times \(\hbox {t}_{1}, \hbox {t}_{2} \in \hbox {T}\)\(\phi \)(\(\hbox {s}_{0}, \hbox {t}_{1} + \hbox {t}_{2})\) = \(\phi \)(\(\phi \)(\(\hbox {s}_{0}, \hbox {t}_{1}\)), \(\hbox {t}_{2}\)).

The first property says simply that there is a time \(\hbox {t}_{0}\) which is the present state of the system and that each state \(\hbox {s}_{1}\), \(\hbox {s}_{2}{\ldots }\hbox {s}_{\mathrm{n}}\) must be mapped to itself in the present moment. The second property says that future states are uniquely determined by the present state (ibid., p. 446). In this sense, dynamical systems are not opposed to more traditional connectionist and computational approaches, and they do not imply a rejection of representations (cfr. Sect. 5). As an example of a dynamical model, we can describe Thelen et al.’s (2001) explanation of the A-not-B-error. In this famous experiment, 8 to 10 month old infants are placed in front of two containers A and B. A small toy is hidden in container A, and the infants correctly reach repeatedly for the toy until they are habituated to its presence. Then, an experimenter hides the toy in container B in plain view. Yet, in spite of the fact that they have seen the toy being hidden in container B, infants will reach container A. Piaget’s classical explanation was that it is not until they are 12 months old that infants are able to construct reliable mental representations of the perceived objects. Before that, their actions are primarily guided by motor routines. Thelen et al. (2001) called into question the standard explanations: “The A-not-B error is not about what infants have and don’t have as enduring concepts, traits, or deficits, but what they are doing and have done” (ibid., p. 4). They designed a dynamic field model (ibid., pp. 16–20) that traces the evolution of activation levels in the dynamic field as a function of different types of inputs: environmental inputs, task-specific inputs, and memory inputs. However, the model was “neutral as to an anatomical instantiation in the central nervous system; it is a model of the behavioral dynamics” (ibid., p. 28), and it only “captures an integrated behavioral outcome” (ibid., p. 31).

Some proponents of DST in cognitive science (e.g. Beer 2000; Port and Gelder 1995; Van Gelder 1995, 1998), however, argue that the best way to study the human mind is not in terms of information-processing, but with the mathematical tools of dynamical modeling (Van Gelder 1998). This is known as the dynamical hypothesis in cognitive science. Defenders of the dynamical hypothesis emphasize that the DST provides a new research paradigm, in contrast to the traditional computationalist and connectionist approaches in cognitive science. The hallmark of the dynamical hypothesis is the rejection of the computer metaphor of the mind, and therefore of computational and connectionist approaches to the study of cognition. Cognition, it is claimed, is not a process of computing over static, discrete mental representations (Chemero 2000, p. 634; Van Gelder 1998, p. 622). Using the mathematical tools of DST, we may be able to describe or explain (cfr. Sect. 4) the behavior of the target system. In what is widely regarded as the “manifesto” of the dynamical hypothesis, Van Gelder and Port put forward the following claims:

The cognitive system is not a computer, it is a dynamical system. It is not the brain, inner and encapsulated; rather, it is the whole system comprised of nervous system, body, and environment. [DH4] The cognitive system is not a discrete sequential manipulator of static representational structures [DH3]; rather, it is a structure of mutually and simultaneously influencing change. Its processes do not take place in the arbitrary, discrete time of computer steps; rather, it unfolds in the real time of ongoing change [DH1][...] The cognitive system does not interact with other aspects of the world by passing messages or commands; rather, it continuously coevolves with them [DH2]... (Van Gelder and Port 1995, pp. 2–3; my emphases).

According to proponents of the dynamic hypothesis, the novelty of dynamicism is reflected in its explanatory structure. Dynamical systems afford covering-law or nomothetic explanations of non-decomposable systems, whereas defenders of the “orthodoxy” espouse mechanistic explanations (Zednik 2011). I will elaborate on the concept of covering law and mechanistic explanation in the next sections (Sects. 4– 5), for now, it suffices to say that the covering law model achieves scientific explanation of an explanandum by subsuming it under one or more laws of nature plus antecedent conditions. In a dynamical system, the role of laws is taken up by the differential equations that are meant to support counterfactuals (Bechtel 1998, p. 311; Clark 1997, p. 117–120; Walmsley 2008): they tell us what would happen to the system, if things (parameters and variables) had been different. By means of the equations we can “predict and explain subsequent states of the system” (Bechtel and Abrahamsen 2002, p. 267). We can call this the fifth thesis of the dynamical hypothesis:

  1. DH5:

    A dynamical system obeys a set of specific dynamical laws.

Notice that the dynamical hypothesis is a specific philosophical interpretation of DST. Hence, theses DH1-5 are characteristics only of the dynamic hypothesis, rather than of DST in general. The reader will have recognized the similarities between DH1-5 and the theses of the sensorimotor theory (T1–5) (Table 1).

Table 1 The dynamical hypothesis and SMT’s theses

Since the SMT and the dynamical hypothesis share the same theoretical commitments, I claim that the SMT is a version of the dynamical hypothesis. This may cause some confusion, so it needs to be spelled out. The dynamical hypothesis is not a scientific or mathematical approach, it is a philosophical theory, or better a philosophical interpretation of DST. The original formulation of the SMT due to O’Regan and Noë (2001a, b) does not provide mathematical definitions for its key concepts, as we have seen, but the assumptions and terminology suggest not only that the theory is consistent with the theoretical approach of DST, but also that it can be upgraded into a full-blown DST model (Sect. 4.1). This much suffices to my argument. Both proponents of the standard formulation of the SMT and proponents of the dynamical hypothesis contend that their targets are explained by means of covering law explanations. I will exploit the similarity between the two to show that the problems that beset the dynamical hypothesis apply to the standard formulation of the SMT as well.

4 The explanatory structure of the standard SMT

With “Standard formulation of the SMT” I simply refer to the current form of the SMT, in contrast with the mechanistic version that I put forward in Sect. 5. I will first show that the SMT conforms to a covering-law or nomothetic model of explanation (Sect. 4.1), and then I will show that this exposes the SMT to the mere description worry (Sect. 4.2).

4.1 A nomothetic explanation

As we have seen, O’Regan and Noë argue that the sensorimotor contingencies obey a set of sensorimotor laws (T2). The concept of sensorimotor law has not been explicitly defined in the original paper, however, it is possible to form a better idea about the nature of these laws by discussing the two examples advanced in O’Regan and Noë (2001a), these are: eye rotation for the category of sensorimotor contingencies related to the visual system, and visual shape for the category of sensorimotor contingencies related to specific features.

When we rotate our eyes, the stimulations on the retinas are altered in a lawful way determined by the size of the eye movement, the shape of the retina and the nature of ocular optics (O’Regan and Noë 2001a, p. 941). As the eye moves by voluntary control or simply by saccadic movements, the distal stimulus of a straight line is distorted in such a way as to describe a greater or smaller arc. The alteration of the stimulus on the retina depends not only on the eye rotation, but also on the structure of the retina:

When the line is looked at directly, the cortical representation of the straight line is fat in the middle and tapers off to the ends. But when the eye moves off the line, the cortical representation peters out into a meager, banana-like shape [...] (2001a, p. 941, my emphases).Footnote 7

Alterations of the stimulus and the consequent sensorimotor response would be constrained by different structural laws that are specific to the visual apparatus (cfr. also Noë 2004, pp. 107ff). No law is explicitly mentioned in the text, but we can arguably advance few suggestions about the nature of these sensorimotor laws: simple mechanical laws governing eye rotation and optical regularities.

The example of visual shape is an instance of the second category of sensorimotor contingencies that are specific to perceptual features. In a dense passage, O’Regan and Noë argue that shape perception would be «the set of all potential distortion that the shape undergoes» (O’Regan and Noë 2001a, p. 942) when we move in relation to the object or when it is the object itself which moves in relation to us. From these movements, the brain would abstract a set of laws that code shape perception. Shape perception would depend on the laws abstracted by the variations produced by body movements. To illustrate their point, the authors discuss the case of perceptual restoration by surgical intervention on patients born with congenital cataract. Helmholtz for example (O’Regan and Noë 2001a, p. 942) cites the case of a patient who, after visual restoration, feels surprise when first observing that a coin seems to change shape when rotated. According to O’Regan and Noë, the “surprise” felt by the patient is due to her new capacity, enabled by the surgical intervention, to abstract specific laws that govern shape distortion.

In the words of O’Regan (2011, p. 127) the quality of a perceptual state is given by the laws that govern the specific sensorimotor skills. It follows that, in order to explain how an agent perceives, we must find the relevant sensorimotor laws. Before I further elaborate on this suggestion, I will briefly discuss Buhrmann et al. (2013) dynamical model of the SMT.

Buhrmann et al. (2013) remark that many of the SMT’s concepts are unclear, and that this may lead to “practical uncertainty at the time of designing an experiment or modeling the behavior of a robot” (p. 2). O’Regan and Noë (2001a, b) have developed a philosophical theory, which is consistent with the theses of the dynamical hypothesis, but they did not provide mathematical formulations of their concepts, without which no experiment or scientific investigation can be set up. Buhrmann and his colleagues have thus filled in this gap. The result of this operation, it is claimed, provides also several theoretical insights, for example bringing into clearer view the similarities and differences between the sensorimotor theory and ecological psychology (pp. 11–14).Footnote 8

The first step is to define the organism and its environment as a coupled dynamical system that can be described by a set of differential equations. The environment is described by a function E that assigns changes in the values of an environment state e to each agent’s body position or configuration p in the world, taking also into account its own independent dynamics:

$$\begin{aligned} \dot{{\varvec{e}}}= E({{\varvec{e}}}, {{\varvec{p}}}) \end{aligned}$$
(1)

The position vector p describes the body configuration of the agent in relation to its environment. A set of sensors S transforms the environmental states e into sensory states s that modulate the agent’s internal state a.

$$\begin{aligned} \dot{{\varvec{s}}}= S({{\varvec{e}}}, {{\varvec{a}}}) \end{aligned}$$
(2)

The sensor states s also depend on internal factors a. The efferent movement-producing signals m are functions of the internal state, and activate effectors in the agent’s body B that in turn bring to changes in body configuration p (here I follow Burhmann et al.’s description, 2013, pp. 3–4).

$$\begin{aligned} \dot{{\varvec{a}}}= & {} A({{\varvec{a}}}, {{\varvec{s}}}) \end{aligned}$$
(3)
$$\begin{aligned} \dot{{\varvec{m}}}= & {} M({{\varvec{a}}}) \end{aligned}$$
(4)
$$\begin{aligned} \dot{{\varvec{p}}}= & {} B({{\varvec{m}}}, {{\varvec{e}}}) \end{aligned}$$
(5)

Equation (1) describes the agent-environment coupling, (2) the agent’s sensory dynamics, (3) and (4) the internal dynamics, and (5) the body dynamics. Concluding their study, Burhmann and colleagues say that these four kinds of sensorimotor structures present various relevant regularities, captured by the equations, and that «[t]hese regularities [...] are the “laws” or “rules” of [sensorimotor contingencies] that form the basis of» the SMT (p. 14). Not only the dynamical model would capture Noë’s suggestion that «[b]rain, body, and world form a process of dynamic interaction» (2009b, p. 95), in addition, Buhrmann et al. have also refined the SMT, extending the categories of sensorimotor contingencies to four.

The first kind of sensorimotor contingency is sensorimotor environment, and it refers to the set of all possible sensory dependencies on motor states (s, m) for a particular type of agent and environment considered independently of the agent’s internal dynamics (Buhrmann et al. 2013, p. 4). Think for example of how rotations of the head lead to lawful changes in the optic flow on the retina, like expanding when one moves forward, or contracting while moving backwards (cfr. the example of the eye rotation discussed earlier in this paragraph). The second kind of sensorimotor contingency is the sensorimotor habitat: the set of all sensorimotor trajectories traveled by a closed-loop agent for a range of values, taking into account the evolution of internal states a. The regularities of the sensorimotor environment constrain, but do not determine the regularities or laws of the sensorimotor habitat. The first two categories of sensorimotor contingencies are independent of the agent’s functional context. The third category of sensorimotor contingencies is related to regular patterns that play a crucial role in task performance. Burhmann et al. (2013, p. 5) call these stable patterns of task-related activities sensorimotor coordination. These contingencies are “determined by a dynamical analysis of the agent within the context of a given task performance” (ibid.), and often play an important role for task performance in the area of autonomous robotics (e.g. Beer 2003). Finally, the last category is that of sensorimotor strategies: the organization of sensorimotor coordination patterns regularly used by agents because they have been evaluated as preferable for achieving a particular goal (ibid.).

Time to take stock. The examples discussed so far point to a covering-law model of explanation similar to the well-known deductive-nomological model (DN) (Hempel and Oppenheim 1948; Salmon 1989). According to the DN model, explanations are deductive arguments in which the explanandum phenomenon figures as the conclusion (in our case, the agent’s visual perception of the object). Among the premises, there must be at least a law of nature plus some antecedent conditions. Consider a simple example. A DN explanation of the fall of a body is achieved by specifying some antecedent conditions—like the height from which it falls, the body’s mass and structure—plus the law of gravity. Given these premises, we can deduce the explanandum phenomenon, i.e. the fall of a body. Thus, the DN model bestows a central explanatory role to laws of nature. Proponents of the SMT do indeed stress the role of laws, and the necessity to finding out these laws to understand how the activity of perceptual experience unfolds. As we have seen (Sect. 2), this is clearly expressed in the words of O’Regan and Noë: “[...] we must direct our investigations not on some ineffable inner event, bur rather to the temporally extended activity itself, to the laws that govern this activity” (O’Regan and Noë 2001c, p. 80; my emphasis). The example of the covering law explanation of the fall of a body is echoed in the words of O’Regan: “Like the law of gravity that describes how objects fall, these laws [the sensorimotor laws] describe how changes made by our body provoke changes in the information coming into our sensors” (2011, p. 157). In this and the subsequent passages, O’Regan talks about the brain as deducing features of the outside environment:

Suffice it here to say that it is possible to deduce things about the structure of outside physical objects by studying the laws relating movements that an organism makes to the resulting changes in sensory input.Footnote 9 (O’Regan 2011 p. 45, my emphases)

However, just like the brain deduces things about outside items, so do the scientists who, in order to explain vision from a sensorimotor standpoint, build algorithmic models with which we can deduce characteristics of the external environment. (cfr. Philipona et al. 2003). The explanatory structure of the standard SMT may thus be described as follows:

  • LP1: Sensorimotor Laws of the Visual Apparatus.

  • LP2: Sensorimotor Law of Visual Attributes.

  • AP1: Target object O.

  • AP2: Standpoint of the agent.

    • C. Conscious visual perception of O.

LP1-2 and AP1-2 are premises, from which the conclusion C can be deduced. LP1-2 are the two sets of sensorimotor laws corresponding to the two categories of sensorimotor contingencies. Antecedent conditions are specified in AP1-2. These include the presence of an object within the reach of the senses and the standpoint of the agent. (I borrow the term “standpoint” from Campbell (2009), who uses it to refer to various factors, such as the sense modality involved, the relative orientation of the agent, its distance from the object, etc.; cfr. also Philipona et al. (2003), especially the mathematical appendix, where several of these conditions are mentioned, among others: apertures of diaphragms, position of the light, and the euler angles for the orientation of the eye). The scheme is modeled on O’Regan and Noë (2001a, b) version of the SMT, but it can be easily upgraded to accommodate Buhrmann et al. (2013) four categories of sensorimotor contingencies, where the differential equations would play the role of laws.

At this juncture, one worry concerns the status of the sensorimotor laws. The foregoing discussion has shown that O’Regan and Noë are evasive when it comes to the task of clarifying their nature. One potential objection is that the so-called sensorimotor laws are nothing but mere regularities, and since only laws may play an explanatory role, they cannot provide nomothetic explanations. Indeed, the existence and status of laws in biology and psychology is matter of debate (e.g. Dorato 2012), but the problem can easily be sidestepped by adopting a pragmatic perspective. Woodward (2001) for example states that the problem can be bypassed if we “focus directly on the question of whether the generalizations of interest are invariant in the right way” (p. 6). Similarly, Mitchell (2000) maintains that biological generalizations are less stable than physical laws, but that they can nonetheless provide causal knowledge and be used to predict, explain, and guide interventions. Woodward and Mitchell disagree on how exactly to characterize such regularities, as Woodward lays emphasis on the notion of invariance under interventions, whereas Mitchell focuses on the degree of stability (for a discussion, cfr. Woodward 2003, pp. 295–307). For our purposes, it suffices to notice that characterizing the sensorimotor laws as mere regularities does not represent a significant challenge to the nomothetic structure of the SMT.

4.2 The mere description worry and the role of representations

The idea that dynamic systems provide covering law explanations is called the “mainstream view” by Zednik (2011). The SMT, as we have seen, can easily be accommodated within the nomothetic framework of explanation—independently from how we conceive the sensorimotor laws, as strict regularities or admitting exceptions—both in its “pure” philosophical form and the dynamical model. In this paragraph, I will show that it is precisely this nomothetic explanatory structure that apparently justifies the SMT’s non-representationalism. However, it also exposes the SMT to the “mere description worry,” a well-known drawback of covering law explanations.

Let’s start from representations. Notice that reference to representations does not play any significant explanatory role within both the standard SMT and the dynamical hypothesis. This is a feature shared by most dynamical models of cognition: they remain neutral about the actual structure (Clark 1997, p. 118), and (often) make no explicit commitment to representations. Beer makes this clear where he says that whereas computational and connectionist models lay the explanatory focus on representations, dynamical models provide a characterization of the internal states that “does not necessarily have any straightforward interpretation as a representation of an external state of affairs. Rather, at each instant in time, the internal state specifies the effects that a given perturbation can have on the unfolding trajectory” (2000, p. 97). Indeed, the nomothetic structure of the SMT does not require any explicit reference to representations, thus making them explanatory irrelevant (cfr. Chemero 2000; Van Gelder 1995, p. 352).Footnote 10\(^,\)Footnote 11 Both the DST and the SMT take representations to be explanatory irrelevant, and they both endorse a form of externalism. Both approaches put the explanatory burden on laws or regularities, rather than representations or internal states. Since representations are (allegedly) explanatory irrelevant for both the DST and the SMT, we can interpret the latter’s non-representationalism as a form of epistemological anti-representationalism (Chemero 2009, p. 67; cfr. also Chemero 2000). However, as we have seen (Sect. 2), proponents of the SMT seem to assume that the cognitive system does represent the world, and the literature produced in light of the SMT contains many references to representations. But if there are representations, we face the following problem. If representations are explanatory irrelevant for visual perception, then what is exactly their role? If we assume that biological agents are the product of evolution, the capability of forming representations requires a sophisticated cognitive machinery that is able to capture relevant features of the environment and assemble representations. Proponents of the SMT, as we have seen, call for a reconsideration of the role of representations, however, they have not clearly spelled out the relation between perceptual experience and representations.

Another, more serious problem with the nomothetic model endorsed by the standard SMT is the mere description worry. Defenders of the dynamical hypothesis usually say that, since dynamical models allow for testable predictions, the regularities we rely on are explanatory (Van Gelder 1998, p. 625; cfr. also Chemero and Silberstein 2008; Walmsley 2008). Yet, it is generally agreed that prediction does not suffice for explanation. For example, the Ptolemaic system can deliver reliable predictions of the positions of planets in the sky, but it does not explain why they move in this way (Craver 2006). In general, merely predicting and describing the behavior of a system by means of generalizations or laws can be very useful for a variety of purposes (e.g. Craver 2006; Hochstein 2013), but this is insufficient for an explanation. The reason is that phenomenological models—i.e. models that merely describe the observable behavior of the target system, but refrain from postulating the hidden causes behind it (e.g. Craver 2006, p. 358; Frigg and Hartmann 2009)—can afford a limited number of predictions.Footnote 12 Phenomenological models, as I said, can play a variety of helpful roles in scientific investigation, but they are not explanatory. One way to account for the observed regularities and ground the predictions within a robust explanatory framework is to show that the behavior of the system results from the coordinated activity of underlying mechanisms (Andersen 2011). Another way to ground the predictions into an explanatory framework is to hold the observed regularities, or laws, as explanatory. However, opting for the latter strategy exposes a model or theory to the mere description worry, a well-known drawback of the nomothetic model of explanation (for a synthetic discussion, cfr. Craver 2007, pp. 34–40). In short, the problem is that it is unclear why the laws or regularities apply in the first place (Cummins 2000). One way to resist to the mere description worry, while rejecting a mechanistic account, is to stipulate that a covering law account qualifies as a genuine explanation. However, in the words of Zednik: “As long as dynamical explanation is viewed as a form of covering-law explanation [...] the mere description worry looms” (2011, p. 246; cfr. also Gervais 2015).

As we have seen the standard formulation of the SMT is construed as a search for laws, and hence, the mere description worry looms also for the SMT. Mechanisms do provide a way to distinguish between merely phenomenal and genuinely explanatory regularities. Accordingly, in the next section, I will articulate a mechanistic interpretation of the SMT and show that it can reconcile the SMT with the orthodoxy.

5 Towards a mechanistic SMT

Although the mainstream view of dynamical explanation dictates that it conforms a nomothetic model, in recent debates it has been shown that at least some dynamical models are mechanistic (e.g. Zednik 2011; Gervais 2015). In contrast with covering-law explanations, mechanistic explanations are how explanations: they show why a phenomenon occurred by exposing how operating parts arranged in a particular way jointly produce the explanandum [e.g. Bechtel 2008; Craver 2007; Glennan 1996; Machamer et al. (2000); Miłkowski (2013)]. It should be noted that no one doubts that mechanisms explain, the central issue in the debate about dynamicism is whether non-mechanizable dynamical models are also explanatory (e.g. Kaplan and Craver 2011). Some researchers respond in the affirmative. For example, Ross (2015) argues that Ermentrout and Kopell’s canonical model does not meet Kaplan and Craver (2011) 3M requirement (cfr. Sect. 5.1), although it conforms to Batterman’s minimal model explanation. Moreover, it is matter of debate whether all explanatory regularities are such in virtue of underlying mechanisms (e.g. Andersen 2011; Leuridan 2010).

I will not try to settle the grand debate about whether dynamical models are explanatory even when non-mechanizable. My purpose is more modest. Since everybody agrees that mechanisms explain, I will focus on the SMT and show that it is compatible with a mechanistic approach (Sect. 5.1).

5.1 Mechanizing the SMT

Researchers who espouse mechanistic explanations consider the law-like regularities as effects that are themselves in need for explanation (Cummins 2000). Such effects are explained by describing how the mechanism(s) responsible therefore generates them. By identifying the mechanism(s), one can not only identify spurious generalities, but also account for the fact that some “generalizations are explanatory because they describe the causal relationship that produce, underlie, or maintain the explanandum phenomenon” (Kaplan and Craver 2011, p. 612).

There are different concepts of mechanisms in the literature, and the very definition of “mechanism” is object of some controversies, but for my purpose Bechtel’s definition will do: “[A] structure performing a function in virtue of its component parts, component operations, and their organization. The orchestrated functioning of the mechanism is responsible for one or more phenomena” (2008, p. 13; cfr. also Bechtel and Abrahamsen 2005). As Bechtel and Richardson (2010) show, mechanisms are discovered mainly thanks to the heuristics of decomposition and localization. The former consists in either structural decomposition—the discovery of the mechanism’s working parts—or functional decomposition—the decomposition of a complex behavioral phenomenon into a series of simpler behaviors. The heuristic of localization consists in pairing the relevant operations with the corresponding working parts. Of course, the process of discovering and describing a mechanism responsible for a given phenomenon is usually rather complex, as the system under investigation may admit no simple decomposition.

It is not controversial that mechanisms explain, but whether dynamical models can provide explanations that are not reducible or convertible to mechanistic explanations. Kaplan and Craver (2011) main contention is that “[d]ynamical models do not provide a separate kind of explanation subject to distinct norms. When they explain phenomena, it is because they describe mechanisms” (p. 618). Focusing on the SMT, the mere description issue (Sect. 4.2) looms as long as the standard formulation is cast in terms of a purely nomothetic explanation. However, since one of the primary virtues of the mechanistic view of explanation is that “it neatly dispenses with several well-known problems of predictivism” (Kaplan and Craver 2011, p. 606), showing that the SMT can be mechanized would provide the theory with the means to overcome some obstacles, and also throw light on some interesting developments. So, how can we show that the SMT is compatible with a mechanistic framework of explanation? A useful resource is provided by Kaplan and Craver (2011) 3M criterion:

(3M) In successful explanatory models in cognitive and system neuroscience (a) the variables in the model must correspond to components, [operations], properties, and organizational features of the target mechanism that produces, maintains, or underlies the phenomenon and (b) the (perhaps mathematical) dependencies posited among these variables in the model correspond to the (perhaps quantifiable) causal relations among the components of the target mechanism. (Kaplan and Craver 2011, p. 611).Footnote 13

The 3M criterion is designed for cognitive and system neuroscience, but as the authors say, it may easily be extended to other domains of cognitive science (cfr. Kaplan and Bechtel 2011). To reiterate an important point, what is at stake is not whether dynamical models explain or not, but rather in virtue of what norms or explanatory framework they may achieve explanations. I will therefore apply the 3M criterion to the foregoing examples of sensorimotor laws. If the SMT conforms to the 3M criterion, we will be able to show not only that it can provide explanations, rather than mere phenomenological regularities subject to the mere description worry, but also that the SMT represents an interesting complement to the orthodox approaches (cfr. Sects. 1– 2). I will start with Buhrmann et al. (2013) dynamical model of the SMT.

In Sect. 4.1 I have described Buhrmann et al.’s dynamical model. Later in their article, they discuss in more details a minimal agent built following the tradition of minimal cognition models developed by Beer (e.g. 2003) and the Sussex school (e.g. Harvey et al. 1997). Specifically, they introduce a minimal model of active categorical perception represented below (Fig.  1). Categorical perception refers to the activity of partitioning the world into distinct objects with distinctive properties. The continuous signals received by the sensory organs (natural or artificial) are sorted into discrete categories whose members stand in some resemblance relations (for further references on categorical perception, see Beer 2003, p. 210). In Buhrmann et al.’s model, the agent can move horizontally within a one-dimensional environment that contains two bell-shaped gradients with different widths that are detected thanks to a distance sensor. The agent’s task is to move away from the wide-shaped figure and approach the peak of the narrow-shaped one.

In this model, the environmental states e are described by the Gaussian functions that describe the two shapes:

$$\begin{aligned} \mathbf{e} = E(\mathbf{p}) = \hbox {h} \cdot \hbox {e} \frac{\left( {p-x} \right) ^{2}}{2w^{2}} \end{aligned}$$

where h is the height of the shape, x the position of its peak, \(\pm w\) the maxima of the function’s derivative, and p is the agent’s horizontal position. The sensor S transforms the environmental variables e into sensory states:

$$\begin{aligned} \hbox {s} = S(\hbox {e})=1-\frac{d_{max} -e\left( p \right) }{d_{max}} \end{aligned}$$

where \(d_{max} \) is the maximum distance between the agent and the shapes. This serves as the input to a neural network composed by two nodes: a and m. Each node is governed by the following equation:

$$\begin{aligned} \tau _i {\dot{\mathrm{y}}}_{i} =-y_i+\mathop \sum \nolimits _{j=1}^n w_{ji}\sigma \left( {y_{j} +\theta _j } \right) \end{aligned}$$

Here \(y_i \)is the activation of node i, \(\tau _i \) its time constant, \(w_{ji} \) the strength of the connection from node j to i, \(\theta _j \) a bias term, and \(\sigma _j \) the logistic activation function. Further details are not relevant in this context. The equations that describe this simple model meet the 3M requirement. There are four elements, the environment, the sensor, and two nodes. The sensor and the nodes perform a computation described by the corresponding equations. The relations among the equations, also, correspond to causal relations between different mechanisms, e.g. the sensor S measures the proximity of the Gaussian shape include the environmental state e which is in turn defined by the first equation. In other words, the dynamical model described by Buhrmann and collaborators is mechanistic, hence the equations can be used as laws for explanation and prediction because they describe mechanisms.

Fig. 1
figure 1

Buhrmann et al.’s (2013, p. 7) minimal agent model. The agent, represented as a big circle, can sense the proximity s to objects, with different widths w(0.03 and 0.08). The time derivative \(\Delta \)s of the sensor signal provides the input to the node a of the agent’s neural network. The node is recurrently connected to itself and drives the motor node m, which control the agent’s horizontal velocity

Let us now turn on the classical examples discussed by O’Regan and Noë (2001a, b). In these cases, we do not have any mathematical or formal description of the relevant sensorimotor contingencies. The SMT, as described by O’Regan and Noë, is a philosophical theory, and as such it does not provide scientific explanations. Rather, it should be understood as the philosophical blueprint, or more aptly as a philosophical model, defining how an abstract sensorimotor model of visual perception works. It can be shown that this abstract model providing indications about how to construct a sensorimotor explanation is also inherently mechanistic. In other words, I suggest that the blueprint of the SMT is mechanistic. In order to show this, I return to the cases of sensorimotor laws discussed earlier (Sect. 4.1).

Consider the case of eye rotation. As we have seen, the eye movements alter the representation of the stimulus on the retina in lawful ways, but the eye movements themselves are indeed a classical example of a physical mechanism. The system can be decomposed in a number of parts, such as the eye, the orbit, the muscles, etc. each performing a particular operation or being operated on. The variations of stimulus on the retina result from the behavior of the ocular mechanism plus environmental conditions that specify the relevant parameters concerning the light array that impinges the retina.

The sensorimotor contingencies related to visual features represent another case of mechanistic decomposition. O’Regan and Noë (2001a, pp. 941–943) focus on shape perception, where the perception of such a feature would be the result of an abstraction from the set of all potential distortions that shapes undergo under different behavioral standpoints (p. 942). Two things are worth mentioning. The first is that it is difficult to fathom out what it means to say that the brain “abstracts” a “set of laws” (cfr. Sect. 4.1). Hence, it is not clear what kind of regularities may be extrapolated for the purpose of explanation and prediction. The second is that a decompositional strategy seems to be assumed by O’Regan and Noë in that they recognize that there are distinct regularities or laws related to different features of conscious visual perception. This requires some explanation.

O’Regan and Noë state that there is a subset of sensorimotor contingencies which “correspond” to visual attributes of sensed objects in a way that is “neural-code-independent”—i.e. it does not depend on some mysterious quality of the neural information related to the nature of the features (O’Regan and Noë 2001a, p. 942). In short, this means that there are distinct sensorimotor contingencies related to distinct features. Since the sensorimotor contingencies are defined as sensory changes, and that perception is, according to the SMT, essentially active, the distinct sensorimotor contingencies can be interpreted as a kind of operations or activities. Hence, there are distinct activities causally related to specific features of the outside physical objects, like colors, texture, size or shape, i.e. the basic elements out of which our visual perception is composed (for a somewhat old, but still useful review, cfr. Wolfe 1998; Treisman 1988). As O’Regan and Noë remark, then “visual consciousness is not a single thing, but rather a collection of task and environment-contingent capacities, each of which can be appropriately deployed when necessary” (2001a, p. 967). If this interpretation is correct, then, the SMT seems compatible with a decompositional mechanistic strategy whereby one recognizes a set of operations (related to the distinct features), and then tries to identify the component within the system that are responsible for them. This is called by Bechtel and Richardson (2010, p. 18) the ‘synthetic’ strategy that projects from a ‘top-down’ perspective, in contrast with an ‘analytic’ strategy based on the prior identification of the component parts, whose role is subsequently specified. One way to proceed is, for example, by assigning the features to specific functional areas in the extrastriate cortex, as shown by Fellman and Essen (1991). There is, however, a complication. Following O’Regan and Noë, I have shown that they distinguish between different feature-specific operations, but not that they assign these operations to specific subcomponents of the system. How should we understand these subcomponents of the system?

The first step is to acknowledge that, for sensorimotor theorists, seeing is partially constituted by the environment (T5, cfr. Sect. 2). From this, it also follows that although the brain plays a necessary role in enabling perception is no straightforward “one-to-one correspondence between visual experience and neural activations”, this because “seeing is not constituted by activation of neural representations” (O’Regan and Noë 2001a, p. 966; cfr. also Noë and Thompson 2004; Pessoa et al. 1998). The relation between brain and environment is explicitly couched in terms of the dance metaphor, seeing is “somewhat like dancing with a partner” (ibid.; also Noë and O’Regan 2005, p. 567), which suggests that seeing is a process that couples organism and environment. A straightforward implication is that neural correlates of conscious content (e.g. Chalmers 2000) cannot be understood as (minimally) sufficient to generate a specific experience (Noë and O’Regan 2005). This is not to deny that the brain plays a fundamental role in perception, and it does not amount to a rejection of more or less specialized cortical areas. For example, Hurley and Noë (2003) do assume that distinct cortical areas are often associated to distinct kinds of experiences, like intramodal differences—e.g. cortical areas engendering a “red” instead of “yellow” experience—or intermodal differences—e.g. visual instead of smell experiences. Areas normally associated with a certain qualitative character are called “cortically dominant,” whereas areas that, due to neural plasticity, may take over the function of other areas (for example due to lesions, etc.) are called “deferent.” In order to explain the qualitative differences correlated with the distinct areas, Noë and Hurley refer to a “dynamic sensorimotor approach” (2003, p. 146), according to which different cortical areas are attuned to different sources of input. In this complex dynamic process of constant interaction between environment and neural structures, the role of the brain and cortical areas is, as we have seen, to “causally enable [...] our embodied mental life” (Noë and Thompson 2004, p. 19; also O’Regan and Noë 2001a, p. 968).

In short, the SMT acknowledges the following:

  • Objects in the environment having particular characteristics (colors, shapes, etc.).

  • Distinct sensorimotor contingencies for specific characteristics.

  • Different neural structures related to different characteristics.

Hence, we have: distinct components (like cortical areas and perhaps objects in the environment), and distinct activities (the specific sensorimotor contingencies), which are causally related to the relevant components via a dynamic process. Since the theory espouses vehicle externalism (T5), it may be the case that physical objects can also be subcomponents of a broader, extended mechanism(s) (cfr. Sect. 5.2). As Noë and O’Regan say: “Just as mechanical activity in the engine of a car is not sufficient to guarantee driving activity (suppose the car in a swamp, or suspended by a magnet), so neural activity alone is not sufficient to produce vision” (2005, p. 584); and “The mechanical substrate is sufficient only given the embodiment of that substrate in a normal vehicle and the appropriate embedding of that vehicle in a normal environment” (Noë 2001, p. 47). In short, the point is not to deny that there are specialized cortical areas, but how to interpret their degree of autonomy from the environment.

Neither O’Regan nor Noë provide a detailed description of how the components are associated with the activity of sensory changes, and how they are arranged. But this does not threaten the correctness of a mechanistic interpretation. Explaining mechanistically is a process that unfolds over time, and initial sketches of a mechanism often include many black boxes and filler terms that ought to be specified by subsequent research (e.g. Craver and Darden 2013, pp. 64–118). If I am right, however, the SMT strategy of explaining feature perception is paradigmatically mechanistic, since it involves the functional decomposition of a task into a number of sub-operations or functions, and the identification of distinct components, from external objects to cortical areas that are involved in what seems a mechanistic dynamic process (Zednik 2011).

5.2 The SMT as a complement to the orthodoxy

Zednik observes that dynamical models amenable to a mechanistic analysis “resemble computationalist and connectionist cognitive science” (2011, p. 255). Explanation in psychology, for example, often takes the form of functional decomposition of a problem (Cummins 1983), where the subcapacities or subfunctions of the explanandum are assigned to specific operating parts of the cognitive system (Craver 2007) that realize the computations (cfr. Miłkowski 2013, pp. 51–76; Piccinini 2007). Computational models, in other words “specify the component operations of a mechanism that are [...] localized in neurobiological component parts” (Zednik 2011, p. 241). The same lesson can be applied to the SMT as well. Of course, this does not amount to downplay the contribution of dynamical models, and as I have insisted earlier (Sect. 3), DST is not inherently opposed to connectionist and computationalist approaches. Rather, this radical opposition is the hallmark of the dynamical hypothesis, i.e. a specific philosophical interpretation of dynamicism, and of the standard formulation of the SMT. By trying to reconcile the SMT with the orthodoxy, T2 (and DH5) should be read as meaning that sensorimotor laws (or the differential equations of a dynamical model thereof) are explanatory because they describe mechanisms. This claim is not entirely original. Cliff (1991), following Lakoff (1988) criticism of Smolensky (1988) connectionist approach, stressed the importance of sensorimotor contingencies for neural models in order to overcome the problem of ad hoc semantics of connectionist models; further, he observed that the “sole focus on information processing may omit important factors” (p. 34). Moreover, he claimed that whilst the behavior of a system can be studied as the outcome of computation, models that make no direct reference to computations can still be of interest. Writing few years earlier, Grossberg made much the same remarks “Without a behavioral linkage, no amount of superb neurophysiological experimentation can lead to an understanding of brain design, because this type of work, in isolation, does not probe the functional level on which an organism’s behavioral success is defined” (1984, p. 389). Nothing in these remarks amounts to a rejection of the orthodoxy, as advocated by defenders of the dynamical hypothesis and the SMT, but rather its extension. More recently, Hotton and Yoshimi (2011) have adopted DST to model embodied cognition with the concept of open dynamical system in a way that makes it compatible with the orthodoxy rather than in opposition to it.

To say that mechanistic sensorimotor explanations explain the sensorimotor behavior of a system by relying on regularities generated by the underlying operating mechanisms does not mean that a full understanding of such mechanisms is always required. Miłkowski (2013, pp. 53–54) for example states that a mechanistically adequate model of computation must include both an abstract specification of the computation including the relevant variables (the mechanism’s function), and a complete blueprint of the mechanism implementing the relevant computations. But a complete computational explanation is not necessary in most studies on sensorimotor models. For pragmatic reasons, it may be helpful to rely on mechanisms’ sketches (Craver and Darden 2013), incomplete blueprints of the mechanisms. Such an approach can be a forced choice when the structure of such mechanisms is unknown, or when the researchers’ explanatory interest is directed at higher level of organization of the system, or finally when the target system comprises a high number of variables that make a full articulation of the underlying mechanisms difficult or impossible. This is consistent with Cliff’s observation that the behavior of the system can be studied as the outcome of computations even though no direct reference to computation is made.

A further implication for the SMT is that, since the theory can be understood as an enrichment of traditional representationalist approaches, it can more easily incorporate representations within the explanatory framework. Remember that the SMT does not reject representations altogether (Sect. 2), although it puts the explanatory burden on the sensorimotor laws. In the standard formulation, the SMT does not successfully address the issue of representations, as it seem to include them in the cognitive system, but denies that they have any explanatory relevance. Within the mechanistic SMT representations are better integrated within the framework. In the neurosciences, researchers often characterize neural processes as representational (Bechtel 2016). On this reading, much of research in cognitive science consists in identifying the representational vehicles and their contents. This is an important aspect of the heuristic of localizing mechanisms’ operations that are seen as control systems (e.g. Bechtel and Richardson 2010).Footnote 14 Moreover, representations do not need to be understood as static and pictorial (at least not if pictorial is taken to be synonymous with ‘static-photographic’). This observation will not surprise neuroscientists and philosophers of neuroscience, however. The fact that representations are highly dynamic is well acknowledged in neuroscience. For example, Nishimoto et al. (2011) explicitly refer to perceptual experience as dynamic. Shadlen and Newsome (1994) discuss how information may be represented, either in the spike rate of neurons, or in the timing of individual spikes; Eliasmith (2003) describes neural dynamics in terms of neural representations as control theoretic state variables. Mechanisms seem well suited to account for the dynamic character of representations, as they are inherently dynamic. More work is needed to integrate dynamical and active understanding of representations within the orthodoxy, however, and it is likely that this issue will be at the forefront of future research (Bechtel 2012). Finally, given that, as I said, not every computation is a representation we cannot just conclude that every computational process in the SMT and Buhrmann et al.’s dynamical model is related to representations, but the claim suffices to show that if we accept representationalism, then at least some processes can be seen as representations and play an important role in the overall explanation of the system’s sensorimotor behavior.

In conclusion, I want to briefly address the problem of externalism. Limitations of space preclude a full exploration of this issue, but in short, the mechanization of the SMT proposed in this study does not itself constitute a counter-argument to the SMT’s externalism. One way to accommodate the claim that the mind is “wide”Footnote 15 (to borrow a term from Clark 1997) would simply be to say that mechanisms span the boundaries of the skull and organism to extend in the environment. As Gervais observes, mechanisms can account for cases of embodied or embedded cognition (2015, p. 52) where there is a continuous and smooth interaction between an agent and the environment. Zednik (2011, pp. 257–258) seems to accept that mechanistic dynamical explanations may show that the mind is, indeed, extended beyond the limits of the organism. Although I cannot fully articulate a rebuttal of this suggestion here, I want to invite caution in drawing this conclusion. From the fact that a satisfactory explanation of the sensorimotor behavior of an agent must take into account mechanisms beyond the boundaries of the organism it does not follow that such mechanisms are cognitive in any relevant sense of the term. For reasons of space, I leave this issue open for further studies.

6 Concluding remarks

To sum up, if my arguments are correct, the SMT subscribes to the dynamical hypothesis in cognitive science, emphasizing the role of the perceiver as an active agent within a dynamic environment. Proponents of the dynamical hypothesis uphold a nomothetic model of explanation that makes the notion of representation redundant, and give explanatory power to the dynamical regularities described by set of differential equations. The standard formulation of the SMT is also construed along the same line as a search for sensorimotor laws, and relevant passages discussed above show that defenders of the standard SMT think possible to deduce the behavior of a system (its perceptual states) from the sensorimotor regularities. I have shown that if the SMT endorses a covering law model it exposes to the mere description worry, and also generates a puzzle about representations. The mere description worry can be avoided if we show that the SMT is consistent with a mechanistic approach, and that the sensorimotor laws are explanatory because they describe the behavior of underlying mechanisms. I have then argued that both a concrete dynamical model of the SMT realized by Buhrmann et al. and the blueprint of the SMT as outlined by O’Regan and Noë conform the 3M-requirement for mechanistic explanation. The mechanistic SMT has two advantages: it can escape the mere description worry, and it can also better account for the role of representations. This, however, comes with a cost, as my reformulation of the SMT makes it continuous, rather than in opposition to, with the orthodoxy in vision science, and thus provides an answer to Buhrmann et al.’s initial question (Sect. 1).

Again, in the conclusion, I want to stress that opting for a mechanistic approach is one way to avoid the mere description worry. Some covering law models, at least, do seem to be explanatory (e.g. Ross 2015; Sect. 5) and there may be other ways to cope with the mere description worry. I leave open to other researchers to show that the theory does not square well with a mechanistic approach and the orthodoxy. The mechanistic interpretation of the SMT, however, seems a perfectly viable strategy, which finds support in the relevant literature, a strategy that comes with costs and benefits for the standard formulation. The cost is the continuity of the SMT with the orthodoxy. Yet, researchers may want to welcome the latter aspect as a positive feature. My remarks, after all, call for a revision of the SMT in tandem with the orthodoxy, not a rebuttal. Sensorimotor theorists are right when they say that perceivers are agents in dynamical contexts, and ultimately this lesson can lead to mutual insights for both defenders of the orthodoxy and of the SMT. Whether my arguments lead to further consequences for the SMT, will be the object of future studies.