1 Bottom-Up and Top-Down Processing as Overarching Approaches

Anyone walking past the Pepsi shelf in a supermarket in the USA in the 1980s had to be prepared for a surprise: At some displays, two six-packs tumbled off the shelf (sometimes it was a one liter bottle), threatened to fall, but were caught by a motorized device and placed back in a stable position – which happened every 30 s. This mechanism, known as the “tipping can,” is considered an almost legendary (and also award-winning) form of sales promotion (Mullen and Johnson 1990). It is built on few relatively simple assumptions. It assumes that any deviation from what is normal, in particular movement, attracts attention (Garcia-Garcia 2017, p. 107) – and it further assumes that design measures that attract attention are also more likely to be purchased.

These assumptions are followed by various sales promotion measures that are scientifically researched but also developed by practitioners. For example, it seems to be effective to install an additional display away from the actual shelf, which points out the offer. Depending on the product category, sales increase rates of between 77% and 243% are reported for such a measure (Inman et al. 2009, p. 20).

A British company has specialized in measures that ask customers at the point of sale (POS) to actively release a scent. This can be done mechanically by a pump mechanism attached to the shelf, but also electronically (e.g., with sensors), whereby the scent is triggered automatically by a certain behaviour. Both principles deviate from the normal or expected, either by something unusual sticking to the shelf or by an eye-catching installation (see Fig. 4.1).

Fig. 4.1
figure 1

Left: “Poparoma,” which is clamped to the shelf and prompts customers to release a scent by pressing a small bellows. Right: “Bespoke Diffusion,” which prompts customers to behave in a way that automatically triggers the scent (here: “Stop and smell #daisylove”). (Courtesy of © The Aroma Company (Europe) Ltd. All Rights Reserved)

Another question to be investigated in more detail would be whether unexpected background stimuli can increase attention specifically to those objects that match the stimulus, i.e. are “congruent” with it. In a first study, Hehn et al. (2019) explored the question whether a background scent in a store can draw visual attention to a congruent range of goods (Hehn 2007, p. 99). They used a Christmas spice scent in the pre-Christmas season. The target object was a secondary display in the middle of the store, where the corresponding products (gingerbread, speculoos, dominoes, etc.) could be found. The analysis of the eye tracking data showed that in the experimental group with scent (n = 25) not significantly more customers looked at the display, but compared to the control group without scent (n = 55) at least a strong tendency was recognizable. Among the 56 customers who had seen the display, however, the experimental group looked at it significantly longer than the control group. The length of stay in the store was also significantly longer when it smelled like Christmas.

If stimuli and objects now stand out sufficiently from the environment or the expected thanks to their characteristics – according to Gazzaley and Rosen (2018, p. 95) and Meyer-Hentschel Management Consulting (1993, p. 40 f.) salience, surprise and novelty are strong triggers of attention – they can automatically, quickly and temporarily attract attention. This can result in a shift of attention from things that were intended to be focused on to objects that involuntarily grab our attention. In this case, we speak of a bottom-up process (Koch 2005, p. 176; Rösler 2011, p. 83 f.), i.e., a process of perception that starts from “the very bottom,” and that means: from the stimuli of the environment. It can also be referred to colloquially as distraction (Garcia-Garcia 2017, p. 105).

From this, one could derive initial recommendations as to how certain zones in the store should be designed in order to draw attention to product ranges or impulse goods. In this case, one actually assumes a stimulus-driven consumer, i.e., a consumer who has permanently extended his or her antennae for what the environment has to offer. However, recommendations of this kind underestimate our ability to tune out irrelevant stimuli, especially when we are pursuing specific consumption goals. This can be understood using the example of hunger: in this situation, we pay more attention to offers that can satisfy our hunger in a shopping mall, for example, while shoe stores and electronics markets receive no attention (Scheier and Held 2018, p. 111 f.). However, activated consumption goals do not necessarily lead to the total exlusion of irrelevant stimuli from attention. It is just that they are not perceived attentively, but casually (Felser 2015, p. 126; Kroeber-Riel and Gröppel-Klein 2019, p. 163 ff.), which is not entirely without consequences, but is irrelevant for the probability of purchase for now.

When people pursue specific consumption goals, they control their attention willfully. In this case, we speak of top-down attention (Gage and Baars 2018, p. 298); what is perceived is controlled “from above,” so to speak. In this process, basic individual motivations as well as our knowledge about the properties of objects (such as shapes, colors) or even certain tasks selectively direct our attention to that which brings us closer to our goals (Koch 2005, p. 175; Scheier and Held 2018, p. 102).

1.1 From Bottom-Up to Top-Down

So consumer behaviour is drive by both processes: bottom-up and top-down. One is stimulus- or data-driven, the other goal-, motive- or concept-driven. All environmental interactions are accompanied by the dynamic interplay of these two processes. The winner of this showdown exerts the greatest influence on our perceptions and actions (Gazzaley and Rosen 2018, p. 95).

Whether a surprising element now generates attention depends on the top-down modulation (existing goals or tasks) of the perceivers.

Example

This is impressively shown by the experiment of Simons and Chabris (1999): Participants watch a short video of two basketball teams in black and white jerseys, respectively, passing balls to each other. The subjects’ task was to count the white team’s passes (top-down attention). After about 20 s, a person in a gorilla costume walks into the picture, stops briefly, turns to the camera, drums his chest – gorilla-style – and then calmly walks out of the picture again. Observers can usually tell the correct number of passes, but they often don’t notice the gorilla. About half of the observers miss this highly surprising event (the original video and other footage can be found at https://www.theinvisiblegorilla.com/videos.html, last accessed 04/15/2020).

So it will be similar for the “gorillas” in the salesroom, be they a highly conspicuous display material like the “tipping can” or other unexpected things and events. When customers have a specific goal in mind while in the store, one must expect to tune out information that has nothing to do with that goal. For example, Glaholt et al. (2010) cite studies that eye movements are strongly influenced by what is later chosen. By all accounts, the choice is fixed early on and shapes how gaze is distributed among the options. This would then be an argument for top-down control of the eye movement: The goal determines where I look – and that goal (i.e., in this case, the later choice) can be influenced only slightly from the outside. Admittedly, one could interpret the results differently: That which attracts the most fixations is chosen in the end.

Example

A study by Pieters and Wedel (2007) from the field of advertising supports the first explanation: the task of memorizing the ad led to high viewing times for the advertising images, while the goal of learning about the brand increased attention to the text. This so-called Yarbus effect states that the viewer’s attention allocation is determined primarily by their goals. A replication of Yarbus’ 1967 study conducted by DeAngelus and Pelz in 2009 was able to confirm the core of these results.

The probability of overlooking even very conspicuous things and events varies, of course, and some of the influencing factors are also of interest for store design. Of particular interest in Simons and Chabris’s (1999) experiment was a finding that contradicts some classical notions of attentional control: the gorilla was spotted more often when the black team, rather than the white team’s passes, had to be counted. In other words, the more similar the gorilla was to the environment on which viewers’ attention was focused, the more likely it was to be noticed. In many contexts, the opposite is evident: things that look exactly like their surroundings virtually “sink” into them – attention tends to be given to stimuli that stand out strongly from their surroundings (e.g., Johnston and Hawley 1994). The gorilla findings, on the other hand, show a different aspect of attentional control: when focusing on something, it is relatively easy to tune out things that are very different from what is relevant at the moment. In contrast, what is reasonably similar to the relevant information may still be noticed, even if it is actually irrelevant.

Apparently, when scanning the environment, a subconscious quick check takes place before the stimulus can be discarded (Garcia-Garcia 2017, p. 106; Milosavljevic and Cerf 2008) – and a stimulus that is very dissimilar to the target may be discarded after much more superficial scrutiny than a similar one. Accordingly, what should get attention at the POS should not simply be flashy, unusual and novel. It should have a minimum similarity with what the customer is looking for so that it is captured by their search strategy.

Another condition for detecting the gorilla is not surprising: it is more likely to be overlooked the more difficult the current task is. For example, in the experiment of Simons and Chabris (1999), some of the observers were asked to count separately the passes made across the ground and those thrown through the air. Doing so additionally decreased the probability of noticing the gorilla. Presumably, then, customers who simply want to browse – which arguably corresponds to an “easy task” – would be more receptive to sophisticated methods of directing attention.

And another condition seems to be interesting: age. A robust finding in aging research shows that as we get older we find it increasingly difficult to block out irrelevant things (“loss of inhibition,” Hasher and Zacks 1988). As older people, we become more distractible (which is one of several aspects that makes older consumers more impressionable than younger ones, for an overview see e.g., Felser 2018). Thus, potentially older customers would be less focused and more receptive to environmental stimuli designed to engage attention. Similar findings apply to children and adolescents, in whom increased distractibility is attributed to the protracted development of neural mechanisms for cognitive control (Gazzaley and Rosen 2018, p. 120 f.).

1.2 Targets and Distractions: Impulse Buying

If we claim here that the perception and attention of customers is dominated by their goals, then at first glance it seems inexplicable why we quite often buy things that we have not planned to buy. Asked whether clothing is bought more spontaneously or more purposefully, 32% of respondents said they buy spontaneously or more spontaneously (Statista 2017). Other estimates suggest that a good two-thirds of purchasing decisions were unplanned and only made in the store itself (Inman et al. 2009). If everyone only has eyes for what they actually wanted in the store, then they should actually be able to successfully block out the potential “seducers.” How do they still come into focus and get enough attention to make a purchase?

There are several answers to this. One of them could be formulated in “gorilla terms” like this: Many of the unplanned purchases concern products that are not so different from the actual target that they can easily be ignored, just as the black gorilla was just not sufficiently different from the black jerseys that needed to be paid attention to. They are therefore – noticed and examined and thus receive the necessary attention.

In general, it is of course true that the existence of a shopping goal, for example in the form of a shopping list, reduces the amount of impulse buying (Baumeister 2002; Inman et al. 2009). On the other hand, however, the “goal” by no means always consists of working through a shopping list. According to an Austrian study, only 28% of respondents strictly follow a shopping list when making purchases for daily needs (Schwabl 2019, p. 5). Whether with or without a shopping list, the actual goal of many customers in food retailing, for example, is a “supply purchase” – and this goal corresponds to significantly more things than those on the list.

However, another important aspect is the potential for rewards through shopping: unplanned purchases often concern things that are fun, so-called hedonic goods (e.g., Inman et al. 2009). And this is the result of at least two causes.

On the one hand, shopping is tiring, and this does not only mean possible physical exertion, time pressure or whining children. In fact, the mere task of having to make decisions is exhausting (Vohs et al. 2008). Baumeister (2002) assumes that after prolonged stress, people are less and less able to suppress impulses and exercise self-control. Indeed, it can be shown that impulsive purchases (i.e., those caused by outside stimulation rather than the person’s goals) increase when consumers are strained, stressed, fatigued, or exhausted (Vohs and Faber 2007).

The second point is also related to this: People are also increasingly open to rewarding themselves for their efforts when they are stressed for a longer period of time. Impulse purchases are therefore more and more likely at the end of a stressful day, be it in the evening or at the end of a shopping trip, whenever it takes place. This is one of several arguments for placing products in the area around the checkouts, or where customers presumably go last, that are not primarily useful, but above all fun.

According to Baumeister’s (2002) theory, all strain consumes self-control resources, not just those from shopping. This aspect of his theory has been heavily criticized in recent years (see Friese et al. 2018 for a review). It may not be a necessity that people make increasingly impulsive and uncontrolled decisions the more exhausted they are. However, many believe it is. And those who believe of themselves that they deserve to indulge themselves after a busy day of shopping, who believe that one cannot demand to be in control throughout the day, will also buy impulsively in the appropriate situation – even if their self-control resources are still sufficient (Job et al. 2010).

In other words, it often matters less how people actually function than how they think they function.

2 Consumption Targets, Signals and Reward System

From the previous explanations, we can deduce that the intended consumption goals have a significant influence on our (selective) perception of the POS (top-down attention). Under certain conditions, external stimuli at the POS distract from the pursuit of the goal (bottom-up attention), but this does not always trigger an impulse purchase. The connection between goal pursuit or existing consumption motives and top-down processes is deeply rooted in us and, according to Gazzaley and Rosen (2018, p. 80 f.), can be explained as follows: Stimuli that are particularly relevant for goal attainment have been previously stored in memory and are represented in the areas that encode these stimuli. If our ancestors roamed through a forest with the aim of looking for edible berries, a roundish-blue pattern was represented in the visual cortex and a typical smell in the olfactory cortex. As soon as they spied blueberries, a neural representation typical of this stimulus strengthened in visual cortex, while at the same time suppressing representations of irrelevant information, so that the corresponding shape stood out more clearly from background actions than in the absence of an explicit target. Suppressing irrelevant information (ignoring) and reinforcing relevant information (focusing) in the brain achieves a stronger contrast so that relevant signals gain salience and we can act effectively even on subtle information. This contrast between neural patterns in the brain significantly determines how we experience our environment based on our goals.

In order for us to be able to reliably recognize spontaneously occurring opportunities and dangers even at the highest level of concentration, we must at the same time be sensitive to distracting bottom-up stimuli, which can play a dominant role in our consciousness for a certain period of time when they are detected (Gazzaley and Rosen 2018, p. 40 f.). For example, it would be extremely deleterious if we were to walk down a street, lost in thought, and fail to notice the movement and honking of a car. However, the default mode in which we find ourselves most of the time is the top-down mode, because it is the only one that allows us to pursue our goals in a focused manner (Gazzaley and Rosen 2018, p. 38 f.).

The nature of consumption goals can be explained in terms of three very basic motivational systems, as described in the Zurich Model of Social Motivation (Bischof 1985): The security system ensures our striving for connection, safety, security and care. Behind the excitement system is the desire for novelty, excitement, variety, discovery, and enterprise. The autonomy system makes us strive for control, power, assertion, and independence. If we achieve our consumption goals or satisfy our motivational systems, we experience this positively (e.g., as a feeling of security, fun or superiority), whereas failure to achieve leads us into an imbalance of negative experience (e.g., fear, boredom or anger; Häusel 2004, p. 31 f.). These motivational systems are already ingrained in us from an early age, but they vary depending on personality and situation. For example, there are people who like to discover new things (pronounced excitement system) and others who always stick to the tried and tested (pronounced desire for security; Scheier and Held 2018, pp. 99, 103).

Scheier and Held (2012, 2018) have taken up this model in order to derive indications for the design of brands and communication measures. Their considerations could in principle also be used for assortment and store design. They explain the choice of certain brands and the non-choice of other brands with the motives that the brands address. We try to assess whether a brand is relevant by looking at the brand codes used, which include sensory codes (e.g., lighting conditions, shapes, colours, smells, sounds, haptics) that give us a sensory experience (Scheier and Held 2018, p. 72). If an existing motive is addressed by the appropriate brand codes, then the corresponding brand is perceived as relevant to achieve one’s consumption goals with it. Via this mechanism, a particular motive can lead to preferential processing (selective perception) of the codes that presumably bring us closer to our consumption goals (Scheier and Held 2018, pp. 103–115).

But which codes are appropriate to address certain motivations? The answer is quite obvious and coincides with the mechanisms of top-down information processing described earlier: We implicitly learn that things belong together when they repeatedly occur together.

Scheier et al. (2010, p. 56 f.) refer to this as the statistics of the environment: “what fires together wires together.”

The representation of certain stimulus features in the brain is based on the expectation that we can achieve our consumption goals with these stimuli, which in turn leads to selective attention for these learned stimuli. Consequently, the task for store and assortment design is: take into account the sensory (and other) codes that are associated with the prevailing motivational systems in the target group. For example, an exclusive American fashion chain specifically uses such codes that young people are familiar with from clubs and discos (dimmed lights, loud music, spotlights) in order to appear attractive to young hip people who want to experience something in nightlife with like-minded people (excitement motive). In this way, differentiation from the competition can be achieved via the motive systems, as is also possible in brand positioning (Scharf et al. 2022, p. 133 f.).

Now there is still the question of why people persist in trying to achieve their consumption goals. “Goals are desired states and achieving them is rewarding. We use products to achieve these rewarding goals” (Scheier et al. 2010, p. 87). And the better customers are now able to achieve their goals with a brand, the greater the appeal of that brand will be (Scheier et al. 2010, p. 117). In essence, the argumentation is that the meaning of the codes used by the company is first (implicitly) decoded and then promises reward, provided that the interpretation of the codes ties in with an existing motivational situation. Thus, the brand becomes relevant for the target group, its codes are not only simply liked, but furthermore they signal reward (Scheier and Held 2012, p. 151; Scheier and Held 2018, p. 110 f.). Thus, the assortment composition (which brands are offered and how are they coded?) as well as the store design and the store concept (with which differentiating codes can I address certain shoppers with homogeneous motives?) can provide for a preferential perception of consumers. The prerequisite for this, however, is that the corresponding motive systems are addressed in a clearly recognizable way and that activating bottom-up stimuli at least do not deviate from them (Kroeber-Riel and Esch 2015, p. 285).

The mechanism of the reward system is quickly explained: if we have experienced a certain action, a situation or an object as positive, this is marked accordingly in the brain and we tend to repeat this action or come to this situation or object again. If the reward stimulus is not currently attainable, then the desire for the reward increases (Gage and Baars 2018, p. 375). Functionally, the system allows us to make predictions about the success of an upcoming action. According to Roese et al. (2017, p. 214 f.), the evaluation of the offer then takes place in three stages: based on our expectations shaped by experience, we try to predict the potential reward based on the available stimulus features (anticipated reward: “can I possibly achieve my consumption goals with this offer?”). The consumption stage is followed by the product experience and thus the consequences of the choice made (experienced reward: “this offer is a hell of a lot of fun!”). In the final step, the system is updated to adjust future predictions based on the earlier prediction and the experienced (“my expectations were disappointed, unfortunately the product does not bring the fun I had hoped for”).

What holds the promise of reward? In addition to what has rewarded us in the past, this also includes, as another strong driving force, the pursuit of novelty, as it brings potential benefits for our survival (Gazzaley and Rosen 2018, p. 25; Wittmann et al. 2007), whereby this is more (dominant arousal system) or less pronounced (dominant safety system) depending on the personality. The neurotransmitter dopamine, which is often mentioned in this context, is released in increased concentration especially when unexpected reward occurs (e.g., through something surprising, new) or something was better than expected (Roese et al. 2017, p. 214), thus also marking this experience for the future. This describes on a neuropsychological level what psychology knows as learning according to the reinforcer principle (Felser 2015, p. 62 f.). Here the circle closes again and explains the constant conflict between top-down and bottom-up information processing.

3 Multisensuality: When the Whole Is More than the Sum of the Parts

From your last holiday in Greece you took a few bottles of the wine that you always enjoyed so much at dinner. Now you try a bottle in your living room at home. It is quite possible that you will be disappointed. The wine was probably not so great as an isolated taste experience, but rather as an overall experience together with the view of the Aegean Sea, the warm summer breeze on your skin, the light smell of lavender from the bush next to your table in the restaurant and several other little things that made your holiday so unforgettable.

A single stimulus input rarely works on its own, but always in combination with others.

And this effect does not only take place in the mind or even “in the imagination,” it can be very physical and tangible. For example, it can be shown that tolerance to drugs depends on the environment in which they are consumed. The same amount that woould be “hamless” when consumed in a familiar environment can be a dangerous overdose in another environment (Siegel et al. 1982). Very automatic bodily responses already ensure that a consumption experience does not depend on one input alone. So it may well be that what you see or hear makes the wine taste different.

This phenomenon, namely that the effect of one sensory perception depends on another sensory perception, is the essential core of a multisensory experience. This is at least the understanding we follow here (see also Fulkerson 2020). With this understanding, we would like to distinguish ourselves from another understanding, which could perhaps be outlined as follows:

Let’s imagine that you could increase the length of stay of customers in the shop by 10% with a certain colour design. Furthermore, one could show that a certain scenting keeps the visitors in the shop by 15% longer. If you now use both at the same time, i.e. colour and scent, you achieve an increase of 25%.

If things were that simple, then the overall effect of the measures would be nothing more than the sum of their individual parts. That is conceivable, and one could also call that a multisensory effect. But normally by this term one means not an additive but rather a multiplicative effect of the individual sensory inputs (Drewing 2017, p. 77), whereby, for example, ambiguous sensory information on one channel is made unambiguous by information from another channel, or else the integration of different inputs creates a completely new event of its own. An example of the former would be the visual input through the train window to the slowly moving train on the neighboring track. We can decide whether we are moving or the neighboring train is moving once the vestibular input through the organ of equilibrium is added. An example of the latter would be the McGurk effect: if we see the mouth movements for “ga-ga” in a face, but hear “da-da” at the same time, this is integrated into the perception of “ba-ba” (both examples cited in Drewing 2017, p. 76 f.). Such changes in perception make it plausible that, for example, food and drink do not always taste the same, depending on which contextual information is added. Incidentally, this contextual information need not itself be sensory impressions. Cognitions, i.e., expectations or thoughts, can also change sensory experiences (Litt and Shiv 2012), to this extent the concept of multisensory also includes a non-sensory component, cognitions.

Where effects of multisensuality occur, apparently “the whole was more than the sum of its parts.” Gestalt psychology adopted this Aristotelian phrase as its motto a good century ago (Müsseler 2017, p. 38). Bottom up processes typically lead to the isolated and cumulative effect of individual stimuli, and this is then initially no more than the sum of its parts. It is thanks to Gestalt psychology that we know: Bottom-up perception is the exception, and top-down is the rule.

4 Multisensuality in Application: The Example of Background Music

Music in the retail environment is a relatively easy marketing tool to use and it has been well researched (Spence et al. 2014, p. 475). Consumers also seem to generally prefer music to no music, although this preference is – unsurprisingly – stronger when it comes to personally preferred music (Garlin and Owen 2006). Moderately complex music performs best; with a stronger affinity for classical music, the preferred music can also become more complex (Spence et al. 2014, p. 475).

Music is an interesting topic not only because of its relatively easy applicability in the given context. The example of music shows particularly clearly what it means to work multisensually or to combine different sensory dimensions. This already begins with the finding cited above that music has a different effect depending on whether one likes it or not (see Antonides et al. 2002; Caldwell and Hibbert 2002, for further examples) or whether one knows it (Yalch and Spangenberg 2000).

Music is an excellent example of the problem of a multisensory perspective. In fact, research on the psychological effect of music comes up against almost insurmountable limits, for it is, after all, difficult to impossible to consider the individual components of music, be they beat, rhythm, pitch, timbre, instrumentation, etc., in isolation. As soon as it is supposed to be real music, one has to consider a combination of all these elements, and as soon as one tries to isolate one dimension, it is – if isolation is possible at all – in any case no longer music. Thus, most research on the psychology of music consists of comparing already finished pieces, e.g., German versus French music (North et al. 1999), without being able to name an isolated feature that makes this music “German” or “French.”

The following comments summarize some findings from research on background music in marketing contexts. The intention is not only to make recommendations for the use of music. More than that, we would like to cite music as a particularly succinct example of a marketing tool whose effect can indeed only be understood from a multisensory perspective.

4.1 Effects of Musical Tempo

Relatively easy to isolate are – apparently anyway – characteristics such as tempo or volume. Musical taste is also unlikely to have much influence on the question of whether a piece is fast or slow. The findings on the tempo of music therefore seem to be correspondingly clear: the work of Milliman (1982, 1986) indicates that customers stay longer in the sales environment with slow music (less than 72 beats per minute) than with fast music (more than 94 beats per minute). In the case of a supermarket, at least in Milliman’s (1982) study, this effect also translated into more money spent. But even in the case of a restaurant, the findings were somewhat more nuanced: patrons did not eat more when the music was slow, and the longer stay was reflected only in the amount of drinks ordered. Other studies, such as Caldwell and Hibbert (2002) could not prove an effect of the musical tempo, but one for the popularity of the music with the respective guests.

Nevertheless, tempo still seems to be the closest thing to a musical variable that already influences customer behaviour, even in isolation and independently of other characteristics.

This may also be due to the fact that musical tempo has direct physiological consequences: fast music can increase heart rate, blood pressure and breathing rate, especially if listeners also feel subjectively more activated by the fast music (Knöferle et al. 2011, p. 327).

Stronger activation is incidentally also one of the reasons why – in addition to a high tempo – a high volume is not recommended (Spence et al. 2014, p. 475). However, here too, the available evidence only shows that customers stay in the store for less time with loud music, but not that they would be more dissatisfied or buy less (Smith and Curnow 1966).

Nevertheless, the seemingly so easily isolatable characteristic of musical tempo apparently does not act independently of other variables: Knöferle et al. (2011), for example, show that increased sales figures for slow music only apply to minor keys. According to the authors, this effect is based on culturally predetermined fit: “slow” tends to fit minor, “fast” tends to fit major, and when “slow” and minor now come together, this is experienced as pleasant thanks to the fit. Thanks to the slow tempo, people stay longer in the store and can therefore spend more. It is true that major and “fast” also fit together and are experienced more pleasantly than major and “slow.” However, here the higher tempo also ensures less time spent in the store, and the positive effect of the fit cannot be reflected in increased sales figures.

The cited findings on musical tempo have another problem: Normally, different musical tempos are also represented by different pieces of music. Fast music is therefore not simply faster. It consists of a completely different piece from the outset and therefore also differs from slow music in many other respects.

Oakes (2003) attempts to circumvent this problem by presenting the same pieces both slowly and quickly. Oakes’ work also differs from previous studies in two other ways. First, he adjusted the tempos presented to the subjective perceptions in the target audience (in his case, students). His preliminary studies had found that a pace of about 114 beats per minute was already experienced as slow and an average of 145 beats per minute was considered fast. Secondly, Oakes collected the assessments of his participants in the situation itself and not – as in most other studies – in retrospect.

These are huge methodological advantages of this work over others. However, the study focuses on waiting times and not on the stay in a store where shopping and spending money. The results show that the same music in its slow version resulted in a significantly more positive evaluation of the situation. This is probably largely due to the fact that slow music had a more relaxing, i.e., less activating effect, but above all also led to an underestimation of the waiting time. This latter point will be taken up again below when we discuss music and the experience of time (Sect. 4.4.4).

4.2 Fit Effects of Music

The findings of Knöferle et al. (2011, Sect. 4.4.1) cited above have already shown that the fit with other environmental characteristics plays a dominant role in the effect of music, and what fits is culturally predetermined.

Example

This is supported by the following examples: Regardless of musical taste, classical music signals high quality and sophistication and thus fits well with stores that tend to sell expensive, infrequently purchased goods, such as jewelers (e.g., Grewal et al. 2003). North et al. (2003) show that classical music, as opposed to pop or no music, encouraged patrons of a restaurant to spend more money. Again, classical music per se is not expected to induce higher willingness to pay under all circumstances. Rather, the effect is based on the fit between music and product.

Thus, fit seems to be a very important evaluation criterion. One of the most famous works on this topic shows that in a wine shop, the wines sold were preferably chosen to match the music played: In weeks when mainly French music was played, French wine also sold better and when German music was played, accordingly German wine (North et al. 1999). Guéguen and Jacob (2010) were able to demonstrate similar effects for a flower shop: Here, sales were better when romantic as opposed to pop music or no music was played.

“Romantic” is apparently a trait dimension on which music and product can be congruent. Other dimensions are revealed by the experiments of North et al. (2016). For example, they compared classical to country music and found that both recall and purchase intention were higher when the characteristics of music and product were congruent. Examples of characteristics studied include “sophisticated,” “refined,” “formal,” and “expensive,” which are better expressed by classical music, and “pragmatic,” “simple,” or “necessary, essential,” which are more associated with country.

A fit with the image of the store also seems to have a positive effect: Vida et al. (2007) show that customers stay longer in stores where the music played there matched the store’s image. However, the authors did not manipulate the fit, but only considered the subjective judgement of the customers. In any case, they stayed longer in the store if, in their opinion, the music and the image of the sales location and the music matched.

Example

An example of fit across different sensory modalities is provided by Spangenberg et al. (2005). They investigated the content congruence of scent and music. They found that evaluations of a store were more positive when – in the appropriate season, of course – a Christmas scent was released and Christmas music was played at the same time. Ratings were lower when the Christmas scent was combined with non-Christmas music.

The cited findings clearly show the enormous weight that cultural conditioning and experience have on the effects of music: whether classical music has anything to do with sophistication, indeed what constitutes classical music at all, what Christmas smells like or what a wine culture has to do with nationalities and what makes typical music for a certain nation, these are all highly presuppositional questions to which there is evidently an answer only in certain cultures and only against certain backgrounds of experience. In this respect, we are far from being able to identify isolated effects of music that are independent of other effects.

The fact that musical design recommendations depend on a variety of conditions is also evident – in a different sense – in the work of Mattila and Wirtz (2001). They manipulated the activation potential of music and smells. For the evaluation of the respective environment, it was primarily important that music and smell increased or decreased activation to the same extent. It was less important whether the activation was concordantly high or low. The findings cited above tend to indicate that the use of fast or loud music is less favourable. To all appearances, however, the disadvantages are mitigated when other environmental factors and stimulus inputs are similarly activating as music.

4.3 Cognitive Effects of Music

Music can be distracting. Accordingly, the stimulation provided by background music can quickly become too much, especially when purchasing decisions of high complexity are to be made (Park and Young 1986) or when customers have to interact a lot with staff (Dubé et al. 1995).

Particularly strong distraction can be expected from vocal music, that is, music with singing. The human voice and spoken sung or spoken-direct attention away from a given task (e.g., Zatorre et al. 2002). Kang and Lakshmanan (2017) examine what this means for clients exposed to music in several separate experiments. Vocal music – as opposed to instrumental music – caused lower attention and subsequently poorer recall of product features or prices. This effect was particularly fatal in the perception of special offers. When a product is advertised at a reduced price, consumers underestimate the savings anyway (Gupta and Cooper 1992) – presumably because they assume that advertising always exaggerates. Therefore, if you claim a price reduction of 20% in your shop, you should not expect your customers to believe that they will really save 20% on their purchase. In most cases, customers deduct a kind of “advertising malus” and estimate the actual savings to be significantly lower than the claimed savings (Schindler 1994). This effect is amplified when customers are distracted. In Kang and Lakshmanan’s (2017) experiments, recall of a 25% discount was sensitively disrupted by background vocal music – to the point of ignoring the special price altogether. Vocal music can thus cause the effect of a discount or special offer to fizzle out.

Here it is worth looking again at the processes on which the distraction effect is based. We have already seen from the “gorilla example” (Sect. 4.1.1) that we are more easily distracted by similar information than by dissimilar information. In the case of vocal music, this principle shows up again, and on a much more specific level. What we mean to say is that when linguistic information is to be processed, this processing is also particularly disturbed by further linguistic information. The disturbance is already less if visual information is to be processed in parallel. For example, people can remember sentences badly if they are to speak further sentences in parallel. The ability to remember sentences is already less impaired if they are to press colourful keys in a certain order in parallel, for example (e.g., Buchner and Brandt 2002, p. 528 f.). This means: Vocal music disturbs, as far as it stresses the processes for language processing and these are just needed.

This gives rise to a set of limiting conditions. Vocal music should not interfere if no words are sung – presumably it would not even interfere if listeners did not understand the sung language (although this was not investigated by Kang and Lakshmanan 2017). Similarly, vocal music was not distracting when the necessary information was presented visually. Using price as an example, recall is particularly poor when a discount is presented purely verbally, e.g., via loudspeaker announcement or a “20% off” label. Recall is better with the help of purely visual cues such as a red price tag. Arabic numerals (e.g., “−20%”) are also better remembered than words (Kang and Lakshmanan 2017).

4.4 Experience of Time and Music

For the retailer, it is important how long customers stay in the store, because this is one of the most important determinants of how much money they ultimately spend (e.g., Vida et al. 2007). For this reason, the experience of time is also important. In general, background music seems to tend to shorten an experienced time (Antonides et al. 2002; Oakes 2003), but whether and to what extent this happens depends on various boundary conditions.

In relation to the popularity of music, contradictory findings on the experience of time emerge: Some studies suggest that, for example, a waiting period is perceived as longer when preferred music is played, others argue for exactly the opposite (see Oakes 2003, p. 687 f., for an overview). The contradiction already appears less dramatic if one considers the mental processes that are triggered by music: When people pay particular attention to an event and perceive a particularly large number of details in the process, a higher density of information also emerges in memory. If a popular music triggers such attention, it will also produce more differentiated memories – and under this condition, the time spent with a popular music will at least appear longer in the memory. This is based on a certain idea of how our memory works.

Milan Kundera (1999 cited in Ahn et al. 2009, p. 508) said about the functioning of human memory: “Memory does not make films, it makes photographs.”

In the course of an event, the memory shoots different photos as “markers.” But what makes a proper occasion for the taking of a photo and thus for putting a mark on a certain event? These occasions are above all changes in our mental or physical environment, i.e., thoughts, memories, flashes of inspiration, feelings or even external events. To estimate the duration of an event, the memory uses the number of markers in retrospect.

This explains that a time characterized by many individual events in retrospect also appears longer. For the same reason, time spent eventfully appears short in immediate experience (as opposed to retrospect). So it matters a great deal whether we look at the experience of time in the moment or in retrospect. The biases we commit in estimating duration are usually opposite.

Preferred music is indeed likely to trigger more attention, more associations or thoughts, which in turn are likely to be salient. Familiar music is likely to have a similar effect. Familiarity is a variable that certainly has much to do with liking the music, but is not identical to it. Yalch and Spangenberg (2000) examined the perceived duration of a shopping trip as a function of whether familiar or unfamiliar music was played. The familiarity manipulation resulted in a similar bias to that of popularity: when familiar music was played, customers overestimated the length of time they spent in the store. In fact, when unfamiliar music was played, they stayed longer in the store. The exact reasons for this misperception are unclear. Further analysis showed that familiar music was also associated with higher activation. It is therefore possible that the increased activation causes time to pass more slowly subjectively. Moreover, it is known that, in general, the familiarity of events results in greater sensitivity to their duration (e.g., Morewedge et al. 2009). As with popularity, this could indicate that events that are accompanied by familiar music are experienced more intensively in their details and therefore seem longer.

The study by Oakes (2003, Sect. 4.4.1) cited above looked at variations in the tempo of music, using the same music played either fast or slow through technical manipulation. To do this, Oakes used unfamiliar pieces, so neither popularity (after all, it was always the same music) nor familiarity could vary, so the effects were purely down to tempo. On the one hand, Oakes’ results show that waiting times are generally overestimated rather than underestimated. Subjectively experienced waiting time is almost always longer than actual waiting time. Second, they show that music, whether fast or slow, reduces overestimation – in other words, shortens subjective waiting time. A notable exception to the general overestimation is when slow music was played during rather short waiting times (between four and 15 min): Here, the experienced waiting time was even shorter than the actual one. This peculiarity disappeared as soon as the waiting time became longer (18–25 min), but nevertheless the finding remained stable that slow music still resulted in the shortest experienced time.

5 Conclusion: Music or Not? Under What Conditions?

If you want to hear a scientifically based recommendation for an application situation and ask a scientist, you often get an answer that starts with the words: “It depends ….” These two words are followed by all the conditions under which recommendation A is more likely to apply, and others under which recommendation B is more likely to apply. Scientifically based recommendations rarely fit on a beer mat, but maybe that’s for good reason. In addition to the differences between top-down or bottom-up control of perception, there is also the problem that customers controlled in this way often do not even give the mental processes of shopping the space of a postage stamp, that they perceive and decide as “cognitive misers” (e.g., Kunda 1999).

In any case, anyone who wants to design multisensually often has to listen to “It depends …,” because that is the essence of multisensuality: that the effects of sensory inputs are considerably different under one condition than under the other. But then again, if you want to achieve something, what could be more practical than knowing what the effectiveness of your measures depend upon.

With this in mind, here we attempt to compile some scientifically justifiable recommendations for background music design:

  • In general, background music seems to be perceived positively, so that in most contexts there is more in favour of its use than against it.

  • Even if personally preferred music is also perceived more positively, one should not sit on the favorite music of the respective target group under all circumstances, because …

  • … sometimes it is more important what image and values the music conveys (e.g., seriousness, sophistication, moreover there are styles of music that appeal to certain motivational systems) than how well it pleases.

  • … in contexts where an underestimation rather than an overestimation of the past is desirable, popular and familiar music is rather less advisable.

  • Slow music has more positive effects than fast music under most conditions, e.g., for a longer stay in the sales rooms or for the estimation of waiting times.

  • Vocal music (in a familiar language) is more distracting than instrumental music. In most contexts, this distraction is more of a disadvantage.

  • Congruence or fit between music and its environment usually has more positive effects than its opposite.

These are highly coarsened, but at least scientifically grounded recommendations for the design of background music. They are valid until further research questions, differentiates, or even disproves them. And even as they stand, recommendations sometimes change with even a slight shift in perspective. Just think of the perception of waiting time, where it depends on whether you mean “past” or “passing” time (because the estimates are different in the situation itself than in retrospect).

Of course, music should only stand as an example for other multisensual design options. Nevertheless, we wanted to conclude with some clear instructions for action. In doing so, we tried to keep the conditions “that matter” manageable.