Many grocery products have environmental impacts, and it is increasingly common for information about these impacts to be communicated via a label (Thøgersen 2000). This seems simple enough, yet the relevant decision-making context and psychological mechanisms involved are far from simple. The common act of completing a grocery list involves many decisions made over a short period of time, perhaps under time pressure. These decisions are made about products with multiple attributes, which often trade off against one another. Furthermore, many grocery products purchased during a large, perhaps weekly, household shopping trip will not be purchased for the individual who is making the purchase, but for family members who might be of different generations and may have different preferences, including for environmental attributes. The consumer decision-making context is not simple, but complex; consumers must juggle many things.

This insight, which we substantiate with reference to previous literature below, has important implications for policies that aim to influence consumer choice through labels to communicate environmental information (hereafter, “eco-labels”). Additional disclosures about environmental attributes increase the volume of information that consumers might process when making decisions. Yet, consumers do not have infinite capacity to process information and are already cognitively taxed by the complexity of the decision-making environment. Hence, the effectiveness of a label may be limited, even among consumers who would like to purchase environmentally friendly products. Given the above, policymakers face an unavoidable trade-off. On the one hand, they may want environmental disclosures to be transparent and informative about environmental impacts. On the other, a surfeit of environmental information may lead to it being disregarded by busy consumers.

One key issue policymakers face, therefore, is the extent to which eco-labels convey detailed environmental information or a more superficial measure of environmental impact. The issue can be informed by empirical research that measures how different eco-labels affect consumer decisions. Empirical evidence may influence not only an eco-label’s design but also whether the label (or aspects of it) should be mandatory, certified for voluntary use, regulated, or left up to producers entirely (provided the label is not misleading).

Typically, however, empirical studies of eco-labelling do not employ an experimental design that explicitly manipulates the complexity of the context in which consumer decisions are made. The contribution of the present paper is to test alternative eco-labels using an experimental design that does incorporate this element. We adapted a method previously developed to assess neuropsychological patients for use in a laboratory online shopping experiment. The study, conducted with a representative sample of consumers in Ireland, required participants to purchase household items based on shopping lists that included items for both themselves and others, under varying time pressures. With this method, we compared a standardized, colour-coded eco-label against a more detailed verbal one, and eco-labels with negatively framed environmental impacts against labels with positively framed impacts, while experimentally manipulating the complexity of the consumer’s task.

Before presenting detailed method and results, the following sections first motivate our hypotheses with respect to previous literature, then describe the logic of the alternative experimental method that we developed to test them.

Relationship to Previous Literature

The focus of our study is on grocery and everyday household products, because they are consumed in high volumes. Consequently, if consumers purchase more environmentally friendly options, the environmental impact could be large and positive (Upham et al. 2011). Moreover, there is existing evidence that eco-labels on grocery goods can, in general, influence consumer choices (Asioli et al. 2020; Bjørner et al. 2004; Loureiro et al. 2001; Michaud et al. 2013; Teisl et al. 2002).

However, this focus also dictates the context in which consumer decisions are made. Evidence suggests that consumers trade off the accuracy of a decision (i.e., whether it matches their “true preferences”) against the time and effort required to make the decision (Bettman et al. 1998) – a speed–accuracy trade-off. Completing a lengthy shopping list is made more difficult in the presence of time pressure (Park et al. 1989). Thus, compared to larger, once-off purchases, decisions concerning regular household purchases are likely to be accorded substantially less cognitive effort and consumers may be selective about the information they do process (Jacoby 1984). Dual-process psychological models (Evans 2008) posit that the capacity of deliberative information processing is constrained, and when overloaded, more automatic or habitual processes will be used instead. Shoppers form and rely on habits (Wood and Neal 2009). By “habits” here, we do not necessarily mean merely purchasing the same good as previously. Habits can affect decisions involving new products too, e.g., whether the consumer habitually checks labels and scans all of the packets. Since many goods do not demand extensive consideration, consumers may process and evaluate information heuristically, or in line with existing beliefs and preferences (Chaiken and Maheswaran 1994; Maheswaran et al. 1992).

In an everyday shopping context, therefore, environmental information may be given less weight than in a more deliberative context, or even disregarded entirely. This is more likely among consumers who are not intrinsically motivated by environmental concerns (Lindenberg and Steg 2007). Furthermore, in a single shopping trip, a consumer might encounter a large variety of environmental information, relating to production, packaging, transport, and more. Even consumers who wish to make environmentally friendly purchases may give less weight to environmental impact when purchasing multiple grocery and household items.

The starting point for the present investigation is the possibility that these contextual factors interact with the design of eco-labels. Presently, across different countries and markets, eco-label designs differ in the degree to which they incorporate simplification and standardization. At one extreme, some logos are binary indicators granted only to products that meet certain criteria. Examples include the Nordic Swan logo, or labels indicating “dolphin-friendly” tuna. A somewhat less simple eco-label might involve a standardized visual scale, perhaps colour-coded like the EU Energy Efficiency Label for household appliances. Such a label does not convey specific environmental information, but instead standardizes it to permit product comparison via a visual cue. At the other extreme, environmental information can be conveyed in a much more specific manner. This includes a full verbal description (e.g., “Packaging made with 75% recycled content”), as is required by the “Green Guides” of the US Federal Trade Commission (2012). This latter approach is primarily motivated by the desire to prevent “greenwashing” – making products merely appear to be environmentally friendly – by requiring evidence for environmental impacts to be spelled out (see, for example, Hahnel et al. 2015). While this forces firms to be precise in their claims, it also requires consumers to possess sufficient conceptual knowledge to evaluate the specific information (Swim et al. 2014). Consumers differ in their knowledge, and those who understand eco-labels better are more likely to use them (Grunert et al. 2014), so a danger is that providing specific information will be effective only for consumers able to process more complex environmental concepts.

In light of these different approaches, the present study makes a direct experimental comparison between a standardized colour-coded label and specific verbal information, which we refer to as a difference in “format.” Previous empirical evidence regarding the relative effectiveness of these formats in promoting environmentally friendly consumer choices is not conclusive. The EU’s standardized, colour-coded energy label appears to shift consumer preferences towards more efficient appliances (Newell and Siikamäki 2014), although energy efficiency labels indicate lower running costs as well as environmental benefit. Simplified, colour-coded labels for carbon footprint have been shown to have significant but small impacts on sales volumes (Vanclay et al. 2011). A carbon footprint label that added “traffic light” colours was given more weight in a discrete choice experiment (Thøgersen and Nielsen 2016). More generally, work on cue fluency suggests that decision-makers give more weight to attributes that are easier to process (Shah and Oppenheimer 2007), while consumers are known to prefer visual presentations of product information (Townsend and Kahn 2014) and to rate them as more fluent than text-based ones (Yoo and Kim 2014). Colour-coded nutrition labels are noticed (Dodds et al. 2014), attended to (Bialkova et al. 2014) and understood (Grunert and Wills 2007) better than monochrome labels or numerical information, although evidence in relation to their impact on final choices is more mixed (Dodds et al. 2014; Ducrot et al. 2016). Recent work in both laboratory and field has shown that the colour-coded nutri-score label influences the healthiness of consumers’ food choices (Crosetto et al. 2020; Dubois et al. 2020). Considering these studies as a whole, one might reasonably conclude that standardizing and colour-coding an eco-label will increase attention and assist cognitive processing. If so, simplifying environmental information may improve consumers’ decisions while preserving freedom of choice (Sunstein 2014). Furthermore, the approach may be particularly effective in busy household shopping contexts that are cognitively demanding.

However, empirical measurements of the effectiveness of a target label depend on the label against which the target is compared. Some studies are also supportive of specific, verbal descriptions. In a direct comparison of visual and verbal information for online shoppers, Kim and Lennon (2008) found that while both formats affected attitudes to products, verbal information was more important for purchase intentions. Some research specific to eco-labels also suggests that both verbal and visual information can be effective (Tang et al. 2004) and that specific verbal information engenders greater trust in the information (Atkinson and Rosenthal 2014). Trust is an important factor underlying purchase of eco-labelled products (Thøgersen 2000), and consumers report greater satisfaction with eco-labels they perceive as accurate (D'Souza et al. 2006). More recently, Osburg et al. (2019) found that providing detailed environmental information increased purchase intention provided the information was perceived as useful.

In sum, therefore, previous empirical research does not allow strong conclusions to be drawn about the relative effectiveness of these contrasting eco-label formats, especially in the complex context of grocery shopping where consumers are making many choices and juggling multiple influences. Nevertheless, it does inform the hypotheses that we outline below in relation to time pressure, since we expect a standardized, colour-coded eco-label to be more effective when cognitive demand is increased.

As well as differences in formats, a second dimension on which eco-labels vary is how they are framed. Formats and frames are frequently confused. A format is a way that information is arranged for communication. While different formats typically communicate different information (e.g., standardized vs. not), by contrast, framing refers to different ways that the same information is communicated. A well-documented framing effect concerns the positive or negative valence of an attribute (Levin et al. 1998). For example, people prefer beef labelled “75% lean” to identical beef labelled “25% fat” (Levin and Gaeth 1988).

Framing environmental information with a positive or negative valence can influence pro-environmental behaviour (White et al. 2011) and how marketing messages are received (Olsen et al. 2014; Amatulli et al. 2019). Applied to labelling, for example, an eco-label can display the percentage of a product’s packaging that is recyclable or the percentage that is non-recyclable. Borin et al. (2011) and Grankvist et al. (2004) provide evidence that purchase intentions respond more to negatively than positively balanced environmental attributes, although neither study directly compared situations where only the frame differed. Van Dam and De Jonge (2015) confirm a stronger influence on attitudes and preference formation of negative versus positive framing for the broader concept of “ethical” labelling. In the present study, we manipulate the framing of otherwise identical products, recording not attitudes or intentions, but decisions taken in the course of purchasing from a list. Existing evidence did not allow us to form an advance expectation of how any framing effect might vary with time pressure.

Since we decided to manipulate both the format and framing of eco-labels, a further possibility was that the two might interact. Levin et al. (1998) suggest that attribute framing works because the positive or negative evaluation is primed by embedding the evaluative tone into the attribute description. If so, the effect may be stronger when the consumer reads a specific sentence rather than views a visual scale, so framing may be particularly strong in conjunction with the verbal format.

“Multiple Errands” Shopping Task

Given the above, we set out to develop an experimental task that would allow us to (1) undertake a direct test of eco-labels with a standardized, colour-coded format against a specific, verbal format; (2) manipulate independently whether environmental information was framed with a positive or negative valence; (3) mimic a household grocery shop in requiring participants to juggle different goals, preference criteria, and types of environmental information; and (4) increase time pressure on decisions.

As described above, a substantial family shop is cognitively challenging. It may require elements of executive control, such as planning a sequence, breaking the overall task up into sub-goals, and being cognitively flexible in response to the experience. The inspiration for our experimental design stemmed in part from Shallice and Burgess (1991), who used an open-ended series of tasks, performed in a shopping environment to examine executive control in neuropsychological patients (“the multiple errands test”). The fact that a shopping task was deemed suitable for measuring executive control is indicative of the cognitive demands involved. We adopted some of the key principles of the task, including the need to juggle and sequence multiple tasks, to balance own priorities against others, and to respond to events as they unfold. The idea was to generate a realistic yet somewhat challenging online shopping experience.

Participants were faced with an online shopping environment and given two shopping lists to complete that consisted of groceries and regularly purchased household items. For one list, their task was to choose the item they would most prefer, while for the other list (hereafter the “directive” list), their task was to choose an item to match a description that a friend or family member had asked for (e.g., “cheap and environmentally-friendly clingfilm”). As well as requiring participants to balance priorities, the use of a directive list had the advantage that it generated an objective measure of mistakes, since some products were unarguably superior to others on the directed dimensions. Thus, the experimental set-up could test not only how eco-labels influenced choices but also how they assisted in identifying environmentally friendly products.

Participants could approach the task as they wished. Trials did not appear sequentially or automatically in isolation, as in a typical choice experiment. Instead, each “trial” was a choice that participants had to make after locating a product in a computerized shopping environment. For each product, they were presented with four different options of which they could choose one. Our main manipulations of format and frame were embedded within these decisions, which related to multiple products and a broad range of environmental attributes. This flexible experimental structure meant that participants were essentially given a series of multi-component, open-ended tasks that involved locating, viewing, and choosing between different multi-attribute options – as when actually shopping for groceries.

One way to induce heuristic or habitual processing of information is to impose a time limit (Park et al. 1989; Wood and Neal 2009). In multi-attribute decision-making, a time limit is likely to lead to decision-makers prioritizing some attributes and disregarding or giving less weight to others (Johnson et al. 1993). A time limit was also used to increase demand on executive processes in the original multiple errands test (Shallice and Burgess 1991). In the present study, participants completed the task both with and without the time limit, allowing us to test whether the influences of format and frame are sensitive to the time pressure experienced by consumers of groceries and household goods.

One final consideration was the generality of the design. Given our focus on groceries and household goods, we set out to look for effects that might be observed across multiple types of product and environmental information. We also wanted to see whether results might be specific to environmental information or might reflect more general effects of how information is presented regardless of its domain. For comparison, therefore, equivalent formats for some non-environmental product information were also tested. Specifically, we tested the same type of disclosure labels when they conveyed information about nutritional content and general product effectiveness.

Hypotheses are summarized in Table 1. Based on the existing literature described above, a number of interactions between factors were hypothesized. Some of these interactions were expected to favour the specific, verbal information and others the standardized, colour-coded information. Overall, one of the label formats could outperform the other—or the differences between them could be entirely driven by interactions and thus suggest that different labels would be effective in different contexts. It is important to understand that while some hypothesized interactions were directional, the general effect of eco-label format on choice of environmentally friendly products was not. For instance, the hypothesis that participants will be more likely to select environmentally friendly products correctly (i.e., according to the directive) under the standardized format is derived from previous work showing that specific, verbal information requires the possession and application of a greater degree of conceptual knowledge. However, for predicting the likelihood of environmentally friendly choices, in general, it was not clear in advance of the study whether more fluent processing of the standardized, colour-coded label would outweigh the potential for greater trust in the information when conveyed by the specific, verbal label.

Table 1 Summary of hypotheses

Methods

Participants

The study was conducted in Ireland. Sixty participants aged 18–70 (30 male, 30 female) were recruited via a market research company. This sample size would be small for a between-subjects test of a single decision, but for a design in which participants were expected to make 50–60 decisions (the final total was 3596), it surpasses recommendations for obtaining sufficient statistical power (Brysbaert and Stevens 2018). The sample closely matched the local population distribution by employment status (employed, unemployed, retired, and student) and age (12 aged 18–24, 12 aged 25–34, 13 aged 35–44, 13 aged 45–54, and 10 aged 55 to 70). Data were collected by the authors at their institution. Participants were paid €15.

Design and Stimuli

The experimental environment resembled a typical supermarket website with sections for food and household products and a “cart” for chosen products. Each section had multiple subsections, and within each of these were multiple product categories. The structure of the shopping environment can be seen in the Supplementary Materials, Appendix A, Table A1. Participants received two types of shopping list: one to be completed according to their own preferences and a “directive” list with specific criteria. This generated two outcome variables. Choices from both lists could be analysed to find whether a participant chose more environmentally friendly products by format or frame. Choices from the second list could be analysed to find whether participants correctly identified products that matched the directive criteria, again by format or frame.

The study consisted of 32 four-alternative choices in each of two blocks. Each participant performed the task twice (without and with the time limit), such that the manipulation of time pressure constituted a within-subjects test. Two visually distinct online shopping environments were created to emphasise that the options on offer in each block were different, though the layout of each environment was the same. The time-limited block always followed the unlimited block. This meant that consumers could become familiar with the shopping environment in the untimed block, so that the effect in the second block would be one of time pressure, and not of lack of familiarity. To allay any effects of fatigue, participants were given a break between each block and offered tea or coffee.

The choice of each product (e.g., a toothbrush) was considered a “trial.” Within each trial were four options (e.g., four different types of toothbrush). The participant could choose to add one of these options to their cart. Trials were not completed in a defined sequence. Consumers could navigate to any product at any time, view a product without making a decision, and revisit it to change their decision. They needed to click on different “tabs” at the top of the screen to view information about each option, so they could also make a choice without viewing information about all four types. The different environments for the two blocks are shown in Figs. 1 and 2. Product categories were the same in each, but the four types (i.e., options) on offer differed. Figure 1 depicts how the shopping environment was navigated. Figure 2 shows how information about each option within a product category was presented. Images used were copyright-free or available via Creative Commons licences. The study was programmed in Python using PsychoPy (Peirce 2007, 2009) and presented on 14-in. laptops, resolution 1366 × 768. All of the products and related attributes are listed in the Supplementary Materials, Appendix A, Tables A2, A3, and A4. A single image was shown in the background, and the product information for each of the four options was displayed on a semi-transparent rectangle over the picture. Unbranded images were sought, but where unbranded copyright-free or Creative Commons–licenced images were unavailable, images with multiple brands unavailable in Ireland were used. Only one image in one block contained a brand that could be considered known. Overall, therefore, effects of brand on choices were designed to be negligible.

Fig. 1
figure 1

Navigation of the shopping environment. In this example, a participant clicks (depicted by cursor) on the grocery department, fruit and vegetables section, and then carrots. They click on the first type of carrot by clicking on the tab labelled “1” to bring up information about this option. The participant adds option 1 to their cart and clicks to view it. Images used are CC0, CC-BY-SA, or CC-BY-NC-SA licenced. Some images concealed due to ownership ambiguities. CC-BY-licenced images are as follows: Blümer (2011), Chalon Handmade (2005), Collegestudent33 (2016), Lee (2007), Mark (2010), Pasquiel (2008), Sherool (2006), Tesco PLC (2014), and Visitor7 (2013)

Fig. 2
figure 2

(a) A landing page for a product. (b)–(e) Product information for each type. Image used by Suzette.nu (2009)

In a real shopping situation, each decision might involve a different number of options possessing a different number of attributes. For the present study, as well as always facing four options, each option always possessed four attributes. These were price, average customer review, environmental impact, and nutrition (for grocery products) or effectiveness (for household products) information. The use of four attributes was guided by results showing that while consumers are able to take four product attributes into account simultaneously in a single judgement of value, this stretches cognitive capacity and has a substantial impact on the accuracy of decisions (Lunn et al. 2020). It was not anticipated that this consistency within the design raised issues for the generality of the results – an issue we return to in the final section. Effectiveness and nutrition information were included as comparators to the environmental information because various sub-attributes could be presented to convey each concept (e.g., environmental – recycling, energy conservation; nutrition – calories, macronutrients; effectiveness – performance in scientific tests, germs killed). In this way, we could examine whether the effects of specific labelling formats and of their interaction with time pressure and frame were universal to attributes of this type or specific to environmental information.

Customer review scores were selected as the fourth attribute as these are typical in online shopping. There were no hypotheses regarding the effects of price or review attributes on consumer decisions, rather these attributes were included to ensure that participants had to negotiate trade-offs that might typically be involved in such decisions.

Figure 3 shows the different formats and frames. The design of the standardized format label was influenced by the work of Mudgal et al. (2012). Their survey research suggested a consumer preference for colour-coded scale labels that give an aggregated indication of product performance. Across all four types of a particular product, the format and frame for each attribute remained the same. This can be seen in Figs. 1 and 2. Although Figs. 1 and 2 show products in which one attribute was standardized and the other was specific, attribute frames and formats were counterbalanced across products such that some products had two specific attributes, some had one specific and one standardized, and some had two standardized attributes. Likewise, some had two negative attributes, some had two positive attributes, and some had one of each. Which products had which attributes was pseudo-randomized across participants. Detailed information on attribute creation, counterbalancing, and pseudo-randomization is provided in the Supplementary Materials, Appendix B. These materials include a list of all of the environmental, nutrition, and effectiveness attributes displayed in the experiment. Sixteen different types of information were devised, with four each related to recycling and waste, transportation, energy use, and water and soil. This variety of topics was based on the themes used in the An Taisce Green Schools programme (https://greenschoolsireland.org/themes/). We used a wide variety to overcome any differences in motivation in relation to specific environmental causes and to ensure the generality of findings. The literature was searched to find appropriate attributes. For example, recyclability and toxicity are used in Borin et al. (2011), emissions and recyclability in Hahnel et al. (2015), and CO2 emissions in Schuitema and de Groot (2015). Other attributes came from the examples in the Green Guides of the Federal Trade Commission (2012) or were devised by the authors to ensure a broad range of topics.

Fig. 3
figure 3

Formats and frames in which information (environmental, nutrition, effectiveness) was presented

For each product, the four types on offer each ranked best on one attribute, second best on another, second worst on another, and worst on another. Attributes were paired, so that the best and second-best products on one attribute would be the second-worst and worst on another, and so on. For example, two types of the same product might be cheap and environmentally friendly, but poorly reviewed and not effective, whereas the other two would be effective and well reviewed but less environmentally friendly and expensive. The most environmentally friendly type offered would be the second cheapest, and the cheapest would be the second most environmentally friendly. As a result, if consumers were directed to find a product that was “environmentally-friendly and cheap,” they would be forced to prioritize one of those attributes over the other – or to use a different attribute as a tie breaker (e.g., to avoid the worst-reviewed product).

Procedure

The study took place in the authors’ research institute. Participants were run in three groups of 10, two of nine, one of eight, and a final group of four. Participants were informed that they were taking part in a study on how people choose which products to buy based on information given on shopping websites. The study was conducted in line with institutional ethics board guidelines. Participants received an information sheet on what to expect and how their data would be stored and used. Participants initially explored a practice version of the shopping environment. They were told to treat the task as though they were exploring a real “online grocery shop.”

The shopping environment used in block 1 was loaded once participants had explored the practice environment. Participants were each given their two shopping lists. They were instructed to locate the products on each list and to choose according to their own preferences for the list titled “Your shopping list.” For the list titled “Please buy” (the directive trials), they were told to imagine that a friend or family member had asked the participant to get these items for them that week, as a favour, and would pay for everything. Participants were told that their friend or family member did not give them a budget, but had asked for products that met particular criteria, as written on the list. They were instructed to try to choose products that best matched what the friend was looking for, according to what was written on their list. Participants were told that there was no time limit, that they should try to find all of the products, and that they could do so in any order they wanted and could use a pen to keep track of progress.

When all participants had completed block 1, the block 2 environment was loaded and participants received two new shopping lists. They were told that they would repeat the previous task, with new products and lists, and a time limit of 15 minutes. They were told that they should aim to have similar numbers of products for themselves and for their friend in their cart by the time 15 minutes had passed. They were told to choose products that they would really choose in this situation, if they needed to buy the products on the lists, and also to complete as much of the list as possible. They were notified when five minutes and 10 minutes had passed.

Analysis

An initial analysis compared the number of completed and correct (of the directive) trials in block 1 and block 2. The primary analysis pertained to environmental and nutrition/effectiveness scores of chosen product types, and a further analysis was related to the probability of correctly selecting an environmentally friendly or nutritious/effective product when directed to do so.

The experiment was designed to generate multiple responses per individual and condition, across a variety of products and environmental dimensions. The primary unit of analysis for the main research questions was a score obtained over multiple decisions clustered by condition. That is, the design generated a main dependent variable that did not correspond to the outcome of each individual decision, but instead to a score for environmental friendliness (or nutrition/effectiveness) of choices over multiple counterbalanced trials that employed the same format and frame.Footnote 1 This was accomplished as follows. For each type of information (environmental or nutrition/effectiveness) and within the 16 trials on each shopping list (subjective or directive), four trials had each of the possible combinations (2 × 2) of format and frame that constituted the four primary experimental conditions. Thus, the (up to) 64 total responses provided by each participant were converted into 16 scores of choices made when facing each primary condition and shopping list, both with and without a time limit. The equivalent analysis for nutrition/effectiveness scores was also undertaken for comparison. Within each matched group of four trials, the available product options were ranked on each of the four attributes, with 0 for the worst and 3 for the best. Responses were turned into an environmental (or nutrition/effectiveness) score that ranged from a maximum of 12, if the participant chose the most environmentally friendly (nutritious/effective) product on all four trials, to a minimum of zero if the participant always chose the least environmentally friendly (nutritious/effective) product. Where a participant did not complete all four trials, a normalized score out of 12 was calculated from the completed trials.

The scores were analysed using multi-level linear regression estimated by maximum likelihood, with multiple observations for each participant and a normally distributed random effect specified to account for heterogeneity of overall preference (a “random intercept” model). Equivalent fixed-effects models were also estimated. Individual intercepts passed standard tests for normality and skew, supporting the use of the random intercept models. The dependent variable in model E1 is the environmental score, while in model N1, it is the combined nutrition and effectiveness score. Models E1 and N1 (Table 2) are described in Eq. (1), where Scoreij is the score out of 12 for individual i and trial group j.

$$ {\mathrm{Score}}_{ij}={\upbeta}_0+{\upbeta}_1{\mathrm{Format}}_{ij}+{\upbeta}_2{\mathrm{Frame}}_{ij}+{\upbeta}_3\mathrm{Format}\times {\mathrm{Frame}}_{ij}+{\upbeta}_4{\mathrm{Time}\ \mathrm{limit}}_{ij}+{\upbeta}_5{\mathrm{Directive}}_{ij}+{u}_i+{\varepsilon}_{ij} $$
(1)

Models E2 and N2 (Table 3) add variables for the interaction with the time limit, to test whether effects of format differed once the time limit was introduced. These are estimated according to Eq. (2)

$$ {\mathrm{Score}}_{ij}={\upbeta}_0+{\upbeta}_1{\mathrm{Format}}_{ij}+{\upbeta}_2{\mathrm{Frame}}_{ij}+{\upbeta}_3{\mathrm{Time}\ \mathrm{limit}}_{ij}+{\upbeta}_4\mathrm{Format}\times {\mathrm{Time}\ \mathrm{limit}}_{ij}+{\upbeta}_5{\mathrm{Directive}}_{ij}+{u}_i+{\varepsilon}_{ij} $$
(2)

In these “score” models, a categorical control variable, “Directives contain attribute,” refers to the proportion of completed trials in each group of four in which the participant was directed to select a product that ranked highly on the relevant attribute (environmental friendliness or nutrition/effectiveness for E1 and N1, respectively). It is important to control for this factor, which was obviously likely to have affected how the choice ranked on the given attribute. The reference category is trials that were on the subjective list. The category “0” implies that none of the four trials in the group had a directive asking for the relevant attribute, whereas “All” means that all four did. The other categories refer to different proportions. The proportions are not completely uniform because not all participants completed choices for all products in a group of four. For example, if a participant completed only three trials, of which only one directed them to prioritize the relevant attribute, this would be “\( \raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$3$}\right. \)” of “directives contain attribute.” The \( \raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$3$}\right. \) and \( \raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$4$}\right. \) as well as the \( \raisebox{1ex}{$2$}\!\left/ \!\raisebox{-1ex}{$3$}\right. \) and \( \raisebox{1ex}{$3$}\!\left/ \!\raisebox{-1ex}{$4$}\right. \) categories are collapsed.

Objectively defined mistakes were possible in directive trials. For instance, one item listed in Fig. 4b asks for an “environmentally-friendly and effective toothpaste.” Of the four options, two toothpastes were superior on both these attributes, i.e., these options dominated the other two. The dependent variable was set to 1 if a dominant option was chosen and 0 if a dominated option was chosen. These responses were analysed by multi-level logistic regression. Models E3 and N3 (Table 4) are estimated according to Eq. (3) for only trials (k) in which the directive shopping list requested an environmentally friendly or nutritious/effective product respectively, i.e., where an incorrect response equated to a failure to integrate the relevant environmental or nutrition/effectiveness information accurately. Thus, these models were estimated at the trial level and the number of observations was half the completed number of directive trials. We report fixed-effects models because the individual fixed effects in model N3 did not pass standard tests for normality and skew (kurtosis: p < 0.05; Shapiro-Wilk: p < 0.05). The median coefficient on the individual fixed effects is used as the reference case for each model.

$$ \ln\ \left(p\ \left(\mathrm{correct}\right)/\left(1-p\ \left(\mathrm{correct}\right)\right)\right)={\upbeta}_0+{\upbeta}_1{\mathrm{Format}}_{ik}+{\upbeta}_2{\mathrm{Time}\ \mathrm{limit}}_{ik}+{\upbeta}_3\mathrm{Format}\times {\mathrm{Time}\ \mathrm{limit}}_{ik}+{\upbeta}_4{\mathrm{Participant}}_i+{\varepsilon}_{ik} $$
(3)
Fig. 4
figure 4

Examples of lists given to participants. (a) “Subjective” list of items to choose according to their own preferences. (b) “Directive” list of items to purchase according to instructions given

Results

Data are available in Mendeley data (https://data.mendeley.com/datasets/c5pdcsx6hk/1).

Task Difficulty

In block 1, participants completed a mean of 31.4 trials (standard deviation (SD) = 1.03). Twenty-one participants did not complete all trials. In block 2, participants completed a mean of 28.53 trials (SD = 4.22). Thirty-eight participants did not complete all trials. Significantly fewer trials were completed in block 2, under the time limit, than block 1 (Wilcoxon signed-rank test, Z = − 5.238, p < 0.001), supporting hypothesis 1a. Information on how long it took participants to complete the task can be found in the Supplementary Materials, Appendix C.

t tests were used to analyse errors, that is, trials in which one of the two options that did not match the criteria on the directive list (Fig. 4b) was chosen. Of the 16 trials on the hypothetical friend’s list, participants correctly completed 12.07 (SD = 3.22) in block 1 and 11.12 (3.26) under the time limit in block 2 (t(59) = 2.27, p < 0.05, d = 0.3), in line with hypothesis 1b. However, the mean percentage of completed trials that were correct did not differ significantly between block 1 (76.5%, SD = 19.41%) and block 2 (79.33%, SD = 18.09%) (t(59) = 0.99, p = 0.33, d = 0.15).

Product Choices

On average, the rank (from 0 to 3) on each attribute of products chosen by each participant was similar (price: M = 1.53, SD = 0.25; average customer review = 1.52, 0.22; environmental friendliness = 1.4, 0.17; nutrition/effectiveness = 1.54, 0.19). The distributions of scores were smooth and unimodal, with a mean environmental score of selected products, by participant, of 5.57 (SD = 0.69) and a mean nutrition/effectiveness score of 6.17 (SD = 0.78).

With respect to the comparison of formats, hypothesis 2a, model E1 (Table 2) shows that the standardized scale format significantly increased participants’ environmental scores compared to the specific format. There was no significant effect of positive versus negative information frame. This is illustrated in Fig. 5. The interaction between format and frame was also non-significant. Hence, hypotheses 2b and 2c were not supported. These results were mirrored for nutrition/effectiveness attributes in model N1. Coefficients on the control variables across both models indicate that attributes were not weighted differently when the time limit was introduced.

Table 2 Random intercept models for scores (0–12) based on environmental ranking of chosen products (E1) or ranking by nutrition/effectiveness (N1)
Fig. 5
figure 5

Mean scores across participants by information format, frame, and attribute. Error bars show standard error of the mean score per participant

As described above, the categorical control variable “Directives contain attribute” controls for the proportion of products in each group of four (or fewer) trials that had environmental or nutrition/effectiveness criteria on the “directive” list. The coefficients on this variable are as expected: The more participants were directed to find an environmentally friendly or nutritious/effective product, the higher was their environmental or nutrition/effectiveness score. Relative to the subjective choices, participants chose less environmentally friendly (or nutritious/effective) products when fewer than half of the products in a set had a related directive, and more environmentally friendly (or nutritious/effective) products when more than half had a related directive, but there was no difference when half the products had a related directive. This indicates that participants did not give systematically low or high weight to these attributes in their subjective decisions. We additionally tested for interactions between the format and the presence of a directive, which were non-significant. The possibility that effects of format, frame, and time limit differed between “nutrition” and “effectiveness” attributes is addressed in Appendix C, Table C1.

Models E2 and N2 (Table 3) test for the influence of the time limit under different formats. In model E2, the main effect of format with no time limit is marginally significant (β = 0.38, SE = 0.21, p = 0.07). The time limit significantly reduced the environmental score of chosen products, but the highly significant interaction with the standardized format reveals that it did so only when the format was specific, not when it was standardized. This interaction, which supports hypothesis 3, is illustrated in Fig. 6. Results for model N2 are similar to those for model N1. The interaction between time limit and frame was not investigated because there were no hypotheses relating to this interaction. Nonetheless, in the Supplementary Materials, there is a model that includes this interaction and verifies that its inclusion does not affect conclusions taken from models E2 and N2, should the reader be curious (Appendix C, Table C2).

Table 3 Random intercept models for scores (0–12) based on environmental ranking of chosen products (E2) or ranking by nutrition/effectiveness (N2), including interactions with session (untimed versus time limit)
Fig. 6
figure 6

Mean scores for the standardized versus specific information formats, by attribute and session. Error bars show standard error of the mean

Models E3 and N3 (Table 4) examine whether the format affected the likelihood that participants selected a product that satisfied the directive on their hypothetical friend’s list. For the environmental attribute (model E3), the standardized scale increased the probability of selecting a “correct” product type, relative to the specific information. However, while this effect strengthened under the time limit, the interaction was not statistically significant. An equivalent advantage for the standardized scale was not obtained for nutrition/effectiveness labels (model N3). These results therefore partially support hypothesis 4, which predicted an advantage for the standardized scale. Effects of frame are not included in these models as there were no related hypotheses, but models including frame and its interaction with format are available in Supplementary Materials (Appendix C, Table C3).

Table 4 Fixed-effects logit models for correctly selecting a product that matched the environmental (E3) or nutritional/effectiveness directive (N3)

A small number of participants’ responses indicated that they frequently viewed only one option before choosing. Excluding these participants in the above models did not change estimated coefficients by more than one standard error, though it did alter some p values (Appendix C). Notably, the estimated main effect of the standardized label on the environmental score in E2 (Table 3) was slightly strengthened (model SE5, Table C5 in Appendix C).

Discussion

The aim of this study was to compare the impact of different types of eco-label on choices of environmentally friendly products. The study focused on grocery and household shopping. In this context, consumers have to make many multi-attribute decisions, identify sub-goals, and respond to events as they unfold and juggle priorities, including time. We tested different types of eco-label in a task designed to mimic these key contextual elements, including time pressure. Under a time limit, participants unsurprisingly completed fewer items on shopping lists (hypothesis 1a) and obtained fewer items that correctly matched the description of a friend or family member (hypothesis 1b). More importantly, participants chose more environmentally friendly products when the format of the eco-label was standardized and colour coded than when the label provided specific, verbal information (hypothesis 2a). We found no effect of negative framing of the environmental information (hypothesis 2b), nor any interaction between the frame and format of the eco-label (hypothesis 2c). The advantage of the standardized, colour-coded label was more pronounced under the time limit (hypothesis 3). Although this advantage of a standardized, colour-coded label was also found for nutrition and effectiveness information, the interaction with the time limit was only observed for environmental information. The standardized label also made it easier for participants to choose a product that matched a request to obtain an environmentally friendly product (hypothesis 4).

The findings with respect to positive versus negative framing were unanticipated and contrast with previous work (Borin et al. 2011; Grankvist et al. 2004). In these previous studies, participants rated products displayed individually in either a positive or a negative frame. By contrast, participants in our study chose among options that differed on an environmental attribute described in the same frame. They did not give more weight to environmental information described (for all available options) in the negative frame. The result raises two issues. First, there is a need to explain the difference. The present experiment was designed primarily to inform consumer policy rather than to test specific psychological theories regarding how information on eco-labels is processed. Nevertheless, the finding may be suggestive. Janiszewski et al. (2003) propose that framing effects occur because different frames evoke different sets of reference values against which products are assessed. If so, the failure to observe an effect may be due to participants using the four available options to form the reference set rather than recruiting a stored set of reference values against which to rate a product. An alternative possibility is that the demands of integrating multiple attributes simply diminished the framing effect (Bier and Connell 1994). The second issue is what the finding implies for the generalization of such framing effects to routine consumer contexts. Since products often appear in ranges and often require consumers to integrate multiple attributes, framing effects observed in isolated rating tasks may not transfer to many everyday transactions. More research is needed to test these potential limits to the generalization of framing effects, but the present study implies that negative framing of environmental impacts on eco-labels may not alter consumers’ desire to purchase environmentally friendly products.

The advantage of the standardized information format over the specific, verbal format has multiple potential explanations. It may reflect greater salience (Hutchinson and Alba 1991) or noticeability (Dodds et al. 2014) of the eco-label. Alternatively, it may be due to ease of processing, whether through greater fluency (Shah and Oppenheimer 2007) or another advantage of visual presentation (Townsend and Kahn 2014; Yoo and Kim 2014). It could also be suggested that the standardized format was effective because it allowed products’ relative performance to be assessed. However, note that the specific information was also relative – it was identical across product types bar the numerical figure or category (e.g., “The manufacturer of this product offsets 42%/48%/64%/75% of the carbon emissions created during its production”). The primary differences between formats, therefore, were whether the label used colour or allowed visual relative judgement, although the smaller quantity of text might also have reduced processing demand relative to the specific information. Any advantage provided by the standardized, colour-coded label, was clearly not outweighed by increased trust in the specific information (Atkinson and Rosenthal 2014). It is also possible that lack of familiarity with environmental concepts (Grunert et al. 2014) made it harder for consumers to identify and evaluate environmentally friendly products from verbal information. That there was no equivalent effect on the ability to select products to match a directive for nutrition and effectiveness is perhaps consistent with this explanation. The interaction between the time limit and format of the eco-label implies that participants also changed how they processed or evaluated environmental information when under time pressure. This fits with the idea that the standardized label permitted greater processing fluency, but does not rule out other explanations.

Before discussing policy implications, some limitations of the present study need to be considered. Experimental studies designed to generate evidence for policy often face a trade-off between the need to hold factors constant to maintain experimental control and the need to make experimental tasks realistic. This was true of the present study. Our results were obtained from an experiment in which all products had four attributes, all choices involved four options, the environmental information for the four available options concerned the same environmental impact (although this differed across products), and participants had a mixture of subjective and directive choices to make. It is logically possible that varying one of these four constant aspects of the design could alter findings. The first three are atypical of most shopping environments, but the fourth is common. Completing the household shop often involves considering the preferences of other household members. In our experiment, it is possible that preferences implied by the directive shopping list had a carryover effect to choices made on the subjective list, perhaps leading to a greater emphasis on environmental information than arises in a typical shop. Even if so, however, it is not clear why this would have a differential effect across types of eco-label, or why it would generate the observed differences between environmental and nutrition/effectiveness information. Similarly, it is not obvious why holding the number of options and attributes constant, or keeping the type of environmental information consistent across options, would advantage one sort of eco-label over another. However, these possibilities cannot be completely ruled out because the relevant aspects were not experimentally manipulated.

It is important to recognize that the present study set out to provide evidence for consumer policy rather than to test a specific policy intervention. Any eco-label introduced as a mandatory disclosure or via voluntary industry agreement would be likely to correspond to a more integrated or holistic measure of environmental impact than we used in this study, where just one environmental impact per product was presented to consumers (e.g., energy efficiency during manufacturing, recyclability of packaging). The purpose of the study was not to design, test, or advocate a particular labelling policy, but to examine how different means of presentation might influence busy consumers. Hence, the results do not indicate whether a particular labelling policy works or not. Rather, they provide evidence for policymakers, researchers, and others involved in the consumer sphere who might wish to design an effective means of displaying environmental impact.

Given these limitations, the results nevertheless have policy implications. They suggest that taking a behaviourally driven approach to developing labels could allow more consumers to integrate environmental information into decisions even when faced with challenging shopping situations. They imply an advantage for standardized, colour-coded eco-labels when shoppers are under cognitive load, such as when completing a large grocery list. The findings indicate that, relatively speaking, consumers may ignore or down-weight more complex verbal descriptions of environmental information when faced with a challenging shopping task. More generally, therefore, the results provide an empirical illustration of the benefits of simplifying complex information for consumers (Sunstein 2011), although doing so naturally involves conveying less precise information. Whether an eco-label is mandatory or voluntary, widespread, or industry specific, the present study implies benefits to a simple, standardized, colour-coded design, especially if other aspects of the decision-making context are likely to be challenging for consumers.

The results also question the idea that providing specific environmental information reduces greenwashing because it increases trust. We did not measure trust explicitly, but any positive influence it may have conferred for choosing environmentally friendly products was outweighed by other factors. If consumers do not have resources to analyse specific information, which takes time to read, process, and compare, they may be unable to integrate it into their decisions. Other findings suggest that product packaging limits the extent to which specific information can counter greenwashing (Hahnel et al. 2015).

Conclusions

Grocery and household shopping may be undertaken in a context where consumers must make many multi-attributable decisions and juggle multiple priorities, attending to different product attributes in pursuit of different goals, often under time pressure. In such contexts, eco-labels that aim to promote environmentally friendly shopping habits and decisions are more likely to be effective if presented in a way that makes environmental information easy to integrate into decisions. A standardized, colour-coded eco-label can outperform an eco-label that presents the same comparison via specific, verbal information. By contrast, in contexts where decisions involve multiple products and attributes, consumers may be unaffected by whether environmental information is framed positively or negatively. These empirical results do not indicate which eco-labelling policy is best, nor the extent to which purchase of environmentally friendly products could be increased with standardized labelling, but they do show that when designing eco-labels for everyday products, a simplified format may help consumers to integrate environmental information into their decisions.