Introduction

Over the last decades, firms have started using an increasing number of voluntary social and environmental labels on their products. Some firms have changed their production and distribution practices to reduce their carbon footprint; coffee retailers have introduced new practices into their supply chains to ensure better working conditions for the farmers; developers have included innovative design features into new buildings to reduce consumption of energy, water, and materials. These innovations are often “hidden” and difficult to observe (Terlaak 2007) so firms seek various ways to inform consumers about them. Products might be labeled as “carbon zero,” “fair trade,” or “organic” to communicate the firms’ social and environmental innovations to consumers (Harbaugh et al. 2011; Hartlieb and Jones 2009) and retailers report increased demand for “green” products (TerraChoice 2010). Yet such growth is also accompanied by a growth in green-washing (Delmas and Cuerel-Burbano 2011) and increased confusion related to sustainability claims and their credibility. It has been proposed that the “credibility gap” (Dando and Swift 2003) can be narrowed by the use of third-party independent assurance. A growing number of firms indeed turn to third parties to substantiate their claims with externally verified eco-labels. For instance, organic produce can carry the USDA Organic certification, fair trade coffee can be Fairtrade or UTZ certified, sustainable buildings can have LEED or BREEAM certifications, and sustainable forestry practices can earn certifications from the Forestry Stewardship Council (FSC), Sustainable Forestry Initiative (SFI), or the Programme for the Endorsement of Forest Certification (PEFC).

Eco-labels are a form of private regulation (Smith and Fischlein 2010; Henson 2011). They are provided by independent labeling schemes, which act as certification intermediaries and which offer certification services to interested parties. There exist over 400 eco-labels that firms can choose from, according to the reports from portals such as ecolabelindex.com or standardsmap.org. Although many eco-labels involve third-party assurance, the overall assurance process (including the standard-setting and conformity assessment practices associated with individual eco-labels) exhibits striking differences. For instance, an eco-label may or may not involve an open- and consensus-based standard-setting process (Golden et al. 2010), may or may not be under governmental control, and may or may not include a chain-of-custody requirement. On the audit side, certification can be first-, second- or third-party, by verifiers who may or may not be accredited, and may or may not involve field site visits. This variation in assurance practices has led many to conclude that eco-labels are confusing (Harbaugh et al. 2011). In part to help reduce that confusion, various services such as ecolabelindex.com (by Big Room Inc.), standardsmap.org (by the International Trade Centre), and greenerchoices.org (by ConsumerReports) are beginning to compile information allowing users to characterize eco-labels based on the presence or absence of such assurance practices. In this paper, we investigate whether specific assurance practices contribute to good governance of eco-labels. We use the term “governance” broadly, defining it as “the process by which requirements of an eco-label are set and enforced.”

This question is important for several reasons. First, existing literature argues that better-governed eco-labels will be more widely adopted, and hence have greater impact on sustainability outcomes. Well-governed networks ensure that participating firms adhere to program obligations and that they sanction free-riders and shirkers (Prakash and Potoski 2007; Hartlieb and Jones 2009; Schuler and Christmann 2011). Such regulatory consistency ensures that an eco-label can maintain a network of partners, and hence sustain the resources and relationships that are needed to maintain the eco-label’s viability (Smith and Fischlein 2010). Though good governance is not the only factor influencing an eco-label’s impact on sustainability outcomes, a persuasive case can be made that, all else being equal, a better-governed label will have a greater impact on sustainability, which means it is important to know which assurance practices contribute to good governance.

Second, firms, governments, NGOs, and retailers alike use eco-labels for multiple reasons, such as to communicate sustainable attributes of products and services, to set procurement policies, or to mitigate risk. In each case, these stakeholders need to choose which labels to support and adhere to, so they need to be well informed about how well-governed any given eco-label is. At the moment, the emerging information services such as ecolabelindex.com, standardsmap.org, and greenerchoices.org characterize labels based on the presence or absence of various assurance practices. While helpful as a first-order screening device, such binary characterization may not be adequate to fully convey the many variations in how labels implement various assurance practices.

Third, labeling schemes are voluntary standards that are developed by private institutions. They assume the role of a regulator, a role typically assigned to governments (Smith and Fischlein 2010). Yet, unlike in accounting, there is no commonly accepted or legal standard for eco-labels. Efforts are ongoing to define “best” practices in governance and assurance for eco-labels, such as the ISEAL Code of Conduct (ISEAL 2010a), the recently produced ISEAL Credibility Principles (ISEAL 2013), or various ISO standards for eco-labels. Such efforts are in their infancy though, and have very little scholarly literature to draw on to directly inform them. There is a rapidly growing literature on topics such as governance, legitimacy, effectiveness, and impact of eco-labels, but most of that work is still conceptual or focuses on a single label or single sector. In his summary of the articles in a special issue of Agriculture and Human Values derived from a symposium on private agrifood governance, Henson (2011, p. 446) writes, “While some of the articles do compare instances of private governance, the symposium as a whole highlights the need for more comparative analysis. In particular, we need to focus on why these differences exist and their consequences for legitimacy.” This is essentially the question we address here.

To add a broader empirical perspective to the predominantly conceptual or single-sector research on eco-labels, we look at governance of 41 eco-labels, and combine data from three sources. The data on assurance practices of eco-labels are obtained from ecolabelindex.com. We measure the quality of governance of eco-labels using two perspectives: expert opinion and media coverage. We asked 67 experts to rate the 41 eco-labels in terms of their governance. The experts were from France, Germany, Hungary, Ireland, Spain, Sweden, the UK, and the US, and represent large retailers (typically purchasing or sustainability directors, who manage the use of eco-labels in their supply chains), corporate purchasers (who set their firms’ purchasing policies), consumer associations (who act as watchdogs of eco-labels), policy-makers (who set governmental policies related to eco-labels, including purchasing policies for governmental agencies), and consultants from leading accounting and professional service firms (who advise their clients on matters related to eco-labels). These experts have hands-on experience with eco-labels at a senior level and are well positioned to provide an assessment of the quality of governance of eco-labels. For the media perspective, we used a sample of 3,034 articles drawn from the major world publications dataset from Lexis-Nexis, a set of full-text news sources from around the world which are held in high esteem for their content reliability (LexisNexis 2013).

This paper aims to make several contributions. First, it combines the literature on voluntary regulation with that emerging in the accounting and auditing literature, and applies them to the rapidly growing domain of eco-labels, where governance is key. Second, our work suggests that experts and media both find schemes with more external parties involved to be better governed. The only factors that contribute to experts rating eco-labels as well governed are the presence of independent accreditation and governmental control. This suggests that the specifics of the design of an eco-label may be less important than the presence of external parties in the assurance process. Similarly, the only factor that contributes to more positive media coverage of eco-labels is the presence of open- and consensus-based standard-setting. This suggests that again the presence of external stakeholders is more important for positive media coverage than any other specific assurance practice of an eco-label.

Below, we provide more background on assurance and governance of eco-labels. We then formulate our hypotheses, and describe our data and methods in more detail. We then present our results, and end with broader speculative implications of our work. By looking for patterns among a larger-than-usual set of eco-labels, we hope to provide some directions for future work that can go into greater depth on a smaller, more tightly controlled set of labels than we do here.

Voluntary Standards, Eco-Labels, Assurance Practices, and Governance

In this section, we provide some background and review relevant literature on voluntary standards, eco-labels, and on the link between assurance practices and governance. In the next section, we formulate our hypotheses.

Voluntary Standards

The last two decades have seen rapid growth in voluntary regulation and voluntary standards, with corresponding assurance practices. Under voluntary regulation, private institutions (rather than elected governments) create and enforce rules, a form of “soft law” (Mörth 2004). From a political science perspective, voluntary regulation therefore symbolizes a shift of power from governments to global networks of interacting institutions (Richardson 2009). Fuchs et al. (2011) emphasize that when such voluntary standards are developed by large downstream firms, their market power may mean that upstream suppliers have no choice but to accept them, giving such “voluntary” standards the same effective power as governmental regulation, but without the legitimacy which may be associated with governmental action. This creates a need to examine what else voluntary standards (can) do to increase legitimacy. Fuchs et al. (2011) summarize the main areas as participation, transparency, and accountability, all of which fall within our broad definition of governance of an eco-label. Voluntary standards spread across the globe as well as across industries (most notably in forestry, fishing, food, tourism, retail, etc.). Voluntary standards cover a variety of issues, including carbon emissions, labor conditions, pollution, use of chemicals or energy efficiency.

Voluntary standards contribute to the global order by enforcing the “soft law” and related requirements. They also serve as instruments that enable transactions between organizations and contribute to reducing information asymmetry about products and their (often hidden) characteristics, as for instance Balineau and Dufeu (2010) discuss in more depth. For instance, firms often rely on certifiers to conduct independent audits of their operations, allowing them to then use the results of those independent audits to serve multiple buyers. A good example is the ISO 14001 certification for environmental management systems. Firms seek ISO 14001 certification in the hope of avoiding the need for a separate audit from each buyer, and as a signal to the broader marketplace about their commitment to sustainability. Many scholars also characterize voluntary standards as tools for conveying information to consumers (Anderson and Hansen 2004). Hence, voluntary standards can be seen as part of a market mechanism, of particular value in international business (as we illustrate quantitatively below in the specific context of eco-labels). Voluntary standards may also have an effect on governance of society beyond the scope of their value chain, as Tallontire et al. (2011) discuss; they find that two voluntary standards in the Kenyan agribusiness sector have had an impact on legislative and judicial governance in Kenyan society, but that executive governance is dominated by the private sector participants. Busch (2011) warns that the emergence of voluntary standards in the agribusiness sector may lead to a system where only consumers who can afford to pay more will be able to buy produce that meets clear safety standards. Partzsch (2011) highlights the trade-off between legitimacy and effectiveness of two biofuel certification schemes, while Gregoratti (2011) examines an agrifood initiative in Kenya. Both find that the fact that stakeholders are included in a scheme does not necessarily mean they carry much weight in its actual governance.

Eco-Labels

Eco-labels are a form of voluntary standards. There are currently over 400 eco-labels or, more precisely, eco-labeling schemes, that set these voluntary standards and provide verification and certification services to firms and their supply chains. A recent report by the International Institute for Sustainable Development (Potts et al. 2014) documents the growing influence of eco-labels in the global market. For example, standard-compliant coffee reached a 40 % market share of global production in 2012 (up from 15 % in 2008). Other commodities with significant market shares (in terms of global production) in 2012 include cocoa (22 %, up from 3 % in 2008), palm oil (15 %, up from 2 % in 2008), and tea (12 %, up from 6 % in 2008).

ISO 14024 defines eco-labeling schemes as “voluntary third party programs that award labels based on independent audits” (ISO14024 2001). They are also a form of “soft law.” For instance, FSC certification is a voluntary standard and hence a form of voluntary regulation. It emerged mainly because national governments were unable to address critical environmental issues in forestry. FSC created rules for sustainable forestry, established a certification scheme, and attracted key players in the industry, upstream and downstream, to adopt and follow their requirements. The label facilitates transactions between buyers and sellers that care about sustainable wood, and provides consumers assurance that the product comes from sustainable sources. In the framework of Gimenez and Sierra (2013), eco-labels would be a form of supplier assessment, one of the two mechanisms they identify for making supply chains more sustainable. To further ensure that eco-labels on downstream products have the intended upstream effects, labels frequently also include some form of chain-of-custody requirements, through physical segregation, mass balance, or other. Regardless of the specifics of an eco-label, it is clear that the process by which compliance is assured is key (Potts et al. 2014).

Assurance Practices

Voluntary standards, including eco-labels, rely on multiple assurance practices to set their standards and enforce them. Auditing is a critical part of social and environmental voluntary standards. Darnall et al. (2009) define environmental auditing as a “management tool that systematically documents and periodically evaluates how well an organization’s management practices and equipment are safeguarding the environment.” There is a wide range of assurance practices among voluntary standards. Audits can be first, second, or third party, referring to whether firms audit themselves or whether an independent party conducts the audit. Audits also vary in depth; in same cases, the auditor only reviews documentation provided by the firm, while in other cases, the audit includes site visits to physically verify compliance with the standards’ requirements. The scope of assurance also varies: an audit can cover a single facility or may involve assuring chain of custody across suppliers and buyers. Some voluntary standards have multiple layers of control (Dranove and Jin 2010), involving independent accreditation bodies overseeing multiple certification agencies. Finally, the process by which a standard is set is also a key assurance practice. Some voluntary standards are created by professional or industry-led associations, in which case standard-setting is mainly driven by members of an association. Other voluntary standards follow an open- and consensus-based standard-setting process, involving multiple stakeholders directly or through a consultation process (Balzarova and Castka 2012; Fransen and Kolk 2007; Tamm Halström and Boström 2010).

There is a rapidly growing literature on governance, legitimacy, effectiveness, and impact of eco-labels. However, most of that work is still conceptual or focuses on a single label or single sector. For instance, Raynolds et al. (2007) compare governance of five labels in the coffee sector; Silva-Castaneda (2011) examines one scheme for palm oil, which Partzsch (2011) compares with another; Dhanda and Hartman (2011) focus on carbon offset schemes; etc. Blackman and Rivera (2011) is an exception, reviewing studies of the effectiveness of a wider range of eco-labels from the producer’s perspective, finding little empirical evidence either way. Henson (2011) calls for more comparative analysis, also commenting (pp. 444–445) that “Analysis of the impacts of private governance is often hampered by a paucity of empirical evidence and/or weak theorizing that make it difficult to generalize across contexts.” A few reports analyze the prevalence of various assurance practices among a wider set of eco-labels (Golden et al. 2010; Potts et al. 2014), but overall, an empirical comparative approach is still largely absent. That is what we provide in this paper.

Governance

The eco-labeling literature addresses various aspects of governance, ranging from conceptual studies that highlight the link between credibility and effectiveness of eco-labels (Schuler and Christmann 2011) to studies that investigate the actual environmental and social impact of eco-labels (Johansson and Lidestav 2011) or those measuring the perceived credibility of eco-labels among consumers as reviewed in Leire and Thidell (2005).

Winters-Lynch (1994) describes the determinants of effectiveness of eco-labeling and certification schemes as consumer awareness, consumer acceptance, consumer behavioral change, and end benefits. Similarly, Delmas et al. (2013) advise managers to evaluate eco-labels in terms of consumer awareness and understanding, consumer confidence, and willingness to pay. Potoski and Prakash (2005) argue that voluntary standards need to be used by firms and need to have an impact to be credible. Schuler and Christmann (2011) agree, arguing that stringency and enforcement of requirements on one hand and promotion of the eco-label on the other hand lead to socially responsible behavior of firms and to consumer demand—hence a credible eco-label. Simpson et al. (2012) also emphasize the importance of selecting the appropriate degree of stringency and accompanying governance mechanisms for a voluntary standard to be effective.

Most related to our research is the work by the ISEAL Alliance, a non-governmental organization whose mission is to strengthen sustainability standards systems for the benefit of people and the environment. Over the last decade, ISEAL has produced various codes of practice for eco-labels, covering standard-setting, governance, and impact assessment of eco-labels (ISEAL 2010a). ISEAL was founded by four key certification organizations: FSC, the International Federation of Organic Agriculture Movements (IFOAM), Fairtrade, and the Marine Stewardship Council (MSC), and has expanded since. Recently, ISEAL produced a set of 10 “credibility principles”: sustainability, improvement, relevance, rigor, engagement, impartiality, transparency, accessibility, truthfulness, and efficiency (ISEAL 2013). No metrics and data exist yet for these attributes, but as metrics are developed and data become available, a larger replication of the current study would be worthwhile, to investigate which dimensions matter most for overall quality of governance.

Hypothesis Development

In this section, we discuss the linkage between assurance practices (in terms of standard-setting and conformity assessment) and governance of eco-labels. We assess governance of eco-labels from two perspectives: eco-labeling experts and media coverage. We first explain why we use these two perspectives, and then discuss each in more detail below.

In the absence of an established framework for governance of eco-labels, we combined insights from various streams of literature (discussed in the previous section) with insights from our interviews with eco-labeling experts (discussed later in the “Data” section) to determine our approach to assessing governance of eco-labels. Two key issues emerged, leading to our choice to focus on experts and media coverage.

First, uptake of eco-labels is driven by B2B networks rather than consumers. For instance, Mike Barry, Head of Sustainable Business at Marks and Spencer, commented during his address at the 2013 ISEAL conference that eco-labels will increasingly address the supply chain rather than consumers, as consumers make their purchase decisions in 30 s and retailers cannot put 10 labels on all their products. The description in Ingenbleek and Reinders (2013) of how supermarkets rather than consumers drove adoption of various eco-labels is consistent with this view, and we have heard similar arguments from many other leading practitioners who we have met at various symposia. Multiple stakeholders are involved in this voluntary regulatory space, each with a distinct role and each influencing the use of eco-labels. Primary stakeholders, such as retailers and manufacturers, produce, distribute and sell eco-labeled products, while secondary stakeholders, such as NGOs and consumer associations, often monitor eco-labels. Each stakeholder faces a different set of incentives. A consumer may mostly care about whether the label addresses issue s/he cares about, while a retailer may care more about availability and continuity of supply. Therefore, we chose to gather data on governance of eco-labels from experts representing multiple stakeholder groups rather than just consumers.

Second, we observed that the media perspective is potentially important. Hand in hand with growing demand for sustainable produce is an increase in media attention to firms’ environmental practices. Firms have often been forced to incorporate environmental audits or undergo third-party certification in response to media reports of environmental scandals and labor issues. NGOs often use the media to enforce changes in firms’ environmental or social practices, as in the examples of Danone, BP, and Nike analyzed by Besiou et al. (2013). On several occasions, that led to the establishment of an eco-label, such as when Greenpeace’s media pressure on Home Depot led to the creation of the FSC labeling scheme (Conroy 2007). Media play an integral role in voluntary regulation, one that has been mostly omitted in previous studies. We therefore use the tenor of media coverage as our second measure of governance of eco-labels. Next, we develop our hypotheses on how the assurance practices affect experts’ assessment of governance and media coverage of eco-labels.

Effects of the Assurance Practices on Experts’ Assessment of Governance of Eco-Labels

Bouslah et al. (2010) argue that first- and second-party certifications are not credible, due to inherent conflicts of interest, implying that standards that require third-party certification are superior to those where it is optional. Kollmuss et al. (2008), cited in Dhanda and Hartman (2011), include third-party verification as a requirement for carbon offset standards to be credible. Various studies have demonstrated that voluntary programs deliver better results if they involve external monitoring: Potoski and Prakash (2005) and Darnall and Kim (2012) in the context of ISO 14001 certification, Graffin and Ward (2010) in the context of Baseball Hall of Fame, or Behnam and MacLean (2011) for international accountability standards. Conversely, King and Lenox (2000) demonstrate that industry programs without external monitoring fail to improve facilities’ environmental performance. External assurance also enhances disclosure credibility, as in Hodge (2001) who demonstrates that financial analysts rate audited disclosures higher than non-audited ones; Mercer (2004) makes a similar point. Dranove and Jin (2010) argue that certification intermediaries are crucial for the quality of disclosure. Voluntary standards and eco-labels also need to have effective governance in place to reinforce their standards and to monitor firms that are certified under their program (Potoski and Prakash 2005). If voluntary programs fail to sanction the shirkers, the program may be seen as less credible by stakeholders (Mercer 2004).

Several specific attributes are asserted to be important elements of governance of eco-labels, though such assertions are not based on empirical tests. The introduction to social and environmental labels by ISEAL (2010a) mentions the need for an open- and consensus-based standard-setting process. A survey of eco-labeling thought leaders by ISEAL (2010b) suggests that verification (3rd-party audits and accredited verifiers), open- and consensus-based standard setting, and a transparent governance model are considered key to building trust in a scheme. Schepers (2010) and Golden et al. (2010) mention field site visits and chain-of-custody requirements (among others) as differentiating factors between labeling schemes. Mueller et al. (2009) provide a conceptual comparison of four schemes (ISO 14000, SA 8000, FLA, and FSC) using five criteria: inclusivity, discourse, control, supply chain, and transparency, which can be mapped (though not in a one-to-one fashion) to the attributes mentioned previously, i.e., third-party auditing, accreditation of verifiers, open- and consensus-based standard-setting, and to some extent the chain-of-custody requirement. Some schemes are managed by government agencies (e.g., Energy Star, USDA Organic, EU Flower). As long as the governments involved are not perceived as corrupt or weak, governmental control could lead to a label being (perceived as) better governed, as government agencies face less commercial pressure to compromise (though they are also not immune). Comparing eco-labels for organic produce across four countries, Sønderskov and Daugbjerg (2011) find that consumers have higher confidence in labels with substantial government involvement. We predict that, all else being equal, eco-labels which adopt any of the six assurance practices mentioned above will ultimately be seen as better governed, leading to our first hypothesis:

Hypothesis 1

Presence of any of the following assurance practices is associated with an eco-label being considered by experts to be better governed: third-party audits, accredited verifiers, field-site visits, chain-of-custody, open- and consensus-based standard-setting, and governmental control.

We do not claim that these six are the only assurance practices that are relevant, but they are among the key practices and the ones we focus on in this study. We define these practices more precisely in the Data section. We test this hypothesis for each of the assurance practices separately. If, for instance, labels with field site visits would not be considered better governed, all else being equal, than labels without such visits, then Hypothesis 1 would be rejected for that specific assurance practice.

Effects of the Assurance Practices on Tenor of Media Coverage of Eco-Labels

Communication—through annual reports, press releases and media coverage—plays a central role in environmental accountability (Aerts and Cormier 2009). Media coverage is particularly important as it significantly shapes firms’ disclosures; increased media coverage leads to higher propensity of environmental disclosures (Bewley and Li 2000) and firms use various communication strategies to respond to media coverage (Clarkson et al. 2008). Media also play an important control role in environmental accounting. Sustainability reports and voluntary standards operate outside of traditional democratic processes, and media in fact facilitate environmental governance of voluntary regulation by enhancing transparency and legitimacy in areas where traditional democratic processes are not involved (Martinelli and Midttun 2010). This control role can be crucial as we have witnessed in cases of corporate accounting scandals that were uncovered by journalists. Similarly, media sometimes play an important role in eco-labeling, sometimes even leading to the establishment of eco-labels (Conroy 2007).

When firms experience crises or accounting scandals, the media will scrutinize their disclosures, reports, press releases, and audit reports (Andon and Free 2012), so one would expect that they will also scrutinize eco-labels and their assurance practices. The reputation-reality gap (Eccles 2007) is a key reputational risk for a firm, one often picked up by media, so one may similarly infer that the media are more likely to attack labels (or firms certified with those labels) that are poorly governed. Schuler and Christmann (2011) propose that better governance enhances credibility of an eco-label, which would also imply that media coverage of such labels will be more favorable. Dhanda and Hartman (2011) cite negative media coverage of the carbon offset market as something offset providers should be concerned about. Therefore, analogous to our first hypothesis, we expect that media will report more favorably on eco-labels that have multiple assurance practices in place. Anecdotal evidence also exists of media reporting more favorably on some of the assurance practices that we study, such as third-party certified claims and disclosures (Dickson and Eckman 2008) and open- and consensus-based standard setting (Conroy 2007). We therefore hypothesize that:

Hypothesis 2

The presence of any of the following assurance practices is associated with more favorable media coverage: third-party audits, accredited verifiers, field-site visits, chain-of-custody, open- and consensus-based standard-setting, and governmental control.

As for Hypothesis 1, we test this hypothesis separately for each of the assurance practices.

Data

This project started with an exploratory study in which we interviewed leading experts about eco-labels and their governance, to help inform the design of the main study. The main study started by creating a set of eco-labels, drawing from the 336 eco-labels then listed at www.ecolabelindex.org (as of November 2010). Of the full set, 56 eco-labels had sufficient media coverage in the Lexis-Nexis World Major Publications database. We compiled data on these 56 eco-labels from three sources. Data on assurance practices were obtained from www.ecolabelindex.org. Data on quality of governance were obtained from a survey of eco-label experts. Data on media coverage were obtained by coding 3,043 articles in Lexis-Nexis. After merging and cleaning, we ended up with a dataset of 41 eco-labels. We describe the exploratory phase and the three data sources in more detail below, including reporting on the various steps we took to assess validity and reliability of our measures; in the following section, we describe how we tested our hypotheses and the results we obtained. A team of MBA students assisted with several parts of the data collection.

The Exploratory Study on Governance of Eco-Labels

The MBA student team conducted 32 in-depth interviews with a wide variety of experts from major retailers, corporate purchasers, government regulatory agencies, NGOs and consumer groups, consultants and other stakeholders, from France, Germany, Hungary, Ireland, Spain, Sweden, the UK and the US. To avoid biases, we did not include any experts from standard-setting and labeling organizations such as Fairtrade International, which owns the Fairtrade certification, or the U.S. Green Building Council, which develops the LEED green building rating system. These interviews had two objectives: first, to verify whether our understanding of key issues in the practice of eco-labeling was consistent with that of well-informed experts, and second, to verify whether using a survey of experts to assess governance of eco-labels was likely to be meaningful.

The interview protocol included questions on the primary motivations for firms to pursue certification, what constitutes a successful eco-label, the role of governance, the challenges involved with eco-labels, and the future evolution of eco-labels. We avoided using the term “credibility” because this term was being used in relation to the subset of eco-labels that are members of ISEAL, which was developing its Credibility Principles during this phase of our study. We drew three key observations from these 32 interviews.

First, our experts generally agreed that that acceptance and impact of an eco-label matter the most in defining a successful label. In other words, an eco-label must able to attract prospective stakeholders (consumers, retailers, producers) and remain credible to them. At the same time, a label must have an actual social or environmental impact. The interviews indicated that a label’s ability to attract key actors as well as its ability to make an actual impact was all considered part of its governance. In other words, a credible label is well governed, so we used the term “governance” in our subsequent expert survey.

Second, between them, the experts mentioned various assurance practices that matter in eco-labels’ governance, and although several practices were mentioned repeatedly, no single one stood out as being the most important. In other words, “governance” could not be reduced to a narrower, more precisely defined construct.

Third, even though the specific practices that experts mentioned differed, the consistency in the overall way they commented on governance reassured us that surveying a population of experts about a wide range of eco-labels would be meaningful. Also, there was no clear distinction between the experts’ perspectives based on their background. For instance, some experts from retailers and purchasers mentioned that eco-labels need to have an open- and consensus-based standard setting process, a perspective one might more typically associate with NGOs. Conversely, several NGO-based experts mentioned that eco-labels need to be accepted by the marketplace, a quote one would more likely expect from retailers. In short, we found that individual experts may differ on the relative importance of specific assurance practices, but there was no evidence of any bias based on their professional background. This observation is similar to that of Highhouse et al. (2009). The quotes in Appendix A illustrate these three observations.

Assurance Practices of Eco-Labels

Our study investigates governance of eco-labels, which we define broadly as the process by which the eco-labels’ requirements are set and reinforced. We selected a set of assurance practices that significantly influence eco-labels’ governance. In making this selection, we were guided by the data available from ecolabelindex.com and also by the interviews that we conducted with experts during the exploratory phase. We do not claim that this set of assurance practices covers all aspects of governance, but we do believe they form an important subset. This led to the selection of the following six assurance practices (using some of the terminology as it appears on ecolabelindex.com):

  • Governmental control is a dummy variable to indicate whether ecolabelindex.com lists the organization managing the eco-label as being of type “government.”

  • Third-party audits indicates whether compliance with the eco-label’s standard is ensured by an independent third-party organization (as opposed to by the organization managing the eco-label, which would be second-party certification).

  • Verifiers accredited indicates whether the organizations performing the compliance audit are accredited, whether by the organization managing the eco-label or by an independent organization.

  • Chain-of-custody indicates whether chain-of-custody data are used in the conformity assessment process.

  • Field site visits indicates whether the verifiers perform field site visits during the conformity assessment process.

  • Standard setting indicates whether the standards for the eco-label were developed using an open- and consensus-based process.

We extracted our data on these assurance practices from www.ecolabelindex.org. We also manually verified these by looking at the websites maintained by the organizations behind those eco-labels and corrected some errors. We defined binary variables, where “1” (“0”) signals the presence (absence) of an assurance practice. (We discuss the inevitable limitations of this approach later.) We included a control for “Year Established” to allow for the possibility that older schemes may have more assurance practices in place and may be considered better governed simply because they have operated longer.

Expert Measures

We collected data about perceived quality of governance by inviting experts to rate each of the initial 56 labels on a 5-point scale. We again sought to include experts from five broad stakeholder groups involved in eco-labeling: major retailers, corporate purchasers, government regulatory agencies, NGOs and consumer groups, and consultants. In order to identify appropriate experts, we followed a two-step approach. Many of the experts interviewed agreed to complete the survey. We also reached out to individuals who we were familiar with, and identified further experts through interviews, referrals, and web search. We aimed to have at least 10 experts in each category, following the advice of Highhouse et al. (2009) that “reasonably stable estimates” are gained with 5 experts, and “little incremental gain” is achieved with more than 10 experts. Anticipating a 20 % response rate we sought to identify approximately 50 experts in each category. Altogether we invited 312 experts by email to participate in the short survey, which was only accessible using the link in the email. The first 5 experts responded to an email invitation that was not category-specific. After that, we received 20 usable survey responses from the consumer groups and NGO category, 12 from corporate purchasers, 14 from individuals in government positions, 12 from consultants and academics, and 4 from buyers at retailers. In total, we received usable responses from 67 experts, giving a 21 % response rate. In light of the observations (cited above) by Highhouse et al. (2009), this number of responses gives us confidence that the expert measures are reliable; we discuss their validity below.

Because we were asking each expert about 56 eco-labels, we had to keep the questions on each label to an absolute minimum.Footnote 1 The survey asked experts to respond, on a five-point Strongly Disagree to Strongly Agree scale, to the following statement about each eco-label: “This labeling scheme is well-governed.” From the interviews with experts, and from many interactions we have had over the years with a wide range of practitioners, we are confident that our experts interpreted the term “governance” appropriately. We also refrained from using multi-item scales as their incremental information is likely to be extremely small in this case (Drolet and Morrison 2001), especially in light of the large number of eco-labels that each expert was asked about. Rather, we kept the questions distinctive and simple to minimize any halo effect and to maintain experts’ focus. We did not anchor the scales, as we are only interested in the experts’ relative ranking of each label, not the absolute scores. Any anchor that we might have used would have potentially introduced a bias to their responses. The survey was conducted on-line using Qualtrics software. Each scheme appeared individually on the screen together with the three questions and an option to skip the evaluation of the label (“I don’t know the scheme”). The sequence in which the labels were presented was generated randomly for each expert.

While these expert survey data have inevitable limitations relative to a single-label design, the responses do have substantial face validity. For instance, the average governance score for LEED (rated by 40 experts) is 3.93, compared to 2.83 for the Green Building Initiative’s Green Globes (rated by 12 experts), consistent with a widespread view that LEED is the better governed of the two. Similarly, the score for FSC was 4.02 (rated by 40 experts), compared to 3.06–3.28 for CSA (16 experts), PEFC (18 experts), and SFI (27 experts), the other forestry schemes. This variation in number of responses also confirms that experts were comfortable skipping schemes with which they were not sufficiently familiar.

Further reassurance for the validity of the expert data comes from the observation that they rated Fairtrade and Max Havelaar almost identically: 3.53 versus 3.50, with 33 and 15 experts in the final sample. (Using the 17 experts in the original full sample that rated both Fairtrade and Max Havelaar yields a similar comparison: 3.47 vs. 3.53.) Fairtrade is more widely known, especially in the US, but in fact the two standards are identical and mutually accepted as equivalent (see Ingenbleek and Reinders (2013) for a detailed description of the two). The data suggest that the experts who were familiar with Max Havelaar rated it, correctly, the same as Fairtrade, while experts who were not familiar with Max Havelaar did not rate it at all. Altogether, this does indicate that the experts had a common understanding of the term “governance,” despite the lack of a very precise definition, and that the experts’ rating of the governance of eco-labels is valid measure of their actual overall governance.

Media Coverage Measure

Data on media coverage spanning the period 2005–2010 were drawn from the Lexis-Nexis database of “Major World Publications.”Footnote 2 The search for coverage of our 56 labels yielded 8,486 potentially relevant articles. The most covered eco-label had 1,186 articles, the least covered 10 articles. We sampled from this pool using the following procedure. For each eco-label with fewer than 25 articles, all articles were selected for coding. For eco-labels with more than 25 articles, a total of 25 plus 25 % of the remaining number of articles were selected. To cover the entire 5 year period, the articles on each label were ordered chronologically and we used every 4th article. Some articles were deleted if during the coding they turned out not to describe the eco-label in question. This sampling procedure is commonly used in media research and a sampling fraction of 25 % is well above the usual practice (Deephouse 2000). The sampling provided 3,043 articles for analysis.

The recording unit of analysis (Weber 1985) is a single article about a labeling scheme. Some articles covered multiple labels, and in these instances, the article was coded for all labels separately. Following common practice in media research, we coded each recording unit as favorable, neutral, or unfavorable (Deephouse 2000; Janis and Fadner 1965; Pollock and Rindova 2003; Pollock et al. 2008) The rating was defined as follows. First, we defined an overarching question: “After reading this article, do you feel substantially more positive or substantially more negative about the scheme?” A recording unit was ranked as favorable or unfavorable when it contained evaluative content, whereas neutral statements purely reported facts (Pollock et al. 2008). Evaluative content took various forms, such as an endorsement (“the most prestigious labels”), assessment of the ‘quality’ of the labeling scheme (“this is the most stringent label”), data demonstrating tangible impact of the scheme (such as the number of adopting organizations or improvements observed in adopting organizations), etc. For articles that contained multiple accounts (Lamertz and Baum 1998), we coded each paragraph as positive, neutral, and negative toward the scheme. Following previous studies (Pollock and Rindova 2003), an article with relatively equal instances of positive and negative references was coded as neutral.

A research team of five MBA students was trained to code the articles. The training was continued until we reached coding consistency of .86 in inter-rater reliability as measured by Cohen’s kappa (Miles and Huberman 1994). Other media research recommends this threshold and reports similar reliability, i.e., .86 (Pollock and Rindova 2003), .91 and .83 (Deephouse 2000), and .86 (Pollock et al. 2008). At the end of the training, we selected the three coders who were the most consistent and reached Cohen’s kappa of .86 or higher. Given the volume of coding required, the 3,043 articles were divided equally among the three coders, rather than have multiple coders for each article. To minimize the effect of any remaining inconsistency in coding, each student was given a random set of articles.

Following common practice in media research, we measured media reputation by overall tenor of media coverage (Deephouse 2000; Janis and Fadner 1965; Pollock and Rindova 2003; Pollock et al. 2008). The tenor was calculated using the Janis-Fadner coefficient of imbalance:

$$ Tenor = \, \left( {f^{2} - fu} \right)/\left( {total} \right)^{2} \qquad {\text{if}} \; f > u; \, 0 \; {\text{if}} \; f = u; \, \left( {fu - u^{2} } \right)/\left( {total} \right)^{2} \qquad {\text{if}} \; u > f $$

where f is the number of positive articles, u the number of unfavorable ones, and total the total volume of articles about each scheme. The range of this variable is (−1,1) where 1 indicates all positive coverage and −1 all negative. Appendix B includes some examples of favorable and unfavorable mentions.

Validity is less of a potential concern with the media coding than it was with the expert measures. In line with the adage that “perception is reality”, one could argue that if a reader considers a particular article to be unfavorable, then it is in fact unfavorable; there is no underlying “true” tenor beyond that which is perceived by the reader. If different readers interpret the same article differently, that is a reliability issue, which we addressed above. In short, based on the nature of the data, and on our iterative approach to training the raters, we are confident that the data on media coverage are sufficiently valid and reliable.

Final Data Preparation

We eliminated Energy Saving Recommended (as its MediaTenor of 0.4 is an outlier). For the main results that we present below, we only include schemes for which at least 5 experts responded to the survey and for which the MediaTenor was based on coding at least 5 articles. In our robustness tests, we included all experts, but for our main analyses, in order to reduce the possibility of less-informed experts driving the results, we eliminated the experts who deviated most consistently from the mean. To do this, we calculated the mean response across all schemes and experts.Footnote 3 For each expert we calculated, for each scheme, the absolute deviation between their response for that eco-label and the mean response for that eco-label. We added those absolute deviations across all schemes for which the expert responded, and divided by the number of schemes for which the expert responded. In the main results reported here, we eliminated the 20 % of experts with the highest average absolute deviation, and used the value of quality of governance based only on the remaining 80 % of experts. We performed robustness checks by including labels that were evaluated by at least 8 (instead of 5) experts, labels for which at least 10 articles were coded (instead of 5), and by using 90 % or 100 % of experts (instead of 80 %). We discuss these robustness checks in more detail later.

This yields a set of 41 labels for our main analysis. Table 1 shows the original set of 56 labels and the final set of 41. Descriptive statistics and correlations are shown in Table 2. Table 2 also shows that the variables “third-party audits” and “verifiers accredited” are heavily correlated. For that reason, we repeated the analyses with a new variable “third-party audits and verifiers accredited,” equal to 1 if and only if both original variables are equal to 1, to verify that this collinearity does not affect our results (this combined variable is identical to “verifiers accredited,” so this is equivalent to repeating the analysis omitting the variable “third-party audits”).

Table 1 List of eco-labels used in our study
Table 2 Descriptive statistics and correlations

Methodology and Results

The dependent variables in our hypotheses are the quality of governance of an eco-label as perceived by experts (H1) and the tenor of media coverage (H2). We use simple OLS regressions to test our hypotheses. Our dataset has several inevitable limitations, including a relatively small sample size (25–50 eco-labels). Therefore, we focus on simple analyses with substantial robustness checks, using different subsets of the data, rather than using more complex methods that would not be appropriate with our sample size. In this section, we describe our statistical findings; in the next section, we interpret our findings, place them in context, and discuss limitations of our work.

Hypothesis 1 predicts that presence of any of the assurance practices listed will be associated with the scheme being considered better governed. In the OLS regression in Table 3, eco-labels under governmental control, older eco-labels, and those with accredited verifiers are seen as better governed, whether or not one includes the variable “third-party audits.” The variance inflation factors for the latter two variables are below 4, and the other coefficients barely change; both of these facts indicate that multi-collinearity is not a concern (Hair et al. 1998, pp. 191–193). This provides mixed support for our first hypothesis only as it relates to governmental control and accreditation of verifiers; the presence of any of the other assurance practices has no positive effect on the experts’ assessment of governance. We discuss this further in the next section.

Table 3 Results of OLS regression to test Hypothesis 1

The next hypothesis (H2) predicts that more favorable media coverage is associated with the presence of any of the assurance practices we consider. The OLS results in Table 4 show that only open- and consensus-based standard setting is associated with more favorable media coverage. Note that the fit of the OLS models is poor, suggesting that the tenor of media coverage is explained by factors other than those we consider here.

Table 4 results of OLS regression to test Hypothesis 2

These results were obtained with our main sample, using 80 % of experts, and only including the 41 eco-labels rated by at least 5 experts and covered in at least 5 articles. Highhouse et al. (2009) found that going beyond 8 experts added little further accuracy. Therefore, we re-do our analyses with the 32 eco-labels rated by at least 8 experts. Similarly, we explore the effect of only including the 38 eco-labels covered in at least 10 articles, and the effect of both restrictions combined (30 eco-labels). Table 5 summarizes these robustness checks, which indicate that the results discussed so far are invariant under these changes. If we eliminate less (10 %) or none of the most dissonant experts, the noise in the expert assessments inevitably increases, but our results remain largely similar. In most cases, “verifiers accredited” is still significant in predicting the quality of governance, while other factors are still not; “governmental control” is sometimes significant but not consistently. In most cases, “open and consensus-based standard-setting” is associated with higher media tenor scores, while all other factors are not. In summary, we believe that our key findings are robust, though our work has several inevitable limitations which we discuss later.

Table 5 summary of OLS results with different subsamples

Discussion and Limitations

Discussion

Transcending our specific hypotheses, we can loosely organize our findings into two themes. First, the mere presence of most specific assurance practices has minimal effect on experts’ assessment of the quality of governance, while accreditation of verifiers and governmental control do matter. Second, the media appear more concerned with the participation of external stakeholders in the standard setting process of eco-labels rather than with the other assurance practices. We discuss both of these themes in turn.

The finding that the presence or absence of most individual assurance practices is not associated with experts’ assessment of the overall quality of governance of an eco-label is intriguing. That means, for instance, that an eco-label that requires field site visits is not, by itself, considered better governed than an otherwise identical eco-label that does not require field site visits. Similarly, third-party audits, chain-of-custody requirements, and open- and consensus-based standard setting do not matter by themselves. The only attributes that increase quality of governance in the eyes of the experts are governmental control and the presence of an independent accreditation scheme. Speculatively, we explain this in two ways.

First, each of the other attributes can be implemented well or poorly, and the experts recognize that the quality of implementation of these attributes matters more than their mere presence. For instance, experts may dismiss field site visits as contributing to governance of a particular label if they know that such visits tend to be cursory or not performed by independent auditors. Raynolds et al. (2007, p. 159) provide several examples of such variation in implementation among eco-labels in the coffee value chain. They claim, for instance, that UTZ Kapeh, which identifies itself as involving third-party certification, “resembles a second-party certification, since the NGO base has been created after the fact largely to legitimate a system that appears to cement the power of dominant distributors.” Similarly, still in their view, “Rainforest Alliance, for example, has a strong NGO base, but it excludes small-farmers, workers, and consumers,” indicating that open- and consensus-based standard-setting also comes in multiple shades. The discussion by Silva-Castaneda (2011), in the context of the Roundtable on Sustainable Palm Oil, illustrates the complexity and nuance involved in third-party audits, observing that what counts as “evidence” in an audit tends to favor corporations over local communities. Our work can therefore be seen as an empirical validation, across a wide range of eco-labels, that one needs to go deeper into the nuances of how a label is governed than simply tallying whether certain assurance practices are present or not.

Second, we interpret the positive effect of governmental control and accreditation of verifiers as a search for reassurance: if a trustworthy independent organization is involved with the management of the scheme, whether directly as in the case of government control, or indirectly as in the case of accreditation agencies, one may be more reassured about the overall governance of the scheme. In that view, eco-labels that provide assurance without reassurance would be perceived by experts as less well governed. Of course, governmental control and accreditation can also be implemented well or poorly, but on average the additional layer of oversight is considered reassuring.

The only factor that is associated with more favorable media coverage is whether the eco-label involves open- and consensus-based standard-setting. The fact that the presence of other assurance practices is not associated with more favorable coverage could be for the same reasons as why the experts do not rate those labels as being better governed, recognizing that quality of implementation of a governance practice matters more than its mere presence, or it could be because the media do not know enough about the eco-labels to know whether those dimensions are in fact in place. The fact that there is minimal correlation between the tenor of media coverage and the experts’ rating of governance (0.09 in Table 2) is quite surprising, and points toward the second interpretation, that the media seem to focus more on which organizations are involved in a label than with the specifics of the label itself. If a label is transparent and includes relevant stakeholders, i.e., if it is based on open- and consensus-based standard-setting, the media are more likely to conclude that the label is beneficial. If a label is generally excellent but not open- and consensus-based, the media appear more likely to mistrust it and to report less favorably than such a label deserves based on its actual merits.

One way to interpret our findings is that the experts and the media both agree that schemes with more external parties involved are better governed. For the experts, this translates into linking quality of governance to governmental control and accreditation of the verifiers; for the media, this translates into open- and consensus-based standard-setting, which requires involvement of external stakeholders such as NGOs. In the broader eco-labeling and standards community, it is common to think in terms of “assurance”: a scheme draws up a set of requirements and then provides assurance that certified firms meet those requirements. Our findings suggest that experts and media are less concerned about the exact requirements or about the assurance itself, but are looking for one or preferably multiple layers of “reassurance” that someone is overseeing the labeling scheme and the firms it has certified.

Contributions and Implications

Our study makes several contributions to theory and practice. First, it contributes to an emerging and increasingly important area of private regulation—that of eco-labels. Previous studies have mainly focused on single labels or single industries, often speculating about the effect of assurance practices on governance of eco-labels. In contrast, our study provides evidence that “re-assurance” practices (governmental control, independent accreditation, and open- and consensus-based standard setting) are the most important practices for eco-labels to be considered well governed. This has important ramifications for practice. For instance, an auditing firm that provides assurance on sustainability reports can rely more on information in the report that has been previously verified by eco-labels with “re-assurance” practices than on information verified by eco-labels without such a re-assurance layer. When choosing between competing labels, managers should consider “re-assurance” practices: their mere presence suggests that an eco-label is more likely to be considered well governed.

Second, we contribute to the environmental and social accountability literature by linking governance to several assurance practices. Scholars like Darnall et al. (2009) and Parker (2005) have argued that the accounting literature recognizes the importance of environmental accountability but still lacks studies on environmental auditing. Darnall et al. (2009) also argued that scholars often view environmental audits as a uniform practice and that a more nuanced view on auditing is necessary. The literature has looked neither at specific assurance practices nor at the credibility of assurance practices of certifiers. Our study suggests that external assurance practices contribute to more effective private regulation, which supports those who have argued for multiple layers of control and oversight in private regulation such as Dranove and Jin (2010). We also add to the studies in the accounting and auditing literature that have looked at various facets of perceived credibility, such as analyst credibility (Chen and Tan 2013) or disclosure credibility (Mercer 2004), by providing a perspective on credibility of assurance practices from an expert and media perspective.

Third, like the environmental and social accountability literature, the voluntary standards literature has also highlighted a need to scrutinize the control mechanisms in private regulation as well as the inconsistencies and variations in the quality of assurance practices (Castka and Balzarova 2008a, b; Heras-Saizarbitoria and Boiral 2012). Some researchers have found evidence that the quality and consistency of third-party auditing varies, even within the same certification scheme (Aravind and Christmann 2011; Boiral 2003), and have used those preliminary findings to argue for more research to explain variation in how private regulation is organized. So far such variation has been mainly explained in the extensive body of literature on public regulation, yet little has been done so far specifically on variations in private regulation (Short et al. 2013). Research so far has also done little to investigate the role of specific assurance practices.

Fourth, our finding that most specific assurance practices have no effect on the experts’ overall assessment of quality of governance has methodological implications for comparative research on eco-labeling and other voluntary standards and auditing schemes. For many types of voluntary standard (not just eco-labels), the assurance practices along which eco-labels and voluntary standards differ may be impossible to measure objectively. It may be tempting to treat, for instance, chain-of-custody as a simple binary variable, but in practice there is a wide range of implementation of such a requirement. Similarly, for other voluntary standards, rather than only trying to quantify the assurance practices of each standard, it may be informative to ask a knowledgeable panel of experts for their assessment.

Finally, we have integrated and made contributions to two streams to literatures, that on environmental and social accountability and that on voluntary standards—literatures that hold important insights for one another, but that continue to develop in a parallel rather than in a more integrated manner (Short et al. 2013).

Limitations

We consider our results to be robust, but also preliminary due to the limitations inherent in this study. Aiming to cover a much wider set of eco-labels than has been done previously in empirical work brings with it several inevitable limitations. It would not have been possible to obtain the (tentative) findings we put forward here without this broader coverage, which in our view legitimizes the compromises we had to make. Further research could build on our findings in various ways. We hope that follow-up studies will test our findings in more controlled settings, for instance by only considering labels within the same sector. To establish causal links, longitudinal data on assurance practices as well as on media coverage would be helpful. This would help overcome the limitation posed by the cross-sectional nature of our data, which masks the fact that competing labels in the same domain have sometimes converged over time; for instance, eco-labels controlled by industry associations have been increasingly adopting “open and consensus-based standards setting” practices, typical for multi-stakeholder and NGO controlled eco-labels. Henson (2011) gives examples of how several schemes in the agrifood sector have entered over time, to emphasize the need to study the dynamics of private governance systems.

By requiring that we have enough experts rating each label, and enough media coverage on each label, we had to exclude many labels. The service www.ecolabelindex.com lists 435 ecolabels (as of January 2013), but for many of these, even that website provides only limited information. Other methods of assessing governance that do not limit the sample in this way would help to increase the number, and therefore the variety, of labels included. Including non-English language media outlets would allow more country-specific labels to be included from non-English speaking nations. Earlier, we noted that adoption of eco-labels may be driven more by retailers than by consumers, so another interesting question to explore further is whether retailers and consumers differ in how they evaluate assurance practices of different eco-labels.

Despite these limitations, we hope that our study, being the first to link assurance practices, governance and media coverage of 41 different eco-labels, is a useful first step toward such future work.