Introduction

Protected areas play a predominant role in the global efforts to conserve biodiversity and natural systems (Visconti et al. 2019). The International Union for Conservation of Nature (IUCN) classifies protected areas into six categories (Table 1; Dudley 2008) ranging from strict nature reserves (Category Ia) to areas where resources can be used sustainably (Category VI). The categories—which are primarily defined by their management objectives (Boitani et al. 2008; Dudley and Stolton 2008)—were designed to reflect a gradient of naturalness and permissible human uses (Dudley 2008). For example, protected areas in Category Ia represent natural areas set aside exclusively for the protection of biodiversity and in which “human visitation and use” is strictly prohibited. Conversely, protected areas in Categories V and VI represent areas with higher levels of human presence and in which local communities are allowed to maintain sustainably many of their nonindustrial activities (Table 1).

Table 1 The six categories of protected areas as defined by the International Union for Conservation of Nature (obtained from Dudley 2008)

The current IUCN classification system has been the result of years of discussions and negotiations (Dudley et al. 2010; Shafer 2015) and is thought to reflect the present-day view on protected areas and their role in conservation and society (Dudley et al. 2010, 2014). Although protected areas were originally conceived as places to be set aside solely for the preservation of biodiversity and natural systems (Shafer 2015), over the decades, this exclusionary approach has been regarded by many as problematic and ineffective (Mallarach et al. 2008; Berghöfer 2010), because of its negative impact on the local communities and the resulting conflicts (West and Brockington 2006; West et al. 2006). Its critics have argued that local communities must have the right to maintain their livelihoods within the protected areas and must not be required to bear the costs of conservation inequitably (West and Brockington 2006; West et al. 2006). In 1994, the IUCN revised and adopted the current classification system (Bishop et al. 2004), which, among other changes, it now included Categories V and VI (Table 1; Dudley 2008).

This development, however, was not welcomed by all stakeholders (Dudley et al. 2010; Shafer 2015); groups of conservationists opposed the new system—and particularly the new categories—on the premises that it lessened the emphasis on conservation (Terborgh 2004; Locke and Dearden 2005; Shafer 2020). In an influential paper published in 2005, Locke and Dearden argued that protected areas belonging to Categories V and VI—such as many of the extractive reserves in Brazil and Canada—have little conservation value and therefore should be reclassified into “Sustainable Development Areas” (and should not count towards official targets, e.g., the Aichi Targets). Conversely, the supporters of the current classification system—and in general of the notion that certain sustainable nonindustrial human activities could be part of some of the protected areas—responded by arguing that all categories are necessary to protect biodiversity (Mallarach et al. 2008), and that nature conservation, which is the primary focus of all of them, takes precedence in instances of conflicting management objectives (Dudley and Stolton 2008; Dudley et al. 2010). Some would even argue that certain human activities—carried out in a sustainable manner—could even benefit biodiversity in some cases. Perhaps an example would be the low-intensity farming that contributes to the preservation of High Nature Value (HNV) farmlands in Europe (Matthews 2014), on which several habitats and species depend for their persistence (Halada et al. 2011; Anderson and Mammides 2020a).

Yet, despite the significance of the topic, and the associated conservation implications, the reality is that until today much of the discussion regarding the relative effectiveness of the IUCN categories has been centered around assertions for which we still lack important knowledge. On the one hand, the arguments against Categories V and VI (often also referred to as “multiple-use areas”) have been based for the most part on the assumption that those areas are less effective because they allow by design higher levels of human presence (Locke and Dearden 2005). On the other hand, the arguments in favor of those areas—and in general of the current IUCN classification system—have been based for the most part on the notion that the categories themselves have little to do with the effectiveness of the protected areas (Phillips 2007; Dudley and Stolton 2008), and that all categories must and can be effective in maintaining their levels of naturalness (Dudley and Stolton 2008; Mallarach et al. 2008).

In the past, both sides have been able to use case studies to support their arguments (Locke and Dearden 2005; Mallarach et al. 2008; Shafer 2020); however, it is unclear to what extent those cases are representative of the IUCN categories in general. Moreover, the findings in the literature regarding the relative effectiveness of the IUCN categories have been inconclusive, making it challenging to decipher potential differences. For instance, while several studies have found that strictly protected areas are more effective than areas in which multiple human uses are permitted (Scharlemann et al. 2010; Joppa and Pfaff 2011; Carranza et al. 2014), others have found the opposite (Nelson and Chomitz 2011; Porter-Bolland et al. 2012; Blackman et al. 2015; Miranda et al. 2016), and yet others that the two types do not differ (Coetzee et al. 2014; Françoso et al. 2015; Wendland et al. 2015; Anderson and Mammides 2020b). Several factors could be driving these contrasting results.

First, studies have used and compared the IUCN categories in dissimilar ways. Although a small percentage of the studies has examined each category separately (e.g., Leroux et al. 2010; Leberger et al. 2020), most studies have compared protected areas by grouping them into areas that are strictly protected and areas in which multiple human uses are permitted; the exact classification, however, varies considerably from study to study. For instance, while many studies classify Categories III and IV as strictly protected (Nelson and Chomitz 2011; Porter-Bolland et al. 2012), others classify them as multiple-use areas (Scharlemann et al. 2010; Jones et al. 2018; Anderson and Mammides 2020b) and yet others classify Category III as strictly protected and Category IV as multiple-use (Seiferling et al. 2012; Françoso et al. 2015). Second, many of the studies have focused on different geographic regions. However, there is increasing evidence that the effectiveness of the protected areas varies considerably across regions (Geldmann et al. 2019; Leberger et al. 2020). Therefore, it is possible that while strictly protected areas are more effective in some regions (Françoso et al. 2015), in others they are not (Butsic et al. 2017).

Third, studies have compared the effectiveness of the protected areas using different indices of human pressure. For example, while many studies have used changes in forest cover or deforestation rates (Porter-Bolland et al. 2012; Françoso et al. 2015; Bebber and Butt 2017), others have used changes in other land-cover types (Leroux et al. 2010; Jones et al. 2018; Anderson and Mammides 2020b), and yet others have used changes in species presence, abundance, or mortality (Gray et al. 2016; Hill et al. 2020). It is possible, however, that some protected areas are effective in mitigating one threat but not another (Leroux and Kerr 2013); consequently, many of the dissimilarities in the findings in the literature could be also due to differences in the indices used to assess the relative effectiveness of the protected areas.

Fourth, studies have compared the effectiveness of the various types of protected areas using different statistical methods. Not all methods, though, are equally appropriate when comparing protected areas. Previous research has shown that the effectiveness of the protected areas is influenced by multiple confounding factors, due to the fact that protected areas are not situated randomly across landscapes (Joppa and Pfaff 2009). Therefore, when comparing protected areas, it is essential to account for such biases. An effective way of achieving this is through the use of quasi-experimental methods, e.g., based on propensity score weighting or matching (Agrawal 2014; Ramsey et al. 2019). However, not all studies have used such methods. Even those that have, they often compare protected areas to areas outside—rather than to each other—and then use the results to evaluate the relative effectiveness of the protected areas. This approach, however, could produce biased results because of the inherent dissimilarities between areas in different IUCN categories (Dudley 2008; Joppa and Pfaff 2009). For example, protected areas in Categories I and II are more likely to be found in remote regions compared to areas in Categories V and VI (Joppa and Pfaff 2009); hence, the human pressure exerted on the various types of the protected areas will differ markedly (Nelson and Chomitz 2011), confounding the comparisons (Pfaff et al. 2014).

Because of all of the above-mentioned limitations, we still lack a clear understanding regarding the relative effectiveness of the IUCN categories—including whether Categories V and VI are indeed less effective than Categories I–IV as it is often assumed (Locke and Dearden 2005). To address this knowledge gap, we conducted two different but complementary analyses. First, we reviewed the literature on the effectiveness of the protected areas, to understand better the reasons behind the disparities in the reported findings. Then, using the World Database on Protected Areas (WDPA), we conducted our own global analysis to assess the extent to which the relative effectiveness of the various types of protected areas differs. Importantly, we designed our analysis in a way that addresses the four limitations described above: (a) we grouped protected areas into strictly protected and multiple-use areas using the two most common methods in the literature, to assess if results vary according to how the categories are grouped; (b) we ran the analysis separately for each of the world’s six major biogeographic realms (Olson et al. 2001; Fig. 1), to take into account any possible regional differences; (c) we measured the relative effectiveness of the protected areas using two different indices of human pressure, to assess if results vary depending on the index used; and (d) we measured the relative effectiveness of the protected areas using two quasi-experimental methods, and by comparing the protected areas to each other (rather than to areas outside). Moreover, we assessed also the relative effectiveness of the protected areas with no IUCN category. These areas are often excluded from similar analyses (e.g., Leberger et al. 2020); yet, approximately one-third of the world’s protected areas has no category assigned or reported (WDPA; October 2018 version) and therefore it is essential we understand if their effectiveness differs systematically from the rest of the protected areas.

Fig. 1
figure 1

Map showing the six biogeographical realms used in the analysis as well as the percentage of protected areas within each IUCN category in each realm

Methods

Literature review

We used the Web of Science and Google Scholar to search for peer-reviewed studies in which authors had evaluated the effectiveness of protected areas and had included also a comparison of the IUCN categories. For practical reasons, we considered only studies published in English. To identify those studies, we used the following search terms: [“Protected areas” OR “Nature reserves” OR “National parks”] AND [“IUCN Categories” OR “Strictly protected areas” OR “Strict reserves” OR “Multiple-use areas” OR “Multi-use areas” OR “Category V” OR “Category VI”]. Additionally, we searched the reference list of each identified study (Table S1) to find other relevant studies, which we may have missed during our initial search. Since our objective was to assess the relative effectiveness of the various types of protected areas, we did not consider studies that had evaluated protected areas but had combined all categories together, or studies that had focused on only one category (e.g., case studies on particular areas). For each study included in the analysis, we noted the following information: (a) country (or countries) in which the analysis was conducted; (b) index used to measure the effectiveness of the protected areas; (c) method used to group protected areas; and (d) main result, i.e., which type of areas was found to be more effective (Table S1).

Quasi-experimental analyses

As we mention in the introduction, protected areas—particularly strictly protected areas—tend to be found in places that are often less useful for other human uses, e.g., landscapes on higher elevations and steeper slopes (Table S2) (Joppa and Pfaff 2009). Hence, to be able to evaluate correctly the relative effectiveness of the different types of protected areas, it is essential to control for these confounding factors (Joppa and Pfaff 2009). Although this can be potentially achieved by adding the confounding factors into the analyses as covariates, this approach is perhaps not ideal because often there is little overlap in the distribution of those factors among the areas compared (Ramsey et al. 2019). An alternative approach—which is more appropriate in the case of protected areas (Geldmann et al. 2019)—is to use a quasi-experimental method (described in more detail in “Propensity score weighting” and “Matching” sections), to create, for example, a counterfactual control group (Geldmann et al. 2019). Although such methods have been frequently used to compare protected areas to areas outside (Joppa and Pfaff 2011; Geldmann et al. 2019; Mammides 2020b), they have been rarely used to compare the various types of protected areas to each other. Yet, the reasons that make them necessary in the first case—i.e., the presence of the confounding factors due to the non-random distribution of the protected areas—are also applicable in the second, due to the inherent dissimilarities between the IUCN categories (Dudley 2008; Joppa and Pfaff 2009; Table S2).

For the purposes of this study, we used two quasi-experimental methods—i.e., propensity score weighting (“Propensity score weighting” section; Mccaffrey et al. 2013) and matching (“Matching” section; Ho et al. 2011)—to compare the relative effectiveness of the three types of protected areas: (1) strictly protected areas; (2) multiple-use areas; and (3) areas with no IUCN Category. Quantifying the effectiveness of protected areas can be a challenging and complicated task because different areas can serve different conservation purposes (even when belonging to the same IUCN category). Ideally, each protected area should be evaluated against its specific conservation target(s). However, such detailed analysis is not possible on a global scale (as this information is rarely available). Hence, for the purposes of this study, we adopted a more general definition of “effectiveness”, which is in line with the guidelines of the IUCN (Dudley 2008) and is also commonly used in other studies (e.g., Jones et al. 2018; Geldmann et al. 2019; Anderson and Mammides 2020a, b). Specifically, when comparing the relative effectiveness of the various types of protected areas, we considered an area to be more effective if it has maintained its levels of naturalness during the period examined, i.e., if it has experienced lower or no increases in human pressure, measured using the human footprint index (Venter et al. 2016b) and the change in forest cover (Hansen et al. 2013; Heino et al. 2015).

When comparing protected areas, we controlled for key confounding variables that have been shown in the literature to determine the levels of human pressure within protected areas: (1) their size; (2) elevation; (3) slope; (4) distance to the nearest major city; and (5) initial levels of human pressure at the beginning of the period examined (Joppa and Pfaff 2009, 2011; Nelson and Chomitz 2011; Jones et al. 2018). Following previous studies (Geldmann et al. 2019), we ran the analysis separately for each of the world’s six major biogeographic realms (Fig. 1); biogeographic realms represent regions with shared biogeography (Olson et al. 2001).

Propensity score weighting

First, we measured the relative effectiveness of the protected areas using generalized boosted models (Mccaffrey et al. 2013) and the “twang” package in R (Burgette et al. 2017). Unlike other quasi-experimental methods—which can only handle treatments with binary outcomes—generalized boosted models can be used in cases where there are more than two treatment levels (Burgette et al. 2017). Hence, this method was appropriate for our study in which we compared strictly protected areas to multiple-use areas and areas with no IUCN category. Generalized boosted models use propensity scores to weight the relative importance of each sample in the dataset (i.e., each protected area in our case), based on the distribution of the confounding variables (i.e., size, elevation, slope, etc.). Samples that are more dissimilar to the samples to which they are being compared are assigned a lower weight to reduce the differences between the treatment groups (and to make the samples more comparable).

To assess whether strictly protected areas were more effective than multiple-use areas and/or areas with no IUCN category, we calculated the “Average Treatment Effects on the Treated” (ATT) (Mccaffrey et al. 2013); in other words, we calculated the average change in human pressure within those two types of protected areas had they been assigned a strict protection status (Burgette et al. 2017). Whenever propensity scores are used to weight samples, it is important that the resulting balance is assessed (Keller and Tipton 2016); the balance is a measure of “the degree of overlap in the distributions of the confounding variables among the treatment groups” (Ramsey et al. 2019); the higher the overlap the more comparable the treatment groups can be considered (Ramsey et al. 2019). A common way of assessing the balance, which we also used in this study, is by calculating the mean standardized difference between the treatment groups for each of the confounding variables (Mccaffrey et al. 2013; Ridgeway et al. 2017).

Matching

In addition to the generalized boosted models, we also compared the relative effectiveness of the protected areas using the matching method (Keller and Tipton 2016). Unlike the weighting procedure—which uses all available samples by first assigning them a weight—matching uses only those samples that are most similar to each other (Ho et al. 2011). Matching, however, can be only used to compare two treatment levels; hence, for this part of the analysis, we only compared strictly protected areas to areas in which multiple human uses are allowed. We used the “matchit” package in R (Ho et al. 2011) to match strictly protected areas to multiple-use areas, while controlling for the same confounding variables we used in the generalized boosted models. There are several algorithms that can be used to match samples (Ho et al. 2011); a standard approach is to test more than one algorithm and choose the one that achieves the best balance (Schleicher et al. 2019b). In our case, that algorithm was the “nearest neighbor” (using the “mahalanobis” distance; Ho et al. 2011). As before, we assessed the balance using the mean standardized differences between the treatment groups (Ho et al. 2011).

Data collection

We retrieved the spatial boundaries of the terrestrial protected areas from WDPA (October 2018 version), available at protectedplanet.net. Following previous studies (Jones et al. 2018), we removed areas not yet established (i.e., those with a “proposed” status) and areas < 5 km2 to avoid errors that could result from the resolution of the human pressure data (i.e., 1km2; Heino et al. 2015; Venter et al. 2016a). In addition, following the best practice guidelines, developed by the curators of WDPA (and available at http://protectedplanet.net/c/calculating-protected-area-coverage), we removed the UNESCO Man and Biosphere Reserves, since their buffer and transition zones are in most cases not protected areas. Lastly, a subset of the protected areas in WDPA are represented by multiple polygons, which overlap partly in some cases; therefore, to avoid overestimating the size of those protected areas we dissolved overlapping polygons (also in accordance with the best practice guidelines). When the overlapping polygons belonged to a different IUCN category (in approximately 5% of the cases), we assigned the strictest category to the whole protected area (Jones et al. 2018).

We grouped protected areas into strictly protected and multiple-use areas using the two most common methods in the literature (Table S1). First, we classified Categories I–IV as strictly protected and Categories V–VI as multiple-use. Then, we classified only Categories I-II as strictly protected and the rest of the categories (III–VI) as multiple-use. Since one-third of the world’s protected areas have no IUCN category, we added a third group to our analyses, which we named “areas with no IUCN category”. This group included all protected areas for which the IUCN category was specified as: “not applicable”, “not assigned”, or “not reported” (85% of the areas belonged to the last category). We identified each protected area’s biogeographical realm (Olson et al. 2001; Fig. 1) using the data by the World Wildlife Fund for Nature available at www.worldwildlife.org/publications/terrestrial-ecoregions-of-the-world.

We measured the change in mean human footprint within each protected area using the data developed by Venter et al. (2016a). The human footprint index is a composite measure of the human pressure on natural systems across the whole globe at a resolution of 1 km2 (Venter et al. 2016a). Recent studies have shown that increases in human footprint correlate with increases in animal extinction risk (Di Marco et al. 2018) and reduced animal movement (Tucker et al. 2018). The index is based on the following eight human pressures: (1) built-up areas; (2) intensively farmed crop land; (3) pasture land; (4) human population densities; (5) night-time lights; (6) railways; (7) roads; and (8) navigable waterways (Venter et al. 2016a). Each pressure is first standardized according to its relative impact on natural systems (Sanderson et al. 2002) and then summed to obtain a cumulative score ranging from 0 to 50 (Venter et al. 2016a). The index is available for two different years, i.e., 1993 and 2009, and hence suitable for estimating the changes in human pressure within protected areas during the specific 16-year period (Venter et al. 2016b). To avoid any temporal mismatches between the datasets used, we only included in the analysis protected areas established in 1993 or earlier (19 486 protected areas in total; Table 2). A small proportion of the protected areas in WDPA lack an establishment year; following previous protocols (Jones et al. 2018), we randomly assigned to those areas an establishment year based on the years of the rest of the protected areas in their country.

Table 2 Number of protected areas used in the analysis within each IUCN category and realm (n = 19 486). NC = protected areas with no IUCN category

We measured the loss in forest cover within each protected area using the data by Heino et al. (2015). The data are based on the analysis of Hansen et al. (2013) and represent the total change in the percentage of tree cover between the years 2000 and 2012 (also at a resolution of 1 km2); Hansen et al. (2013) defined trees as “all vegetation taller than 5 m in height”. To ensure that the results from the two indices were as comparable as possible, we kept in both analyses the same group of protected areas (i.e., only those established in 1993 or earlier). For each protected area in our dataset, we also measured: (a) its size, in km2; (b) its mean elevation and slope, in meters and degrees respectively (using the data by Amatulli et al. (2018) available at a resolution of 1 km2); and (c) its distance to the nearest major city, in km (using the World Cities dataset made available by the Environmental Systems Research Institute (ESRI) at hub.arcgis.com/datasets).

Results

Literature review

In total, there were 36 studies that had measured the effectiveness of the protected areas and had included also a comparison of the IUCN categories (Table S1). In many of those studies, researchers were primarily interested in understanding whether protected areas were effective at reducing human pressure; the IUCN categories were used as a possible explanatory factor.

Eighteen of the studies were conducted at the global level, while the rest had focused on protected areas in a specific region (n = 18; Table S1). Of those, nine of them focused on protected areas in Latin America, four in North America, two in Europe, two in Asia, and one in Africa. The most common index used to compare the effectiveness of the protected areas was change in forest cover (Table S1). Several studies had used satellite imagery to measure those changes, while others had relied on the data made available by Hansen et al. (2013). Other indices used to assess the effectiveness of the protected areas included the human footprint index (n = 3), and data on species richness, abundance, and mortality rate (n = 4); one study had used data on forest fires as a proxy of anthropogenic disturbance and another had used illegal hunting pressure (Table S1).

Of the 36 studies reviewed, only ten had compared the IUCN categories separately; the rest of the studies (n = 26) had classified the categories into groups, which corresponded to areas that are strictly protected and areas in which multiple human uses are permitted (Table S1). Although all but two of the studies had classified Categories I and II as strictly protected and all but four had classified Categories V and VI as multiple-use, there was substantial variation in terms of how Categories III and IV were classified: 14 studies classified Categories III and IV as strictly protected, while eight studies classified them as multiple-use. Moreover, two studies classified Category III as strictly protected and Category IV as multiple-use (Table S1).

Thirteen of the studies concluded that strictly protected areas were more effective than multiple-use areas, while six studies concluded the opposite (Table S1). The rest concluded that there was no obvious pattern between the effectiveness of the protected areas and their type (n = 7). Of those that concluded that strictly protected areas were more effective, several had focused on protected areas in Latin America (n = 5), although five of them had included protected areas from across the globe (Table S1). Their analyses were for the most based on deforestation rates (although some of them had used other indices as well; Table S1). Of the six studies that had concluded that multiple-use areas were more effective, three had focused on protected areas in Latin America, two on protected areas globally, and one on protected areas in Europe; their analyses were also based for the most part on deforestation rates (n = 5). Lastly, of the studies that found no significant difference between the two types of protected areas, most had used protected areas globally and a wider range of indices (Table S1). Overall, the number of protected areas included in the reviewed studies ranged from 12 to almost 200 000 (median and mean = 788 and 13 216 respectively; n = 27 studies that had mentioned sample sizes).

Quasi-experimental analyses

Both methods, i.e., weighting and matching, produced similar results leading to the same conclusions; therefore, in this section we only present the results of the weighting method because it also included areas with no IUCN category. The results of the matching method are presented in the Supplementary Materials (Table S5; Figs. S1S4). Both methods reduced substantially the mean standardized differences between the confounding factors, resulting into more balanced samples (Figs. S2S12).

Overall, the relative effectiveness of the protected areas varied considerably (Table 3), even within the three types of protected areas, depending on: (a) the geographic realm examined, (b) the method used to group areas into strictly protected and multiple-use, and (c) the index used to measure their effectiveness (Fig. 2). That said, on several occasions the human pressure had on average increased more within multiple-use areas and/or areas with no IUCN category (Table 3). However, the differences between those areas and strictly protected areas were for the most part small and statistically not significant, i.e., < 1 for the human footprint index and < 5% for the loss in forest cover (Table 3; Figs. 3 and 4). It is worth noting here that statistical significance is not only determined by the size of the effect, but also by the sample size, i.e., the number of protected areas within each realm (Table 2) (Wasserstein et al. 2019). In some of the realms, e.g., the Nearctic, there were more protected areas, and hence a smaller-sized effect could be detected there; the p-values of our analyses (Tables S3, S4) should be interpreted in light of this fact (Dushoff et al. 2019; Wasserstein et al. 2019).

Table 3 Results of the generalized boosted models showing the relative effectiveness of multiple-use areas and areas with no IUCN category (NC) when compared to strictly protected areas. A positive value indicates higher human pressure, while a negative value indicates the opposite. For example, the percentage of forest cover lost within multiple-use areas in the Afrotropical realm—when strictly protected areas were defined as Categories I–IV—was on average higher by 0.2%
Fig. 2
figure 2

Change in human footprint (a) and loss in forest cover (b) within each IUCN category and each realm based on the raw data prior to weighting or matching. A positive value indicates an increase in human pressure, while a negative value indicates a decrease. NC = protected areas with no IUCN category. Outliers represent values smaller than Q1–1.5 × IQR or larger than Q3 + 1.5 × IQR, where Q1 and Q3 are the first and the third quartiles, and IQR is the interquartile range (i.e., Q3 − Q1)

Fig. 3
figure 3

Change in human footprint within: a strictly protected areas (I–IV), b multiple-use areas (V–VI), and c areas with no IUCN category in each realm (during the years 1993–2009). Boxplots are based on the results of the generalized boosted models. An asterisk indicates that the comparison to strictly protected areas in that realm was statistically significant. Significance levels: *p < 0.05, **p < 0.01, ***p < 0.001

Fig. 4
figure 4

Percentage of forest cover lost within: a strictly protected areas (I–IV), b multiple-use areas (V–VI), and c areas with no IUCN category in each realm (during the years 2000–2012). Boxplots are based on results of the generalized boosted models. An asterisk indicates that the comparison to strictly protected areas in that realm was statistically significant. Significance levels: *p < 0.05, **p < 0.01, ***p < 0.001

Strictly protected areas vs. multiple-use areas

When strictly protected areas were defined as Categories I-IV, the differences in the human footprint between strictly protected and multiple-use areas were always ≤ 0.3 (mean = 0.2; Fig. 3) and not statistically significant (Table 3). In fact, in three out of the six realms—the Afrotropical, the Nearctic, and the Palearctic—the human footprint had increased more within strictly protected areas (Table 3). By contrast, when the same areas were compared using forest cover, in all but one of the realms (the Australasian), the loss in forest cover was higher within multiple-use areas (Fig. 4). However, the differences remained minor, i.e., < 4% (mean = 1.3%; Table 3) and were statistically significant in only two of the realms: the Indomalayan and the Neotropical (2.1% and 0.8% respectively; Table 3).

When strictly protected areas included only Categories I-II, the increases in human footprint were in all but one of the realms (the Afrotropical) higher within multiple-use areas (Table 3). However, as before, the differences were small (mean = 0.3; Table 3) and were statistically significant for only three of the realms: the Australasian (0.4), the Indomalayan (0.8), and the Neotropical (0.3). Likewise, the loss in forest cover was higher within multiple-use areas in four of the six realms (Table 3), but statistically significant in only three of those: the Indomalayan (1.8%), the Neotropical (0.8%), and the Palearctic (0.9%). Overall, the differences in the percentage of forest cover loss within these two types of protected areas (i.e., Categories I-II vs. Categories III-VI) was always < 2% (mean = 1.5%) across all realms. The only exception was the Australasian realm; there, forest cover loss was lower within multiple-use areas (− 4.3%), showing an opposite pattern to what was found using the human footprint index (Table 3).

Strictly protected areas vs. areas with no IUCN category

When strictly protected areas were compared to areas with no IUCN category, the increases in human footprint were consistently higher within the latter type, regardless of how strictly protected areas were defined. However, as it was the case with multiple-use areas, the differences between strictly protected areas and areas with no IUCN category were for the most part minor, i.e., ≤ 0.4 (Fig. 3) and not statistically significant (Table 3). There were two exceptions, though: (a) in the Australasian realm, the increase in human footprint within areas with no IUCN category was substantially higher (i.e., by 1.8), regardless of how strictly protected areas were defined, (b) in the Indomalayan realm, the corresponding increase was also considerably higher (by 1.4), but only when strictly protected areas included Categories I-II (and not Categories III-IV).

A somewhat different pattern was found when the loss in forest cover was used to compare strictly protected areas to areas with no IUCN category (Table 3). Although in some of the realms the loss was higher within the latter group (e.g., in the Afrotropical and the Neotropical realms), in some other realms the loss was actually higher within strictly protected areas (e.g., in the Australasian). However, the differences were always ≤ 6.2% (mean = 2.8%).

Discussion

Based on the above results, it can be concluded that strictly protected areas are not necessarily more effective in reducing human pressure as it is often assumed (Locke and Dearden 2005; Leroux et al. 2010). In fact, the relative effectiveness of the protected areas varies extensively between and within geographic regions, and depending also on the index used to measure human pressure (Figs. 3 and 4). This pattern was not only evident in our review of the literature but also our own standardized global analysis (Table 3), which was based on multiple regions and indices of human pressure and a large number of protected areas (> 19 000).

Possible reasons for the dissimilarities reported in the literature

Many of the dissimilarities reported in the literature are likely due to the differences in the methods used. For instance, the fact that researchers group protected areas into strictly protected and multiple-use areas using different methods is undoubtedly influencing the results. Ideally, the IUCN categories should be evaluated separately—because they each possess unique characteristics, e.g., in terms of their permitted levels of human presence and management objectives (Tables 1 and S2) (Dudley et al. 2010; Leberger et al. 2020). However, in many studies, like ours, in which the analytical methods used require larger samples sizes than what is currently available, such independent and detailed comparisons will not always be possible. Hence, grouping protected areas into meaningful types will often be necessary and it appears to be a common practice in the literature (Table S1). However, researchers must be aware that the method they chose to group protected areas will influence their findings; hence, our recommendation is to use more than one method, where necessary and applicable, in order to evaluate the robustness of the results.

Another dissimilarity in the methods that is likely affecting the results concerns the statistical methods employed to compare the various types of protected areas. While it has been repeatedly demonstrated that the effectiveness of the protected areas is influenced by multiple confounding factors (Joppa and Pfaff 2009)—and hence better assessed using a quasi-experimental approach (Geldmann et al. 2019; Schleicher et al. 2019b)—only some of the reviewed studies have used such methods, and only rarely to compare the different types of protected areas to each other. We recommend that whenever researchers are interested in making inferences regarding the relative effectiveness of the protected areas, they use analytical approaches that take into account the inherent differences between the IUCN categories (Tables 1 and S2) (Dudley 2008).

Differences revealed by the quasi-experimental analyses

Not all the dissimilarities in the literature, however, are due to methodological issues; some are likely reflecting real differences in the effectiveness of the protected areas across the various geographic regions (Leberger et al. 2020). Our own standardized global analysis showed that the relative effectiveness of the protected areas varied considerably across the six realms (Table 3). To a large extent, this was because the overall levels of human pressure also varied across the realms (Geldmann et al. 2019; Anderson and Mammides 2020b). In realms in which the human pressure had not increase substantially (Figs. 2 and 3), the differences between the IUCN categories were expectedly low. For instance, in the Palearctic realm, the increases in the human footprint within the protected areas were overall low (and even negative in many cases; Fig. 2); hence, it was not surprising that in that particular realm there were no substantial differences between strictly protected areas and multiple-use areas (Table 3). In contrast, in the Neotropics and the Indomalayan realms—where natural systems have been under increasing human pressure (Jones et al. 2018; Geldmann et al. 2019)—the differences between the protected areas were larger (Table 3); in those realms, strictly protected areas had on average lower rates of deforestation (Fig. 4) and smaller increases in human footprint (Fig. 3). This was especially true when strictly protected areas included only Categories I and II (and not Categories III-IV). Considering that these realms harbor some of the world’s most biodiverse areas (Olson et al. 2001), it is worth investigating further the possible reasons behind these patterns and why the effectiveness of the protected areas in those realms might differ.

Interestingly, we did not find the same pattern for the protected areas in the Afrotropical realm, in which the human pressure has been also increasing overall (Figs. 2 and 3). In fact, in the Afrotropical realm, the human footprint had increased less within multiple-use protected areas (albeit the difference was not statistically significant regardless of how the areas were grouped; Table 3). A similar finding was also reported by Leberger et al. (2019), but only for West Africa (and not necessarily for the rest of the continent). It should be noted, though, that the results for this particular realm (and for Africa in general) must be interpreted cautiously because > 84% of the protected areas in that region have no IUCN category (Table 2; Fig. 1). Consequently, the sample size available for comparing the protected areas in that region is particularly small. In general, the uneven distribution of the IUCN categories between and within the various regions (Table 2; Fig. 1) is an important limitation that needs to be considered carefully whenever researchers incorporate the IUCN categories into their analyses. Moreover, we caution against pooling protected areas at the global level and making inferences about the categories based on that sample because the results will be largely driven by the dissimilarities between the regions rather than the categories themselves. For example, 60% of the protected areas in Category V in our dataset were found in the Nearctic and Palearctic realms (Table 2), in which the human pressure had increased on average the least (Figs. 2 and 3).

An interesting pattern in our results was that in some of the realms—e.g., the Indomalayan, the Neotropical, and the Palearctic—it was more likely to find a statistically significant difference between strictly protected areas and the other two types of protected areas, when Categories III and IV were classified as multiple-use areas (along with Categories V and VI) rather than as strictly protected (Table 3). This pattern suggests that perhaps the largest differences in the relative effectiveness of the protected areas lie between Categories I-II and the rest of the categories, rather than Categories V–VI on which much of the debate regarding the protected areas has focused (Locke and Dearden 2005; Dudley et al. 2010; Shafer 2015).

Variations within each IUCN category

It must be recognized, though, that even those protected areas, which belong to the same IUCN category, they are likely to differ noticeably within and between regions (Phillips 2007). Countries across the globe have their own definitions of protected areas—and management objectives—which do not always match those specified by the IUCN (Table 1) (Dudley 2008; Muñoz and Hausner 2013). For example, Australia alone has more than six types of protected areas, ranging from “Aboriginal Areas” to “Flora Reserves” and “State Conservation Areas”. In the European Union, the majority of the protected areas, called Natura 2000 sites (> 27 800), are designated under a regional system, which classifies areas into “Sites of Community Importance” and “Special Protection Areas” (European Commission 2016); the management objectives (and the restrictions) associated with those two types of protected areas are not necessarily always in line with those set by the IUCN (Muñoz and Hausner 2013). In China, many of the protected areas are managed under a zoning system: while human visitation and use are strictly prohibited within the core zone of the protected areas, multiple types of human uses are actually permitted within their experimental and buffer zones. Consequently, different zones within the same protected area could represent different IUCN categories (Dudley 2008).

On a related note, it is also important to recognize that the IUCN classification system is for the most part voluntary. At the moment, there is no mechanism in place to verify that the protected areas meet the conditions of the IUCN category to which they are assigned (Dudley 2008; Leroux et al. 2010). To give an example, an assessment of the protected areas in Madagascar showed that many of those listed in Category V do not in fact meet the conditions of the particular category—mainly because of higher levels of human presence (Gardner 2011). Given these discrepancies, it is important that any analyses or policy recommendations centered around the IUCN categories, considers and acknowledges the potential mismatch between the protected areas and their assigned IUCN category (Gardner 2011; Muñoz and Hausner 2013). Moreover, we would argue that the current IUCN classification system would be more useful and effective if there was a process in place to verify the category of each protected area (Dudley 2008).

There are two caveats to consider when interpreting the results of our study. First, our literature review was based only on peer-reviewed studies that were published in English in international journals. That said, we have no reason to believe that results published in other languages in regional journals would have resulted into different conclusions; yet, this is something to be confirmed in future studies. Second, although the two indices we used in our analysis capture many of the human pressures exerted on the protected areas worldwide, they do not capture a few others, which also affect biodiversity. For example, overexploitation and invasive species are well-known threats within protected areas worldwide (Schulze et al. 2018). Unfortunately, it is not possible to know to what extent those pressures vary across the IUCN categories; it may be, for instance, that strictly protected areas are more effective in preventing the overexploitation of natural resources compared to multiple-use areas. This possibility, i.e., that other human pressures could potentially lead to different conclusions regarding the relative effectiveness of the protected areas must be researched further in the future.

Conclusions

Although the IUCN categories were originally devised as an international tool “to help communications and reporting on protected areas” (Bishop et al. 2004), they are being increasingly used to design and implement conservation policies (Sheppard 2008; Dudley et al. 2010). Considering the growing importance of the categories, it is crucial we understand whether their relative effectiveness differs. It is often assumed that protected areas that permit higher levels of human presence—particularly those in Categories V and VI—are less effective at curbing human pressure (Locke and Dearden 2005). This assumption has fueled a long and yet unresolved debate regarding the conservation value of those areas (Shafer 2015, 2020). Our findings—based on our review of the literature and the global analysis—suggest that there is no robust evidence to support this assumption; we found no strong relationship between the effectiveness of the protected areas and their assigned IUCN category. On the contrary, the differences between the various types of protected areas were for the most part small and statistically not significant. Although it is true that the effectiveness of the protected areas worldwide varies (Geldmann et al. 2019; Anderson and Mammides 2020b), other factors, besides their assigned IUCN category, are likely to be responsible for this pattern (Mammides 2020a). For example, previous studies have suggested that socio-economic factors such as human population densities, extent of agriculture (Mammides 2020a), financial resources available (Watson et al. 2014; Coad et al. 2019) and management efficacy (Schleicher et al. 2019a) are more important in determining the effectiveness of the protected areas. Stakeholders interested in improving protected areas should perhaps focus on those factors rather than the categories themselves.

In closing, our finding that multiple-use areas are not necessarily less effective than strictly protected areas has important policy implications: it suggest that governments worldwide could still achieve many of their conservation goals and obligations (Chandra and Idrisova 2011), without necessarily prohibiting all human activities within protected areas, and hence, without all the associated social impacts that the more restrictive conservation approaches can sometimes have on the local communities (especially in the less developed regions, in which local people depend more on natural resources for their subsistence and other needs; West et al. 2006; West and Brockington 2006). Although it is true that strictly protected areas will still be necessary in many cases in order to successfully protect biodiversity (Dudley 2008), in other cases, multiple-use areas could be established effectively without compromising conservation efforts (Mallarach et al. 2008).