Introduction

The nonprofit sector has been increasingly acknowledged by scholars and practitioners as a vital and indispensable component of a nation’s economic and social affairs. The nonprofit community serves fundamental social needs, such as caring for the most vulnerable people, educating youth and adults, organizing for social change, and providing spiritual fulfillment. In addition, the nonprofit sector is often the source of innovative solutions to pressing public problems and a major player in how public policy is developed and implemented. Over the past several decades, the nonprofit sector has grown in size, sophistication, and impact in various countries all over the world, including Australia (Lyons 1993), Brazil (Marchesini da Costa 2016), Japan (Kanaya et al. 2015), Netherlands (Burger and Veldheer 2001), South Korea (Kim and Kim 2015), Spain (Marcuello 1998), and USA (McKeever 2015).

Despite steady development, the growth pattern of the nonprofit sector and the distribution of nonprofit activities differ significantly across communities, localities, and countries. For example, in the USA, the northeastern part of the country has higher nonprofit density, while the South and the Rocky Mountain areas have a relatively smaller nonprofit sector (Liu 2016). Internationally, nonprofit sector employment (in terms of paid and volunteer workers as a percentage of total national workforce) across countries also varies from a low of 0.9% to a high of 12.7% in Israel, with countries such as Brazil, Kyrgyzstan, and Czech Republic grouped toward the low end of the spectrum and Australia, Belgium, and New Zealand toward the higher end (Salamon et al. 2013). Indeed, this variation of nonprofit sector size across geographic locations has been documented by many studies, including Bielefeld and Murdoch’s (2004) study of education and human service nonprofits in U.S. metropolitan areas, Grønbjerg and Paarlberg’s (2001) study of Indiana nonprofit sector, James’s (1987) international comparison of educational nonprofit sector, Kim and Kim’s (2016b) examination of OECD countries, Marchesini da Costa’s (2016) study of nonprofits across Brazilian municipalities, Marcuello’s (1998) study of nonprofits in Catalonia, Spain, and Okten and Osili’s (2004) study of community organizations in Indonesia, just to name a few.

To account for the variation in nonprofit activities and growth pattern across geographic locations, over the past several decades a number of theories on nonprofit formation and behaviors have been proposed from diverse disciplines and perspectives (e.g., Ben-Ner and Van Hoomissen 1991; DiMaggio and Anheier 1990; Hansmann 1980; Lecy and Van Slyke 2013; Rose-Ackerman 1996; Salamon 1987; Salamon and Anheier 1998; Steinberg 2006; Weisbrod 1988). This body of theories greatly contributes to our understanding of the nature and roles of nonprofits in society. Among them, one of the most influential theories is the government failure theory developed by Economist Weisbrod (1975) to explain why nonprofits exist in a mixed-sector economy.Footnote 1 The basic model of the theory, the demand heterogeneity hypothesis, argues that in heterogeneous communities where citizens’ tastes for public goods are more diverse than what median voters prefer, there will be more nonprofits established to provide public goods to satisfy the demand unmet by government provision. The level of nonprofit activities (hereafter referred to as “nonprofit sector size”) in a geographic location is thus determined by the population heterogeneity of the location. The more diverse the population in a community is, the larger the nonprofit sector we could expect.

Since its inception, Weisbrod’s demand heterogeneity hypothesis has been widely tested and extended (e.g., Corbin 1999; DiMaggio and Anheier 1990; Kingma 2003; Lecy and Van Slyke 2013; Steinberg 2006). Interestingly, this body of literature reaches mixed conclusions about the relationship between population heterogeneity and nonprofit sector size. Although some studies find the expected positive association between the two variables (e.g., Corbin 1999; James 1987; Joassart-Marcelli 2013; Lu 2016; Matsunaga et al. 2010; Wolch and Geiger 1983), many other studies report null or even negative association (e.g., Abzug and Turnheim 1998; Grønbjerg and Paarlberg 2001; Kim and Kim 2016a; Marchesini da Costa 2016; Marcuello 1998; Salamon and Anheier 1998; Van Puyvelde and Brown 2016). These inconsistent findings, although they provide a rich examination of the demand heterogeneity hypothesis from different lenses, cause difficulty in providing a coherent assessment of the demand heterogeneity hypothesis and a consistent knowledge base to guide future nonprofit studies. There is a scholarly need to make sense of this body of existing studies and draw a generalized conclusion from competing findings.

The present study employs a meta-analysis technique to aggregate existing quantitative tests of the demand heterogeneity hypothesis. Through a quantitative systematic review of 37 existing studies with a total of 491 effect sizes, we find a significant and positive association (a correlation coefficient r between .020 and .147) between population heterogeneity and nonprofit sector size in a geographic location. Population heterogeneity tends to be positively related to nonprofit formation and growth—basically in line with the demand heterogeneity hypothesis—but the magnitude of the relationship seems to be substantially small. Further, this relationship seems to be generalized across countries. On the research design side, the demand heterogeneity hypothesis is supported by both within-country and cross-country data and by different measurements of nonprofit sector size. In addition, some dimensions of population heterogeneity (including age, education, ethnicity, language, and religion) seem to support the demand heterogeneity hypothesis better than other dimensions (including employment status, gender, household location, income, and origin). Overall, although the demand heterogeneity hypothesis holds empirically, its explanatory power in predicting nonprofit sector size and growth might be less robust. Future studies might be cautious when applying the demand heterogeneity hypothesis to explain the size variation of the nonprofit sector.

The remaining sections are arranged as follows. The second section reviews Weisbrod’s government failure theory and its demand heterogeneity hypothesis in particular. Next, we detail the procedures of the meta-analysis in section three. Section four presents the results of the meta-analysis. Finally, we conclude the paper in section five with a discussion of the implications of the findings as well as the limitations of the meta-analysis and existing studies.

Weisbrod’s Demand Heterogeneity Hypothesis Revisited

Weisbrod’s (1975, 1977a, 1986, 1988) government failure theory has been widely acknowledged as a cornerstone in the literature on the origins of nonprofit organizations. The key question that Weisbrod attempted to answer is why nonprofits exist in market economies. Holding an economic lens, Weisbrod started with the notion of government as a response to market failure in providing public goods because of problems such as free riders. He explained that government can also fail to provide sufficient public goods to meet citizen needs in all circumstances, largely because of the tension between demand heterogeneity and median voters. As a society grows more and more demographically diverse, the diversified population would ask for diverse public goods, in terms of quality, style, level, etc. As such, citizens would expect government to be more responsive to the needs of these diverse groups and ideally tailor public goods to their diversified demands.

However, this request creates a dilemma for government. On the one hand, government cannot respond to the preferences of all the constituent groups in local communities. Often, these diversified goods are only desired in accordance with local preferences and consumed in local communities. Unless a particular set of collective goods is demanded by a sizable political constituency, these local collective demands might not be large enough to be addressed by government intervention as a public policy issue. The government’s bureaucratic process and accountability environment would also limit government response to citizen needs (Wilson 2000). On the other hand, probably more fundamentally, the quality and quantity of government provision in a democratic society is decided by a majority of voters through the political voting process. As a result, the majority tends to choose the level of government provision that median voters prefer. Public officials and political leaders, to maximize their chances of re-election, will ensure that public service provision is at the level that most voters prefer, leaving other service needs unsatisfied in the public choice process (Buchanan and Tullock 1962). In this way, “when demand is diverse, though, whatever quantities and qualities of services government provides will oversatisfy some people and undersatisfy others” (Weisbrod 1988, p. 25), resulting in government failure in meeting the needs of all citizens.

Under this circumstance, Weisbrod argued that nonprofit organizations, supported by charitable donations and voluntary activities from individuals who share the same local public interests, can function as “extragovernmental providers of collective-consumption goods” (Weisbrod 1977a, p. 59) to meet the residual and particularistic service needs.Footnote 2 Indeed, several nonprofits’ defining features might better position them to act as a supplement to government provision. As grassroots community-based organizations, nonprofits are embedded in local communities to varying degrees and thus have better understanding of local needs. Additionally, compared to government, nonprofit operations are subject to less bureaucratic control and political oversight and thus could be more flexible and efficient in responding to social and community issues (Douglas 1987). In sum, nonprofits seem to be more readily to tailor goods to local diversified needs. In this way, the existence of nonprofits can be justified as “an alternative to the private-sector provision of the private-good substitutes for collective goods” (Weisbrod 1977a, pp. 59–60).

Since Weisbrod’s seminal work in the 1970s, the government failure theory has been considered by scholars as a classic on the economics of nonprofit organizations. A major test of this theory is a test of the demand heterogeneity hypothesis (Kingma 2003; Matsunaga et al. 2010).Footnote 3 As Matsunaga et al. (2010, p. 182) claimed, in Weisbrod’s theory “the core source of government failure is demand heterogeneity, and it is the most influential factor determining the size of the nonprofit sector.” The hypothesis, in Weisbrod (1988, p. 27) words, states “if government responds to the demands of the majority and the nonprofit sector responds to the demands of the undersatisfied, then the greater the diversity of demand the larger the size of the nonprofit sector will be, other things being equal.” Given the difficulty in measuring demand heterogeneity, Weisbrod further suggested that demand heterogeneity can be proxied by population heterogeneity in terms of income, education, age, ethnicity, etc. He (1986, p. 37) noted that “the degree of heterogeneity in quantities demanded is positively correlated with the degree of heterogeneity in preferences, and this, in turn, may be proxied by the degree of heterogeneity of the population in the given governmental unit”. In this way, the demand heterogeneity hypothesis becomes: the size of the nonprofit sector in a geographic region is positively related to the population heterogeneity of that region.

Indeed, since its inception, the demand heterogeneity hypothesis has been extensively examined and extended (e.g., Corbin 1999; DiMaggio and Anheier 1990; Kingma 2003; James 1987; Lecy and Van Slyke 2013; Liu 2016; Matsunaga and Yamauchi 2004; Steinberg 2006). A major critique of Weisbrod’s theory is its focus on the demand side of nonprofit formation but its ignorance of the supply side (Ben-Ner and Van Hoomissen 1991; DiMaggio and Anheier 1990; James 1987). Indeed, the demand heterogeneity hypothesis seeks to explain the existence of nonprofits as a function of heterogeneous demand brought by population heterogeneity, but it neglects consideration of demand-side stakeholders or entrepreneurs such as consumers, sponsors, and donors who ultimately establish or support nonprofits to respond to residual demands. On its own, the existence of demand for nonprofits does not necessarily guarantee a supply. Rather, as Ben‐Ner and Van Hoomissen (1991, p. 540) forcefully argued, “there must be a confluence between the demand for the organizational form and the ability to provide it in order for a nonprofit organization to be formed.” In this way, although the process of nonprofit formation can be demanded by population heterogeneity, nonprofits will not materialize until stakeholders exist to generate the supply. Therefore, without readily exploring the supply-side effect of population heterogeneity on nonprofit formation, the explanatory power of the demand heterogeneity hypothesis could be greatly undermined.

In fact, a growing body of recent literature documents that population heterogeneity can be detrimental to civic engagement and collective actions. In demographically heterogeneous communities, people display a lower level of trust and social interactions (Alesina and La Ferrara 2002; Costa and Kahn 2003; Putnam 2007), get involved in fewer social and civic activities (Alesina and La Ferrara 2000; Anderson and Paskeviciute 2006; Costa and Kahn 2003), and contribute less to community organizations and public fundraising (Andreoni et al. 2016; Fong and Luttmer 2009; Miguel and Gugerty 2005; Okten and Osili 2004). There are two possible reasons for this negative effect. First, people are inclined to trust and associate with others who are demographically similar to themselves because of shared norms, interests, and empathy (Alesina and La Ferrara 2000, 2002; Glaeser et al. 2000). Second, diverse groups have divergent preferences or agendas, which makes it extremely difficult to bridge competing interests and take collective actions toward a shared goal (Habyarimana et al. 2007; Miguel and Gugerty 2005). Alesina and La Ferrara (2000) employed U.S. locality data and found that population heterogeneity (in terms of income and ethnicity) significantly reduces participation in community activities such as religious, civic, educational, and service groups. Putnam (2007) studied social capital across U.S. communities and concluded that in communities with greater diversity, people are less likely to trust their neighbors, collaborate on community projects, and donate their time and money. In sum, population heterogeneity might limit the supply of social capital and other voluntary resources that are fundamental antecedents to nonprofit formation and development (Ben-Ner and Van Hoomissen 1991; Saxton and Benson 2005).

Therefore, in the consideration of the demand heterogeneity hypothesis, when population heterogeneity leads to diversified needs that call for more nonprofits to fill the gap, it simultaneously undermines the social capital and charitable activities that sponsor nonprofit formation. In this way, the impact of population heterogeneity on nonprofit formation and growth depends on the interaction between its demand-side and supply-side effects (see Fig. 1). If the demand-side and supply-side effects balance each other, the relationship between population heterogeneity and nonprofit sector size might become ambiguous.

Fig. 1
figure 1

The impact of population heterogeneity on nonprofit formation and growth

Indeed, existing empirical tests of the demand heterogeneity hypothesis reach a mixed conclusion concerning the relationship between population heterogeneity and nonprofit sector size. Some studies find a positive relationship between the two and thus endorse the demand heterogeneity hypothesis (e.g., Corbin 1999; James 1987; Joassart-Marcelli 2013; Lu 2016; Matsunaga et al. 2010; Wolch and Geiger 1983). However, some studies report that population heterogeneity has almost no significant impact on nonprofit sector size (e.g., Abzug and Turnheim 1998; Garrow and Garrow 2014; Salamon and Anheier 1998; Salamon et al. 2000). Still other studies conclude that population heterogeneity actually has a negative association with nonprofit existence and growth (e.g., Grønbjerg and Paarlberg 2001; Kanaya et al. 2015; Kim and Kim 2016a; Marchesini da Costa 2016; Marcuello 1998; Van Puyvelde and Brown 2016)—just the opposite to the demand heterogeneity hypothesis. Considering all the empirical evidence, there seems to be no consensus as to whether the population hypothesis matters to nonprofit sector size and if so in what way.

In sum, although Weisbrod’s demand heterogeneity hypothesis has been widely considered as a classic model on nonprofit formation and behaviors, inconsistent findings from existing empirical studies question the explanatory power of the hypothesis and present a significant barrier to forming a coherent knowledge base to guide future nonprofit research and practice.

Method

We explored the relationship between population heterogeneity and nonprofit sector size by employing a meta-analysis technique. Given that the demand heterogeneity hypothesis has been empirically tested extensively over the past four decades, a more critical scholarly need might be not to add additional empirical studies but to take stock of these existing studies and integrate conflicting findings to establish a generalized understanding. This is where meta-analysis could help. Meta-analysis, as Glass (1976, p. 3) coined it, represents “the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings.” As a quantitative research synthesis technique, meta-analysis enables researchers to draw statistical information from existing original studies and then aggregate it to form a more accurate conclusion across different studies. In this way, meta-analysis can build bodies of cumulative knowledge and provide a more precise estimation of the underlying relationship between population heterogeneity and nonprofit sector size, which further contributes to the development of government failure theory and its demand heterogeneity hypothesis in particular.

Literature Search

As a systemic review method, meta-analysis starts with a literature search to identify all the existing studies concerning the impact of population heterogeneity on nonprofit sector size in a geographic area. To promote the comprehensiveness of the literature search, we employed three complementary search strategies in Google Scholar to identify as many relevant studies as possible (Reed and Baxter 2009).Footnote 4 First, given that studies testing Weisbrod’s demand heterogeneity hypothesis have to cite Weisbrod’s seminal work, we searched all the studies that cite Weisbrod’s original paper (Weisbrod 1975), its subsequent reprints (Weisbrod 1977a, 1986) and books (Weisbrod 1977b, 1988).Footnote 5 Second, we undertook an ancestor search to examine the references of the studies we found to seek other relevant studies. Third, we conducted a descendant search to identify later studies that cite the studies identified by the previous two search strategies. We iterated the second and third steps until no new relevant studies could be identified. The entire literature search was concluded on December 16, 2016. In most cases, we manually reviewed the abstracts of all the relevant studies. If an article’s eligibility for inclusion was unclear from its abstract, we performed a full-text review to determine whether the study meets our criteria.

Inclusion Criteria

The following inclusion criteria were used to identify primary studies to be included in the present meta-analysis: (1) the study includes at least one quantitative test of the effect of population heterogeneity on nonprofit sector size in a locality, (2) the focal predictor, population heterogeneity, represents one or several dimensions of a population’s demographic diversity (e.g., income, age, ethnicity, gender, and education), (3) the dependent variable, nonprofit sector size, refers to the size of the nonprofit sector in a given geographic area (e.g., organization number, employment size, and expenditure amount), and (4) the study provides sufficient statistical information to calculate effect sizes. After detailed screening, we finally included 37 original studies for the meta-analysis, with a wide variety of nonprofit types, countries, and research designs represented in these studies (see Table 1), which promotes the external validity of the present analysis.

Table 1 Studies included in the meta-analysis

Coding Procedures

Information from these accepted studies was then extracted and coded. In particular, two types of information were included in the synthesis: effect size information and study characteristics (Lipsey 2009). Effect size, the key metric in meta-analysis, refers to the standardized association between focal predictor and dependent variables. In the present study, effect size represents the standardized association between population heterogeneity and nonprofit sector size. Given that most existing studies measure both population heterogeneity and nonprofit sector size continuously, we employed a correlation-based (r-based) effect size (transformed into Fisher’s z to correct the small bias associated with correlation coefficient r) to represent the population heterogeneity—nonprofit sector size relationship in original studies (Borenstein 2009; Ringquist 2013).Footnote 6

When calculating correlation-based effect sizes from original studies, we employed the following coding strategies suggested by Borenstein (2009), Fleiss and Berlin (2009) and Ringquist (2013): (1) When original studies report odds ratios (e.g., Garrow and Garrow 2014), odds-based effect sizes were first coded and then converted into correlation-based effect sizes. (2) When original studies only report parameter estimates and indicate their statistical significance levels using asterisks (e.g., Bielefeld et al. 1997; Bielefeld and Murdoch 2004), t-scores or z-scores were estimated as the values at the symbol levels and then converted into correlation-based effect sizes. In this situation, the effect sizes represent lower bound estimates. (3) When original studies only report the parameters of interest that are not statistically significant (e.g., Peck 2008; Salamon and Anheier 1998), the effect sizes were set as 0. In this situation, the effect sizes again represent lower bound estimates. (4) When original studies report multiple effect sizes, mostly because of different variable measurements, model specifications, or sample restrictions (e.g., Kim 2015; Liu 2016; Van Puyvelde and Brown 2016), all relevant effect sizes were coded to maintain within-study variation. Finally, 491 effect sizes were coded from 37 original studies. These effect sizes range from − 0.6815 in Kim and Kim (2015) to .9450 in James (1987). Within this group of effect sizes, there are 265 positive effects, 32 null effects, and 194 negative effects.

In addition to effect size information, we coded study-level descriptors such as characteristics of nonprofits under study and research design. Table 1 provides an overall picture of the distribution of the study descriptors across all 37 studies, which gives us a sense about the studies included in the meta-analysis.

Results and Findings

Table 2 reports the average effect sizes between population heterogeneity and nonprofit sector size under various scenarios. Before combining individual effect sizes across studies, we need to evaluate effect size heterogeneity to choose between a fixed-effects model and a random-effects model in the estimation of average effect size (Geyskens et al. 2009; Ringquist 2013). First, Hedge’s Q test was performed to test the null hypothesis that the variation among the effect sizes could be accounted for by sampling error alone. Second, we calculated the I2 statistic to further gauge the proportion of the variability in effect sizes that cannot be attributed to sampling error. After choosing a fixed-effects or a random-effects model, we estimated average effect sizes under different scenarios.Footnote 7 In the estimation of average effect sizes, three methodological treatments are worth noting. First, meta-analysis does not treat the effect sizes from each original study equally, but it aggregates them with weights that are the inverse of the effect size variance (Ringquist 2013; Shadish and Haddock 2009). Effect sizes from larger-sample studies are thus weighted more heavily because such studies tend to produce estimators that are closer to the population. In this way, we accounted for the potential influence of sample size in the precision of average effect sizes across studies. Second, given that Fisher’s z is not readily interpretable, average effect size was calculated in Fisher’s z and then converted back to Pearson’s r for reporting and interpretation (Borenstein 2009; Ringquist 2013). Third, we also performed z tests (two-tailed) to check whether average effect sizes are significantly different from zero, in order to assess the significance of the effects. The results are discussed as follows.

Table 2 Average effect sizes under different scenarios

We first estimated the average effect size for all 491 effect sizes coded from the 37 existing studies. The Q test was rejected at the .001 level, implying significant variability in effect sizes that is not attributable to sampling error. Again, an I2 statistic of 97% indicates a high level of heterogeneity across effect sizes (Higgins and Thompson 2002). Based on these findings, a random-effects model is more appropriate to synthesize the effect sizes. The weighted average effect size is .034 [95% CI (.18, .50), z = 4.10, p < .001]. This effect size indicates a significant and positive association between population heterogeneity and nonprofit sector size, but the magnitude of the relationship is substantially small.Footnote 8 This finding implies two things regarding the demand heterogeneity hypothesis. First, the positive sign of the average effect size supports the hypothesis that higher degrees of population heterogeneity in a locality could be associated with more nonprofit formation and growth. Second, the very small magnitude of the average effect size questions the robustness of the hypothesis, as it shows that population heterogeneity only has a trivial relationship with nonprofit sector size and there must be other factors shaping nonprofit formation and growth in a more forceful manner. Put together, although the demand heterogeneity hypothesis still holds, its explanatory power in predicting nonprofit sector size seems less strong.

Table 2 also reports average effect sizes under other scenarios to check the moderating effects of different research design components on the test of the demand heterogeneity hypothesis. Similarly, we began with a choice between a fixed-effects and a random-effects model. In most scenarios, random-effects models were used to estimate average effect sizes because there are significant variabilities among effect sizes. Two exceptions come from effect sizes related to linguistic and origin heterogeneity where limited effect size variations were found. In these two cases, fixed-effect models were employed to aggregate effect sizes. Then, weighted average effect size in each scenario was calculated and discussed as follows.

First, it could be argued that each country has its unique institutional and cultural environment and thus makes itself less comparable to others (Abzug and Turnheim 1998; Salamon and Anheier 1998). In this line of reasoning, whether the U.S.-based demand heterogeneity hypothesis could be applied to non-U.S. countries is also an open question. Actually, existing tests of the demand heterogeneity hypothesis are dominated by studies on U.S. nonprofits. In our meta-analysis, 369 effect sizes come from studies on U.S. nonprofits, and 122 effect sizes are from studies on non-U.S. nonprofits. We estimated average effect sizes for these two groups of studies separately.Footnote 9 In studies specifically looking at U.S. nonprofits, the weighted average effect size is .028 [95% CI (.009, .048), z = 2.86, p < .01], while in studies including non-U.S. nonprofits, their weighted average effect size is .043 [95% CI (.021, .065), z = 3.84, p < .001]. Both average effect sizes point to a significant and positive association between population heterogeneity and nonprofit sector size, even though such an effect is relatively small. In this way, the demand heterogeneity hypothesis seems to be applicable to not only U.S. but also non-U.S. nonprofits. As such, the hypothesis can be supported with less reference to country-specific environments.

Similarly, it could also be argued that because each country might be less comparable, studies employing cross-country data might compare apples and oranges and thus produce different results. In our sample of 491 effect sizes, 428 effect sizes are from studies focusing on one specific country (within-country studies), while 63 effect sizes are from studies relying on cross-country data such as OECD data and United Nation data (cross-country studies). We compared the average effect sizes from these two groups of studies. In within-country studies, the weighted average effect size is .038 [95% CI (.011, .065), z = 3.17, p < .01], and in cross-country studies, their weighted average effect size is .055 [95% CI (.037, .073), z = 3.53, p < .001]. There seems no substantial difference between these two effect sizes: both support the demand heterogeneity hypothesis and their magnitudes are rather small. This finding implies that the demand heterogeneity hypothesis can be supported by both within-country and cross-country data.

Third, we explored whether the use of different measurements of nonprofit sector size would influence the test of the demand heterogeneity hypothesis. Among the 37 studies included in the meta-analysis, 3 predominant measurements of nonprofit sector size are used, namely organizational density (number of nonprofits within a geographic area), financial size (nonprofit revenue, asset, or expenditure), and employment size (full-time equivalent nonprofit employees). We calculated an average effect size for each group of effect sizes with one specific measurement. In effect sizes with organizational density measurement, the weighted average effect size is .021 [95% CI (.007, .035), z = 2.98, p < .01], in effect sizes employing financial size measurement, their weighted average effect size is .022 [95% CI (.003, .041), z = 3.36, p < .01], in effect sizes relying on employment size measurement, their weighted average effect size is .076 [95% CI (.006, .146), z = 2.13, p < .05], and in effect sizes representing other measurements, their weighted average effect size is .077 [95% CI (.049, .105), z = 5.45, p < .001]. Considering this evidence, how nonprofit sector size is measured seems not to challenge the demand heterogeneity hypothesis, for each of the average effect sizes, though small, is significant and positive. In short, the measurement of nonprofit sector size makes no difference in the support of the hypothesis.

Fourth, we examined the validity of each dimension of population heterogeneity in predicting nonprofit sector size. Although Weisbrod suggested that heterogeneity in terms of age, income, education, religion, ethnicity, or other demographic variables can be used as indicators of population heterogeneity to represent demand heterogeneity, he did admit that “we do not have any strong a priori basis for selecting these particular indices of demand heterogeneity” (Lee and Weisbrod 1977, p. 85). Indeed, existing empirical evidence suggests that not every type of population heterogeneity could predict nonprofit growth equally well (see Kim 2015; Matsunaga et al. 2010 for brief reviews of this issue). We calculated average effect sizes for each group of effect sizes representing one specific dimension of population heterogeneity. The finding is twofold. First, there are five indicators of population heterogeneity (age, education, ethnicity, language, and religion) that support the demand heterogeneity hypothesis well, as the average effect size in each group is significant and positive. Second, another five indicators (employment status, gender, household location, income, and origin) fail to support the demand heterogeneity hypothesis because their average effect sizes are either insignificant or negative, contrary to what the hypothesis predicts.Footnote 10 In this way, the demand heterogeneity hypothesis could be supported when population heterogeneity is measured in terms of age, education, ethnicity, language, and religion.

In sum, all the weighted average effect sizes underlying the demand heterogeneity hypothesis reported in Table 2 tend to imply: (1) the relationship between population heterogeneity and nonprofit sector size is overall significant and positive; (2) the magnitude of the effect is generally very small; (3) the hypothesis has be supported by both U.S. and non-U.S. data, as well as with-country and cross-country data; (4) the hypothesis can be supported when nonprofit sector size is measured by organizational density, financial size, or employment size; and (5) the hypothesis could be supported population heterogeneity is measured in terms of age, education, ethnicity, language, or religion. To conclude, the demand heterogeneity hypothesis is supported by existing empirical evidence, but its explanatory power might be relatively weak.

Discussion and Conclusion

The nonprofit sector plays a vital and unique role within the good governance of a society by addressing various social, economic, and political needs that neither business nor government is prepared to fulfill. Over the past several decades, we have witnessed a substantial growth of the nonprofit sector in various countries around the world, in both size and impact. However, despite the unprecedented growth, the size of the nonprofit sector and its growth patterns vary dramatically across localities. What accounts for the size variation of the nonprofit sector, or to frame it differently, why nonprofits exist, has attracted much scholarly attention over the years. Plenty of theories have been developed to address this issue. One dominant theory among this body of literature is the government failure theory proposed by Weisbrod. A key model of the theory suggests that nonprofits exist to fill the service gap left by government provision, largely because of the conflict between diversified needs due to population heterogeneity and government underprovision due to majority voting. As such, there should be a positive relationship between one locality’s degree of population heterogeneity and its nonprofit sector size. This demand heterogeneity hypothesis has received wide recognition as a classic work in nonprofit studies and been tested extensively over decades.

However, recent studies of the size variation of the nonprofit sector reach an inconsistent conclusion concerning the relationship between population heterogeneity and nonprofit sector size. Although some studies do find the expected positive association between the two, many other studies in this line of research show either null or negative association and consequently question the explanatory power of Weisbrod’s demand heterogeneity hypothesis. Indeed, these competing findings fail to provide a coherent assessment of the hypothesis and thus cause a significant barrier for future nonprofit studies. We argue in the present study that a pressing scholarly need is to take stock of these existing studies and draw a generalized conclusion from their competing findings.

Toward this goal, we conducted a meta-analysis to aggregate existing literature testing the demand heterogeneity hypothesis. Through a systematic review of 37 existing empirical studies with 491 effect sizes, we find an average correlation between population heterogeneity and nonprofit sector size ranging from .020 to .147. This finding of average effect size has two theoretical implications. First, we could confirm the existence of a positive association between a locality’s degree of population heterogeneity and its nonprofit sector size. Therefore, the size of the nonprofit sector will be larger where the degree of population heterogeneity is higher. In this way, the present study provides meta-analytical support for Weisbrod’s demand heterogeneity hypothesis. Indeed, although the hypothesis has been criticized for ignoring the supply-side effect of population heterogeneity and there is empirical evidence showing that population heterogeneity can be detrimental to nonprofit formation, the positive net effect we found here implies that the positive demand-side effect of population heterogeneity on nonprofit formation seems to outweigh its negative supply-side effect. Overall, population heterogeneity would still push nonprofits to take root and grow.

Second, the magnitude of the association indicates that population heterogeneity might only have a limited relationship with nonprofit sector size, which further questions the robustness of the hypothesis. Admittedly, meta-analysis cannot delve into causal factors, and thus we are not able to determine the causality between population heterogeneity and nonprofit sector size in this study. However, the weak association between the two at least implies that population heterogeneity might not be a strong factor shaping nonprofit sector size; there must be other more important factors causing nonprofits’ establishment and growth. In this way, although the meta-analytical result generally supports Weisbrod’s line of reasoning, the power of this hypothesis in explaining and predicting nonprofit sector size might be less robust. Future studies need to be cautious when applying the government failure theory to explore the size variation of the nonprofit sector by locality and give more attention to other theories on nonprofit formation and behaviors.

Further, we also explored a number of methodological factors to examine whether these factors would influence empirical supports of the demand heterogeneity hypothesis. First, the hypothesis seems to be applicable to both U.S. and non-U.S. nonprofits and thus can be generalized across countries. Second, the hypothesis can be supported by both within- and cross-country data, and thus data type seems not to make a difference. Third, different measurements of nonprofit sector size do not undermine the validity of the hypothesis. As such, organizational density, financial capacity, or employment size can be used to gauge nonprofit sector size. Fourth, some indicators of population heterogeneity (age, education, ethnicity, language, and religion) seem to support the hypothesis better than other indicators (employment status, gender, household location, income, and origin). Future studies might rely more on the first five proxies. In sum, these findings should be able to inform future application of the demand heterogeneity hypothesis in the exploration of nonprofit sector size variation.

In addition to these contributions, the current research is subject to some limitations which could be addressed by future studies. On the theoretical side, first, existing studies are dominated by studies on U.S. nonprofits. Although the present analysis finds the average effect size from U.S. studies to be roughly the same as that from non-U.S. studies entirely, we cannot guarantee the demand heterogeneity hypothesis’ applicability to every non-U.S. country.Footnote 11 In this sense, more studies on other non-U.S. countries and cross-national comparison could contribute to our understanding of the topic. Before that, generalization to other non-U.S. countries should still proceed on a case-by-case basis considering a country’s specific environment. Second, the present analysis explores the association between population heterogeneity and nonprofit sector size, without including government provision as an intervening variable. A more thorough test of Weisbrod’s model might take the mediating effect of government social spending into consideration.Footnote 12 Third, existing studies generally use one demographic dimension (such as age, race, education) to measure population heterogeneity as a multidimensional construct. Although we find that some dimensions support the demand heterogeneity hypothesis better than others, future studies might consider developing a more comprehensive index to capture the multidimensional nature of population heterogeneity (e.g., Lu 2016). Further, population heterogeneity (observed) is usually resorted to as a proxy of demand population heterogeneity (unobserved), but the validity of this measurement is still not for sure.

On the methodological side, it should be first noted that any meta-analysis is limited by the availability of empirical studies. Therefore, the current meta-analysis only adjusted for the effect of sample size in the aggregation of average effect sizes using a traditional meta-analysis method (Ringquist 2013; Shadish and Haddock 2009). Although the effects of other statistical and methodological artifacts such as measurement reliability and variable ranges have been identified (Schmidt and Hunter 2015), existing studies generally do not provide sufficient information in these regards, which makes it impossible for us to make additional adjustments. Second, existing studies heavily rely on cross-sectional data, and thus the limitations related to cross-sectional analysis would apply to the meta-analysis here.Footnote 13 Clearly, without longitudinal analysis of the relationship between population heterogeneity and nonprofit sector size, we fail to account for the interaction between the two variables over time. Third, many existing studies focus on the entire nonprofit sector or aggregate several subsectors together without accounting for the subsector variation. Some studies do demonstrate the existence of subsector variation due to different policy and resource environments (e.g., Bielefeld et al. 1997; Grønbjerg and Paarlberg 2001). In this way, within-subsector analysis and cross-subsector comparison should attract more scholarly attention.