Introduction

Since the 1980s, there has been a rapid growth of research addressing the relationship between religion/spirituality (R/S) and health outcomes. It does not matter whether R/S is assessed as religious attendance, religiosity, or belief in god(s), greater levels of R/S are associated with higher levels of life satisfaction (Garssen et al., 2021; Habib et al., 2018; Kortt et al., 2015), better self-rated health (Barkan & Greenwood, 2003; Ellison, 2001; Mukerjee & Venugopal, 2018; Musick, 1996; Musick et al., 2004; cf. Cragun et al., 2016; Walker et al., 2021), improved mental health (Schieman et al., 2013; cf. Stroope & Baker, 2018), longevity (Oman, 2018), etc. This eclectic set of findings has been described as the belief-as-benefit-effect (BABE), which serves as a shorthand for the idea that R/S promotes wellness (Schuurmans-Stekhoven, 2017, 2019, 2020). BABE does not describe a singular finding, but instead a pattern of findings within the broader literature. However, a closer examination of BABE reveals a collection of issues, omissions, and errors that plague much of the existing field. To be clear, this is not a criticism of any one study addressing BABE, but a criticism of practices endemic to the BABE field.

Disaggregating BABE Across Nonreligious Groups

While BABE research had been criticized for its frequent omission of nonreligious respondents (Brewster et al., 2014; Galen, 2015; Hwang et al., 2011; Weber et al., 2017), this issue has improved over the past decade, particularly with the advent of large datasets (Balazka et al., 2021). Recent research has included nonreligious respondents—which is unambiguously positive—but there has been a substantive failure of this literature to meaningfully engage with subgroups within the nonreligious. There is only a small body of research addressing atheists, and there is a profound shortage of research addressing agnosticism. While atheists or agnostics may be included within general samples of the nonreligious, analyses rarely allow those groups to have their own estimates for wellness. To be fair, there are valid reasons for grouping nonreligious identities together (e.g., sample size restrictions), but the BABE field has largely been reluctant to investigate different ‘flavors’ of nonreligion. Some research will notably group atheists and agnostics together (Barringer & Gay, 2017; Zimpel et al., 2019), which is problematic as they often have different health outcomes (Baker et al., 2018). In a similar vein, researchers will mistake low-R/S activity as being indicative of secularism (Dilmaghani, 2018) or will assume a broad homogeneity amongst all nonreligious respondents (Brunoni et al., 2015; Rudalevičienė et al., 2010).

Even when researchers elect to disaggregate nonreligious groups, a similar failure of disaggregation may appear when discussing the salutary effects of R/S beliefs or behaviours (e.g., religious attendance, prayer, religiosity). In other words, researchers will model religious attendance, prayer, religiosity, etc., but will not investigate whether these relationships vary across R/S groups (e.g., Bergan & McConatha, 2001; Bosco-Ruggiero, 2020; Bernardelli et al., 2020; Lim & Putnam, 2010; Kort et al., 2015). In other words, there is an implicit assumption that Muslims, atheists, agnostics, Evangelical Christians, etc., report the same relationship between R/S beliefs and behaviours and health outcomes. This analytical assumption is unusual because the grouping variable (R/S identity) is intrinsically connected to the predictor of interest (R/S beliefs and behaviours). To be clear, there are valid reasons to ignore these types of interaction effects; but when the goal of a study is to determine the salutary effects of R/S beliefs and behaviours, this decision becomes problematic for interpretation. Interestingly, when researchers allow for nonreligious groups to report their own unique relationship between R/S variables and health outcomes, these nonreligious minorities tend to report no benefit from R/S beliefs and behaviours (Speed, 2017; Speed & Fowler, 2016; Speed & Hwang, 2017) or will report negative relationships (Speed & Fowler, 2017).

BABE Fails Simple Tests of Validation

The central premise of BABE is, for all intents and purposes, that R/S is health-promoting. Yes, there are variations across race (Assari, 2013; Krause, 2003), region (Stavrova et al., 2013; Stroope & Baker, 2018), sex (Maselko & Kubzansky, 2006), etc., but at its core, higher levels of R/S are presumed to be healthy. Logically, if R/S is positively associated with better health outcomes, then it stands to reason that people who are very high on R/S will be better off than people who are very low on R/S (Farias & Coleman III, 2021). On this note, atheists provide a logical avenue of testing this inference of BABE, as they are among the least religious of all groups (Baker et al., 2009; Speed, 2021). However, researchers have been slow to explicitly test atheists (Brewster et al., 2014; Galen, 2015; Hwang, 2008; Hwang et al., 2011; Zuckerman, 2009), which is unusual as they show strong potential for confirming the BABE hypothesis.

On this note, several studies that have examined atheism and health draw conclusions that are perhaps supportive of BABE. Giannini et al. (2018) found that atheists reported poorer perceived social support and optimism than Buddhists. Wilhelm et al. (2018) showed that German atheists tended to have lower scores on body image relative to veiled Muslims, while having lower BMI. Sawyer and Brewster (2019) found that bereaved atheists reported higher levels of distress relative to religious groups. Finally, Kuentzel et al. (2012) found that atheists were more likely to have a non-suicidal self-injury relative to other religious groups. Simultaneously though, it would be fair to point out that these same studies and other studies (Caldwell-Harris et al., 2011; Moore & Leach, 2016; Sillick et al., 2016), often find no differences between atheists and comparison groups. However, given the small-scale of these studies, perhaps the null findings are a product of poor statistical power and should be given less weight in the BABE discussion.

When studies use population-level data or large-scale surveys, the evidence for BABE becomes noticeably more impoverished, particularly within the USA. Zuckerman, Galen, and Pasquale (2016) note that while much of the narrative surrounding secularism and health is negative, a more critical lens does not support this preconception. Baker et al. (2018) found that atheists reported either comparable health or better health than all other religious groups (Evangelicals, Mainline Protestants, Black Protestant, Catholics, Jews, Other Religions, Agnostic, and Nones). Speed et al. (2018) found that atheists and theists reported similar levels of nihilism and fatalism and Speed (2017; Speed & Fowler, 2016) found that atheists reported comparable levels of self-rated health and happiness to various religious comparators. Hayward et al. (2016) found that atheists reported better health than religiously affiliated people on three outcomes (BMI, number of chronic conditions, and physical functionality), no differences on fifteen other outcomes, and worse health on eight outcomes. Finally, Walker et al. (2021), examining multiple waves of the General Social Survey, found that atheists reported comparable or better health than the other religious groups.

BABE in Canada

While there is a scarcity of American atheism research, there is a downright drought of Canadian atheism research, which is surprising given the geographical closeness of the two nations. Because nonreligion is growing in Canada, there is an impetus to better understand BABE, particularly in a Canadian context. Wilkins-Laflamme (2022) reported that Canada ranks 69th for religious salience (cf. 42nd USA), that 16.4% of Canadians attend religious service monthly or more (cf. 39.1% USA), that 41.5% of Canadians are atheists (cf. 17.9% USA), and that only 35.5% of Canadians pray weekly or more (cf. 62.9% USA). However, while nonreligion is a larger phenomenon in Canada than in the USA, R/S still plays an important role in the lives of Canadians: 46.0% identify as Christian (cf. 48.5% USA; Wilkins-Laflamme, 2022), religiosity consistently predicts conservative positions on political matters (Ang & Petrocik, 2012), and despite growth in nonreligious identities from the 1980s through the 2000s, several provinces did not see a concomitant shrinkage in committed religious members (Wilkins-Laflamme, 2014). Functionally, while nonreligion is on the rise in Canada, religion does not appear to be going anywhere any time soon. Despite this, only one study could be found that addressed atheism and health in a Canadian context (Speed, 2021), and its results were somewhat ambiguous. Speed notes that atheists reported health parity when compared to religious groups, except that religious groups reported higher levels of social well-being (similar results were reported in Shiah et al., 2016). However, Speed did not control for R/S beliefs and behaviours, which means that a BABE-related finding may have been missed due to potential heterogeneity of the comparison groups.

The Current Study

The current study will explore how religious attendance, prayer, religiosity, and R/S identities predict wellness in a representative sample of Canadians. I am specifically interested in exploring whether atheists report worse health than other R/S identities, and whether religious attendance, prayer, and religiosity have a uniform relationship with wellness outcomes across R/S identities. My choice of outcome variables is deliberately banal; each has been explored at some length within the BABE literature: self-rated stress, self-rated physical health, life satisfaction, and self-rated mental health (Brewer-Smyth & Koenig, 2014; Dilmaghani, 2018; Graham et al., 2001; Nooney, 2005; Pargament & Park, 1997; Ritter et al., 2014; Speed, 2021; Stavrova, 2015; Stroope & Baker, 2018). The underlying logic of the current study is that if R/S is salutary then I should find that atheists (especially highly nonreligious atheists) would report worse health than the religiously affiliated (especially when compared to highly religious members). Conversely, if health differences are not found between atheists and religious groups, this represents an important null finding.

First, I examined the health differences between atheists, agnostics, Nones, Catholics, Protestants, and Eastern religion practitioners, while allowing each group to report its own relationship between R/S beliefs and behaviours and health outcomes. This approach compared the health of the averagely religious atheist, the averagely religious agnostic, the averagely religious None, the averagely religious Catholic, the averagely religious Protestant, and the averagely religious Eastern Religion practitioner. Second, while ‘atheist’ is a good comparator group, it is possible to increase its potency by selecting only the most nonreligious members of that group, and then compare them to the most religious elements of religious groups. Functionally, by eliminating atheists who attend religious service, pray, or value religiosity, and by eliminating religious members who fail to attend religious service, pray, or value religiosity, differences across religious groups can be potentiated. In effect, this approach compared maximally nonreligious Canadians to maximally religious Canadians. In summary, these dual approaches will explore BABE across the average member of an R/S identity, as well as in the context of a ‘dedicated’ member of an R/S identity.

Method

Data

I accessed data from Cycle 29 of the General Social Survey (GSS) that was collected in the 2015 and 2016 calendar years. The GSS is produced by the Social and Aboriginal Statistics Division of Statistics Canada and provides data on social issues. The sampling frame for the GSS used telephone numbers and property registries and had an overall response rate of 38.2%. While Statistics Canada released a data file for Cycle 29, because the variable values for religious groups were compressed, it was impossible to access information on whether a respondent was an atheist. Consequently, I applied to gain access to the appropriate master file so that the study could proceed. Because the Master file contained potentially sensitive data, the study was not permitted to release bivariate statistics (e.g., correlations) as there was a possibility of residual data disclosure. To further reduce the risk of disclosure, random digit rounding was employed (to multiples of 5) for the number of respondents per category (see Table 1). For example, if there were 3,212 people in a category, this would be randomly rounded down to 3,210 people 50% of the time, and randomly rounded up to 3,215 people 50% of the time. These rounding decisions were only applied to count values and were not applied to statistical values. I did not seek institutional ethics review for the current study as I was accessing public data under the stewardship of a government organization (this is exempted under S2.2 of Tri-Council Policy Statement). I only retained respondents who were ≥ 18 years old and spoke at least one official language of Canada (English or French).

Table 1 Descriptive statistics for covariates, religious/spiritual beliefs and behaviours, and outcomes by religious/spiritual categories

Measures

Covariates

All models included covariates for sex (female = base), age (measured continuously), age2, household income from 1 (< $20,000) to 8 ($140,000 +), marital status (married/common-law = base; widowed or separated or divorced; single), language (English = base; French; English and French), minority status (white = base), education level (≤ high school = base; college or trade school; bachelor’s or certificate; post-bachelor’s), and region (Atlantic = base; Quebec; Ontario; Prairies; British Columbia).

R/S Identities

I used the question, “What is your religion? Specify one denomination or religion only, even if you are not currently a practicing member of that group.”, which required respondents to self-identify as a given religious group. I had three nonreligious groups in the current study (atheists, agnostics, and Nones) and four religious groups in the current study (Catholics, Protestants, Eastern Religions, and Other Religions). In all comparisons, atheists were the base category, as they were assumed to be the ‘most nonreligious’ group. Please note that while I include ‘Other Religions’ as a category, I will not discuss them in the results because they are an analytically useless catchall (e.g., Satanists, Druids, and Muslims would be considered members of the same group).

R/S Beliefs and Behaviours

I explored religious attendance from 1 (Not at all) to 5 (Weekly or more), prayer from 1 (Not at all) to 6 (Daily or more), and religiosity from 1 (Not at all important) to 4 (Very important). I used interaction terms to investigate if different R/S identities reported different linear effects for R/S beliefs and behaviours. This allowed me to investigate whether the relationship between religious attendance, prayer, and religiosity and health was the same for atheists, as it was for agnostics, Catholics, etc. All interaction terms were mean-centred, and group comparisons were made at the mean values for R/S beliefs and behaviours per R/S category.

Maximally Nonreligious and Maximally Religious

There is variability in the religiousness of nonreligious groups, and there is variability in the religiousness of religious groups. Some members of nonreligious groups (e.g., atheists) may attend religious service, pray, or place importance on religion. Similarly, some members of religious groups (e.g., Catholics) may not attend religious service, pray, or view religion as important. Utilizing a min/max approach described by Speed, Barry, et al. (2020), Speed, Coleman, et al. (2020)), I isolated maximally nonreligious individuals and maximally religious individuals. Maximally nonreligious individuals were atheists, agnostics, or Nones who never attended religious services, never prayed, and viewed religion as unimportant. Maximally religious individuals were Catholics, Protestants, Eastern Religion practitioners, or Other Religions who attended religious service weekly or more, prayed daily or more, and viewed religion as very important. An added benefit of this approach was my circumvention of the wording issue for R/S identity; specifically, respondents were asked to indicate their religious membership regardless of whether they were practicing. Functionally, this analytical addition ensured that people who identified as religious, were indeed practicing members of a given religion.

Some critics of the min/max approach may argue that BABE is potentially curvilinear rather than strictly linear with respect to health outcomes. In other words, the extremes of nonreligion and religion are similar, while moderate levels of religion report different health outcomes (Galen & Kloet, 2011; Wei & Liu, 2013). However, this potential objection misses the underlying thrust of the current paper, if a curvilinear relationship is the ‘true’ relationship then BABE fails because it assumes a linear effect. On this point, literature on BABE tends not to extoll the benefits of committed nonreligon, despite it being, according to a curvilinear hypothesis, comparable to committed religion. If a min/max approach is inadequate, then one must also accept that the ‘R/S is healthy’ narrative is incorrect. Granted, some may point out that there are exceptions to the general BABE finding where people high on R/S report poorer health (Ellison & Lee, 2010; Krause & Wulff, 2004; Mannheimer & Hill, 2015). However, as a pre-emptive counter to this criticism, these negative outcomes reference specific subsets of religious adherents as opposed to the general experience of all active religious members. Moreover, there is little evidence that these predictors override the ostensible benefits of higher R/S (e.g., Ellison & Lee, 2010). To summarize, a min/max approach is a viable and robust method to test BABE.

Health Outcomes

I explored four single-item response measures as health outcomes in the current study. Admittedly, while these are not as informative as multi-item scales on specific topics, they are a staple of R/S-health literature and have been used extensively. I assessed self-rated stress (SRS), “Thinking about the amount of stress in your life, would you say that most days are…?”, which was answered on a 5-point scale 1 (Not at all stressful) to 5 (Extremely stressful). I used the question “Using a scale of (1) to (11) where 1 means "Very dissatisfied" and 11 means "Very satisfied", how do you feel about your life as a whole right now?”, to assess satisfaction with life (SWL). I assessed self-rated health (SRH), “In general, would you say your health is…?”, which was answered on a 5-point scale from 1 (Poor) to 5 (Excellent). Finally, I assessed self-rated mental health (SRMH) with the question, “In general, would you say your mental health is…?”, which was answered on a 5-point scale from 1 (Poor) to 5 (Excellent).

Data Analysis

I used Stata 15 (StataCorp) for all analyses and Microsoft PowerPoint for the figure. All analyses took place in a Statistics Canada secure Research Data Centre. The GSS is released with a person-level weight and bootstrap weights, which correct for point estimates and error estimates, respectively. Survey-weighted linear regression was employed as the primary analytical tool along with postestimation marginal means comparisons. All models showed acceptable tolerance and all models used bootstrap repeated replicate standard errors, which addressed issues with the non-simple random sample approach used by Statistics Canada to collect data. My nominal Type I error rate was set at α = 0.05, two-tailed significance values were used for each comparison, and I discuss effect sizes in the context of Cohen’s d. Conventionally, d < 0.20 is trivial, d < 0.50 is small, d < 0.80 is medium, and d ≥ 0.80 is large. There will also be a discussion on relative effect sizes and how other areas of research may treat the magnitude of findings for the current study.

Recall, the purpose of the current study is to determine if atheists differ in health with respect to other religious groups. When comparing R/S identities at their mean-centered level of R/S beliefs and behaviours the minimum power level to detect simple small effects (Cohen’s d = 0.20) between atheists and other R/S identities were: agnostics (0.68), Nones (0.96), Catholics (0.96), Protestants (0.96), and Eastern Religion practitioners (0.89). Except for agnostics, the current study was unlikely to make a Type II error when detecting the presence of small health differences between atheists and all other groups (NB: power levels for detecting medium effects for all groups was > 0.99). When comparing R/S identities using a min/max approach, the current study had reduced power across all categories because of the lower n, and there will be a stronger focus on effect sizes for these comparisons to compensate for the potential power issue.

Exploratory Hypothesis

Do R/S identities report unique linear effects for R/S beliefs and behaviours predicting wellness? The results will depict the unique linear effects of religious attendance, prayer, and religiosity, per R/S identity for each health outcome. Because there are 21 interactions tested per model, I will use Bonferroni–Holm corrections to address the familywise error rate.

Hypothesis 1

When mean-centered interactions are present in the model, atheists will be less healthy than agnostics, Nones, Catholics, Protestants, and Eastern religion practitioners.

As a follow-up analysis, I will compare maximally nonreligious atheists, agnostics, and Nones to maximally religious Catholics, Protestants, Eastern Religion practitioners, and Other Religions.

Hypothesis 2

Maximal atheists will be less healthy than maximal agnostics, maximal Nones, maximal Catholics, maximal Protestants, and maximal Eastern religion practitioners.

Results

Comparing the Averagely Nonreligious and the Averagely Religious

SRS was regressed onto covariates in Block 1, F(16, 499) = 111.53, p < 0.001, R2 = 0.136, R/S beliefs and behaviours in Block 2, F(3, 499) = 10.59, p < 0.001, R2 = 0.139, ΔR2 = 0.003, and R/S identities in Block 3 F(6, 499) = 1.03, p = 0.402, R2 = 0.140, ΔR2 = 0.001. Interaction terms were then entered and removed for religious attendance, F(6, 499) = 0.25, p = 0.957, R2 = 0.140, ΔR2 = 0.000, prayer F(6, 499) = 1.20, p = 0.306, R2 = 0.141, ΔR2 = 0.001, and religiosity F(6, 499) = 0.75, p = 0.610, R2 = 0.141, ΔR2 = 0.000. All 21 interaction terms were entered simultaneously (Table 2), but their respective estimates were approximately b = 0.00, and they only netted an extra 0.2% of the total variability in SRS (compared to Block 3). Notably, prayer in Catholics predicted greater stress, but the effect was trivial. When investigating if different R/S identities reported different linear effects of R/S beliefs and behaviours, results were uniformly null. In other words, atheists, agnostics, Nones, Catholics, Protestants, and Eastern Religion practitioners all reported equivalent relationships between religious attendance, prayer, and religiosity and SRS—and this relationship was quite often flat. I then turned my attention to H1, which asserted that atheists would be less healthy than other R/S identities. However, I found no evidence of substantial stress differences between atheists and other R/S identities, so I rejected this hypothesis (see Table 2).

Table 2 Relationships between religious attendance, prayer, and religiosity per religious category, predicting stress, satisfaction with life, self-rated health, and self-rated mental health

I regress SWL onto covariates in Block 1, F(16, 499) = 35.84, p < 0.001, R2 = 0.057, R/S beliefs and behaviours in Block 2, F(3, 499) = 7.98, p < 0.001, R2 = 0.060, ΔR2 = 0.003, and R/S identities in Block 3, F(6, 499) = 4.79, p < 0.001, R2 = 0.064, ΔR2 = 0.004. Interaction terms were entered and removed sequentially for religious attendance, F(6, 499) = 1.06, p = 0.387, R2 = 0.064, ΔR2 = 0.001, prayer, F(6, 499) = 0.99, p = 0.434, R2 = 0.064, ΔR2 = 0.001, and religiosity, F(6, 499) = 0.75, p = 0.609, R2 = 0.064, ΔR2 = 0.001. When entering all interactions simultaneously (Table 2), those terms only explained 0.2% more variability in SWL than did Block 3. Religious attendance predicted greater SWL in Protestants, prayer predicted lowered SWL in Protestants, and religiosity predicted greater SWL in Catholics, but each of these effects were trivial. Importantly, none of the interaction terms were significant, suggesting that the linear effects for attendance, prayer, and religiosity were comparable across all groups. In other words, each group reported overlapping estimates for the relationship between R/S beliefs and behaviours and SWL, but because some groups were much larger (and thus had lower error estimates) they reached statistical significance for their specific estimate, despite reporting very small effects. When exploring differences in life satisfaction across R/S identities, I found that atheists reported similar levels of SWL to all other R/S identities (Table 2), which failed to support H1.

In the third model, SRH was regressed onto covariates in Block 1 F(16, 499) = 26.87, p < 0.001, R2 = 0.042, R/S beliefs and behaviours in Block 2 F(3, 499) = 4.42, p = 0.004, R2 = 0.044, ΔR2 = 0.002, and R/S identities in Block 3 F(6, 499) = 2.62, p = 0.016, R2 = 0.046, ΔR2 = 0.002. Interaction terms for religious attendance, F(6, 499) = 1.30, p = 0.254, R2 = 0.047, ΔR2 = 0.001, prayer, F(6, 499) = 1.56, p = 0.156, R2 = 0.047, ΔR2 = 0.001, and religiosity, F(6, 499) = 0.36, p = 0.906, R2 = 0.046, ΔR2 = 0.000, were entered and removed in turn. As can be seen in Table 2, when all interaction terms were entered simultaneously (ΔR2 = 0.003, relative to Block 3) most relationships were non-significant. Religious attendance positively predicted SRH for Protestants and prayer negatively predicted SRH for Catholics, but these relationships were trivial. Somewhat surprisingly, agnostics reported a positive relationship between religiosity and SRH, t = 2.16, p = 0.031, b = 0.23, 95% CI [0.02, 0.45], which was a small effect. Given the novelty of this last relationship and the number of uncorrected comparisons made in models, this finding should perhaps be regarded with skepticism. I then investigated whether the unique linear effects for religious attendance, prayer, and religiosity (Table 2), differed across R/S identities. But after applying Bonferroni–Holm corrections, I found that atheists, agnostics, Nones, Catholics, Protestants, and Eastern Religion practitioners reported the same ‘benefits’ for religious attendance, prayer, and religiosity. When testing H1, I found no evidence to support the contention that atheists reported worse SRH than other R/S identities, thus I rejected this hypothesis.

In the fourth model, SRMH was regressed onto covariates in Block 1 F(16, 499) = 19.45, p < 0.001, R2 = 0.028, R/S beliefs and behaviours in Block 2 F(3, 499) = 4.74, p = 0.003, R2 = 0.029, ΔR2 = 0.002, and R/S identities in Block 3 F(6, 499) = 1.81, p = 0.096, R2 = 0.031, ΔR2 = 0.002. Next, I explored if R/S identities moderated the relationship that R/S beliefs and behaviors had with subjective mental well-being. However, neither attendance interactions, F(6, 499) = 1.03, p = 0.402, R2 = 0.031, ΔR2 = 0.001, prayer interactions, F(6, 499) = 0.84, p = 0.539, R2 = 0.031, ΔR2 = 0.001, nor religiosity interactions, F(6, 499) = 1.13, p = 0.341, R2 = 0.032, ΔR2 = 0.001, improved the overall model. When all 21 unique interaction terms were entered simultaneously (see Table 2), an additional 0.5% of variability was explained in comparison with Block 3 (ΔR2 = 0.005); however, the bulk of the relationships between religious attendance, prayer, and religiosity were non-significant. Catholics and Protestants reported a positive relationship between religious attendance and SRMH, and Catholics reported a negative relationship between prayer and SRMH, but each of these effects were trivial. Interestingly, atheists reported a negative relationship between religiosity and SRMH, but this effect was also trivial. When I investigated whether the unique linear effects for R/S beliefs and behaviours differed across groups, after using Bonferroni–Holm corrections there were no differences between atheists, agnostics, Nones, Catholics, Protestants, and Eastern Religion practitioners (i.e., all interaction terms were nonsignificant). When investigating H1, I found no differences across atheists and any other R/S identities, except for agnostics who reported worse SRMH relative to atheists. Overall, this finding was non-supportive of the general BABE hypothesis.

In summary, I allowed atheists, agnostics, Nones, Catholics, Protestants, and practitioners of Eastern Religions to report their own unique relationships between religious attendance, prayer, and religiosity and the four health outcomes (SRS, SWL, SRH, and SRMH). Overall, there were two findings to note. First, tested relationships often showed no association between religious attendance, prayer, and religiosity and any of the four health outcomes. While there were sporadic significant relationships, these tended to be small and inconsistent (e.g., Catholics reported a significant, albeit trivial effect between religiosity and SWL, but not for stress, SRH, or SRMH). Second, while I predicted that atheists would report worse health than non-atheists, this was resolutely not the case: atheists reported health parity with other R/S identities. Granted, these null findings may be the product of the heterogeneity present in R/S identities, which is addressed in the following section.

Comparing the Maximally Nonreligious and the Maximally Religious

The preceding analyses are not without conceptual flaws, namely that BABE may be weakened by heterogeneity in R/S identities. Perhaps a subgroup of atheists reports very high levels of religious attendance, prayer, and religiosity, and this rogue group of nonbelievers is artificially inflating the wellness of atheists. Alternatively, perhaps a subgroup of Protestants is quite unreligious and is artificially deflating the overall wellness of Protestants as a whole. Using the previously discussed min/max approach the most nonreligious atheists, agnostics, and Nones and the most religious Catholics, Protestants, Eastern Religion practitioners, and Other religious practitioners were retained for further analysis. If BABE is correct, then I expect that the largest possible difference in wellness to emerge between the maximally nonreligious and the maximally religious groups. Again, because atheists are among the least religious of any group, they were used as the base group in all comparisons. While these analyses had a fewer number of respondents than previous models, they still represented > 5,000,000 Canadians.

As can be seen in Table 3, there was consistent health parity between atheists and their comparator groups, with several exceptions. Protestants reported higher levels of SWL relative to atheists, t = 2.43, p = 0.015, b = 0.42, 95% CI [0.08, 0.76], Cohen's d = 0.25, and Protestants reported higher levels of SRH relative to atheists, t = 2.36, p = 0.019, b = 0.21, 95% CI [0.04, 0.39], Cohen's d = 0.22. The other exception to this trend were agnostics, t = − 2.28, p = 0.023, b = − 0.41, 95% CI [− 0.77, − 0.06], Cohen's d = − 0.37, who reported worse health than atheists did for SRMH (see Fig. 1). Excluding the heterogeneous ‘Other Religions’ group, there were 12 comparisons made between maximal atheists and maximal religious group members. Only two of these comparisons were significant and were associated with small effect sizes. The 10 other comparisons were non-significant and not associated with practical effect sizes. Atheists reported similar health to Nones and reported better health than agnostics in one case. Overall, there was no substantial support for H2, and it was rejected in each of the tested models.

Table 3 Comparing maximum atheists, agnostics, and Nones to maximum Catholics, Protestants, Eastern Religions, and Other Religions
Fig. 1
figure 1

Health differences across maximally nonreligious atheists, agnostics, and Nones, and maximally religious Catholics, Protestants, and Eastern Religion practitioners

Discussion

To reiterate, BABE describes a positive association between R/S and health outcomes. The current study examined one exploratory hypothesis and two directional hypotheses, all of which failed to garner support for BABE in a Canadian context. The exploratory hypothesis predicted that different R/S identities would report unique relationships between religious attendance, prayer, and religiosity (R/S beliefs and behaviours) and self-rated stress (SRS), satisfaction with life (SWL), self-rated health (SRH), and self-rated mental health (SRMH). Overwhelmingly though, it was found that atheists, agnostics, Nones, Catholics, Protestants, and Eastern Religion practitioners reported the same (often null) relationship between R/S beliefs and behaviours. Both H1 and H2, which explored whether there was a health-related penalty for atheists, were unsupported. If BABE was correct, I could have reasonably expected to see that atheists were less well-off than Catholics, Protestants, and Eastern Religion practitioners. However, there was no widespread evidence of an ‘atheism penalty’, despite efforts to maximize group differences. These null findings are important as a stepping-off point for a broader criticism of the BABE field.

At this juncture, it is beneficial to remind the reader that the current study examined the unique linear effects of all R/S beliefs and behaviours by group, per outcome. This allowed the ‘Catholic relationship between religious attendance and stress’ to vary from the ‘atheist relationship between religious attendance and stress’, and so on (see Table 2). While research by Speed and Fowler (2016, 2017) suggested that different R/S identities reported unique relationships between religious attendance, prayer, or religiosity and health outcomes, we could not find evidence of this (e.g., atheists and Protestants reported the same relationship between religiosity and SRMH). Granted, some of these null relationships may have been a product of a zealous effort to guard against Type I error (e.g., Bonferroni–Holm corrections were used), but frequently the differences across linear effects were nonsignificant regardless of whether corrections were applied. While this set of findings ostensibly undermines the idea that R/S identities report unique relationships between R/S beliefs and behaviours and health outcomes, the bigger issue is that most measured relationships were flat or nearly so (b ≅ 0.00). For example, the relationship that Protestants reported between religious service attendance and life satisfaction was b = 0.08—for each one unit increase in religious attendance SWL would increase by 0.08 units (d ≅ 0.04). This linear effect so closely bordered zero, that the current study was underpowered to detect significant interaction effects. It was, in a sense, trying to split statistical hairs. Granted, stalwart defenders of BABE may point out that some of the tested relationships were significant and may continue to argue that BABE is genuine. However, I will explore this idea closely with respect to religious attendance, which was the strongest of the statistically significant relationships in Table 2.

Statistical Significance Does Not Mean We Should Care

First, it is important to note that religious attendance—which requires a person to be physically capable of leaving their home, travelling, and sitting for periods of time—is intrinsically entangled in subjective wellbeing (Speed & Lamont, 2021). In a sense, it would be remarkable if I was unable to find an association between ‘being at least somewhat physically well’ and ‘some form of subjective physical well-being’, as they appear to be tapping into related constructs. However, I will ignore this confound for the moment and acknowledge that the relationships between religious attendance and health outcomes were, occasionally, not equal to zero. Further, I will assume that going to religious service causally ‘boosts’ health and I will assume that people not going to religious service are not boosting their health in other ways. With these assumptions, I will re-examine the relationship between religious attendance and SWL for Protestants, as this was one of the strongest relationships found in the current study. Imagine that a Protestant who never attended religious service started to attend religious service weekly. How much of a ‘boost’ would they report with respect to SWL? Their hypothetical increase would be d ≅ 0.16 (R2 ≅ 0.006), which is conventionally seen as trivial (Cohen, 1992). Let us improve this to d = 0.20 (R2 = 0.01) and work with these slightly more favourable values: how does the existing literature treat this size of effect for this type of outcome? In other words, how does the broad social science literature react when a model can improve estimates of crude subjective wellness by ~ 1%?

From a longitudinal UK study, McKay et al. (2020) analyzed data from ~ 5000 students to determine how sociodemographic variables predicted wellness. In their study, McKay and colleagues collected data on mental well-being (among other outcomes) using the Warwick-Edinburgh Mental Well-being Scale. Let me generously assume that the single-item SWL question from the current study is on par with the validated scale for mental wellness in McKay et al. (2020). In their study, McKay et al. found that knowing whether a student ‘ever smoked cannabis’ (in the past two years), explained about 1% of the variability in that student’s mental wellbeing. McKay and colleagues (2020) implied that this relationship was not noteworthy because it fell outside of the cutoff for a practical effect.

In fairness, perhaps McKay et al. (2020) had more stringent standards than other researchers in the field. Maybe there are counter examples where a 1% increase in the prediction of subjective wellbeing is seen as an impressive finding. Fine, let us talk about the underlying concepts instead of the specific numerical values. After controlling for demographic covariates, a binary indicator of marijuana consumption for the past two years, was a better predictor of mental wellness in a student, than attending religious service on a weekly basis (rather than not at all) was for a Protestant. To reiterate this point, the amount of variability in ~ mental wellness explained for a Protestant who gets out of bed, gets dressed, travels to religious service, and sits through religious service every single week, is smaller than the amount of variability in mental wellness that is explained by whether a high-schooler tried pot.

When looking at self-rated health, the same underlying approach can be used. If a Protestant started to attend religious service weekly (after not attending religious service at all), their SRH would improve by 0.12 units (d ≅ 0.11; OR ≅ 1.22). I struggled to find a study where an adjusted effect of this size was discussed at length within the literature. A study be Eriksen et al. (2013), looked at the relationship between dichotomized SRH (very good and good, vs. fair, poor, and very poor) and physical activity in ~ 10,000 Danes. Eriksen and colleagues found that women who reported being in the ‘light physical activity’ category were more likely to report better SRH relative to the ‘sedentary’ category (OR = 1.58). The ‘health boost’ associated with moving from the lowest level of physical activity to the second lowest level of physical activity was ~ twice as large as the ‘health boost’ associated with a Protestant regularly attending religious service (compared to a ‘never-attender’).

Granted, these are not exact comparisons and there are statistical nuances that are being ignored for simplicity. However, while these are imperfect comparators, they are still useful comparators for contextualizing the R/S-health relationship. The associations between R/S and health outcomes were not literally zero in the current study, but one would be hard-pressed to explain how they differed remarkably from zero. In either discussed example, it is evident that the best-case scenario for BABE is that the observed effects are so tiny, that other areas of social science research would simply ignore them. Please note, I am not saying that explaining 1% of a health outcome is necessarily unimpressive; 1% of variability explained in infant mortality, cancer survivability, delay in dementia, etc., would each be important findings, but a < 1% explanation of variance in a proxy measure of mental wellness—that is something entirely different.

Why No Atheism Penalty?

According to the reasoning championed by proponents of BABE being an atheist should be associated with poorer health. Hypothesis 2 provided the strongest test of this rationale, so I will devote my attention to that instead of the comparatively weaker test in Hypothesis 1. When comparing maximally nonreligious atheists, agnostics, and Nones, against maximally religious Catholics, Protestants, and Eastern Religion practitioners, there were few differences between atheists and other groups. Figure 1 provides a visual representation of the effect sizes in terms of Cohen’s d. Using effect sizes only and ignoring statistical significance, we can see four instances where there were health differences between atheists and other groups (Cohen’s d ≥ 0.20). Two of these instances were for comparing atheists to agnostics—and in both cases atheists reported better health outcomes (SWL and SRMH). These results, although the effects are small (d = 0.27 and d = 0.37), are consistent with the existing literature suggesting atheists fare slightly better than agnostics. The other set of findings were Protestants reporting better levels of SWL and SRH compared to atheists (d = 0.25 and d = 0.22, respectively), but again these effects are small. While one could point to these differences as evidence for BABE, this argument is unimpressive.

To make this point explicit: I gamed the analyses so that Catholics, Protestants, and Eastern Religion practitioners were advantaged to report a health-related effect. Recall that Protestants being compared to atheists in Fig. 1 and Table 3 must have attended religious service at least weekly. Consequently, Protestants who were unable to attend religious service—due to being bedridden or otherwise infirm—would have been automatically dropped from these analyses. Moreover, atheists, agnostics, or Nones who attended religious service with family or friends (sources of social support), would have also been automatically excluded from analyses. In other words, the min/max comparison systematically ‘cherry-picked’ healthier religious participants and potentially excluded socially integrated nonreligious participants. Even with these selection biases, the differences to emerge were small and are likely overestimates given that I could not control for social support, which is heavily connected with R/S (Speed, Barry, et al., 2020; Speed, Coleman, et al., 2020).

Overall, I made a diligent effort to ‘find’ BABE using a representative sample of Canadians. The theoretical framing of the research question and hypotheses used the underlying logic of BABE without making any grandiose assumptions or caveats. My measures of health (SRS, SWL, SRH, SRMH), R/S beliefs and behaviours (religious attendance, prayer, religiosity), and grouping variable (R/S identities), have been used repeatedly in the literature. My models were well-powered for H1, and I relied solely on a discussion of effect size for H2 to circumvent Type II error issues. Despite these efforts, evidence for BABE simply did not emerge. Atheists reported comparable wellness to Catholics, Protestants, and Eastern Religion practitioners for H1, and while there were some differences between atheists and Protestants for H2, these were largely unimpressive and likely mired in a host of selection and omitted covariate confounds. My question is this—if R/S is associated with better health, why have I failed so spectacularly at detecting it?

Other Explanations for Null Findings

Some people may use R/S as a coping mechanism (Graham et al., 2001; Krause et al., 2001; Pargament, 2011; Pargament & Park, 1997), and it is possible that the current null findings can be explained by this. Perhaps sickly unbelievers will stop being atheists and will become religiously affiliated to cope with their illness. If true, this would artificially inflate the wellness of atheists as its sickest members would select out of atheism. In Table 2, the difference in SRH between atheists and Protestants was, t = 1.69, p = 0.091, b = 0.10, 95% CI [-0.02, 0.21], Cohen's d = 0.10, which was the largest health difference for the unrestricted analyses. Importantly, this difference would have to quintuple to reach a medium effect size (d = 0.10 * 5 = 0.50). By coincidence the b and d values are similar, so a mean difference of b = 0.50 will correspond to a Cohen’s d of a similar size. While I cannot test an ‘atheist exodus’ explanation directly (the data are cross-sectional), I can use a back-of-the-envelope approach to estimate how many sickly atheists I would have to add to the atheist group before the difference between atheists and Protestants increased from d = 0.10 to d = 0.50.

Let us assume as a starting point that atheists’ adjusted MSRH = 3.54 and that Protestants’ adjusted MSRH = 3.64 (b = 0.10). To hit a medium effect size, atheists’ adjusted MSRH would have to be reduced to 3.14 (making the mean difference b = 0.50). If we only added atheists with ‘Poor’ health to the atheist group (i.e., SRH = 1), we would have to introduce 85 sickly atheists to make atheists’ MSRH = 3.14. In other words, we would have to increase the size of the atheist group by ~ 20% to make the ‘atheist exodus’ explanation workable. I used a weighted mean formula (Vogt, 2015) to estimate this \(\left[ {3.14 = 3.54\left( {\frac{455}{{455 + x}}} \right) + 1\left( {\frac{x}{455 + x}} \right) \approx 85} \right]\), which made simplifying assumptions (e.g., coefficients would be the same), but is a good starting point to assess the plausibility of this explanation. Please, note that this is a conservative estimate: it assumed that each of these 85 atheists reported SRH = 1; if instead all atheists reported ‘Fair’ health (SRH = 2) then ~ 160 new atheists would be needed to produce the same effect. It is worth noting that self-identified atheists converting to religion for health benefits has not been studied: research on coping typically addresses religious individuals using R/S to assist in times of hardship. Also, if we are to take this explanation seriously, we would have to consider that some religious individuals may disaffiliate or experience an erosion of faith when ill (Ramfelt et al., 2002; Roger & Hatala, 2018; Rogers et al., 2012), which is something that would artificially inflate the wellness of those R/S groups. Overall, an ‘atheist exodus’ seems unlikely as an explanation for the observed results.

Another potential explanation for the observed null effects is that the R/S-health relationship may be curvilinear (i.e., U-shaped) instead of strictly linear. In these cases, people who are at the low or high ends of R/S tend to be comparable, while people in the middle values of R/S tend to report health differences. Interestingly, whether moderate levels of R/S are associated with better or worse health varies between studies. In some cases moderate levels of R/S are associated with worse health (Eliassen et al., 2005; Galen & Kloet, 2011; Wei & Liu, 2013) and other times moderate levels of R/S are associated with better health (Schnittker, 2001). It is plausible that the curvilinear relationship described is related to confidence—the extent to which a person expresses conviction in their (non)beliefs. Kitchens and Phillips (2021) found that people at the low and high ends of God beliefs tended to have greater clarity in self-concept. Although it is not definitive, atheists likely represent a more committed position than either agnosticism or simple nonreligion, and from this, the findings from the current study are perhaps unsurprising. For example, Baker et al. (2018) found that atheists tended to fare better than agnostics and unaffiliated theists with respect to wellness outcomes. Speed (2017) found that confident theists were more likely to glean benefits from R/S beliefs and behaviours than non-confident individuals. McCaffree (2019) notes that people who are more purposive in their nonreligious identity tend to be better-off than people who are incidentally nonreligious. Finally, Schnell and Keenan (2011) note that ‘low commitment atheists’ were more likely to experience crises of meaning relative to other non-believing or religious groups.

So, can the BABE hypothesis be rescued by invoking this curvilinear relationship? Well, no. Admittedly, the curvilinear hypothesis is intriguing because it can account for a variety of null findings including the ones in this study. Atheists are highly committed on the low-R/S end, hence their comparable wellness to committed Protestants, committed Catholics, and committed Eastern Religion practitioners. However, the thing ‘driving’ the health outcomes is not a specific R/S belief per se, it is the confidence in any specific R/S belief. This does not rescue the BABE hypothesis because it makes it abundantly clear that the underlying predictor of wellness is a commitment to a worldview not the content of said worldview. While R/S may be an incidental feature of many worldviews, it is not the key salutary aspect.

While my failure to find support for BABE may be a bizarre series of Type II errors, the current study combined with the broader literature discussed in Introduction (Baker et al., 2018; Speed, 2021; Hayward et al., 2016; Walker et al., 2021) suggests a more radical explanation: BABE does not exist in any meaningful sense. Alternatively, the extent to which BABE does exist is so unremarkable or narrow, that to treat R/S as broadly salutary is perplexing. While there may be very specific health benefits associated with R/S in specific cases (substance abuse perhaps), the framing of R/S as being generally salutary is unwarranted. Ignoring everything else, the fact that atheists routinely report comparable health outcomes to religious groups, suggests that the current understanding of ‘R/S is salutary’ is problematic.

Study Limitations

The current study has several limitations that should be explicitly noted. First, the use of archival data meant that I was limited in what outcomes I could assess. While SRS, SWL, SRH, and SRMH have been studied previously, using fully fledged measures of wellness would strengthen the conclusions made. Second, while R/S identities were parsed out, there would still be heterogeneity within them (e.g., Roman Catholics, Eastern Catholics, etc., would be grouped together), meaning that subdividing religious identities further may be more illuminating. Third, I was unable to control for social support or personality, both of which may be substantial contributors to BABE. However, this should have only strengthened BABE in the current study, rather than mask it (i.e., BABE is likely smaller than I have estimated). Fourth, the data from the current study are for Canadians and should not be generalized to other populations (e.g., Americans), but the criticisms of the BABE field are not limited to a Canadian context and are reflective of a more international discourse on the topic.

Conclusions

I will acknowledge that I may seem like a contrarian with respect to my position on BABE. After all, there are thousands of studies addressing the relationship between R/S and health, while often concluding the presence of salutary effects. Arguably, the current study did an adequate job defining, measuring, and testing the relationships between R/S and various health outcomes. Respondents self-identified into categories, R/S beliefs and behaviours were typical of the BABE field, and efforts were taken to maximize any potential differences between the nonreligious and the religious. Moreover, when exploring H1, there were high levels of power to detect less-than-small effects, particularly when comparing atheists to Protestants and Catholics. When exploring H2 there was a deliberate ‘gaming of the system’ to find support for BABE using a min/max approach. But, like other studies (Baker et al., 2018; Walker et al., 2021), my findings largely showed wellness parity across R/S categories. If atheists and nonatheists are comparably healthy, perhaps it is time to throw BABE out with the bathwater.