Key messages

  • Online convenience samples are susceptible to important sampling bias; however, the nature of this bias, and how it can be best adjusted for analytically, are two questions that research has yet to fully answer.

  • One bias in COVID-19 research is that participants may be more concerned about COVID-19 and hold more positive inclinations towards prevention measures (e.g., show higher vaccine acceptance) than those who do not participate.

  • Adjusting analyses for demographic variables (e.g., sex, age, education) is a common and theoretically useful strategy to deal with sampling bias (e.g., when informed by causal theory), but most research uses an atheoretical/unstructured approach to select covariates.

  • Adjusting analyses for demographic variables in an atheoretical/unstructured way may be unreliable to account for sampling bias; in nearly 17,000 models, we found covariates to reduce bias (i.e., discrepancies between convenience/representative samples) in 55% of cases and to increase bias in 45% of cases.

  • Researchers that use convenience samples should consider multiverse-style covariate analyses (i.e., extended sensitivity analyses)—as demonstrated in this paper—to examine how covariate selection impacts findings.

Introduction

In many research areas, the gold standard for recruiting participants is to use probability-based sampling to draw representative inferences for a given population [1,2,3]. Unfortunately, such efforts are often costly or unfeasible. Other methods, such as convenience-based sampling, are useful alternatives but can risk introducing significant sampling bias [4]. This concern has been particularly salient during the COVID-19 pandemic, as most COVID-19 research has relied on non-representative (e.g., convenience-based) observational samples [5,6,7].

A common, and in theory valid, approach to reduce the impact of sampling bias is to adjust analyses using covariates thought to influence study participation (e.g., adding covariates to a regression, using propensity scores, or sample weights) [8,9,10]. However, there remains substantial uncertainty about which factors drive participation (or lack thereof) and, therefore, how to adequately account for sampling bias. Commonly, researchers default to adjusting analyses for select demographic variables (e.g., sex, age, education), but the extent to which this practice has been successful is unknown. We address these ideas theoretically and empirically within the context of online COVID-19 behavioural and public health research.

Sampling bias: a non-technical explanation

Sampling bias occurs when different members of a population have unequal probabilities of being included in a study. This can occur for many reasons, such as when recruitment strategies have unequal reach for different groups, or when groups, once reached, differ in their response rates. Sampling bias can impact estimates of prevalence/incidence rates as well as of the link between exposure-outcome pairs. To understand sampling bias, and how to counter it, we can represent the phenomenon using causal diagrams such as Panel A of Fig. 1 [11, 12].

Fig. 1
figure 1

Examples of sampling bias and the roles of covariates. The black square around selection indicates that analyses are limited to individuals who participated (either through selection by a study’s design or through self-selection). Selection is a collider, a common effect of other variables. Panel A is an example of sampling bias shown through a causal diagram. Here, recruitment strategy (convenience vs. probability), along with an exposure (sex) and an outcome (vaccine acceptance) each influence a person’s selection into a study. Though the three variables are not causally linked, conditioning on selection (a collider) leads them to be associated through a process known as collider bias. Panel B is a simulated example of the dynamic in Panel A where 50% of a population is accepting of vaccination, and this ratio is equivalent for male and female individuals. However, both vaccine acceptance and sex predict selection. Having data only from people who participate will lead an analyst to overestimate vaccine acceptance and see a spurious association between sex and vaccine acceptance such that female (vs. male) participants show lower levels of acceptance (also see Box 1). Panel C provides example roles covariates can play in an association between an outcome (vaccine acceptance) and selection. Adjusting analyses for a mediator (confirmation seeking) or a confounder (education) can both reduce sampling bias. However, adjusting for another collider (employment) can introduce further collider bias. Thus, analysts must be mindful of the causal role of covariates in relation to their exposure-outcome links of interest

Using Fig. 1, consider an illustrative example. Imagine we are conducting research to estimate rates of COVID-19 vaccine acceptance (i.e., our outcome, defined as vaccine receipt or intentions to get vaccinated) in a community, and wish to explore how sex (our exposure) influences acceptance. In this study, we compare two recruitment strategies: a convenience-based and a probability-based sampling method. Importantly, our analyses are restricted to the selection of responses we obtain (we have no data on non-respondents). How might we expect sampling to affect our findings under these circumstances?

First, recruitment strategies will influence the selection of responses we obtain (path p1). In an ideal probability-based design, all population members would have an equal likelihood of being reached and efforts would be made (e.g., using incentives) to ensure high participation rates. In contrast, in convenience-based samples, reach is usually skewed towards certain groups (e.g., social media users for a study advertised on social media) and small/absent incentives can skew participation further [13, 14]. Second, participant characteristics can impact responses, either in conjunction with, or independently from recruitment strategy. For our example, research has shown that female (vs. male) individuals are more likely to volunteer for research (path p2) [15, 16] and we could anticipate that people are more likely to participate in vaccine-related research if they hold favourable attitudes towards vaccines (due to human tendencies to seek information in line with pre-existing beliefs; path p3) [17, 18].

The result of these forces (of paths p1, p2, p3) is that we will observe a series of biased findings if we attempt to use the convenience sample to draw inferences about the population (compared to using the probability sample). Specifically, we will overestimate the degree to which participants are female and accepting of vaccines (paths p4 and p5) and will also spuriously find that vaccine acceptance is lower among female (vs. male) participants (path p6)—even if, in the overall population, no such association exists (see Box 1). These three biased findings are spurious and are manifestations of sampling bias.

Box 1 Three ways of understanding how collider bias creates spurious findings

Generally, sampling bias emerges through a process known as collider bias, whereby an association is induced (or distorted) between two variables because analyses are conditioned on a common outcome of those variables—known as a collider [5, 11, 12, 19]. In Fig. 1, selection is a collider (a common outcome) of recruitment strategy, sex, and vaccine acceptance. Conditioning analyses on selection—by limiting analyses to participants—is at the root of the biased observations (paths p4, p5 and p6). An example of this dynamic is provided in Panel B of Fig. 1, demonstrating how limiting analyses to participants induces a spurious (and inverse) association between sex and vaccine acceptance (path p6). Given the importance of collider bias for understanding sampling bias, Box 1 provides three ways of conceiving/understanding this concept.

How to reduce/eliminate sampling bias

Of central interest to researchers is the question: how can we reduce or eliminate (the effects of) sampling bias? One way is to rely on representative sampling, but this will often be unfeasible and sometimes even undesirable [20, 21]. Alternatively, we can disrupt the dynamic that leads to sampling bias analytically by using covariates within statistical models (e.g., adding covariates to a regression, or by using propensity scoring) [8,9,10]. For instance, in Fig. 1, the spurious path p6 (between sex and vaccine acceptance) occurs because analyses are conditioned on selection (Box 1). If we can analytically keep selection from acting as a collider, we can eliminate this bias. To do so, we can disrupt path p3, so that there is no effect from vaccine acceptance to selection (i.e., in the absence of p3, selection is no longer a common cause of sex and vaccine acceptance) or disrupt path p2 so that there is no effect from sex to selection. Likewise, we can also eliminate the spurious paths p4 (or p5) by disrupting the causal effects p1 and p2 (or p1 and p3). Unfortunately, identifying covariates for these tasks is easier said than done. In practice, covariates play a multitude of causal roles, each of which have unique implications for disrupting/amplifying paths leading to selection.

Panel C of Fig. 1 demonstrates this complexity. If a causal link exists between an outcome and a collider (p3 from Panel A), adjusting for a variable that accounts for this causal link (a mediator) can reduce sampling bias. In our example, we reasoned that vaccine acceptance would cause self-selection because people seek attitude-confirming information. Thus, we could measure and adjust for confirmation-seeking behaviour. To fully disrupt the association between an outcome and self-selection, however, we should also adjust for confounders. In Panel C, higher education promotes participation in research [15, 22] and greater vaccine acceptance [23]; education should therefore be adjusted for. That said, one should also avoid adjusting for additional colliders as doing so can introduce further collider bias. For example, if vaccine mandates exist for employment [24, 25] (i.e., vaccination predicts employment) and certain personality factors like conscientiousness facilitate both survey participation [26] and employment [27], then adjusting for employment may increase bias. Consequently, researchers must be very careful in their choice of covariates (and similar cautions could be made for disrupting any causal pathway in Fig. 1; i.e., p1, p2, or p3).

These concerns are not novel, and many articles give guidance on how to use causal theory/diagrams to select covariates [4, 5, 11, 12, 19]. Unfortunately, systematic reviews find that it remains rare for research to adequately justify covariate selection choices, especially by using a causal perspective [28,29,30,31,32]. Instead, researchers frequently rely on heuristics/norms (e.g., always adjusting for demographics variables like sex, age, socioeconomic status), focus on variables for which population-data is readily accessible (also typically demographic variables), use all available covariates in their data, or rely on simple statistical rules such as controlling for any covariate known to relate to either the exposure or the outcome [30,31,32,33,34]—with each of these criteria failing to distinguish between confounders, mediators, and colliders [11, 12].

Researchers also vary widely in their selection of covariates even when examining similar research questions [32,33,34,35]. For instance, nutritional epidemiology work studying the same outcomes rarely adjust for the same sets of covariates [32]. This issue was particularly well-captured in two methodological studies [34, 35] which recruited 29 and 120 research teams, respectively, and tasked teams to independently answer the same research question using the exact same dataset. In both studies, most teams opted for unique selections of covariates (distinct from all other teams). Clearly, there is much uncertainty as to which covariates investigators should and shouldn’t include in analyses, and relatedly, as to whether most covariate choices in the literature are useful for attenuating bias.

Goals of the current study

Being able to identify and adjust for sampling bias is an important goal for science. This is particularly true in contexts like the COVID-19 pandemic, when urgency in decision-making can allow biased findings to have undue repercussions on scientific/public discourse and on policy making [5,6,7]. With this in mind, we set out with two primary goals.

First, we sought to inform future efforts to attenuate sampling bias by qualifying who gets recruited through online convenience sampling in COVID-19 research. Given research on selective-exposure to attitude-congruent information [17, 18], we hypothesised that participants recruited using convenience methods (versus those recruited through more representative means) would display higher levels of concerns about COVID-19, hold beliefs that prevention behaviours are more important, and show greater adherence to behavioural recommendations (e.g., social distancing, mask wearing, vaccination).

Second, given that adjusting analyses for demographic covariates (e.g., adding variables in a regression) is a common method for addressing sampling bias, we sought to evaluate the frequency with which this technique successfully accounts for and attenuates sampling bias within online surveys. To account for how researchers make different choices on which covariates to adjust for, we made use of multiverse analyses [36, 37], an analytical perspective that urges analysts to evaluate how all plausible study choices can influence their results (i.e., by running and reporting results for all analytic choices they could have justifiably made). In our case, this entailed evaluating the degree to which all combinations of a set of plausible and common demographic covariates (e.g., sex, age, education) were successful in attenuating sampling bias in a set of convenience samples.

Methods

This project (e.g., hypotheses, analyses) was registered a priori on the Open Science Framework (https://osf.io/f2pj6), and a project page hosts supplemental files (https://osf.io/dp9kq/).

Data source

We used three online convenience samples (N = 3225; 884; 609) and three largely representative web-panel samples (N = 3003; 3005; 3005) of Canadians recruited over three time periods in 2020 (summarized in relation to the pandemic in Fig. 2). These data represent cross-sectional surveys that were deployed as part of the International COVID-19 Awareness and Responses Evaluation (iCARE; www.icarestudy.com) Study [38]. The convenience-based samples consisted of unpaid volunteers recruited using a combination of online advertising (by iCARE team members) and snowball sampling (e.g., encouraging participants to share the survey within their own networks). In contrast, web panel participants were paid and recruited through Léger, a polling and marketing firm that is commonly employed by researchers aiming to recruit representative samples of Canadians [39, 40]. Participants were drawn from Léger’s LEO panel, a panel of over 400,000 Canadians that was predominantly constructed using probability-based sampling methods (e.g., random-digit dialling) [41]. Additional details on the recruitment/sampling used for the current project are available in the supplemental files (Section 1), as well as through other iCARE-related publications [38, 42, 43].

Fig. 2
figure 2

Contextualized timeline for our six samples, describing date (x-axis) and the number of COVID-19 cases detected in Canada (y-axis). T1 = Time 1; T2 = Time 2; T3 = Time 3. Surveys were conducted in 2020. Survey distribution began during the first wave of COVID-19 infections in Canada, and the third set of surveys occurred during an early portion of the second wave. Data to plot cases were obtained from the Government of Canada’s Public Health Infobase

Measures

Our predictor variable of interest was the type of sample participants were recruited from (convenience vs. web panel). We analysed differences between samples on the 11 outcome variables summarized in Table 1. These were selected and registered in line with the first goal of this article and included participants’: pandemic-related concerns (e.g., about getting infected, losing one’s ability to earn income); adherence to various preventative behaviours (e.g., mask wearing); and intentions to get vaccinated against COVID-19. For our multiverse analyses, we examined the influence of nine covariates that were consistently measured across surveys. These included participants’: province of residence; age; sex; highest education level attained; employment status pre-COVID; student status; parent status; perceived relative household income; and ethnic identity. These were selected as each of these factors has previously been associated to sampling bias in online research [15, 16, 44,45,46,47]. A detailed account of how each outcome and covariate was assessed is provided in the supplemental files.

Table 1 Summary of Outcome Measures Evaluated (Full Measures in Supplemental Files)

Analyses

Sampling bias was operationalized as the discrepancy in results between the convenience samples and the web-panel samples. We conducted simple (unadjusted) linear regressions to identify such discrepancies on each outcome variable per time point. An alpha of 0.01 was chosen to be conservative when making inferences (see registration for rationale). Given that some outcomes were assessed using single Likert-type items, we also computed ordered logistic regressions; the results were equivalent to the regression-based models and are reported in the supplemental files.

Change in bias due to covariate adjustments was operationalized as reductions/increases in the discrepancy between the sample types (convenience vs. web panel) in adjusted models compared to their unadjusted counterparts. We employed specification curves, a type of multiverse analysis that use caterpillar plots and other visual tools to examine how data-analytic choices impact estimates of interest [48]. In our case, for each outcome (at each time point), 512 unique models could be specified. These ranged from regressions with no covariate-based adjustments to regressions that adjusted for all covariates. To reflect how using covariates typically operates in practice, we further specified our models according to normative practice in the field: we used an alpha of 0.05 to compute inferential statistics, and refrained from modelling higher-level terms (e.g., interactions) between covariates. Although we limit our analyses to regression models, our procedure should generally produce convergent results with other common methods to deal with sampling bias, such as the use of sample weights derived from the same set of covariates (e.g., using raking or propensity score-based methods [49,50,51,52]).

All analyses were conducted using R version 4.1.0 [53]. Specification curves used the specr and rdfanalysis packages [54, 55]. Our analysis code is available on our project page.

Results and interpretations

Sample demographics

Table 2 presents demographic information on our samples and compares them to the 2016 Canadian census. Overall, the web panels were generally similar in composition to the Canadian population—unsurprising, as Léger panels were explicitly designed to reflect the Canadian population on attributes like sex, age, and region. However, there was some overrepresentation of individuals that were more educated, English-speaking, and of European descent or White. As expected, discrepancies between the census and the convenience samples were considerably larger. The convenience samples consistently and strongly overrepresented participants from Quebec, that were female, spoke French, were highly educated, and were of European descent or White.

Table 2 Demographic Distribution of the Samples (Presented as Percentages)

Evaluating overall bias on each outcome

Figure 3 presents a forest plot of our inferential results (i.e., unadjusted regressions), evaluating the overall discrepancy between the convenience and web-panel surveys on each outcome. Overall, 24 of 33 tests (73%) indicated significant discrepancies between the samples. Several outcomes were consistent in the direction of these discrepancies over time, with participants in the convenience sample reporting prevention measures as more important, being less concerned about the economy and their personal livelihood, being more likely to self-quarantine, and having higher intentions to get vaccinated. Other variables shifted in the direction of the discrepancy across time points—e.g., participants in the convenience sample reported wearing masks at a lower frequency at Time 1, but at a higher frequency at times 2 and 3. Section 6 of the supplemental file presents the distribution of responses for each outcome and can be used to contextualize effects from Fig. 3.

Fig. 3
figure 3

Inferential results (unadjusted regression models) evaluating sampling discrepancies between the convenience and web-panel surveys on each outcome (reference group is the web panel). N = Sample size; Est = unstandardized estimate; CI = confidence interval; d = Cohen’s d; R2 = R2 coefficient of determination; T1 = Time 1; T2 = Time 2; T3 = Time 3; C = concerns; B = Behaviour. Plot created using the forestplot package in R [72]

How frequently did covariates reduce sampling discrepancies?

Figures 4 and 5 summarize our specification curve analyses and display how discrepancies between the convenience and web panel surveys varied as a function of 512 combinations of covariates. Each plot (i.e., panel) within Figs. 4 and 5 indicates findings for one outcome at a given time point. Each plot also indicates the percent of adjusted models (those that control for covariates) that found smaller estimated discrepancies (i.e., our index of sampling bias) relative to their corresponding unadjusted models. Overall, adjusted models reduced sampling discrepancies 55% of the time, and increased discrepancies 45% of the time. However, there was substantial variation across outcomes. We organize these into three patterns (denoted by circled numbers in Figs. 4 and 5).

Fig. 4
figure 4

Ordered caterpillar plots summarizing specification curve analyses. Plots were created using the specr package in R [54]. Instructions for reading the plots are provided at the bottom

Fig. 5
figure 5

Ordered caterpillar plots summarizing specification curve analyses (continued). Plots were created using the specr package in R [54]. Figure 4 provides instructions for reading the plots

Pattern 1 For 33% of cases (i.e., 11 of the 33 panels across Figs. 4 and 5, with each panel indicating a particular outcome at a given time point), fewer than 25% of adjusted models showed smaller sampling discrepancies relative to their unadjusted counterparts. This pattern was most apparent for hand washing across all three time points (Fig. 4). For these cases, a large majority of covariate combinations increased sampling discrepancies (i.e., bias), frequently leading what was initially a non-significant discrepancy (in unadjusted models) to become significant.

Pattern 2 For 39% of cases (three panels in Fig. 4 and nine panels in Fig. 5), between 25–75% of adjusted models showed reduced sampling discrepancies relative to their unadjusted counterparts. This was especially apparent for vaccine intentions across all three time points (Fig. 5). For these, the inclusion of covariates could frequently reduce or increase sampling discrepancies, but often made little difference in in changing the significance level from that observed in the unadjusted models (e.g., the convenience sample displayed substantially higher vaccine intentions than the web-panel sample regardless of which covariates were adjusted for).

Pattern 3 Finally, for only 27% of cases (two panels in Fig. 4 and eight panels in Fig. 5) did 75% or more of adjusted models lead to reduced sampling discrepancies relative to their unadjusted counterparts. This applied to avoiding social gatherings across all three time points (Fig. 5). For these, covariates could reduce an initially significant discrepancy (in unadjusted models) to nonsignificance, but discrepancies also frequently persisted across many combinations of covariates.

Were there covariates that consistently reduced discrepancies?

In addition to Figs. 4 and 5, our supplemental file (Section 7) provides plots that depict which covariates were adjusted for in any given model. Our project page complements this with tables of results for all 16,896 models computed. Using these, we examined the consistency with which each covariate decreased/increased discrepancies.

When each covariate was adjusted for in isolation, income reduced discrepancies in 73% of cases, but increased discrepancies for the remaining 27%. Other covariates increased estimated discrepancies for 45% (province and education), 48% (ethnicity and sex), 55% (age), 64% (employment status), 67% (parental status), and 73% (student status) of cases. If we consider any combination that includes a given covariate (e.g., income by itself or with any combination of other covariates), each covariate increased discrepancies in between 40–45% of cases. When all nine covariates were adjusted for simultaneously, this performed better, but still increased discrepancies in 33% of cases.

Notably, a given covariate could have drastic and inconsistent effects on estimates across outcomes and time points. For example, if we examine concerns about the economic impact of the pandemic at Time 2 (Fig. 5), we see a sudden shift in the plot such that half of models showed substantially larger negative estimates than the other half. This is almost entirely attributable to province being adjusted for: the smaller (less negative) half of estimates adjusted for province, whereas the larger (more negative) half did not. Now, consider hand washing at Time 2 (Fig. 4). Here, the reverse pattern occurred: the larger (more negative) half of estimates adjusted for province, whereas the smaller (less negative) half did not. Importantly, we also see important shifts within outcomes. For instance, when examining concerns about oneself being infected at Time 1, close to half of estimates are non-significant and close to half are significant and negative. The former generally adjust for province and the latter do not. The reverse is generally true at Time 2: the significant models adjust for province and non-significant models do not.

Discussion

In this work, we sought to: (1) better understand the effects of sampling bias in online COVID-19 research; and (2) examine the degree to which adjusting analyses for demographic covariates can successfully attenuate such bias. What did we find?

Convenience participants were more favourably disposed towards engaging in COVID-19 prevention behaviours

Significant discrepancies emerged between the online convenience and web-panel surveys on over two thirds of outcomes (averaging d = 0.21). For example, vaccine intentions were considerably higher in the convenience sample at all three time points relative to the web-panel, with 13–18% more participants indicating they would be “extremely likely” to get the vaccine. Such discrepancies are of an important magnitude and are larger than many effects listed as take-away messages from studies using convenience samples (e.g., difference in intentions between subgroups [56, 57]). This highlights the importance of taking care not to overgeneralize when using convenience samples and provides valuable information on how researchers can restrain their inferences (e.g., by recognizing that convenience-sample-based estimates of vaccine intentions could be inflated).

Documenting these descriptive patterns is useful, but how do we make sense of them? At the onset of this project, we reasoned that participants recruited using convenience-based methods would show more positive dispositions towards COVID-19 prevention measures than would participants recruited using more representative means. Indeed, participants in the convenience samples rated prevention measures as more important, engaged in more social distancing, self-quarantining, and avoidance of gatherings, and displayed stronger intentions to get vaccinated. These effects all align well with our hypotheses and the notion that individuals engage in selective exposure when deciding which studies to engage in (e.g., volunteering for topics they approve of [17]). This would suggest that to better reduce sampling bias, studies should assess and account for these associations. This could be by assessing people’s attitudes and adjusting for them statistically, or altering study designs to disrupt selective-exposure effects, such as by having the key topic of a study be less obvious during marketing. In making such choices, researchers should also consider carefully which variables act as mediators and confounds of the link between prevention behaviours and study participation. For example, participants who engage more frequently in prevention behaviours may be healthier and in a better position to engage in research—see the healthy user bias [58].

In contrast to our findings on behavioural outcomes, sampling discrepancies for participants’ concerns towards the pandemic were more varied. Participants in the convenience sample endorsed higher concerns for others being infected (in line with expectations) but fewer concerns about the economy and their personal livelihoods. These latter findings were unexpected, but could have arisen due to unmeasured confounds. For example, those with a neurotic personality may experience greater concerns but be less disposed to engage in research [26]. Additionally, affluent (e.g., White, educated) participants are also more likely to participate in research (as evidenced in our samples), but should generally be less concerned about their finances/livelihood. These possibilities highlight the complex nature of sampling bias, and further emphasize the need for research to think more carefully about confounds, mediators, and colliders when adjusting for sampling bias (i.e., Fig. 1, Panel C).

The performance of demographic covariates in attenuating sampling discrepancies was often poor and variable

The use of demographic covariates in analytic models is a common technique to account for sampling bias. However, across nearly 17,000 models, we found that the inclusion of demographic covariates reduced sampling discrepancies only 55% of the time—barely above chance level. Further, no individual covariate (used either in isolation or in combination with others) consistently reduced discrepancies. In fact, the effects of covariates were highly variable even within outcomes and there were many cases (e.g., vaccine intentions) for which no combination of covariates was sufficient to meaningfully attenuate sampling discrepancies. Certain demographic covariates even increased sampling bias in a systematic way (e.g., student status substantially more frequently increased than decreased sampling discrepancies).

These findings suggest that consistently following rules of thumb for covariate selection (e.g., always adjusting for sex or age) or simply including a subset of demographic characteristics that happen to be measured in a study are likely unreliable strategies for reducing sampling bias. General caution, along with a critical outlook, is therefore advised when using demographic variables as covariate variables.

That said, we do not suggest that efforts to reliably adjust for sampling bias using covariates is a profitless endeavour. Indeed, although we found that including all nine covariates increased sampling discrepancies 33% of the time, this was a better performance than most models adjusting for fewer covariates. Consequently, it is possible that adjusting for demographics could become more successful when a very large number of covariates are included in models. Future research could examine this possibility, along with whether modelling higher order effects (e.g., interaction terms between covariates) could also help attenuate sampling bias. It will also be important for research to examine the degree to which the patterns we report vary when using other types of sampling methods (e.g., in-person recruitment methods) as sampling methods may often interact in unique ways with participant characteristics (e.g., online studies may underrepresent those with less technological expertise, whereas in-person studies may underrepresent individuals with reduced physical mobility). While such studies are underway, researchers can consider several other tools at their disposal to deal with sampling bias.

Recommendations for dealing with sampling bias

One way to reduce sampling bias is through design-based methods. One may, for instance, use probability-based sampling to improve reach within a population. However, as noted in our introduction, such methods are not always feasible or optimal (e.g., some populations are better reached through non-probability methods [59, 60]), and certain research goals can supersede the need for representativeness (e.g., a researcher may choose purposive sampling when the goal is maximizing diversity of views/experiences) [20, 21]. Other tools may include reducing selective participation through stronger monetary incentives or by mandating participation [13, 14, 61], but both methods can also have barriers and drawbacks to consider [62, 63].

On the analytic side, causal diagrams (e.g., Fig. 1) are a tool that have, over the last few years, emerged as a gold-standard for understanding and determining how to best analytically handle bias in research (including sampling bias) [4, 5, 11, 12, 19]. Importantly, causal diagrams can help researchers pinpoint which covariates can help maximize the validity of inferences, while also helping better plan studies in the design phase. An important insight from the use of causal diagrams is that there is likely no single “correct” set of covariates that can be used across all analyses. Each outcome (and outcome-exposure link) should have its own covariates (and causal diagram) to avoid introducing error and bias (e.g., see discussions on the Table 2 Fallacy, unnecessary adjustment, and overadjustment) [64,65,66]. To this, our findings further suggest that analysts may also wish to explicitly account for time-specific influences—as we found the role of covariates to differ substantially in their effects across time points even for the same outcome. Adding this type of specificity to causal diagrams could help researchers further reduce the effects of sampling bias.

Unfortunately, in many research areas (e.g., in medical and behavioural sciences), it is often difficult for theories to outline causal factors in enough details to delineate complete causal diagrams. In such cases, researchers can consider a final option; that is, examining the robustness of their findings using multiverse-type analyses—as demonstrated within the current works—as a form of extended sensitivity analyses. Multiverse analyses are not only explicitly designed to help researchers handle and understand ambiguities in analysis-based decisions, but the development of new multiverse-type tools/perspectives continues to be an area of burgeoning methodological advancements, and many resources now exist for interested readers to learn more about these approaches [36, 37, 48, 67]. However, in relying on multiverse analyses, it will be important to remember that compared to causal diagrams, multiverse analyses cannot inform which estimate is the most causally valid. Rather, this approach is used to verify that one’s inferences are not limited to only a subset of possible analyses, and to quantify the degree to which largely arbitrary choices (between plausible alternatives) influence inferences.

Strengths and limitations

There are a few constraints that warrant consideration when interpreting our findings. First, our study was conducted in a very specific context: Canada during the COVID-19 pandemic. Examining sampling bias in other countries and contexts is therefore warranted. Second, although Léger constructs their web panels using methods such as random-digit dialling, the samples we obtained from these panels were not fully representative of the Canadian population and this could have skewed findings—e.g., if similar but less pronounced biases (as observed in the convenience samples) affected the web panels, our results may generally underestimate bias. Third, we acknowledge that many methods exist to obtain convenience samples and that our analyses were specific to volunteer-based online recruitment methods. Other methods (e.g., in-person, or print-based recruitment techniques) can have idiosyncratic biases [68] such that specific variables (e.g., sex, age, health beliefs) may vary in how they operate to generate (and reduce) bias. Future works will need to parse out such patterns. Fourth, our data was cross-sectional and our findings ultimately still conditioned on self-selection into the study. Consequently, care should be taken when inferring causation; for instance, we cannot infer that vaccine intentions cause self-selection into studies (e.g., as in Fig. 1’s path p3), nor can we infer that the effects of adjusting for covariates operated through causal links. That said, our unadjusted models can still provide good estimates of sampling bias if sampling bias is taken to be entirely spurious associations between sampling and outcomes (akin to path p5 in Fig. 1). Lastly, our multiverse analyses treated getting an accurate estimate from a convenience sample (one equal in magnitude/direction to an estimate from a representative sample) as the goal when reducing sampling bias. This was a simplification. Although removing sampling bias would achieve this, so would aggregating divergent biases that so happen to average to the population value. Our analyses cannot tease these scenarios apart.

Finally, our study also has several strengths to consider. Notably, this is the first empirical study to use multiverse style analyses to understand how covariate selection influences estimates produced across sampling methods. Our analyses were also registered a priori and we used large samples across three distinct time points. This contrasts with previous empirical works on sampling bias, which have not been registered, have relied on smaller samples collected over single time points, and have usually examined the influence of a single set of covariates at a time [15, 16, 22, 26, 69,70,71]. Consequently, our findings are more likely to generalize than past efforts.