Abstract
COVID-19 research has relied heavily on convenience-based samples, which—though often necessary—are susceptible to important sampling biases. We begin with a theoretical overview and introduction to the dynamics that underlie sampling bias. We then empirically examine sampling bias in online COVID-19 surveys and evaluate the degree to which common statistical adjustments for demographic covariates successfully attenuate such bias. This registered study analysed responses to identical questions from three convenience and three largely representative samples (total N = 13,731) collected online in Canada within the International COVID-19 Awareness and Responses Evaluation Study (www.icarestudy.com). We compared samples on 11 behavioural and psychological outcomes (e.g., adherence to COVID-19 prevention measures, vaccine intentions) across three time points and employed multiverse-style analyses to examine how 512 combinations of demographic covariates (e.g., sex, age, education, income, ethnicity) impacted sampling discrepancies on these outcomes. Significant discrepancies emerged between samples on 73% of outcomes. Participants in the convenience samples held more positive thoughts towards and engaged in more COVID-19 prevention behaviours. Covariates attenuated sampling differences in only 55% of cases and increased differences in 45%. No covariate performed reliably well. Our results suggest that online convenience samples may display more positive dispositions towards COVID-19 prevention behaviours being studied than would samples drawn using more representative means. Adjusting results for demographic covariates frequently increased rather than decreased bias, suggesting that researchers should be cautious when interpreting adjusted findings. Using multiverse-style analyses as extended sensitivity analyses is recommended.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Key messages
-
Online convenience samples are susceptible to important sampling bias; however, the nature of this bias, and how it can be best adjusted for analytically, are two questions that research has yet to fully answer.
-
One bias in COVID-19 research is that participants may be more concerned about COVID-19 and hold more positive inclinations towards prevention measures (e.g., show higher vaccine acceptance) than those who do not participate.
-
Adjusting analyses for demographic variables (e.g., sex, age, education) is a common and theoretically useful strategy to deal with sampling bias (e.g., when informed by causal theory), but most research uses an atheoretical/unstructured approach to select covariates.
-
Adjusting analyses for demographic variables in an atheoretical/unstructured way may be unreliable to account for sampling bias; in nearly 17,000 models, we found covariates to reduce bias (i.e., discrepancies between convenience/representative samples) in 55% of cases and to increase bias in 45% of cases.
-
Researchers that use convenience samples should consider multiverse-style covariate analyses (i.e., extended sensitivity analyses)—as demonstrated in this paper—to examine how covariate selection impacts findings.
Introduction
In many research areas, the gold standard for recruiting participants is to use probability-based sampling to draw representative inferences for a given population [1,2,3]. Unfortunately, such efforts are often costly or unfeasible. Other methods, such as convenience-based sampling, are useful alternatives but can risk introducing significant sampling bias [4]. This concern has been particularly salient during the COVID-19 pandemic, as most COVID-19 research has relied on non-representative (e.g., convenience-based) observational samples [5,6,7].
A common, and in theory valid, approach to reduce the impact of sampling bias is to adjust analyses using covariates thought to influence study participation (e.g., adding covariates to a regression, using propensity scores, or sample weights) [8,9,10]. However, there remains substantial uncertainty about which factors drive participation (or lack thereof) and, therefore, how to adequately account for sampling bias. Commonly, researchers default to adjusting analyses for select demographic variables (e.g., sex, age, education), but the extent to which this practice has been successful is unknown. We address these ideas theoretically and empirically within the context of online COVID-19 behavioural and public health research.
Sampling bias: a non-technical explanation
Sampling bias occurs when different members of a population have unequal probabilities of being included in a study. This can occur for many reasons, such as when recruitment strategies have unequal reach for different groups, or when groups, once reached, differ in their response rates. Sampling bias can impact estimates of prevalence/incidence rates as well as of the link between exposure-outcome pairs. To understand sampling bias, and how to counter it, we can represent the phenomenon using causal diagrams such as Panel A of Fig. 1 [11, 12].
Using Fig. 1, consider an illustrative example. Imagine we are conducting research to estimate rates of COVID-19 vaccine acceptance (i.e., our outcome, defined as vaccine receipt or intentions to get vaccinated) in a community, and wish to explore how sex (our exposure) influences acceptance. In this study, we compare two recruitment strategies: a convenience-based and a probability-based sampling method. Importantly, our analyses are restricted to the selection of responses we obtain (we have no data on non-respondents). How might we expect sampling to affect our findings under these circumstances?
First, recruitment strategies will influence the selection of responses we obtain (path p1). In an ideal probability-based design, all population members would have an equal likelihood of being reached and efforts would be made (e.g., using incentives) to ensure high participation rates. In contrast, in convenience-based samples, reach is usually skewed towards certain groups (e.g., social media users for a study advertised on social media) and small/absent incentives can skew participation further [13, 14]. Second, participant characteristics can impact responses, either in conjunction with, or independently from recruitment strategy. For our example, research has shown that female (vs. male) individuals are more likely to volunteer for research (path p2) [15, 16] and we could anticipate that people are more likely to participate in vaccine-related research if they hold favourable attitudes towards vaccines (due to human tendencies to seek information in line with pre-existing beliefs; path p3) [17, 18].
The result of these forces (of paths p1, p2, p3) is that we will observe a series of biased findings if we attempt to use the convenience sample to draw inferences about the population (compared to using the probability sample). Specifically, we will overestimate the degree to which participants are female and accepting of vaccines (paths p4 and p5) and will also spuriously find that vaccine acceptance is lower among female (vs. male) participants (path p6)—even if, in the overall population, no such association exists (see Box 1). These three biased findings are spurious and are manifestations of sampling bias.
Generally, sampling bias emerges through a process known as collider bias, whereby an association is induced (or distorted) between two variables because analyses are conditioned on a common outcome of those variables—known as a collider [5, 11, 12, 19]. In Fig. 1, selection is a collider (a common outcome) of recruitment strategy, sex, and vaccine acceptance. Conditioning analyses on selection—by limiting analyses to participants—is at the root of the biased observations (paths p4, p5 and p6). An example of this dynamic is provided in Panel B of Fig. 1, demonstrating how limiting analyses to participants induces a spurious (and inverse) association between sex and vaccine acceptance (path p6). Given the importance of collider bias for understanding sampling bias, Box 1 provides three ways of conceiving/understanding this concept.
How to reduce/eliminate sampling bias
Of central interest to researchers is the question: how can we reduce or eliminate (the effects of) sampling bias? One way is to rely on representative sampling, but this will often be unfeasible and sometimes even undesirable [20, 21]. Alternatively, we can disrupt the dynamic that leads to sampling bias analytically by using covariates within statistical models (e.g., adding covariates to a regression, or by using propensity scoring) [8,9,10]. For instance, in Fig. 1, the spurious path p6 (between sex and vaccine acceptance) occurs because analyses are conditioned on selection (Box 1). If we can analytically keep selection from acting as a collider, we can eliminate this bias. To do so, we can disrupt path p3, so that there is no effect from vaccine acceptance to selection (i.e., in the absence of p3, selection is no longer a common cause of sex and vaccine acceptance) or disrupt path p2 so that there is no effect from sex to selection. Likewise, we can also eliminate the spurious paths p4 (or p5) by disrupting the causal effects p1 and p2 (or p1 and p3). Unfortunately, identifying covariates for these tasks is easier said than done. In practice, covariates play a multitude of causal roles, each of which have unique implications for disrupting/amplifying paths leading to selection.
Panel C of Fig. 1 demonstrates this complexity. If a causal link exists between an outcome and a collider (p3 from Panel A), adjusting for a variable that accounts for this causal link (a mediator) can reduce sampling bias. In our example, we reasoned that vaccine acceptance would cause self-selection because people seek attitude-confirming information. Thus, we could measure and adjust for confirmation-seeking behaviour. To fully disrupt the association between an outcome and self-selection, however, we should also adjust for confounders. In Panel C, higher education promotes participation in research [15, 22] and greater vaccine acceptance [23]; education should therefore be adjusted for. That said, one should also avoid adjusting for additional colliders as doing so can introduce further collider bias. For example, if vaccine mandates exist for employment [24, 25] (i.e., vaccination predicts employment) and certain personality factors like conscientiousness facilitate both survey participation [26] and employment [27], then adjusting for employment may increase bias. Consequently, researchers must be very careful in their choice of covariates (and similar cautions could be made for disrupting any causal pathway in Fig. 1; i.e., p1, p2, or p3).
These concerns are not novel, and many articles give guidance on how to use causal theory/diagrams to select covariates [4, 5, 11, 12, 19]. Unfortunately, systematic reviews find that it remains rare for research to adequately justify covariate selection choices, especially by using a causal perspective [28,29,30,31,32]. Instead, researchers frequently rely on heuristics/norms (e.g., always adjusting for demographics variables like sex, age, socioeconomic status), focus on variables for which population-data is readily accessible (also typically demographic variables), use all available covariates in their data, or rely on simple statistical rules such as controlling for any covariate known to relate to either the exposure or the outcome [30,31,32,33,34]—with each of these criteria failing to distinguish between confounders, mediators, and colliders [11, 12].
Researchers also vary widely in their selection of covariates even when examining similar research questions [32,33,34,35]. For instance, nutritional epidemiology work studying the same outcomes rarely adjust for the same sets of covariates [32]. This issue was particularly well-captured in two methodological studies [34, 35] which recruited 29 and 120 research teams, respectively, and tasked teams to independently answer the same research question using the exact same dataset. In both studies, most teams opted for unique selections of covariates (distinct from all other teams). Clearly, there is much uncertainty as to which covariates investigators should and shouldn’t include in analyses, and relatedly, as to whether most covariate choices in the literature are useful for attenuating bias.
Goals of the current study
Being able to identify and adjust for sampling bias is an important goal for science. This is particularly true in contexts like the COVID-19 pandemic, when urgency in decision-making can allow biased findings to have undue repercussions on scientific/public discourse and on policy making [5,6,7]. With this in mind, we set out with two primary goals.
First, we sought to inform future efforts to attenuate sampling bias by qualifying who gets recruited through online convenience sampling in COVID-19 research. Given research on selective-exposure to attitude-congruent information [17, 18], we hypothesised that participants recruited using convenience methods (versus those recruited through more representative means) would display higher levels of concerns about COVID-19, hold beliefs that prevention behaviours are more important, and show greater adherence to behavioural recommendations (e.g., social distancing, mask wearing, vaccination).
Second, given that adjusting analyses for demographic covariates (e.g., adding variables in a regression) is a common method for addressing sampling bias, we sought to evaluate the frequency with which this technique successfully accounts for and attenuates sampling bias within online surveys. To account for how researchers make different choices on which covariates to adjust for, we made use of multiverse analyses [36, 37], an analytical perspective that urges analysts to evaluate how all plausible study choices can influence their results (i.e., by running and reporting results for all analytic choices they could have justifiably made). In our case, this entailed evaluating the degree to which all combinations of a set of plausible and common demographic covariates (e.g., sex, age, education) were successful in attenuating sampling bias in a set of convenience samples.
Methods
This project (e.g., hypotheses, analyses) was registered a priori on the Open Science Framework (https://osf.io/f2pj6), and a project page hosts supplemental files (https://osf.io/dp9kq/).
Data source
We used three online convenience samples (N = 3225; 884; 609) and three largely representative web-panel samples (N = 3003; 3005; 3005) of Canadians recruited over three time periods in 2020 (summarized in relation to the pandemic in Fig. 2). These data represent cross-sectional surveys that were deployed as part of the International COVID-19 Awareness and Responses Evaluation (iCARE; www.icarestudy.com) Study [38]. The convenience-based samples consisted of unpaid volunteers recruited using a combination of online advertising (by iCARE team members) and snowball sampling (e.g., encouraging participants to share the survey within their own networks). In contrast, web panel participants were paid and recruited through Léger, a polling and marketing firm that is commonly employed by researchers aiming to recruit representative samples of Canadians [39, 40]. Participants were drawn from Léger’s LEO panel, a panel of over 400,000 Canadians that was predominantly constructed using probability-based sampling methods (e.g., random-digit dialling) [41]. Additional details on the recruitment/sampling used for the current project are available in the supplemental files (Section 1), as well as through other iCARE-related publications [38, 42, 43].
Measures
Our predictor variable of interest was the type of sample participants were recruited from (convenience vs. web panel). We analysed differences between samples on the 11 outcome variables summarized in Table 1. These were selected and registered in line with the first goal of this article and included participants’: pandemic-related concerns (e.g., about getting infected, losing one’s ability to earn income); adherence to various preventative behaviours (e.g., mask wearing); and intentions to get vaccinated against COVID-19. For our multiverse analyses, we examined the influence of nine covariates that were consistently measured across surveys. These included participants’: province of residence; age; sex; highest education level attained; employment status pre-COVID; student status; parent status; perceived relative household income; and ethnic identity. These were selected as each of these factors has previously been associated to sampling bias in online research [15, 16, 44,45,46,47]. A detailed account of how each outcome and covariate was assessed is provided in the supplemental files.
Analyses
Sampling bias was operationalized as the discrepancy in results between the convenience samples and the web-panel samples. We conducted simple (unadjusted) linear regressions to identify such discrepancies on each outcome variable per time point. An alpha of 0.01 was chosen to be conservative when making inferences (see registration for rationale). Given that some outcomes were assessed using single Likert-type items, we also computed ordered logistic regressions; the results were equivalent to the regression-based models and are reported in the supplemental files.
Change in bias due to covariate adjustments was operationalized as reductions/increases in the discrepancy between the sample types (convenience vs. web panel) in adjusted models compared to their unadjusted counterparts. We employed specification curves, a type of multiverse analysis that use caterpillar plots and other visual tools to examine how data-analytic choices impact estimates of interest [48]. In our case, for each outcome (at each time point), 512 unique models could be specified. These ranged from regressions with no covariate-based adjustments to regressions that adjusted for all covariates. To reflect how using covariates typically operates in practice, we further specified our models according to normative practice in the field: we used an alpha of 0.05 to compute inferential statistics, and refrained from modelling higher-level terms (e.g., interactions) between covariates. Although we limit our analyses to regression models, our procedure should generally produce convergent results with other common methods to deal with sampling bias, such as the use of sample weights derived from the same set of covariates (e.g., using raking or propensity score-based methods [49,50,51,52]).
All analyses were conducted using R version 4.1.0 [53]. Specification curves used the specr and rdfanalysis packages [54, 55]. Our analysis code is available on our project page.
Results and interpretations
Sample demographics
Table 2 presents demographic information on our samples and compares them to the 2016 Canadian census. Overall, the web panels were generally similar in composition to the Canadian population—unsurprising, as Léger panels were explicitly designed to reflect the Canadian population on attributes like sex, age, and region. However, there was some overrepresentation of individuals that were more educated, English-speaking, and of European descent or White. As expected, discrepancies between the census and the convenience samples were considerably larger. The convenience samples consistently and strongly overrepresented participants from Quebec, that were female, spoke French, were highly educated, and were of European descent or White.
Evaluating overall bias on each outcome
Figure 3 presents a forest plot of our inferential results (i.e., unadjusted regressions), evaluating the overall discrepancy between the convenience and web-panel surveys on each outcome. Overall, 24 of 33 tests (73%) indicated significant discrepancies between the samples. Several outcomes were consistent in the direction of these discrepancies over time, with participants in the convenience sample reporting prevention measures as more important, being less concerned about the economy and their personal livelihood, being more likely to self-quarantine, and having higher intentions to get vaccinated. Other variables shifted in the direction of the discrepancy across time points—e.g., participants in the convenience sample reported wearing masks at a lower frequency at Time 1, but at a higher frequency at times 2 and 3. Section 6 of the supplemental file presents the distribution of responses for each outcome and can be used to contextualize effects from Fig. 3.
How frequently did covariates reduce sampling discrepancies?
Figures 4 and 5 summarize our specification curve analyses and display how discrepancies between the convenience and web panel surveys varied as a function of 512 combinations of covariates. Each plot (i.e., panel) within Figs. 4 and 5 indicates findings for one outcome at a given time point. Each plot also indicates the percent of adjusted models (those that control for covariates) that found smaller estimated discrepancies (i.e., our index of sampling bias) relative to their corresponding unadjusted models. Overall, adjusted models reduced sampling discrepancies 55% of the time, and increased discrepancies 45% of the time. However, there was substantial variation across outcomes. We organize these into three patterns (denoted by circled numbers in Figs. 4 and 5).
Pattern 1 For 33% of cases (i.e., 11 of the 33 panels across Figs. 4 and 5, with each panel indicating a particular outcome at a given time point), fewer than 25% of adjusted models showed smaller sampling discrepancies relative to their unadjusted counterparts. This pattern was most apparent for hand washing across all three time points (Fig. 4). For these cases, a large majority of covariate combinations increased sampling discrepancies (i.e., bias), frequently leading what was initially a non-significant discrepancy (in unadjusted models) to become significant.
Pattern 2 For 39% of cases (three panels in Fig. 4 and nine panels in Fig. 5), between 25–75% of adjusted models showed reduced sampling discrepancies relative to their unadjusted counterparts. This was especially apparent for vaccine intentions across all three time points (Fig. 5). For these, the inclusion of covariates could frequently reduce or increase sampling discrepancies, but often made little difference in in changing the significance level from that observed in the unadjusted models (e.g., the convenience sample displayed substantially higher vaccine intentions than the web-panel sample regardless of which covariates were adjusted for).
Pattern 3 Finally, for only 27% of cases (two panels in Fig. 4 and eight panels in Fig. 5) did 75% or more of adjusted models lead to reduced sampling discrepancies relative to their unadjusted counterparts. This applied to avoiding social gatherings across all three time points (Fig. 5). For these, covariates could reduce an initially significant discrepancy (in unadjusted models) to nonsignificance, but discrepancies also frequently persisted across many combinations of covariates.
Were there covariates that consistently reduced discrepancies?
In addition to Figs. 4 and 5, our supplemental file (Section 7) provides plots that depict which covariates were adjusted for in any given model. Our project page complements this with tables of results for all 16,896 models computed. Using these, we examined the consistency with which each covariate decreased/increased discrepancies.
When each covariate was adjusted for in isolation, income reduced discrepancies in 73% of cases, but increased discrepancies for the remaining 27%. Other covariates increased estimated discrepancies for 45% (province and education), 48% (ethnicity and sex), 55% (age), 64% (employment status), 67% (parental status), and 73% (student status) of cases. If we consider any combination that includes a given covariate (e.g., income by itself or with any combination of other covariates), each covariate increased discrepancies in between 40–45% of cases. When all nine covariates were adjusted for simultaneously, this performed better, but still increased discrepancies in 33% of cases.
Notably, a given covariate could have drastic and inconsistent effects on estimates across outcomes and time points. For example, if we examine concerns about the economic impact of the pandemic at Time 2 (Fig. 5), we see a sudden shift in the plot such that half of models showed substantially larger negative estimates than the other half. This is almost entirely attributable to province being adjusted for: the smaller (less negative) half of estimates adjusted for province, whereas the larger (more negative) half did not. Now, consider hand washing at Time 2 (Fig. 4). Here, the reverse pattern occurred: the larger (more negative) half of estimates adjusted for province, whereas the smaller (less negative) half did not. Importantly, we also see important shifts within outcomes. For instance, when examining concerns about oneself being infected at Time 1, close to half of estimates are non-significant and close to half are significant and negative. The former generally adjust for province and the latter do not. The reverse is generally true at Time 2: the significant models adjust for province and non-significant models do not.
Discussion
In this work, we sought to: (1) better understand the effects of sampling bias in online COVID-19 research; and (2) examine the degree to which adjusting analyses for demographic covariates can successfully attenuate such bias. What did we find?
Convenience participants were more favourably disposed towards engaging in COVID-19 prevention behaviours
Significant discrepancies emerged between the online convenience and web-panel surveys on over two thirds of outcomes (averaging d = 0.21). For example, vaccine intentions were considerably higher in the convenience sample at all three time points relative to the web-panel, with 13–18% more participants indicating they would be “extremely likely” to get the vaccine. Such discrepancies are of an important magnitude and are larger than many effects listed as take-away messages from studies using convenience samples (e.g., difference in intentions between subgroups [56, 57]). This highlights the importance of taking care not to overgeneralize when using convenience samples and provides valuable information on how researchers can restrain their inferences (e.g., by recognizing that convenience-sample-based estimates of vaccine intentions could be inflated).
Documenting these descriptive patterns is useful, but how do we make sense of them? At the onset of this project, we reasoned that participants recruited using convenience-based methods would show more positive dispositions towards COVID-19 prevention measures than would participants recruited using more representative means. Indeed, participants in the convenience samples rated prevention measures as more important, engaged in more social distancing, self-quarantining, and avoidance of gatherings, and displayed stronger intentions to get vaccinated. These effects all align well with our hypotheses and the notion that individuals engage in selective exposure when deciding which studies to engage in (e.g., volunteering for topics they approve of [17]). This would suggest that to better reduce sampling bias, studies should assess and account for these associations. This could be by assessing people’s attitudes and adjusting for them statistically, or altering study designs to disrupt selective-exposure effects, such as by having the key topic of a study be less obvious during marketing. In making such choices, researchers should also consider carefully which variables act as mediators and confounds of the link between prevention behaviours and study participation. For example, participants who engage more frequently in prevention behaviours may be healthier and in a better position to engage in research—see the healthy user bias [58].
In contrast to our findings on behavioural outcomes, sampling discrepancies for participants’ concerns towards the pandemic were more varied. Participants in the convenience sample endorsed higher concerns for others being infected (in line with expectations) but fewer concerns about the economy and their personal livelihoods. These latter findings were unexpected, but could have arisen due to unmeasured confounds. For example, those with a neurotic personality may experience greater concerns but be less disposed to engage in research [26]. Additionally, affluent (e.g., White, educated) participants are also more likely to participate in research (as evidenced in our samples), but should generally be less concerned about their finances/livelihood. These possibilities highlight the complex nature of sampling bias, and further emphasize the need for research to think more carefully about confounds, mediators, and colliders when adjusting for sampling bias (i.e., Fig. 1, Panel C).
The performance of demographic covariates in attenuating sampling discrepancies was often poor and variable
The use of demographic covariates in analytic models is a common technique to account for sampling bias. However, across nearly 17,000 models, we found that the inclusion of demographic covariates reduced sampling discrepancies only 55% of the time—barely above chance level. Further, no individual covariate (used either in isolation or in combination with others) consistently reduced discrepancies. In fact, the effects of covariates were highly variable even within outcomes and there were many cases (e.g., vaccine intentions) for which no combination of covariates was sufficient to meaningfully attenuate sampling discrepancies. Certain demographic covariates even increased sampling bias in a systematic way (e.g., student status substantially more frequently increased than decreased sampling discrepancies).
These findings suggest that consistently following rules of thumb for covariate selection (e.g., always adjusting for sex or age) or simply including a subset of demographic characteristics that happen to be measured in a study are likely unreliable strategies for reducing sampling bias. General caution, along with a critical outlook, is therefore advised when using demographic variables as covariate variables.
That said, we do not suggest that efforts to reliably adjust for sampling bias using covariates is a profitless endeavour. Indeed, although we found that including all nine covariates increased sampling discrepancies 33% of the time, this was a better performance than most models adjusting for fewer covariates. Consequently, it is possible that adjusting for demographics could become more successful when a very large number of covariates are included in models. Future research could examine this possibility, along with whether modelling higher order effects (e.g., interaction terms between covariates) could also help attenuate sampling bias. It will also be important for research to examine the degree to which the patterns we report vary when using other types of sampling methods (e.g., in-person recruitment methods) as sampling methods may often interact in unique ways with participant characteristics (e.g., online studies may underrepresent those with less technological expertise, whereas in-person studies may underrepresent individuals with reduced physical mobility). While such studies are underway, researchers can consider several other tools at their disposal to deal with sampling bias.
Recommendations for dealing with sampling bias
One way to reduce sampling bias is through design-based methods. One may, for instance, use probability-based sampling to improve reach within a population. However, as noted in our introduction, such methods are not always feasible or optimal (e.g., some populations are better reached through non-probability methods [59, 60]), and certain research goals can supersede the need for representativeness (e.g., a researcher may choose purposive sampling when the goal is maximizing diversity of views/experiences) [20, 21]. Other tools may include reducing selective participation through stronger monetary incentives or by mandating participation [13, 14, 61], but both methods can also have barriers and drawbacks to consider [62, 63].
On the analytic side, causal diagrams (e.g., Fig. 1) are a tool that have, over the last few years, emerged as a gold-standard for understanding and determining how to best analytically handle bias in research (including sampling bias) [4, 5, 11, 12, 19]. Importantly, causal diagrams can help researchers pinpoint which covariates can help maximize the validity of inferences, while also helping better plan studies in the design phase. An important insight from the use of causal diagrams is that there is likely no single “correct” set of covariates that can be used across all analyses. Each outcome (and outcome-exposure link) should have its own covariates (and causal diagram) to avoid introducing error and bias (e.g., see discussions on the Table 2 Fallacy, unnecessary adjustment, and overadjustment) [64,65,66]. To this, our findings further suggest that analysts may also wish to explicitly account for time-specific influences—as we found the role of covariates to differ substantially in their effects across time points even for the same outcome. Adding this type of specificity to causal diagrams could help researchers further reduce the effects of sampling bias.
Unfortunately, in many research areas (e.g., in medical and behavioural sciences), it is often difficult for theories to outline causal factors in enough details to delineate complete causal diagrams. In such cases, researchers can consider a final option; that is, examining the robustness of their findings using multiverse-type analyses—as demonstrated within the current works—as a form of extended sensitivity analyses. Multiverse analyses are not only explicitly designed to help researchers handle and understand ambiguities in analysis-based decisions, but the development of new multiverse-type tools/perspectives continues to be an area of burgeoning methodological advancements, and many resources now exist for interested readers to learn more about these approaches [36, 37, 48, 67]. However, in relying on multiverse analyses, it will be important to remember that compared to causal diagrams, multiverse analyses cannot inform which estimate is the most causally valid. Rather, this approach is used to verify that one’s inferences are not limited to only a subset of possible analyses, and to quantify the degree to which largely arbitrary choices (between plausible alternatives) influence inferences.
Strengths and limitations
There are a few constraints that warrant consideration when interpreting our findings. First, our study was conducted in a very specific context: Canada during the COVID-19 pandemic. Examining sampling bias in other countries and contexts is therefore warranted. Second, although Léger constructs their web panels using methods such as random-digit dialling, the samples we obtained from these panels were not fully representative of the Canadian population and this could have skewed findings—e.g., if similar but less pronounced biases (as observed in the convenience samples) affected the web panels, our results may generally underestimate bias. Third, we acknowledge that many methods exist to obtain convenience samples and that our analyses were specific to volunteer-based online recruitment methods. Other methods (e.g., in-person, or print-based recruitment techniques) can have idiosyncratic biases [68] such that specific variables (e.g., sex, age, health beliefs) may vary in how they operate to generate (and reduce) bias. Future works will need to parse out such patterns. Fourth, our data was cross-sectional and our findings ultimately still conditioned on self-selection into the study. Consequently, care should be taken when inferring causation; for instance, we cannot infer that vaccine intentions cause self-selection into studies (e.g., as in Fig. 1’s path p3), nor can we infer that the effects of adjusting for covariates operated through causal links. That said, our unadjusted models can still provide good estimates of sampling bias if sampling bias is taken to be entirely spurious associations between sampling and outcomes (akin to path p5 in Fig. 1). Lastly, our multiverse analyses treated getting an accurate estimate from a convenience sample (one equal in magnitude/direction to an estimate from a representative sample) as the goal when reducing sampling bias. This was a simplification. Although removing sampling bias would achieve this, so would aggregating divergent biases that so happen to average to the population value. Our analyses cannot tease these scenarios apart.
Finally, our study also has several strengths to consider. Notably, this is the first empirical study to use multiverse style analyses to understand how covariate selection influences estimates produced across sampling methods. Our analyses were also registered a priori and we used large samples across three distinct time points. This contrasts with previous empirical works on sampling bias, which have not been registered, have relied on smaller samples collected over single time points, and have usually examined the influence of a single set of covariates at a time [15, 16, 22, 26, 69,70,71]. Consequently, our findings are more likely to generalize than past efforts.
Data availability & study materials
A project page is available through the Open Science Framework (https://osf.io/dp9kq/), which provides open access to our supplemental materials, data analysis script files, and registration. The data underlying this article can be made available by the Montreal Behavioural Medicine Centre upon reasonable request (https://mbmc-cmcm.ca/covid19/apl/).
References
Tyrer S, Heyman B. Sampling in epidemiological research: issues, hazards and pitfalls. BJPsych Bulletin. 2016;40:57–60. https://doi.org/10.1192/pb.bp.114.050203.
Sarstedt M, Bengart P, Shaltoni AM, Lehmann S. The use of sampling methods in advertising research: A gap between theory and practice. Int J Advert. 2018;37:650–63. https://doi.org/10.1080/02650487.2017.1348329.
Kennedy EB, Jensen EA, Jensen AM. Methodological considerations for survey-based research during emergencies and public health crises: Improving the quality of evidence & science communication. Front Commun. 2021;6:226.
Elwert F, Winship C. Endogenous selection bias: the problem of conditioning on a collider variable. Ann Rev Sociol. 2014;40:31–53. https://doi.org/10.1146/annurev-soc-071913-043455.
Griffith GJ, Morris TT, Tudball MJ, et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat Commun. 2020;11:1–12. https://doi.org/10.1038/s41467-020-19478-2.
Shen C, VanGennep D, Siegenfeld AF, Bar-Yam Y. Unraveling the flaws of estimates of the infection fatality rate for COVID-19. J Travel Med. 2021;28:1–3. https://doi.org/10.1093/jtm/taaa239.
Zhao Q, Ju N, Bacallado S, Shah RD. BETS: The dangers of selection bias in early analyses of the coronavirus disease (COVID-19) pandemic. Ann Appl Stat. 2021;15:363–90. https://doi.org/10.1214/20-AOAS1401.
Steiner PM, Cook TD, Shadish WR, Clark MH. The importance of covariate selection in controlling for selection bias in observational studies. Psychol Methods. 2010;15:250. https://doi.org/10.1037/a0018719.
Starks H, Diehr P, Curtis JR. The challenge of selection bias and confounding in palliative care research. J Palliat Med. 2009;12:181–7. https://doi.org/10.1089/jpm.2009.9672.
Wirth KE, Tchetgen EJT. Accounting for selection bias in association studies with complex survey data. Epidemiology. 2014;25:444. https://doi.org/10.1097/EDE.0000000000000037.
Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48.
Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–25. https://doi.org/10.1097/01.ede.0000135174.63482.43.
Smith MG, Witte M, Rocha S, Basner M. Effectiveness of incentives and follow-up on increasing survey response rates and participation in field studies. BMC Med Res Methodol. 2019;19:1–13. https://doi.org/10.1186/s12874-019-0868-8.
Barón JD, Breunig RV, Cobb-Clark DA, Gørgens T, Sartbayeva A. Does the effect of incentive payments on survey response rates differ by income support history? J Off Stat. 2009;25:483–507.
Ganguli M, Lytle ME, Reynolds MD, Dodge HH. Random versus volunteer selection for a community-based study. J Gerontol Ser A: Biol Sci Med Sci. 1998;53:M39–46. https://doi.org/10.1093/gerona/53a.1.m39.
Owen JE, Bantum EOC, Criswell K, Bazzo J, Gorlick A, Stanton AL. Representativeness of two sampling procedures for an internet intervention targeting cancer-related distress: a comparison of convenience and registry samples. J Behav Med. 2014;37:630–41. https://doi.org/10.1007/s10865-013-9509-6.
Hart W, Albarracín D, Eagly AH, Brechan I, Lindberg MJ, Merrill L. Feeling validated versus being correct: a meta-analysis of selective exposure to information. Psychol Bull. 2009;135:555–88. https://doi.org/10.1037/a0015701.
Meppelink CS, Smit EG, Fransen ML, Diviani N. “I was right about vaccination”: confirmation bias and health literacy in online health information seeking. J Health Commun. 2019;24:129–40. https://doi.org/10.1080/10810730.2019.1583701.
Cole SR, Platt RW, Schisterman EF, et al. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010;39:417–20. https://doi.org/10.1093/ije/dyp334.
Rothman KJ, Gallacher JE, Hatch EE. Why representativeness should be avoided. Int J Epidemiol. 2013;42:1012–4. https://doi.org/10.1093/ije/dys223.
Richiardi L, Pizzi C, Pearce N. Commentary: Representativeness is usually not necessary and often should be avoided. Int J Epidemiol. 2013;42:1018–22. https://doi.org/10.1093/ije/dyt103.
Hultsch DF, MacDonald SW, Hunter MA, Maitland SB, Dixon RA. Sampling and generalisability in developmental research: comparison of random and convenience samples of older adults. Int J Behav Dev. 2002;26:345–59. https://doi.org/10.1080/01650250143000247.
Malik AA, McFadden SM, Elharake J, Omer SB. Determinants of COVID-19 vaccine acceptance in the US. EClinicalMedicine. 2020;26: 100495. https://doi.org/10.1016/j.eclinm.2020.100495.
Rothstein MA, Parmet WE, Reiss DR. Employer-Mandated Vaccination for COVID-19. Am J Public Health. 2021;111:1061–4. https://doi.org/10.2105/AJPH.2020.306166.
Gostin LO, Salmon DA, Larson HJ. Mandating COVID-19 vaccines. JAMA. 2021;325:532–3. https://doi.org/10.1001/jama.2020.26553.
Lönnqvist JE, Paunonen S, Verkasalo M, Leikas S, Tuulio-Henriksson A, Lönnqvist J. Personality characteristics of research volunteers. Eur J Pers. 2007;21:1017–30. https://doi.org/10.1002/per.655.
De Fruyt F, Mervielde I. RIASEC types and Big Five traits as predictors of employment status and nature of employment. Pers Psychol. 1999;52:701–27. https://doi.org/10.1111/j.1744-6570.1999.tb00177.x.
Guertin JR, Conombo B, Langevin R, et al. A systematic review of methods used for confounding adjustment in observational economic evaluations in cardiology conducted between 2013 and 2017. Med Decis Making. 2020;40:582–95. https://doi.org/10.1177/0272989X20937257.
Ali MS, Groenwold RHH, Belitser SV, et al. Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review. J Clin Epidemiol. 2015;68:122–31. https://doi.org/10.1016/j.jclinepi.2014.08.011.
Ponkilainen VT, Uimonen M, Raittio L, Kuitunen I, Eskelinen A, Reito A. Multivariable models in orthopaedic research: a methodological review of covariate selection and causal relationships. Osteoarthr Cartil. 2021;29:939–45. https://doi.org/10.1016/j.joca.2021.03.020.
Wright N, Ivers N, Eldridge S, Taljaard M, Bremner S. A review of the use of covariates in cluster randomized trials uncovers marked discrepancies between guidance and practice. J Clin Epidemiol. 2015;68:603–9. https://doi.org/10.1016/j.jclinepi.2014.12.006.
Zeraatkar D, Cheung K, Milio K, et al. Methods for the selection of covariates in nutritional epidemiology studies: a meta-epidemiological review. Curr Dev Nutr. 2019;3:104. https://doi.org/10.1093/cdn/nzz104.
Hyatt CS, Owens MM, Crowe ML, Carter NT, Lynam DR, Miller JD. The quandary of covarying: a brief review and empirical examination of covariate use in structural neuroimaging studies on psychological variables. Neuroimage. 2020;205: 116225. https://doi.org/10.1016/j.neuroimage.2019.116225.
Silberzahn R, Uhlmann EL, Martin DP, et al. Many analysts, one data set: making transparent how variations in analytic choices affect results. Adv Methods Pract Psychol Sci. 2018;1:337–56. https://doi.org/10.1177/2515245917747646.
The MARP Team. A many-analysts approach to the relation between religiosity and well-being. PsyArXiv, https://doi.org/10.31234/osf.io/pbfye, 21 January 2022, preprint: not reviewed.
Steegen S, Tuerlinckx F, Gelman A, Vanpaemel W. Increasing transparency through a multiverse analysis. Perspect Psychol Sci. 2016;11:702–12. https://doi.org/10.1177/1745691616658637.
Del Giudice M, Gangestad SW. A traveler’s guide to the multiverse: promises, pitfalls, and a framework for the evaluation of analytic decisions. Adv Methods Pract Psychol Sci. 2021;4:2515245920954925. https://doi.org/10.1177/2515245920954925.
Bacon SL, Lavoie KL, Boyle J, Stojanovic J, Joyal-Desmarais K. Team is International assessment of the link between COVID-19 related attitudes, concerns and behaviours in relation to public health policies: optimising policy strategies to improve health, economic and quality of life outcomes (the iCARE Study). BMJ Open. 2021;11:e046127. https://doi.org/10.1136/bmjopen-2020-046127.
Perez S, Shapiro GK, Tatar O, Joyal-Desmarais K, Rosberger Z. Development and validation of the human papillomavirus attitudes and beliefs scale in a National Canadian sample. Sex Transm Dis. 2016;43:626–32. https://doi.org/10.1097/OLQ.0000000000000506.
MacDougall D, Halperin B, MacKinnon-Cameron D, et al. The challenge of vaccinating adults: attitudes and beliefs of the Canadian public and healthcare providers. BMJ Open. 2015;5: e009062. https://doi.org/10.1136/bmjopen-2015-009062.
Leger Opinion. Born from research panel book: data collection 360. (2020). Accessed on June 13, 2022 from: https://leger360.com/services/legeropinion-leo/
Lavoie K, Gosselin-Boucher V, Stojanovic J, et al. Understanding national trends in COVID-19 vaccine hesitancy in Canada: results from five sequential cross-sectional representative surveys spanning April 2020–March 2021. BMJ Open. 2022;12: e059411. https://doi.org/10.1136/bmjopen-2021-059411.
Stojanovic J, Boucher VG, Gagne M, et al. Global trends and correlates of COVID-19 vaccination hesitancy: findings from the iCARE study. Vaccines. 2021;9:661. https://doi.org/10.3390/vaccines9060661.
Andreeva VA, Salanave B, Castetbon K, et al. Comparison of the sociodemographic characteristics of the large NutriNet-Santé e-cohort with French Census data: the issue of volunteer bias revisited. J Epidemiol Community Health. 2015;69:893. https://doi.org/10.1136/jech-2014-205263.
Jeong M, Zhang D, Morgan JC, et al. Similarities and differences in tobacco control research findings from convenience and probability samples. Ann Behav Med. 2019;53:476–85.
Prah P, Hickson F, Bonell C, et al. Men who have sex with men in Great Britain: comparing methods and estimates from probability and convenience sample surveys. Sex Transm Infect. 2016;92:455–63.
Yank V, Agarwal S, Loftus P, Asch S, Rehkopf D. Crowdsourced health data: comparability to a US national survey, 2013–2015. Am J Public Health. 2017;107:1283–9.
Simonsohn U, Simmons JP, Nelson LD. Specification curve analysis. Nat Hum Behav. 2020;4:1208–14. https://doi.org/10.1038/s41562-020-0912-z.
D’Agostino RB Jr. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. 1998;17:2265–81. https://doi.org/10.1002/(SICI)1097-0258(19981015)17:19%3c2265::AID-SIM918%3e3.0.CO;2-B.
Mercer A, Lau A, Kennedy C. For weighting online opt-in samples, what matters most? 2018. Accessed 13 June 2022. https://www.pewresearch.org/methods/2018/01/26/for-weighting-online-opt-in-samples-what-matters-most/
Shah BR, Laupacis A, Hux JE, Austin PC. Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. J Clin Epidemiol. 2005;58:550–9. https://doi.org/10.1016/j.jclinepi.2004.10.016.
Stürmer T, Joshi M, Glynn RJ, Avorn J, Rothman KJ, Schneeweiss S. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. J Clin Epidemiol. 2006;59:437.e1-e24. https://doi.org/10.1016/j.jclinepi.2005.07.004.
R Core Team. R: A Language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria; 2017. https://www.R-project.org/
Masur PK, Scharkow M. Specr: conducting and visualizing specification curve analyses: R Package. 2020. https://research.vu.nl/en/publications/specr-conducting-and-visualizing-specification-curve-analyses-r-p
Gassen J. Rdfanalysis: researcher degrees of freedom analysis, a package to explore and document your degrees of freedom. 2020. https://joachim-gassen.github.io/rdfanalysis/
Ali KF, Whitebridge S, Jamal MH, Alsafy M, Atkin SL. Perceptions, knowledge, and behaviors related to COVID-19 among social media users: Cross-sectional study. J Med Internet Res. 2020;22: e19913. https://doi.org/10.2196/19913.
Yahia AIO, Alshahrani AM, Alsulmi WGH, et al. Determinants of COVID-19 vaccine acceptance and hesitancy: a cross-sectional study in Saudi Arabia. Hum Vaccines Immunother. 2021. https://doi.org/10.1080/21645515.2021.1950506.
Shrank WH, Patrick AR, Brookhart MA. Healthy user and related biases in observational studies of preventive interventions: a primer for physicians. J Gen Intern Med. 2011;26:546–50. https://doi.org/10.1007/s11606-010-1609-1.
Hequembourg AL, Panagakis C. Maximizing respondent-driven sampling field procedures in the recruitment of sexual minorities for health research. SAGE Open Med. 2019;7:2050312119829983. https://doi.org/10.1016/10.1177/2050312119829983.
Piperato SM. Comparative Effectiveness of Conventional and Novel Sampling Methods for the Recruitment of Sexual Minority Identified Women. (Doctoral dissertation). 2018. Retrieved from https://scholarcommons.sc.edu/etd/4800
Cheung KL, ten Klooster PM, Smit C, de Vries H, Pieterse ME. The impact of non-response bias due to sampling in public health studies: a comparison of voluntary versus mandatory recruitment in a Dutch national survey on adolescent health. BMC Public Health. 2017;17:276. https://doi.org/10.1186/s12889-017-4189-8.
Gelinas L, Largent EA, Cohen IG, Kornetsky S, Bierer BE, Fernandez LH. A Framework for Ethical Payment to Research Participants. 2018;378:766–71. https://doi.org/10.1056/NEJMsb1710591.
Rothstein MA, Shoben AB. Does consent bias research? Am J Bioeth. 2013;13:27–37. https://doi.org/10.1080/15265161.2013.767955.
VanderWeele TJ. On the relative nature of overadjustment and unnecessary adjustment. Epidemiology. 2009;20:496–9. https://doi.org/10.1097/EDE.0b013e3181a82f12.
Schisterman EF, Cole SR, Platt RW. Overadjustment bias and unnecessary adjustment in epidemiologic studies. Epidemiology. 2009;20:488–95. https://doi.org/10.1097/EDE.0b013e3181a819a1.
Westreich D, Greenland S. The Table 2 Fallacy: Presenting and interpreting confounder and modifier coefficients. Am J Epidemiol. 2013;177:292–8. https://doi.org/10.1093/aje/kws412.
Patel CJ, Burford B, Ioannidis JP. Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. J Clin Epidemiol. 2015;68:1046–58. https://doi.org/10.1016/j.jclinepi.2015.05.029.
Morley KC, Teesson M, Sannibale C, Haber PS. Sample bias from different recruitment strategies in a randomised controlled trial for alcohol dependence. Drug Alcohol Rev. 2009;28:222–9. https://doi.org/10.1111/j.1465-3362.2008.00022.x.
Arnett B, Rikli R. Effects of method of subject selection (volunteer vs random) and treatment variable on motor performance. Res Q Exerc Sport. 1981;52:433–40. https://doi.org/10.1080/02701367.1981.10607888.
Dollinger SJ, Leong FTL. Volunteer bias and the five-factor model. J Psychol. 1993;127:29–36. https://doi.org/10.1080/00223980.1993.9915540.
Pruchno RA, Brill JE, Shands Y, et al. Convenience samples and caregiving research: how generalizable are the findings? Gerontologist. 2008;48:820–7. https://doi.org/10.1093/geront/48.6.820.
Gordon M, Lumley T. Advanced Forest Plot Using 'grid' Graphics. 2021. https://cran.r-project.org/web/packages/forestplot/forestplot.pdf
iCARE Study Team
Kim L. Lavoie (University of Quebec at Montreal and CIUSSS-NIM), Simon L. Bacon (Concordia University and CIUSSS-NIM), Zahir Vally (United Arab Emirates University), Nora Granana (Hospital Durand), Analía Verónica Losada (University of Flores), Jacqueline Boyle (Monash University), Joanne Enticott (Monash University), Shajedur Rahman Shawon (Centre for Big Data Research in Health, UNSW Medicine), Shrinkhala Dawadi (Monash University), Helena Teede (Monash University), Alexandra Kautzky-Willer (Medizinische Universität Wien), Arobindu Dash (International University of Business, Agriculture & Technology), Marilia Estevam Cornelio (University of Campinas), Marlus Karsten (Universidade do Estado de Santa Catarina), Darlan Lauricio Matte (Universidade do Estado de Santa Catarina), Felipe Reichert (Universidade), Ahmed Abou-Setta (University of Manitoba), Shawn Aaron (Ottawa Hospital Research Institute), Angela Alberga (Concordia University), Tracie Barnett (McGill University), Silvana Barone (Université de Montréal), Ariane Bélanger-Gravel (Université Laval), Sarah Bernard (Université Laval), Lisa Maureen Birch (Université Laval), Susan Bondy (University of Toronto—Dalla Lana School of Public Health), Linda Booij (Concordia University); Roxane Borgès Da Silva (Université de Montréal) Jean Bourbeau (McGill University), Rachel Burns (Carleton University), Tavis Campbell (University of Calgary), Linda Carlson (University of Calgary), Étienne Charbonneau (École nationale d'administration publique), Kim Corace (University of Ottawa), Olivier Drouin (CHU Sainte-Justine/Université de Montréal), Francine Ducharme (Université de Montréal), Mohsen Farhadloo (Concordia University), Carl Falk (McGill University), Richard Fleet (Université Laval), Michel Fournier (Direction de la Santé Publique de Montréal), Gary Garber (University of Ottawa/Public Health Ontario), Lise Gauvin (Université de Montréal), Jennifer Gordon (University of Regina); Roland Grad (McGill University), Samir Gupta (University of Toronto), Kim Hellemans (Carleton University), Catherine Herba (UQAM), Heungsun Hwang (McGill University), Jack Jedwab (Canadian Institute for Identities and Migration and the Association for Canadian Studies), Keven Joyal-Desmarais (Concordia University), Lisa Kakinami (Concordia University), Eric Kennedy (York University), Sunmee Kim (University of Manitoba), Joanne Liu (McGill University), Colleen Norris (University of Alberta), Sandra Pelaez (Université de Montréal), Louise Pilote (McGill University), Paul Poirier (Université Laval), Justin Presseau (University of Ottawa), Eli Puterman (University of British Columbia), Joshua Rash (Memorial University), Paula AB Ribeiro (MBMC), Mohsen Sadatsafavi (University of British Columbia), Paramita Saha Chaudhuri (McGill University), Jovana Stojanovic (Concordia University), Eva Suarthana (Université de Montréal/McGill University), Sze Man Tse (CHU Sainte-Justine), Michael Vallis (Dalhousie University), Nicolás Bronfman Caceres (Universidad Andrés Bello), Manuel Ortiz (Universidad de La Frontera), Paula Beatriz Repetto (Universidad Católica de Chile), Mariantonia Lemos-Hoyos (Universidad EAFIT), Angelos Kassianos (University of Cyprus), Naja Hulvej Rod (University of Copenhagen), Mathieu Beraneck (Université de Paris; CNRS), Gregory Ninot (Université de Montpellier), Beate Ditzen (Heidelberg University), Thomas Kubiak (Mainz University), Sam Codjoe (University of Ghana), Lily Kpobi (University of Ghana), Amos Laar (University of Ghana), Theodora Skoura (Aretaieio Hospital Athens University), Delfin Lovelina Francis (Vinayaka Mission's Dental College), Naorem Kiranmala Devi (University of Delhi), Sanjenbam Meitei (Manipur University), Suzanne Tanya Nethan (School of Preventive Oncology), Lancelot Pinto (Hinduja Hospital and Medical Research Centre), Kallur Nava Saraswathy (University of Delhi), Dheeraj Tumu (World Health Organization), Silviana Lestari (Universitas Indonesia), Grace Wangge (SEAMEO Regional Center for Food and Nutrition), Molly Byrne (National University of Ireland, Galway), Hannah Durand (National University of Ireland, Galway), Jennifer McSharry (National University of Ireland, Galway), Oonagh Meade (National University of Ireland, Galway), Gerry Molloy (National University of Ireland, Galway), Chris Noone (National University of Ireland, Galway), Hagai Levine (Hebrew University), Anat Zaidman-Zait (Tel-Aviv University), Stefania Boccia (Università Cattolica del Sacro Cuore), Ilda Hoxhaj (Università Cattolica del Sacro Cuore), Stefania Paduano (University of Modena and Reggio Emilia), Valeria Raparelli (Sapienza—University of Rome), Drieda Zaçe (Università Cattolica del Sacro Cuore), Ala’S Aburub (Isra University), Daniel Akunga (Kenyatta University), Richard Ayah (University of Nairobi, School Public Health), Chris Barasa (University of Nairobi, School Public Health), Pamela Miloya Godia (University of Nairobi), Elizabeth W. Kimani-Murage (African Population and Health Research Center), Nicholas Mutuku (University of Kenya), Teresa Mwoma (Kenyatta University), Violet Naanyu (Moi University), Jackim Nyamari (Kenyatta University), Hildah Oburu (Kenyatta University), Joyce Olenja (University of Nairobi), Dismas Ongore (University of Nairobi), Abdhalah Ziraba (African Population and Health Research Center), Chiwoza Bandawe (University of Malawi), Loh Siew Yim (Faculty of Medicine, University of Malaya), Ademola Ajuwon (University of Ibadan), Nisar Ahmed Shar (National Center in Big Data & Cloud Computing), Bilal Ahmed Usmani (University of Engineering and Technology), Rosario Mercedes Bartolini Martínez (Instituto de Investigacion Nutricional), Hilary Creed-Kanashiro (Instituto de Investigacion Nutricional), Paula Simão (Pneumologia de Matosinhos), Pierre Claver Rutayisire (University Rwanda), Abu Zeeshan Bari (Taibah University), Katarina Vojvodic (University of Belgrade), Iveta Nagyova (Safarik University – UPJS), Jason Bantjes (University of Stellenbosch), Brendon Barnes (University of Johannesburg), Bronwyne Coetzee (University of Stellenbosch), Ashraf Khagee (University of Stellenbosch), Tebogo Mothiba (University of Limpopo), Rizwana Roomaney (University of Stellenbosch), Leslie Swartz (University of Stellenbosch), Juhee Cho (Sungkyunkwan University), Man-gyeong Lee, (Sungkyunkwan University), Anne Berman (Karolinska Institutet), Nouha Saleh Stattin (Karolinska Institutet), Susanne Fischer (University of Zurich), Debbie Hu (Tainan Municipal Hospital), Yasin Kara (Kanuni Sultan Süleyman Training and Research Hospital, Istanbul), Ceprail Şimşek (Health Science University), Bilge Üzmezoğlu (University of Health Science), John Bosco Isunju (Makerere University School of Public Health), James Mugisha (University of Uganda), Lucie Byrne-Davis (University of Manchester), Paula Griffiths (Loughborough University), Joanne Hart (University of Manchester), Will Johnson (Loughborough University), Susan Michie (University College London), Nicola Paine (Loughborough University), Emily Petherick (Loughborough University); Lauren Sherar (Loughborough University), Robert M. Bilder (University of California, Los Angeles), Matthew Burg (Yale), Susan Czajkowski (NIH—National Cancer Institute), Ken Freedland (Washington University), Sherri Sheinfeld Gorin (University of Michigan), Alison Holman (University of California, Irvine), Jiyoung Lee (University of Alabama), Gilberto Lopez (Arizona State University and University of Rochester Medical Center), Sylvie Naar (Florida State University), Michele Okun (University of Colorado, Colorado Springs), Lynda Powell (Rush University), Sarah Pressman (University of California, Irvine), Tracey Revenson (University of New York City), John Ruiz (University of Arizona), Sudha Sivaram (NIH, Center for Global Health), Johannes Thrul (Johns Hopkins), Claudia Trudel-Fitzgerald (Harvard T.H. Chan School of Public Health), Abehaw Yohannes (Azusa Pacific University), Rhea Navani (Monash University), Kushnan Ranakombu (Monash University), Daisuke Hayashi Neto (Unicamp), Tair Ben-Porat (Tel Aviv University), Anda Dragomir (University of Quebec at Montreal and CIUSSS-NIM), Amandine Gagnon-Hébert (UQAM), Claudia Gemme (UQAM), Vincent Gosselin Boucher (University of Quebec at Montreal and CIUSSS-NIM), Mahrukh Jamil (Concordia University and CIUSSS-NIM), Lisa Maria Käfer (McGill University), Ariany Marques Vieira (Concordia University), Tasfia Tasbih (Concordia University and CIUSSS-NIM), Robbie Woods (Concordia University), Reyhaneh Yousefi (Concordia University and CIUSSS-NIM), Tamila Roslyakova (Université de Montpellier), Lilli Priesterroth (Mainz University), Shirly Edelstein (Hebrew University-Hadassah School of Public Health), Ruth Snir (Hebrew University-Hadassah School of Public Health), Yifat Uri (Hebrew University-Hadassah School of Public Health), Mohsen Alyami (University of Auckland), Comfort Sanuade, Olivia Crescenzi, Kyle Warkentin, Katya Grinko, Lalita Angne, Jigisha Jain, Nikita Mathur (Syncorp Clinical Research), Anagha Mithe, Sarah Nethan (Community Empowerment Lab).
Funding
This work was supported by the Canadian Institutes of Health Research (CIHR: MM1-174903; MS3-173099; SMC-151518); the Canada Research Chairs Program (950-232522, Chair holder: Dr. Kim L. Lavoie); the Fonds de recherche du Québec—santé (FRQ-S: 251618; 34757); the Fonds de recherche du Québec – Société et culture (FRQSC: 2019-SE1-252541); and the Ministère de l'Économie et de l’Innovation du Québec (2020-2022-COVID-19-PSOv2a-51754). Study sponsors had no role in conducting the research.
Author information
Authors and Affiliations
Consortia
Contributions
The current project was conceptualized by KJD, JS, VGB, EBK, and JE. KJD prepared data for analyses, along with support from JS and VGB. KJD conducted all analyses, and all authors contributed to data interpretation. KJD prepared the first draft. All authors contributed to revisions and approved the final draft. Beyond the author team, we thank Ms. Mariam Atoui and Ruth Bruno for their assistance with data preparation.
Corresponding author
Ethics declarations
Ethical approval
The iCARE study was approved by the Comité d’éthique de recherche du Centre intégré universitaire de santé et de services sociaux du Nord-de-l’île-de-Montréal, approval: 2020-2099/25-03-2020.
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Competing interests
Kim Lavoie has served on the advisory board for Schering‐Plough, Takeda, AbbVie, Almirall, Janssen, GSK, Novartis, Boehringer Ingelheim (BI), and Sojecci Inc, and has received sponsorship for investigator‐generated research grants from GlaxoSmithKline (GSK) and AbbVie, speaker fees from GSK, Astra‐Zeneca, Astellas, Novartis, Takeda, AbbVie, Merck, Boehringer Ingelheim, Bayer, Pfizer, Xfacto, and Air Liquide, and support for educational materials from Merck. Urška Košir has received speaker fees from Merck. None of these engagements are related to the current article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Joyal-Desmarais, K., Stojanovic, J., Kennedy, E.B. et al. How well do covariates perform when adjusting for sampling bias in online COVID-19 research? Insights from multiverse analyses. Eur J Epidemiol 37, 1233–1250 (2022). https://doi.org/10.1007/s10654-022-00932-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-022-00932-y