Introduction

Web-based tools are increasingly used in epidemiological studies. They may facilitate participation in population-based studies because they do not require physical presence in the study center. However, systematic recruitment of participants directly over the Internet is not feasible as no defined sampling frame exists (Fan and Yan 2010). Studies that use web-based tools were often conducted in specific population groups, e.g. university students (Pealer et al. 2001), doctors and nurses (Raziano et al. 2001) or self-recruited users of the Internet (Owen et al. 2013).

In recent years, a number of studies have investigated the effect of recruitment approach on participation percentage in web-based studies with formal population-based sampling. Different methods were used to obtain a sample that is representative of the general population, e.g. random digit dialling (Midanik and Greenfield 2010), a random sample from a large probability-based database with potential web respondents (Nagelhout et al. 2010), or a stratified random sample from population registries (Turner et al. 2005). These studies found different participation percentages dependent on recruitment approach. However, participants were invited to answer the web-based questionnaire only once. Whether compliance with longitudinal web-based data collection differs depending on the mode of recruitment cannot be inferred from these results. Bexelius et al. investigated the possibility to use web-based questionnaires in population-based infectious disease surveillance (Bexelius et al. 2010), but as participants only had to fill in the questionnaire when they had symptoms of infectious disease, the setting differs from a web-based longitudinal study.

Furthermore, only limited data about differences between participants and non-participants of web-based studies are available (Russell et al. 2010). This question can be addressed in a study employing a web-based data collection embedded in an existing study with a different data collection mode (second-stage studies). So far, data on second-stage participation are only available for paper-based questionnaires (Boshuizen et al. 2006; van Loon et al. 2003; Volken 2013). These studies found associations between second-stage participation and sex, socio-economic status, current employment, and subjective health. A web-based study embedded in a currently conducted large study in Germany including physical examinations offers the relatively rare opportunity to study second-stage participation in this context.

We integrated a longitudinal web-based data collection on acute infections of 3-month duration in a study requiring visits to a study center in a sample drawn from the local population registries. In addition, we recruited persons from a sample with a similar age distribution directly for participation in the web-based study. We aimed to compare participation percentages and compliance with the longitudinal data collection among those who (a) participated in the original study including the assessments in the study center (Group A), (b) did not participate in the original study, but completed the non-responder questionnaire and in this context agreed to participate in the web-based study (Group B), and (c) were invited to the web-based study only (Group C).

Methods

This study was part of the pretest studies of the German National Cohort (GNC) (Wichmann et al. 2012) and was conducted during Pretest 2. In brief, the GNC is a large multicenter prospective population-based study beginning 2014 to examine risk factors for common diseases (e.g. cardiovascular, neoplastic, metabolic, pulmonary, neuro-psychiatric and infectious diseases). The study will include about 200,000 individuals between 20 and 69 years of age in Germany. The older age groups will be oversampled (20 % of individuals younger than 40 years and 80 % aged 40 years and more). The participants will be recruited in 18 study centers across Germany. During the Pretest 2 phase 2896 participants were examined in the study centers and various feasibility aspects of the final study were investigated. In all participating study centers, random samples were drawn from the population registries in the corresponding municipalities. Potential participants were contacted via land mail. Up to two reminders and up to ten telephone calls (provided that phone numbers could be identified) were used to contact non-responders.

Three study centers (Bremen, Hamburg, and Hannover) implemented the web-based data collection on acute symptoms of infections. The response proportions to the initial study among the GNC cohort were 19 % (study center Bremen), 22 % (study center Hamburg), and 16 % (study center Hannover). Subjects who agreed to participate in the pretest of the GNC (including a 4 h long examination at the study center) and had an e-mail address were asked during their visit in the study center to participate in the prospective web-based study (Group A). Those who could not be recruited for the examination in the study center received a non-responder questionnaire. In the non-responder questionnaire an invitation to participate in the web-based study was included (Group B). Finally, persons from a population-based sample (with the same sex and age distribution as in the GNC) in Hannover who were not initially contacted regarding the participation in the pretest of the GNC were directly invited to participate in the web-based study by land mail (Group C). Group C was invited once per land mail and did not receive any reminders. The flow chart of the recruitment process is shown in Fig. 1.

Fig. 1
figure 1

Flow chart of the recruitment process in the prospective web-based study on acute infections conducted in three study centers of the German National Cohort from February to August 2013

Web-based data collection

The web-based data collection lasted 3 months (February to April 2013 in Groups A and B and June to August 2013 in Group C). An open-source online survey application—http://www.limesurvey.org/ (Schmitz 2007)—was used for this purpose. Weekly emails containing the link to the web questionnaire were sent to the participants on Mondays. They were asked whether they had developed new symptoms of respiratory or gastrointestinal tract infections or conjunctivitis in the last week. If so, they were asked to provide additional information. Participation was defined as filling in the online questionnaire at least once during the study period.

Statistical analysis

Participation percentages stratified by sex and age were calculated separately for all three groups and for the combined Groups A and B. For the examination of variables associated with second-stage participation, data on participants and those who did not participate in the web-based study among those who attended the study center were described as percentages for categorical variables and medians for continuous variables. Differences between groups were tested with the Chi-square test (for categorical variables) and the Mann–Whitney U test (for continuous variables). All tests were two-sided. In addition, multivariable logistic regression was performed to identify independent associations. Compliance with the web-based data collection was defined as providing a response about the presence of symptoms in a given week, and multivariable logistic regression for correlated data based on generalized estimating equations (GEE) was used to identify variables associated with compliance (Hosmer and Lemeshow 2000). Furthermore, trajectories of compliance over the study period were studied using semi-parametric group-based modelling implemented in PROC TRAJ in SAS 9.2 (Jones et al. 2001). With PROC TRAJ unobserved groups within the sample that have distinct patterns of change in compliance over time are modelled. Trajectories are described using polynomial functions for the groups. Participants are classified as belonging to one of the groups based on individual trajectories. We started with a single trajectory and subsequently increased the number of trajectories; the selection of the final number of trajectories was based on the Bayesian Information Criterion (BIC). BIC values were −2278.22 for one trajectory, −1918.19 for two trajectories, −1891.60 for three trajectories, −1895.10 for four trajectories, and −1904.57 for five trajectories. We selected the model with three trajectories as it had the best BIC value (Jones et al. 2001).

Results

Participation percentages

In total, 570 persons agreed to participate in the web-based study, but only 477 actually filled in at least one web questionnaire during the study period (Table 1). Given that percentages of those who responded to only few weekly questionnaires were low and that the majority provided more complete records, we defined participation as at least one completed questionnaire. The participation percentage varied in the three study groups: 61 % in Group A, 18 % in Group B if considering only those who returned the non-responder questionnaire and 3 % if all non-responders were considered, and 11 % in Group C (Table 2). Since Groups A and B were recruited stepwise from the same initial sample their combined participation percentage to be compared with participation in group C was 15 %.

Table 1 Descriptive overview on the distribution of the number of weeks with responses in the web-based study
Table 2 Participation percentages stratified by sex and age group in the prospective web-based study on acute infections conducted in three study centers of the German National Cohort from February to August 2013 (analyses based on Groups A, B, and C)

Second-stage participation

Among participants of the original study with a personal assessment in the study center, we were able to assess second-stage participation for the web-based study because we had the socio-demographic information from the original study (Table 3). Those not participating in the additional web-based study had lower education and were more frequently not employed (most of those not employed were in retirement age).

Table 3 Socio-demographic characteristics of the second-stage participants and non-participants in the prospective web-based study on acute infections, stratified by response to the pretest study of the German National Cohort (analyses based on Group A)

Compliance

The average proportion of web questionnaires completed in each week was similar across Groups A, B, and C, i.e. 83, 85, and 81 %, respectively.

Multivariable logistic regression for correlated data suggested that age and employment status influenced compliance, but compliance was not associated with recruitment strategy (Table 4). Compared to individuals aged 40–49 years, younger (20–39 years) and older (60–69 years) individuals showed poorer compliance.

Table 4 Factors that influence compliance in the prospective web-based study on acute infections conducted in three study centers of the German National Cohort from February to August 2013 (analyses based on Groups A, B, and C)

Semi-parametric group-based modelling identified three distinct compliance trajectories over the study period: “poor compliance”, “improving compliance” and “very good compliance” (Fig. 2). Most participants (78 %) were in the trajectory “very good compliance” where compliance increased slightly at the beginning of the study and remained high over the study period. In the “improving compliance” trajectory (14 %) compliance was low at the beginning of the study, but increased sharply. Finally, in the “poor compliance” trajectory (8.5 %) compliance was moderate at the beginning and declined over the study period. The distribution of the three compliance trajectory groups in the groups A, B, and C was similar (Table 5).

Fig. 2
figure 2

Compliance over the study period with the prospective web-based data collection on acute infections conducted in three study centers of the German National Cohort from February to August 2013

Table 5 Differences across the three compliance trajectory groups and the Groups A, B, and C

Discussion

We examined participation percentages and compliance in a population-based study that used web-based data collection among participants recruited via different modes. Participation in the web-based study was highest among those who participated in the original study (Group A) and lowest among those who did not agree to participate in the original study (Group B). Other studies also report a higher participation percentage among people recruited from large panels (i.e. people who indicated their willingness to participate in research) compared to those who did not (Duffy et al. 2005; Nagelhout et al. 2010), indicating that some people are generally more willing to participate in studies. In our current study, it seems that participation in the initial study might have even enhanced willingness to participate in the additional web-based study, resulting in a higher participation in the combined Groups A + B than in Group C. We also showed that it was possible to recruit some participants for the web-based study among those who could not be recruited for the initial study including medical examinations in the study center. However, the participation percentage in the additional data collection among non-responders of the original study was only 3 %. This suggests that while examinations in the study center can be the reason for non-participation in some cases, the proportion of those who can additionally be recruited for a study not involving an appointment in the study center but using web-based data collection is quite low.

Compared to the 16–22 % response in the initial study with physical examination and within this group a 61 % participation in the web-based study, the response in the primarily web-based study was only 11 %. Even if the lack of access to the internet might play some role, the interest to participate in the initial study appears to be somewhat higher, possibly due to the perception of the initial study as being more interesting or because of the feedback about health status offered to its participants. However, the proportions are not directly comparable, since potential participants of the initial study received up to two reminder letters and ten phone calls while the recruitment of Group C only included one invitation letter.

With respect to second-stage participation we found that participants of the web-based study were more educated and less likely to be unemployed (with most unemployed persons being in retirement age) than those who did not agree to participate in the additional study. These findings are consistent with the results of second-stage participation analyses in paper-based studies (Boshuizen et al. 2006; van Loon et al. 2003; Volken 2013).

Despite the differences in participation percentages, participants in all three groups showed a similarly high compliance with web-based data collection. Even in Group B, where the participation percentage was low, compliance was as high as in the other groups. This is generally a positive finding which can encourage creation of web-based cohorts or panel studies. Taken together these results suggest that different recruitment approaches can be used and collected data can be combined to achieve greater sample sizes for longitudinal web-based studies.