Introduction

For large segments of the population, the internet has become a primary source of information on a broad range of health issues and a means of communication. This development provides a unique opportunity for epidemiologic research, in particular the conduct of prospective cohort studies. The attraction of using the internet for recruitment of an epidemiologic cohort stems mainly from cost efficiency and convenience for study participants (e.g., flexibility regarding time of questionnaire completion). By avoiding the expense of physical encounters, postal fees, telephone recruiters, or other more expensive approaches, a study that recruits participants and communicates with them over the internet could theoretically be conducted much more efficiently with respect to the cost of the information required. Cost savings that come at the expense of important biases would be less attractive, however. A major obstacle to validity in any cohort study is the threat posed by incomplete follow-up. In the past, long-duration prospective cohort studies—such as the Nurses’ Health Study [1], the cancer prevention studies I and II [25], and the Black Women’s Health Study [6]—have relied mainly on mail and telephone contact to enhance retention. Because the marginal expense of using the internet for contact is much lower than the more traditional methods, the opportunity for improving the cost efficiency of cohort studies while maintaining validity appears attractive. Although there have been efforts to take advantage of the communication facility that the internet provides for experimental [7, 8] and non-experimental research [911], the ability to recruit and maintain contact with study participants over a long period of time using the internet and the efficiency of the approach have not yet been evaluated.

We conducted a pilot study to assess the feasibility of recruitment, success of follow-up, and cost efficiency in a prospective cohort study conducted solely through the internet. Aside from assessing the web-based approach, another aim of the pilot study was to evaluate several hypotheses regarding lifestyle and behavioral factors for delayed conception; these results will be reported elsewhere. We chose women attempting pregnancy as our study population for this pilot study. Because preconception couples rarely congregate in medical or educational institutions [12] and often do not make their intentions known, the difficulty in recruiting women when they begin to attempt a pregnancy is a recognized challenge of such studies. Consequently, most research on fecundability has evaluated retrospectively the duration of unprotected intercourse in women who are already pregnant [12]. Problems with this approach include exclusion of fertility spectrum extremes and misclassification of exposure status, increasing the potential for bias [13]. An internet-based study may overcome these challenges by reaching out more broadly to the target population because women attempting pregnancy are more likely to be searching the internet for information about fertility or healthy pre-pregnancy behaviors.

Methods

Internet-based pregnancy planning study: ‘Snart-Gravid’

The design of the ‘Snart-Gravid’(‘Soon Pregnant’) study has been described in detail elsewhere [14]. Briefly, the study was initiated in June 2007. To be eligible, a woman had to be 18–40 years of age, a resident of Denmark, in a stable relationship with a male partner, attempting to conceive for ≤12 months and not using any type of fertility treatment.

Participant recruitment took place through an advertisement placed on a well-known Danish health-related website (www.netdoktor.dk), enhanced by a coordinated media strategy that included a press release and attracted attention from print media, online news sites, television and radio. Enrolment and primary exposure data collection were achieved solely via the study website and e-mail. The aim was to recruit 2,500 participants over a 6-month period. This target recruitment was based on the number of unique users expected to visit the website in a given year and the proportion of visitors expected to click onto the site when they see a particular advertisement (“click rate”).

Potential participants visiting the study website were required to read a consent form and complete a screening questionnaire to confirm eligibility before enrolment. They also had to provide their civil personal registration (CPR) number—a unique 10-digit personal identification number assigned to each resident in Denmark by the Central Office of Civil Registration [15]—and an e-mail address. Women using contraception at the time of the screening who planned to discontinue contraception within the next 6 months to attempt pregnancy were given the option to provide their e-mail address for later recruitment.

After completing the baseline questionnaire, which collected detailed information on socio-demographics, reproductive and medical history, lifestyle and other factors, participants were contacted bimonthly by e-mail for 12 months or until they reported that conception had occurred. If necessary, up to two e-mail reminders were sent at each follow-up. The follow-up questionnaires assessed changes in exposures, potential confounders, frequency and timing of intercourse, and whether or not conception occurred. Women who conceived were asked to complete one questionnaire during early pregnancy to collect data on prenatal exposures, after which active follow-up ceased. Women who indicated they were no longer trying to conceive were censored at their date of last response. The study website also provided information on how a participant could leave the study and not receive further e-mails.

Through the CPR number, data obtained from these self-reported questionnaires can also be linked to data from a number of nationwide registries. This linkage allows for the collection of additional data on potential confounders and outcomes of interest, as well as the independent assessment of the validity of some of the self-reported data, including prescription drug use, reproductive history, and socio-demographic variables.

Alternative approach for the pregnancy planning study

To permit assessment of the relative cost efficiency of using the web-based approach in the absence of contemporaneous cost data from a comparable non internet-based study, we compared the study costs with those of an alternative, hypothetical approach. The design of the alternative study approach was informed by considerations of feasibility given the target population of interest and experience in other large cohort studies such as the Black Women’s Health Study. This approach proposes to display study information in medical offices and pharmacies along with postage-paid postcards for interested women to pick up and mail to the study coordinating center. A study brochure and screener questionnaire would be mailed to women who sent in a postcard. The baseline questionnaire would subsequently be mailed to women eligible to enroll based on information provided in the screener questionnaire. During follow-up, questionnaires would be mailed to participants. In the case of non-response, a second mailing would be sent.

Evaluation of feasibility and costs

Feasibility was assessed by examining patient accrual data over time as well as questionnaire-specific response rates. To evaluate losses to follow-up, we calculated the proportion of subjects for which the outcome status (pregnant or not) was unknown at 6 and 12 months.

We followed women until their outcome was known (i.e., pregnancy, or stopped trying to conceive) or until 1 year, whichever occurred first. In order to broaden the applicability of our findings to other types of studies, we also investigated what losses to follow-up would have occurred in a hypothetical study that attempted to follow all subjects for an entire year (as opposed to ending follow-up at the time of pregnancy). For that analysis, we constructed a Kaplan–Meier survival curve of time to loss to follow-up over the 12-month study period. We censored women when they reported a pregnancy or indicated they were no longer trying to conceive, and we considered women to be lost to follow-up after their last completed questionnaire (before the end of the study period) or when they actively resigned from the study.

Cost estimates for the internet-based approach were based on our actual study budget. Costs associated with the hypothetical approach were obtained from a similar scope study previously conducted in Denmark by one of the authors (EMM) using mailed questionnaires (i.e., costs associated with set up of logistics program to track mailings, scanning of questionnaires, record linkage) [16] supplemented with (1) detailed cost estimates for questionnaire development, printing, mail processing and postage obtained from a survey services organization, and (2) actual expenditures for the internet-based study where applicable (i.e., research personnel cost). All costs are reported in 2008 US$ and exclude institutional indirect costs. Results of sensitivity analyses exploring the change in cost as a function of number of study participants and length of follow-up are also presented.

Results

Feasibility

Enrolment started on June 1, 2007 when the study was introduced in the media with a press release, followed by an advertisement placed on the health related website www.netdoktor.dk 2 weeks later. 10 weeks into the study, an article featuring the study appeared in a large Danish women’s magazine, which caused a spike in enrolment. No further recruitment-related events took place during the remainder of the pilot phase and enrolment stabilized at about 40–50 new participants per week. After 6 months, a total of 2,288 participants had been enrolled, which represented 92% of the target. Motivated by this success, we decided to continue enrolment beyond the pilot phase, aiming for about 4,000 participants after 12 months. To this end, another press release was issued in February 2008, which again resulted in accelerated enrolment for several weeks. Aside from the advertisement on the ‘netdoktor’ site, no further recruitment initiatives have taken place and recruitment has been steady at around 20–30 new participants per week. After 1 year, 3,358 women had enrolled in the study (Fig. 1).

Fig. 1
figure 1

Study recruitment over time (*recruitment-related events: week 0: press release; week 2: advertisement on health-related website www.netdoktor.dk; week 10: publication of article featuring the study in large Danish women’s magazine; week 38: press release)

Table 1 shows the response rates for the 2,288 subjects enrolled during the first 6 months. Cycle-specific response remained well above 85% in all cycles. For all questionnaire cycles, the majority (88%) of responders completed the questionnaire within 7 days of receiving the initial invitation. An additional 10% of responders on average completed the questionnaire after receiving one reminder (7–16 days after the initial invitation); the remaining 2% after a second reminder. Follow-up information in terms of whether a woman became pregnant was available for 87.3% of women after 6 months, and for 81.6% at 1 year. After 1 year, 63.4% of women (1,451) had become pregnant, 4.4% (102) were no longer trying to conceive and 13.8% (315) were not yet pregnant. The remaining 18.4% (420) were lost to follow-up: 4 women were lost before the start of follow-up, 376 were lost due to non-response during follow-up, and 40 women actively resigned from the study. These numbers correspond to the totals provided in Table 1.

Table 1 Response rates at each follow-up questionnaire, based on the 2,288 participants recruited during the first 6 months of enrollment

Figure 2 illustrates the expected cumulative response rates for a hypothetical study that attempts to follow participants for a 12-month study period. Given our observations, we project a cumulative response rate of 88.5% after 6 months and 76.1% after 1 year.

Fig. 2
figure 2

Survival curve of time to loss to follow-up in a hypothetical study that attempts to follow all subjects for an entire year, based on the observations in Snart-Gravid

Costs

The cost of conducting the Snart-Gravid pilot study enrolling 2,500 women was estimated at around $400,000 (all figures are in US dollars; Table 2). Time spent designing the study, monitoring its progress, analyzing data and reporting results represents the largest proportion of this cost (67%). Study set-up costs amount to $71,342 (18% of total) and comprise the costs associated with design and testing of the on-line questionnaire, website construction and development of a semi-automated system of e-mail reminders. Costs associated with recruitment are those of the targeted advertisements at a health-related website and the coordinated media strategy and reflect 10% of the study budget. Finally, costs associated with website maintenance and support during follow-up are small, representing about 2.5% of the total cost. This study budget translates to a cost of $160 per enrolled subject.

Table 2 Costs of web-based pregnancy planning study: 6 month recruitment and 12 months follow-up

Less than half (45%) of this cost can be considered fixed, while the remainder will vary depending on the study size and, to a lesser degree, duration of follow-up. Figure 3 illustrates how the study cost is expected to change with increasing number of study participants, assuming that the variable costs change linearly with increasing number of participants. Although clearly an oversimplification, we considered this assumption reasonable given that it likely overestimates costs for some components, while underestimating costs for others.

Fig. 3
figure 3

Estimated change in study cost with increasing number of study subjects (Panel A) and increasing length of follow-up (Panel B)

We estimated the cost of conducting the same study using a conventional non-internet-based approach at $805,000 (or $322 per enrolled subject; Table 3). Close to half of this budget is related to the production and distribution of paper-based questionnaires. The time spent by research personnel—conservatively estimated to be the same as for the internet-based approach—now represents only a third of the total budget. The budget also includes the scanning of the completed data forms, a cost component that is completely avoided when using the internet. The cost associated with recruitment of subjects through medical offices was estimated at $115,508 (14%). This cost could be avoided if the study were conducted in Denmark and the Danish Civil Registration System was used to identify women 18–40 years of age who are living with a male partner. Since a larger proportion of the cost for the conventional approach can be considered variable (75%), the total cost is expected to increase at a faster rate with increasing number of study participants (Fig. 3—Panel A). Figure 3—Panel B illustrates the expected change in cost with varying length of follow-up. Given the ease with which bi-monthly questionnaires can be sent via the internet, the incremental cost of each additional cycle is modest using the web-based approach ($10,700). In contrast, each additional cycle can be expected to generate an extra cost of $35,400 using the conventional approach, owing to the costs associated with preparation, distribution, and processing of paper-based questionnaires.

Table 3 Costs of conventional approach to the pregnancy planning study, based on mailings of questionnaires: 6 months recruitment and 12 months follow-up

Discussion

Large-scale prospective follow-up studies of geographically-dispersed populations tend to use postal questionnaires or telephone interviews to recruit and follow study participants over time. While effective, these approaches are expensive. For example, the cost of the Danish National Birth Cohort—a cohort study which recruited over 100,000 women in early pregnancy for long-term follow-up of themselves and their offspring (primarily using routine health registers) and collected exposure information through four telephone interviews—has been estimated to be more than €15,000,000 [17, 18]. This cost would likely be much higher in countries such as the US that do not routinely collect registry-based data on birth outcome and childhood health. Apart from the expense, conventional epidemiologic design strategies that rely heavily on telephone use have become more challenging with the ever increasing use of mobile phones by large segments of the population and with the use of the caller ID option [19].

Experience with the Snart-Gravid study has shown that use of the internet as a means for recruiting and following subjects over time offers a promising alternative. We estimated the cost of conducting the web-based pregnancy planning study to be about half of what it would have been had we conducted the same study using a more traditional approach. Recruitment rates were relatively high and remained quite steady over time. The cost of recruitment-related events (i.e. advertisement on health-related website, personnel time related to press releases and resulting media coverage) was small—representing about 10% of the study costs—and temporarily boosted enrollment, suggesting such events can be used effectively to accelerate recruitment at a small incremental cost. The apparent decrease in the average number of women enrolled per week during the last few months is likely explained by the fact that the advertisement on the health-related website was not shown as frequently during the later recruitment period. Retention was high and compared favorably with what has been reported for other large volunteer cohort studies. For example, the Black Women’s Health Study reported a response rate slightly above 81% among the actively followed participants for the 2007 questionnaire, using up to seven mailings and numerous telephone calls to enhance response rates (C. Russell, personal communication). Bonde et al. [20] reported an overall response rate of 79% for the monthly questionnaires in their 6-month follow-up study of 430 Danish pregnancy planners, with couples receiving one reminder. Furthermore, the high response rates in the web-based study were achieved with minimal efforts; that is, up to two automatically generated reminder e-mails. These rates could likely be boosted further by sending additional, personalized reminder e-mails, posting a newsletter with study updates on the study website, etc. all of which can be accomplished at a relatively small incremental cost. One possible concern might be that e-mail addresses are likely to change more often than postal addresses, and frequent moving is known to be a strong correlate of non-response [6]. As long as participants are provided a simple way to communicate address changes and an alternative way for contacting participants remains available (e.g., telephone), this problem should not be prohibitive.

Internet-based recruitment of volunteers has raised concerns among critics because the demographics (e.g., age, socio-economic status) of those with ready internet access differ from those without it. Furthermore, among those with internet access, those who choose to volunteer for studies may differ considerably in lifestyle and health from those who decline [21, 22]. Volunteering to be studied via the internet does not, however, introduce new concerns about validity beyond those already present in other studies using volunteers. In any cohort study, the central issue determining validity is not differences between study participants and nonparticipants [23], but rather comparability between the sub-cohorts that are included. In a randomized trial, random assignment provides comparability between those who receive the study intervention and those who receive the comparison intervention. In nonexperimental cohort studies, investigators employ judicious comparisons and adjustment for baseline differences in covariates to simulate the balance achieved by random assignment. Validity concerns in such studies stand in stark contrast to the issues that arise, for example, in a cross-sectional population survey that aims to infer through a sample the characteristics or preferences of a population. Such a study would have doubtful validity if recruitment were conducted using volunteers, because the validity of a cross-sectional survey depends entirely on the representativeness of the sample. But in a cohort study of volunteers, the fact that those who participate may differ from those who do not is of secondary importance because the study aims to measure health outcomes that have not yet occurred and compare rates of occurrence across study groups that have all volunteered in similar circumstances. The primary concern should therefore be to select study groups for homogeneity with respect to important confounders, for highly cooperative behavior, and for availability of accurate information, rather than attempt to be representative of a natural population [24].

Scientific generalization of valid estimates of effect (i.e., external validity) does not require representativeness of the study population in a survey-sampling sense either. Despite differences between volunteers and non-participants, volunteer cohorts are often as satisfactory for scientific generalization as demographically representative cohorts, because of the nature of the questions that epidemiologists study. The relevant issue is whether the factors that distinguish studied groups from other groups somehow modify the effect in question [24]. For example, Hammond and Horn [2] studied the health effects of tobacco smoking in a prospective cohort study of volunteers. Their findings were considered to have external validity because it was implausible that factors related to volunteering for their study would lead to an effect of tobacco smoke on death rates that would differ from that among non-participants. Similarly, volunteers recruited through the internet should provide a perfectly reasonable study population for a prospective epidemiologic cohort study, even if the participants differ from the source population with regard to some characteristics. Etter and Perneger [10] discussed this issue in their evaluation of smokers recruited via the internet and via mail for a smoking cessation trial. They found that although smokers self-recruited through the internet were younger, more educated, more motivated to quit smoking, and smoked more cigarettes per day, the different distributions of these study variables did not imply that the associations measured for study variables would differ from the corresponding values among non-participants. In contrast, cross-sectional and case–control studies, in which the exposure and outcome information are both known at the time of volunteering, would be subject to great concerns about their validity. In the context of a cohort of volunteers recruited through the internet, generalizability will depend on whether the biologic relations studied are expected to differ for those with and without internet access. Knowledge from different branches of science, not just epidemiologic data, will contribute to this assessment [24].

There were several advantages of conducting this study in Denmark. First, Denmark currently has the highest prevalence of internet access in the world [25]. Second, the extensive system of registries [15] enables us to collect follow-up data on all study participants, even if they dropped out of the study, to collect information on additional covariates, and to validate data on several exposures, such as medication use and past reproductive history. Finally, there is a high prevalence of infertility and subfecundity in Denmark [26, 27], and the population is highly motivated to participate in reproductive health studies. The approach still needs to be evaluated in countries where the internet use is lower and where there might be more reluctance to share personal information over the internet.

In conclusion, the successful conduct of this pilot study indicates that the internet may be a useful tool to recruit and follow subjects in prospective cohort studies, suggesting good potential in terms of the range of hypotheses that could be evaluated and the breadth of the populations that could be reached in a cost-efficient manner.