Introduction

In the digital age, health information is often cited as a resource necessary for staying well, preventing and managing disease, and making other decisions related to health and health care. It provides rationale for guiding appropriate health behaviors, treatments, and decisions. As such, growing numbers of consumers are connecting to the Internet to seek out health information. There currently exists, however, no standardized mechanism to ensure equal access to the valuable benefits the Internet can offer. It has been argued in fact that, in the United States at least, a series of technology initiatives in the late 1990s have largely eliminated Web access disparities, or the “Digital Divide” [1]. Little evidence is available to accurately identify the size and makeup of the digitally underserved consumer population, which may still lack equal access to health information. To address the Divide, however, health care managers and policymakers will soon need to incorporate Web resources into patient education and shared decision-making programs, as part of a strategy or intervention to help underserved groups gain access to information, relevant to their health care.

Studies in this area to date have focused mainly on comparisons of the use or nonuse of the Internet for finding general health information. Cotten and Gupta, [2] for example, examined differences in characteristics between online health information seekers and individuals who search for health information from offline sources. They argued that the primary characteristics of online and offline health information seekers must be known to better recognize their needs, highlight improvements in information quality and availability, and understand factors that discriminate between those who seek online vs. offline health information. Their study examined factors that differentiated online and offline health information seekers (n = 385) from the 2000 General Social Survey, and found that the majority of both online and offline health information seekers still relied upon health care professionals as a source of health information. They hypothesized that key factors, such as age, income, and education, act to discriminate between online and offline health information seekers, suggesting that general “Digital Divide” characteristics influence where health information is sought. Healthier consumers were also less likely to look exclusively offline for health information compared to others.

In a related study, Skinner et al. [3] noted the existence of inconsistent Internet access measures, at least among early adopters of the technology. Their study used an inductive qualitative research design, employing 27 focus groups in Ontario, Canada, and examined young Internet users’ perspectives on the value of the Internet to obtain health information. They found that variation in the quality of Internet access influenced young people's ability to obtain health information and resources. They proposed that quality of Internet access was affected by four key factors: privacy, gate-keeping, timeliness, and functionality, and noted that privacy was particularly relevant to these young people in getting access to sensitive health information (e.g. sexual activities). Variation in Internet access also impacted participation in mutual support or social networks, and affected gaining access to specific health questions. Their results highlighted the limitations of using Internet penetration statistics alone as a measure of access, suggesting the need to improve measures of access in order to fully evaluate the potential of e-health. This is imperative for addressing the digital divide affecting populations both within countries, and globally between countries. Though their limited sample size (210) and relatively young population limit generalizability, findings here highlight the persistence of information access barriers, even for Internet-savvy users.

In a contrasting study, Kakai [4] identified disparities in access to information between Asian and Caucasian groups. Using a qualitative correspondence analysis of cancer patients in Hawaii, they focused on differences in ethnicity and educational levels and health information selection. They identified three clusters of health information pertinent to three ethnic groups: Caucasian, Japanese, and Asian non-Japanese. The results of this study revealed that Caucasian patients preferred objective, scientific, and updated information obtained through medical journals or newsletters from research institutions, telephone information services, and the Internet. Japanese patients relied on media and commercial sources, including television, newspapers, books, magazines, and CAM providers. Non-Japanese Asians and Pacific Islanders used information sources involving person-to-person communication with their physicians, social groups, and other cancer patients. Higher educational levels were observed relative to preferences for health information that emphasized objective, scientific, and updated information, while lower education was associated with personally communicated information. These ethnocentric patterns of health information preference remained relatively stable at different educational levels, implying that the effect of patients’ ethnicity influenced information preference more than education. They suggested that these differences highlighted the importance of recognizing culturally developed world views when understanding their health information seeking behavior.

These, and related parallel studies, have sought to identify the existence or extent of a digital divide, or alternatively have focused on differences within specific technology applications. We suggest, however, that given the persistence of a digital divide for health consumers, there is a need to examine distinctions within the digitally underserved groups, using targeted strategies tailored to the needs of subpopulations, rather than attempting to categorize the digital gap as a single entity. As such, delineation of consumer subgroups, and their differential information behaviors, need to be examined in greater detail.

Overall, few of the above have investigated, in depth, whether demographic subgroups had a differentiated pattern of access to and use of the Internet, and if so, why and to what extent.

Research question

The purpose of this study was to identify (1) the demographic characteristics of online health information seeking, (2) the main factors motivating or impeding Internet users to seek health information, and (3) the relationships of health information seeking activity with other online activities.

Method

Sample and variables

Data for this study was obtained from the 2002 Tracking Survey Data of the Pew Internet and American Life Project (N = 2463) with details reported elsewhere. Among the interviewees, 1494 interviewees (60.66%) reported that they ever went online to access the Internet or World Wide Web. The 1495 interviewees were defined as “Internet users” in the study (N Internet users = 1494). The Internet users were again classified into health information seekers (N health  information seekers = 987) and non-health information seekers (N no health information seekers = 506). The health information seekers were defined as Internet users who, at any time, sought any kind of health information on the Web. To identify the demographic characteristics of online health information seekers, 13 categorical variables were selected. The variables were gender, age, education level, race and ethnicity, Internet experience, the frequency of Internet access, marriage status, income level, Internet connection method, self-reported health status, health service utilization within 1 year, health conditions (chronic disease or disability) of interviewees, and health conditions of dependents. Each variable was coded as (1) male/female and (2) White, African, Asian, or Hispanic. To facilitate further analysis of health information seekers, relationships among health information types and their likelihood to seek online health information were also included in the study.

Ten online activities were included in the study. The activities were (1) send or read e-mail; (2) get news online; (3) look for information about a service or product you are thinking about buying; (4) look for health or medical information; (5) buy a product online, such as books, music, toys, or clothing; (6) buy or make a reservation for a travel service, like an airline ticket, hotel room, or rental car; (7) participate in an online auction; (8) make a donation to a charity online; (9) look for information about domestic violence; and (10) go to web sites that provide information or support for people interested in a specific medical condition or personal situations. The activities were coded as “yes” if interviewees reported that they ever accomplished the online activity. All “no response” or “refused” data were excluded in the analysis in order to minimize variation.

Data analysis

Chi-square test, logistic regression, and pairwise distance matrix (p-distance) and unweighted pair group method with arithmetic mean (UPGMA) tests were employed to identify the characteristics of online health information seekers, to predict the probability of the seekers, and to investigate the similarity/dissimilarity among 10 online activities, respectively. The data analysis for this paper was generated using SAS software, Version 8.02 of the SAS system for window. UPGMA was performed with MEGA2 (version 2.1) [5].

As a preliminary step, chi-square tests were conducted to examine the independence between categorical variables and the homogeneity of probability of each cell in a contingency table. In the independence test, the null hypothesis (H0) was that health information seeking activity is independent of a categorical variable (x). The alternative hypothesis (Ha) was that health information seeking activity is dependent of a categorical variable (x). In the homogeneity test, H0 is the ratio of classes in a categorical variable (x) and assumes homogeneity. Ha is the relative ratio and assumes significant heterogeneity. Pearson chi-square tests and Mantel–Haenszel chi-square tests were applied to test these conditions. Logistic regression with binary response (yes/no) was then employed to predict the probability of health information seeking and to estimate the factors which influence health information seeking activity. If an Internet user ever sought health information on the Web, 1 was assigned to the response variable, and 0 otherwise. The response variable therefore indicates the probability of health information seekers at different classes of categorical variables. In addition, estimated coefficient and odds ratios of each predictor provided statistical evidence to identify the main factors affecting health information seeking activity.

The p-distance matrix and UPGMA tree diagrams were also employed here to measure and visualize the similarity/dissimilarity among online activities. In the online activity analysis, p-distance between activities A and B was calculated by counting the number of interviewees reporting a contrary response to activities A and B (activity A is “yes” but activity B is “no,” or activity A is “no” but activity B is “yes”). The counted number was divided with the total number of interviewees responding that they ever do activity A and/or activity B. That is, the p-distance in this study is the rate of the different answers (yes or no) for a pair of two activities. The smaller p-distance of a pair of A and B indicates that the high proportions of interviewees do both activities A and B. So the two activities are more similar than other pairs of activities of which p-distance is larger. On the basis of the similarity/dissimilarity, a UPGMA tree was constructed.

Results

Demographic characteristics of online health information seekers

Chi-square tests provided evidence that health information seeking activity is associated with gender (p < 0.0001), age (p < 0.0001), race and ethnicity (p = 0.0386), Internet experience (p = 0.0001), the frequency of Internet access (p = 0.0484), marriage status (p < 0.0001), the health service utilization within 1 year (p < 0.0001), and health conditions of interviewees (p < 0.0001) at 0.05 level. On the other hand, education level (p = 0.2069) and income level (p = 0.1383), Internet connection methods (p = 0.0940), self-reported health status (p = 0.1837), and health conditions of dependent (p = 0.0976) were not significantly related to health information seeking activity (Table 1).

Table 1 Association between health information seeking activity and selected demographic variables
Table 2 Association between health information seeking activity and selected demographic variables

Frequency of health information seeking

The majority (76.30%) of health information seekers reported that they searched for health or medical information only once every few months, or less often. Only 2.12% of the seekers searched for health information every day. Health information seeking was statistically associated with education level (p = 0.0012), Internet experience (p = 0.0278), frequency of Internet access (p < 0.0001), marriage status (p = 0.0410), health service utilization within 1 year (p = 0.0410), and health conditions of dependents (p = 0.0100). In particular, Internet users with higher education levels (college graduate), or chronic disease patients were more likely to seek health information on the Web everyday (Table 2).

Searching on behalf of another

Over one-half (51.71%) of health information seekers searched health information for someone other than themselves. The tendency was significantly associated with gender (p = 0.0269), age (p = 0.0345), self-reported health status (p < .0001), health service utilization (p = 0.0006), and health condition of interviewees (p < .0001). A Mantel–Haenszel chi-square test demonstrated that the willingness to search information for someone else became stronger when the Internet users were female or highly educated (some college), or when the self-reported health status was favorable (excellent or good), as shown in Table 3.

Table 3 Association between information beneficiary (self or another) and selected demographic variables

The impact of the internet on health

A total of 77.30% of health information seekers reported that the Internet had influenced the improvement of their health or medical information and services. A similar favorable response was seen across all categorical variables. In particular, the frequency of Internet access, the Internet connection method, and the frequency of health information seeking were strongly associated with a like positive response, at a 0.05 significance level (p = 0.0048, 0.0126, and <0.0001, respectively). Internet users who daily accessed the Internet, those who connected through DSL or cable, or those who frequently sought health information on the Web were more likely to report that the Internet had provided a favorable impact on their health and medical information and services (Table 4).

Table 4 Association between favorable response regarding health information improvement and selected demographic variables

Main factors motivating/impeding internet users to seek health information

Using stepwise logistic regression, our model identified that gender (p < 0.0001), age (p = 0.0023), Internet experience (p = 0.0126), frequency of Internet access (p = 0.0053), and the health conditions of interviewees (p = 0.0054) were marginally associated with the binary response (yes/no) of health information seeking activity, at the 0.05 level. Assuming the five categorical variables were independent of each other, the logit regression model is summarized by the equation

$$E[Y] = \frac{{e^X }}{{1 + e^X }} = (1 + e^{ - X} )^{ - 1} \\ X({\rm ye}s = 1) = 0.5730 - 0.7051x_{{\rm gender}\,{\rm (male)}} \\ -\, 0.3964x_{{\rm age}\,(\backslash\{ 18{\rm - - }29\} )} + 0.2181x_{{\rm age}\,(\backslash \{ 30{\rm - }49\} )}\\ -\, 0.7774x_{{\rm experience}\,( < 1)} \\ -\, 0.2526x_{{\rm experience}\,(\backslash \{ 2{\rm - }3\} )} + 0.8639x_{{\rm frequency}\,{\rm (daily})} \\ +\, 0.6336x_{{\rm frequency}\,{\rm (weekly})} + 0.8005x_{{\rm health}\,{\rm condition}\,{\rm (yes)}}$$

The estimated coefficients on the above regression equation illustrate that health condition of the respondent is the most crucial factor motivating health information seeking, and lack of Internet experience (less than 1 year) was the strongest predictor of impeded health information seeking (Table 5). The overall model displayed a Wald statistic equal to 58.7412, significant at the 0.05 level (p < 0.0001). The Max-rescaled R 2 was 0.0868.

The logit regression analysis provided the estimated odds ratio for each predictor. The odds ratio illustrated how much the odds increased or decreased per unit change of the associated predictor, with all other predictors held constant. The odds ratio for the gender coefficient (male versus female) was 0.4941, with a 95% confidence interval of [0.3702, 0.6593]. This suggests that male Internet users were almost one-half as likely to seek health information on the Web as female Internet users. The odds ratios for age demonstrated that younger (18–29) users were 0.6728 times less likely to seek health information than those aged 50 years or more (at p < 0.05). The odds ratios of middle-aged users (30–50) and older users (over age 50) was observed to be 0.8903 and 1.7373, respectively (at p < 0.05). There was little difference in health information seeking activity between the two age groups. When examining Internet use experiences, users who had less Internet experience (less than 1 year) rarely sought health information (monthly or less); those without any disability, handicap, or chronic disease were also less likely to seek health information on the Web (at p  < 0.05), as seen in Table 6.

Discussion

As seen here, increased health information seeking was associated with gender, age, race, and ethnicity, as well as level of Internet experience, frequency of Internet access, marriage status, and health conditions of interviewees. On the other hand, education and income level (0.1383), Internet connection methods (0.0940), self-reported health status (0.1837), and health conditions of dependents (0.0976) were not significantly associated with greater health information seeking activity. Search behavior therefore varies depending on type of information sought, reasons for searching, and experience levels. More importantly, the results further demonstrate that segments of the population remain underserved in their search for health information.

Table 5 Logistic regression: analysis of maximum likelihood estimates
Table 6 Estimated odds ratios in logistic regression model

Few studies to date have investigated, in-depth, whether certain populations have disparate levels of access to and use of the Internet, and if so, why and to what extent. A few have suggested that use of the Internet in health care can bring various benefits, such as improved equity in access to health information, effective dissemination of new information, enhanced communication, and closer interaction between patients and physicians [612]. These did not examine, however, specific information-seeking behaviors, and the degree to which they are associated with specific subgroups.

Implications

From both a managerial and policy perspective, we emphasize here and elsewhere [13] the degree to which income levels influenced distribution patterns and diffusion trends in access to computers, the Internet, and online health information. Use of computers was strongly associated with the level of income both in 2000 and 2002, despite late 1990s government programs designed to eliminate the digital divide. Adults with lower incomes had fewer opportunities to use computers than those with higher incomes. Most of the low income groups never or occasionally used computers, whereas the use of computers by medium and high income groups was over 80% in 2000 and increased in later years. These rates indicate that the computer utilization rates of low, medium, and high income populations, respectively, increased 3.8%, 5.6%, and 4.1% between 2000 and 2002. The persistent digital divide between low and high income populations still exists, and has not been improved, although the overall availability of computers and Internet access in the United States has increased somewhat. It appears, then, that national initiatives aimed at reducing the digital divide have had little effect in providing low income adult populations with opportunities to use computers. Further study, focused on similar underserved groups, is needed to ascertain the effectiveness of computer training or community-based computer centers for adult populations overall.

We suggest that to ensure all groups enjoy the potential benefits of the Internet, it is necessary to first identify characteristics of all underserved subpopulations, who may lack equal access to health information. On the basis of this identification, new strategies and interventions are needed to target underserved populations and help them develop search skills, relevant to their health care.

Conclusion

For consumers of health information, the technology initiatives of the late 1990s across the United States appear to have had little effect in eliminating the digital divide. As the National Health Information Infrastructure begins to take shape, access to information will be an essential part of the consumer-centric, shared decision-making framework outlined in the NHII strategic plan.