Keywords

1 Introduction

In the first of 2021, the world Internet active users have reached 4.66B users, with nearly 60% of the world's population [1]. With this massive spread, the Internet has become one of the main sources of information because of the ease-of-access and large amount of information transmitted to it through online websites or social networks.

The health domain is one of the main domains that have a large share of the information published on the Internet. As reported, there are over 70,000 online websites specializing in providing medical information for online users making the internet an important data source [2]. Also, many other websites, non-specializing in health, provide information in this field. About 7% of daily searches on the Google search engine, alone, are related to health [3]. Some researchers argue that sharing health information between users is very important to increase consumers’ health experience [4].

With the many benefits of the availability of the health information over the Internet, however, this has a frightening and dark side, due to the ease of spreading false and misleading information. That may cause significant problems that may lead to harm or even death [5, 6]. It has been reported over 800 persons have died because of misinformation about covid-19, and other 5800 were taken to hospitals [5]. Others reported many other issues that affected communities because of fake and misinformation, such as effects on economies, marketing for untrusted products and many others [6].

Due to the rapid spread of misinformation, the World Health Organization (WHO) indicated that the spread of the information epidemic regarding Covid-19 is parallel to the spread of the virus and contributed to deaths and injuries [7].

Because of these problems, many researchers have focused on finding the factors that determine how users trust websites that provide health information. Identifying and understanding these factors, will help enhance our ability to identify websites that provide misinformation and the level of trust in these websites and thus may help decrease their spread and effects on Internet users.

Our work aims to understand how users trust websites that provide health information, and what factors affect their trust. Using these factors and their importance can help build a trust model that measures the level of trust of websites and consequently the information provided by them. The work focuses on Covid-19 as a case study, because of the global concern of the massive amount of misinformation that was published in the last few years.

2 Related Work

Fake information is news or information fabricated published with the intent to cheat people to achieve goals for the publisher [9]. This misinformation is published on social media, websites, and other information sources.

Several researchers focused on measuring the factors of trust on websites that provide health information to determine whether to publish information or not. Sillence et al. [10] used an online survey, on 1123 users (625 USA, 498 UK), to find the indicators that affect user trust in websites that provide health information. The authors focused on four types of indicators in their questionnaire: personal experiences, credibility and impartiality, and privacy and familiarity. They found credibility and impartiality as the main factors that have directly affected user trust.

Gunther et al. [11], on the other hand, conducted experiments by requesting from participants to undertake specific tasks on searching about health information, then interviewed them. Twenty-one users, who previously searched for health information on the internet, participated in this experiment. The final result of these experiments was that most of the participants focused on several factors including source or provider, website design, usability, language, and scientific appearance. However, no participant searched for information about the website or parent organization. Also, in [12], the authors found that the interface design and its beauty are influencing factors of user's trust, in addition to, the main factor they found, the familiarity of the website.

In the study [13], they found that parents rely heavily on health information from the Internet to care for their children. The authors have interviewed 15 parents of children of ages between 1.5–21 years. To trust the information, parents made comparisons between information from different sources, such as other websites, experts, or other sources. Another study found that the parents’ trust is affected by the title and description of the websites, but often they did not consider the sources [14].

In [15], they found the trust in online health services is affected by separate factors, the most important of them is trust in offline services (e.g., parent organization and medical team). These factors have a direct impact on users’ trust in online health services. The participants in this study, total was 93, were users who used the e-service of a hospital. In [16], the authors found that the website’s origin affected users’ trust in addition to other factors, such as ease-of-use, familiarity, language, references, and commercial interest and others. Although the participants in the experiment reported in [17] expressed distrust of online health information in general, but they noted that the organizational authority and clear language affected their trust positively.

In the study [18], after interviewing a small group of people, the authors found differences between participant opinions about the factors that affect their trust in online health information. But in general, the information style and website design were identified as factors that have an impact on trust. The study in [19] confirmed the existence of discrepancies in the opinions of people with little health experience and their reliance on inaccurate factors to assess the “goodness” of health information provided on websites, so they did not accurately identify the validity of the information provided, compared to people with health experience. Yalin et al. [20] summarized 37 research papers from the year 2000 to 2019 in this field, as a systematic literature review. They found many factors that affect users’ trust in online health information with different levels of importance such as trustworthiness, expertise, objectivity, familiarity and others.

Trust is a difficult phenomenon to study, especially that people may not always be good at assessing the factors that really affect their perceptions of trustworthiness [22]. Also, using people’s perception to determine trust factors may not always provide very reliable results, because people often lie, although not necessarily always consciously [22]. However, based on the above studies, understanding and determining how people perceive trust and how they make their trust judgement is a valid and a noteworthy approach to use for trust modelling.

3 Our Approach: User Perception-Based Trust Model for Websites

This works aims to develop a trust model that can calculate level of trust of websites based on a number of factors and their importance. However, to identify the factors of significance in determining trust and their importance, we conducted a quasi-experiment on a set of pre-identified key factors, identified as the most significant, as the most frequently reported by the literature, and a set of manually pre-selected websites, used as a gold standard, to compare against participant responses. These will help determine the importance level of each of these factors based on users’ perception of how they affect their level of trust in websites and the information provided by them.

3.1 Trust Factors

This section will describe the factors that were identified for the study and the way each affects user trust. The selected factors were chosen based on those significant factors identified in the literature and those of relevance to the characteristics of the manually identified websites for the quasi-experiment.

These identified factors were divided into three types: those related to the characteristics of the website itself, the content provider, and the content itself. These factors are described below in more details.

Website Quality:

  1. 1.

    Website design: it falls under the main factor of aesthetics. The first impressions have an important effect on user trust, one of the main factors that affect the first impression for the user is the website design [21]. Many of the previous research mention the importance of website design as one of the main factors of user trust in websites that provide health information [11, 12, 18].

  2. 2.

    Website performance: the performance of the website helps to achieve user requirements and provide good experience using it, which raises user trust in the information provided by that website. Performance of the website is the main factor for the success of websites, which can be measured by the response time, loading time, page size, and others measures.

  3. 3.

    Website global rank: there are many algorithms that are used to rank websites in search engines such as PageRank, HITS, and others, however each measures the value of websites in a different way. We used the rank provided by SimilarWebFootnote 1, because it calculates the popularity of websites by measuring the number of monthly unique visitors and the number of page views.

  4. 4.

    Website domain: is a string that defines the realm of administrative autonomy, authority, or control on the Internet. Some domains need special conditions such as high-level domain (e.g., .int, .edu, and others). As reported, a website domain increases trust and prevent distrust [23].

Website Origin:

Privacy policy: a legal document that shows how an organization or website collects and uses user’s data. Many researchers reported that privacy policy affects level of user trust to provide personal information [20].

Logo: research indicates a relationship between the logo and its familiarity to users, which it has decreased the need to verify the information provided in the website [10].

Parent organization type (profit, nonprofit): according to [20], websites published by nonprofit organizations that provide information about the celiac disease have a higher rank on Google than websites by commercial organizations. That may be because other websites view it as reliable sources of information [20].

Location of parent organization: Richard et al. [24] found the location (domestic or international) of e-commerce organizations affects user trust in their website. This research will study if this information applies to websites that provide health information.

Country of parent organization: many users are affected by the country of the website that provides health information. In [16] some of the participants reported that they trust information from America more than others.

Content Quality:

References: studies reported that using references, in citing the content, both online and offline references, had a significant impact on increasing user confidence in online health information [11].

Scientific and official touch: the scientific content can be observed through several characteristics, the most important is writing well, i.e., writing correctness and style, and showing the author’s information, which would increase the user’s trust in the information provided [11].

Multi-language: reported that websites that provide a multi-language ability helped to increase trust in websites that provide information or services for large societies [20].

From the above, the exact factors, their abbreviations and description, that will be considered to study further in our work are listed in Table 1 below.

Table 1. Trust affecting factors under study.

3.2 Quasi-experiment Design

After reviewing the literature, we developed a set of questions that measure the level of importance of the identified factors and to study how they are perceived by participants in our experiment. The quasi-experiment included an online questionnaire that included 12 questions that represented the selected trust factors. Each participant is required to complete one questionnaire, for each one of the pre-selected online websites, completed separately for each. Six websites were carefully manually selected, ranging from highly trusted to low trusted websites, described in more details below.

We requested from the participants to go through specific articles on each of the six websites that provided information about Covid-19. Secondly, they, then, rate their level of trust in the website out of 10. As a third step, they answer 12 questions, on 5-point Likert scale, divided into three categories, each category on one type of the above factors that represent the characteristics of the website, the provider, and the content. These three steps are repeated for each of the six websites. We added the global rank, from SimilarWeb, to the description for every website, as a potential indicator.

3.2.1 Websites Selection Criteria

The websites were manually chosen so that they carry different degrees of trust: two websites have a high degree of trust, which are considered officially recognized as main sources for health information, in the world, and belong to official organizations (e.g., WHO, Harvard University). Two other websites with lesser degree of trust, one of them for a commercial company for the health industry, another one allows users to talk about their health experiments (e.g., pfizer). The last two websites were chosen based on the recommendations of some press reports that they are not reliable [3]. After careful manual check, we found they publish some misleading articles that contradict with WHO reports.

Ideally, selecting more than six websites would provide more accurate results, but due to the long time it took to complete the experiment by each participant, six websites were deemed sufficient to measure the user perception of each of the factor’s importance. Table 2 shows the list of the selected websites arranged depending on the degree of trust based on the opinion of the researchers.

Table 2. Websites for experiment

3.2.2 Participants Selection

To reach a more comprehensive view of user perception of common internet users, ideally participants should include non-expert users. However, based on the results of previous studies [18, 19], users with low level of health experiences find it difficult to decide if the online health information is true or not. Therefore, to qualify the results, the experiment included more than half of the participants (56%) with those that have work experience in the medical field, to achieve more accurate results. These participants included health experts from three national hospitals from two different cities. The rest of the participants were common users, come from different specialties such as (computer sciences-CS, teaching, and others). Figure 1 shows the work experiences of the participants.

Fig. 1.
figure 1

Participants distribution.

3.2.3 Platform for Data Collection

We collected the data by designing the questionnaire as a Google form, to enable and facilitate conducting the experiment online, which also provides ease of use and easy access. Additionally, participants come from different locations and thus Google forms enables to conduct the experiment remotely.

4 Experimentation and Results

This section describes the collected data and discusses the proposed model. The first step in the data analysis is to calculate the average degree of trust from participant responses, on the six websites. As shown in Table 2, we found that most participants were able to distinguish highly trusted and medium trusted websites, but with less ability to distinguish untrusted websites. Table 3 shows the websites and the average degree of trust for both health and non-health expert participants. On ANOVA test, the results are statistically significant, of p-value < 0.00001.

Table 3. Trust degree for all participants

To reach more distinctive results between participant responses, collected data was reduced to 3-point Likert scale: disagree (strongly disagree and disagree), natural, agree (agree and strongly agree). All data, i.e., all participant responses, for all factors, for all participants for the six studied websites are shown in Fig. 2 (data for the same factor for each of the six websites from 30 participants are combined, thus total 180 responses for each factor). From participant responses, we identified the level of importance of each of the factors. We found that most of the participants consider the use of references, location of parent organization, and writing correctness (scholarship) are very important for trusting health information provided by websites. On the other hand, most of them care much less about websites’ performance (response/speed), privacy policy, and global rank of websites.

We found a distinct difference of change in the level of importance for the same factor that caught the attention of the participants, for different types of websites. For example, we found the website design received lower attention in trusted websites with rank 8, and takes more attention for medium trusted ones with rank 6, but in untrusted ones it takes most attention with rank 2. Table 4 shows the arrangement of the factors based on the degree of importance by participants, (i.e., their perception as of higher importance) from high importance to low for websites categorized as trusted, medium-trusted, and untrusted or low-trusted.

To improve the accuracy of results, we set a threshold to accept the factors that received more than 100 positive responses (i.e., arbitrarily set to, at least 75% of total highest Agree responses) from participants to define user trust in websites that provide health information. From this, we can deduce the most important factors to consider are (numbered from most important to least important): 1: Reference, 2: Location of the parent organization, 3: Article written (scholarship); 4: The country of website, 5: Author information's, 6:Website design, 7: Website domain.

Fig. 2.
figure 2

Participant’s answers for all indicators questions.

Table 4. Arrangement of the factors based on the degree of acceptance of participants for websites divided into three categories

5 Proposed Trust Model

5.1 Trust Score Model

Based on the literature and our work in the quasi-experiment, we developed a trust score model to assess the degree of trust of websites that provide health information. To reflect the importance of each factor, each is derived from participant's opinion or perception of trust. For example, based on the results, we can deduce that the importance of using references is considered more important than website domain, as perceived by the participants. Thus, to compute the weight of the factors in our model, from user responses, we use the following equation:

$$FW=\frac{FAG}{ALLAG}$$
(1)

where FW is factor weight, FAG is the agree response of the factor, and ALLAG is the total Agree response for all the seven identified factors. The result of the calculated weights or trust scores is shown in Table 5 (shown values are approximated).

Table 5. Trust score model

From trust scores, in Table 4, a fully trusted website will be donated with 100 points and a fully untrusted website will be donated with 0 points. To achieve additional accuracy or different weighting for each factor, to calculate the points for each in the model, additional conditions were considered for each factor. The following conditions are derived from obtained results and the literature:

  1. 1.

    Reference score: websites that use less than 3 references achieve 5 points, 4–6 references achieve 10 points and more than 6 achieve 16 points. Note the Reference score may change depending on the article.

  2. 2.

    Location of parent organization score: the website must belong to an official organization, if the location of the organization is global it achieves 15 points, for regional it achieves 10 points and local it achieves 5 points.

  3. 3.

    Article written (scholarship) score: the article must be written correctly with fewer writing and grammar errors. If No errors, it achieves 14.5 points, 1–3 errors, it achieves 10 points, and 4–6 errors, it achieves 5 points, more than 6 errors, it achieves 0 points. To find “writing” errors, we use Grammarly Google Chrome extensions.

  4. 4.

    The country of website score: website country must be in the first 20 in the number of the published scientific article to take 14 points.

  5. 5.

    Author information's score: the website must show the article author information to achieve 14 points.

  6. 6.

    Website design score: to measure the user interface value we use Wave toolFootnote 2 (web access ability evaluation tool).to achieve 13.5 points, website must have less than 20 Design Errors and Contrast Errors.

  7. 7.

    Website domain score: to achieve 13 points, a website must be from the top-level domain (.int, .edu, .gov, .mil), because it has allocation restrictions.

Accordingly, the proposed trust model, to calculate its trust score, is developed as an algorithm. The developed algorithm is included below.

figure a

6 Validation and Testing

To validate the developed score trust model, the algorithm is manually applied, first, on the same six websites that were used in the quasi-experiment, to compare the results of the score trust model to results obtained from the participants. The purpose is to conduct a manual evaluation of the model with comparison to the defined gold standard, with a trust degree approximate to the opinion of the researchers and participants in the quasi-experiment. To calculate trust score for a website, we use the following equation:

$$TS(ws)=\frac{\sum FW(ws)}{10}$$
(2)

where TS is trust score for website ws, and FW is factor weight for website ws, for each of the seven identified factors. FW for each of the factors is calculated bnaccording to the conditions set in Algorithm 1.

The resultants scores generated by the model are then compared to the researchers’ opinion in Table 2 and the opinion of the health expert (only) participants, i.e., those with health experience in Table 3 (health experts are selected to increase the accuracy of the outcome). The results are shown in Table 6. As shown, the proposed model achieves relatively good results with acceptable error rate. It is able to evaluate websites and their information effectively and distinguishes between them effectively, i.e., it discovers reliable and unreliable websites.

Table 6. Validation and testing result

7 Limitations and Future Work

The authors recognize that this research has some limitations, but believe that it provides a good basis for an approach to aid the users to identify websites that provide misinformation. We note some limitations that may be constitute threats to the validity of the results. Firstly our result is based on a human judgment. This implies that there may be differences in people's opinions and their focus may be on different factors to trust websites that provide health information and may not scale to websites providing different type of information. Secondly, having larger number of websites to assess and a larger sample of participants in the quasi-experiment would have improved accuracy and reduced error. However, finding large number of health experts, to spend a long time as participants, can be a real challenge.

As a future work, scaling up the research and evaluation of the trust model, on larger number of participants and larger and different types of information and websites would enable to create a scalable solution. Developing the trust model algorithm as an automatic computation in internet browser, e.g., plug-in, may prove a valuable tool to users to distill invalid information.

8 Conclusion

This paper developed an approach to identify online sources that provide misinformation, about medical information. It developed a trust model that provides trust scores of websites. The model assesses the degree of trust of websites that provide health information, using trust factors identified based on user perception of how users trust information provided by online sources or websites. To achieve, 12 factors of relevance were identified from the literature and conducted a quasi-experiment to derive user-perceived opinions of the importance of the factors that affect the level of trust in health information providing websites, using COVIDE-19 as an exemplar.

The results found 7 factors, out of the 12 studied, that are of most importance to users to determine trust in health websites. Using these results, a trust model was developed to calculate a trust score for websites. The developed model was manually validated against a set of gold standard websites and health expert opinion. As shown, the proposed model achieves good result with an acceptable error rate between 15%–19%.