Background

In 2003, the National Assessment of Adult Literacy Survey reported that older adults, some minority groups, and adults with chronic health problems are more likely to have limited literacy skills [1]. These groups are also at higher risk for cancer and are more likely to need cancer-related information. Therefore, it is important to assess factors that influence comprehension of cancer education materials.

Focusing exclusively on “reading level” to evaluate print and Web-based cancer education materials for US adults is not enough; this strategy misses important factors that can influence reading comprehension [26] and perpetuates the gap between what we know we should do and what we actually do when developing cancer education materials. A systematic review of “readability and comprehension” instruments found limitations to those most commonly used, specifically the inability to take into account sentence structure, prior knowledge, and the effects of illustrations [2]. Thus, materials meeting a recommended reading level, e.g., sixth grade, may not be as comprehensible as the developers might believe; alternatively, higher reading levels are not necessarily too difficult for low literacy audiences, should we use the arbitrary cutoff of sixth grade. Other researchers also suggest that layout [3, 79], use of graphics and illustrations [3, 8, 10, 11], learning stimulation and motivation [3, 6], and cultural appropriateness [3, 9] may improve reading comprehension and a patient’s ability to apply health information.

“Suitability” as defined by Doak, Doak, and Root covers six categories of these factors: content, literacy demand, graphics, layout/typography, learning stimulation and motivation, and cultural appropriateness. The purpose of this review is to describe the use of instruments to evaluate categories of suitability in cancer education materials in the published literature. This review will (1) describe the instruments used to assess at least one category of suitability beyond reading level, (2) identify which categories are most frequently measured, and (3) summarize the findings.

Methods

Inclusion Criteria

We used Doak et al. [3] categories of suitability [3] to create the search strategy for published studies reporting assessments of cancer-specific education materials for categories of suitability. Studies were included if they reported original research, were published in English, and evaluated print or Web-based cancer-specific education material. Studies were included in the final sample if they measured reading level and at least one category of suitability.

Search Strategy

The search was designed by health sciences research librarians and last updated in June 2009. Ovid Medline, EBSCO CINAHL, and Ovid PsychInfo were searched for peer-reviewed articles published since 1996 using combinations of terms from the categories of suitability, measure, assessment, formula, or evaluate, patient education, forms of print materials, cancer, and limited to English language (the full search strategy and flowchart of article selection are available from the corresponding author). We also reviewed reference lists of selected articles and the Harvard School of Public Health’s Literacy Studies website (http://www.hsph.harvard.edu/healthliteracy/materials.html) for additional assessment instruments and materials.

Study Selection

Titles and abstracts of studies found through the search were reviewed independently for inclusion by three authors (RF, TF, SKL), who then reviewed the full text of questionable articles which were deemed possible candidates for inclusion in the review. The authors discussed discrepancies until agreement was reached.

Coding

Studies were coded for study purpose, intended audience, study design, the number and cancer focus of materials assessed, Web or print form, and findings for reading level and suitability categories. One author (RF) searched references and sent email requests for evaluation instruments to the corresponding author of each included study. Instruments were coded for purpose, categories measured, results, and any reported information about reliability and validity. Findings for each instrument were abstracted based on instrument-specific terms. For instance, studies using the suitability assessment of materials (SAM) reported results as “superior”, “adequate”, or “not suitable”. For studies not using the SAM, we summarized the score for each category of suitability as "not suitable" or "adequate" based on the original study results.

Two coders abstracted data from the full text of each eligible article. One author (RF) coded all of the selected studies and two additional coders (TF and VG) independently coded half of the selected studies. Discrepancies in coding were resolved by consensus; the data were entered into an Access database (Windows 97-2003).

Results

Identification and Description of Eligible Studies

The database searches yielded 636 unique potentially eligible articles. We discarded 618 after reviewing titles and abstracts. Two authors received the full text of the remaining 18 articles. Studies were excluded if they did not formally assess the written materials [12, 13] or if they only evaluated reading level or categories outside the scope of this review, e.g., accuracy and usability [1418]. Eleven articles met our inclusion criteria; together, they evaluated 432 pieces of cancer education material (262 print, 170 Webpages, not necessarily unique; Table 1).

Table 1 Study characteristics

Overview of Instruments

Seven instruments were used to assess categories of suitability (Table 2). Four assessed surface structure elements of cultural appropriateness (appropriate graphics, language, and physical appearance) [3, 1921]. Three assessed use of illustrations or graphics outside of cultural appropriateness and literacy demand beyond reading level [3, 19, 22]. Four assessed content (evident purpose, inclusion of behavioral guides, scope of the material, and inclusion of summary of information) [3, 19, 21, 22] while one assessed an additional category: accuracy of the information [20]. Five assessed layout and typography [3, 1922], and four assessed learning stimulation and motivation (quizzes and suggested action) [3, 22, 23]. Seven studies assessed inter-rater reliability of instruments [19, 21, 22, 2427]. Two of seven instruments were described having any evidence of validity [3, 22]. The following sections describe the results of each of the studies, grouped according to the instrument used.

Table 2 Instrument characteristics

Suitability Assessment of Materials

The three studies using the SAM assessed all six categories of suitability. Two of these studies focused on prostate cancer print materials and found most materials to be “adequate” or “superior” [25, 26]; in contrast, the study of colorectal cancer webpages [24] found a majority of webpages to be “not suitable”. The most commonly failed factors, for both print and Web-based materials were (1) presentation of information in a behavior-related context (giving patients behavioral guides), (2) summary of key ideas, (3) use of illustrations, ( 4) use of interactive features (e.g. quizzes), and (5) use of culturally appropriate visuals (specific data not shown).

Suitability and Comprehensibility Assessment of Materials (SAM ± CAM)

Helitzer and colleagues [22] 22 modified the SAM by adding other variables that address comprehensibility: presentation of numerical information, use of behavioral theory, and use of framing and tone of messages and used this modified version to assess comprehensability of webpages and newspaper and magazine articles, health education brochures, and revised insurance and health system forms. These reviewers found most cervical cancer and human papillomavirus materials were "adequate" (68%) or "superior" (20%). Webpages and health education brochures were ranked in the "superior" category more often than other materials.

Readability Assessment Instrument

The study using Readability Assessment INstrument (RAIN) to evaluate cancer brochures [19] reported that all ten assessed materials met RAIN criteria for pronoun references, connectives, unity, color, and highlighting of titles and subtitles. Brochures varied in terms of using signaling devices, print style, and adjunct questions. Few brochures used substitutions or illustrations. None of the assessed brochures met RAIN criteria for sentence structure, audience appropriateness, writing style, use of illustrations, or print size.

Cultural Sensitivity Assessment Tool

The Cultural Sensitivity Assessment Tool (CSAT) was developed to assess the cultural appropriateness of materials intended for African Americans, which is still its primary use [20]. The four studies using CSAT evaluated materials intended for racial and ethnic populations. One of the reviews found that the print materials focused on breast cancer were not culturally sensitive [28]. Another found just over half of the print materials on breast cancer were culturally sensitive, whereas the majority of print materials on prostate cancer were not [29]. The third review found some prostate cancer prevention webpages scored in the culturally sensitive range despite “not mentioning high-risk racial or ethnic groups” [30]. The fourth study only found two materials (specifically written for African Americans) to be culturally sensitive [31]. Most materials were reported to have low scores in the visual category because they did not present images of the intended racial or ethnic group.

Cultural Sensitivity Checklist

Two studies included in this review used the Cultural Sensitivity Checklist (CSC) in addition to the CSAT to assess what Resnicow and colleagues [32] classify as deeper aspects of cultural sensitivity such as beliefs and perceptions about dying, symbolic represenations of health and illness, traditional medicine, latent messages, and themes [23]. Both studies [30, 31] reported that less than 25% of reviewed materials assessed: (1) racial or ethnic perceptions of cancer risk, in the racial or ethnic group, (2) cultural beliefs about health, or (3) traditional or alternative medicine as methods of cancer prevention or treatment.

Bloch’s Ethnic/Cultural Assessment

One study [27] adapted four questions from Bloch’s Ethnic/Cultural Assessment Guide [33] to assess the cultural appropriateness of information targeted to ethnic/racial groups on CancetNet’s webpages: (1) did the written information identify or target particular groups? (2) Did the written information contain statements about the target groups’ beliefs toward life, death, and illness? (3) Was the written information presented in the language(s) of the target group(s)? (4) Did written information address cultural healing systems or practices? The results showed that although the materials targeted seven different ethnic groups, information for each ethnic group was presented in the same way without regard to culture. The only exception was that materials targeted to Hispanics were available in Spanish and English.

Masset’s Checklist

The study using Masset’s Checklist found that the majority of the 26 assessed materials did not meet recommended criteria in any area [21]. Eighty-one percent of materials did not repeat key messages. None of the materials included a review section. Most materials (85%) did not define unfamiliar terms. Materials with graphics consisted of simple line drawings containing “extraneous background detail” (84%) and did not represent Hispanic persons (the intended audience) or settings. Typographic factors (font size, white space, bulleting format, and all caps) were generally outside of recommended guidelines.

Discussion

The purpose of this systematic review was to describe instruments that measure categories of suitability and summarize their published results in evaluating print and Web-based cancer education materials. Among the 11 included studies, we found seven distinct instruments that were used to evaluate as many as 432 cancer education materials. These instruments most frequently assessed the cultural appropriateness of the materials, and most materials failed the criteria for this category. Most studies assessed inter-rater reliability. A surprising finding was that only two of the instruments included in this review (SAM and SAM + CAM) were described as having any evidence of validity, and this evidence was limited to content validity [3, 22]. The RAIN and CSAT have been used to evaluate patient education materials in multiple studies, but no indication of validity of any kind was reported for these instruments.

We found differences in reading level scores that appeared to reflect the varying bases of the instruments used. For example, studies using the Simplified Measure of Goobledygook (SMOG) to assess reading level reported higher mean scores than studies using the Flesh-Kincaid (FK). One study that used both the SMOG and FK reported a difference of two grade levels between the two scores [30]. This is consistent with research which has reported that the FK (the “readability” formula used in Microsoft Word) tends to score written materials at a lower reading level than other reading level formulas [2]. This variability in scoring provides further support for not using reading level alone to evaluate the literacy demand of cancer education materials.

Cultural appropriateness was frequently found to be only “adequate” or “not suitable” because materials did not present images of the intended group or lacked images in general [21, 2629]. One study using the CSAT reported that even webpages rated as culturally sensitive did not present images or mention the intended minority group—a limitation of CSAT’s scoring process—while results from the CSC point out deep structure characteristics (cultural, social, historical, and environmental factors) that were missing from webpages [30]. Similar to the CSAT, the SAM scores materials with “neutral” images or that do not present images of the intended minority group in the “adequate” category [3]. To improve the usefulness of print and Web-based cancer education materials, health resources must be created with deep structure characteristics in mind. Cultural health beliefs, practices, and communication preferences differ among ethnic groups [34]. Therefore, it is important to consider these factors when designing cancer education materials.

Other researchers have suggested that some patients prefer formats such as audio or video [8, 3537]. These formats can be used alone or to supplement reading materials and can be personalized to the audience [38, 39]. Suitability measures need to be created or adapted before they can be used for audio visual media, however.

In this review, we found that some cancer education materials might actually be adequate in terms of categories of suitability even though the reading levels are too high. This may be particularly true in specialized areas, including cancer education. In such areas, readability formulas may overestimate the difficulty of commonly used medical terms based on word length alone. Moreover, the most frequently used readability formulas do not take into account the use of glossaries when scoring a material. Thus, improving reading level alone will not guarantee that patients will understand or use education materials; other categories of suitability should be taken into consideration when developing or updating print or Web-based materials.

Strengths and Limitations

We believe that our review is strengthened by the use of Doak, Doak, and Root’s definition of categories of suitability. These categories include many of the elements that influence comprehension of written materials. A limitation of limiting our definition to six categories of suitability was that we missed assessing materials that evaluated other important aspects of suitability such as the quality (accuracy or up-to-dateness) of information and usability of websites.as Others have conducted studies to assess the quality of health education materials [18, 40, 41], and found that materials lacked information on treatment options, contained inaccurate or out-of-date information, or were missing information regarding effectiveness of treatment, among other issues; the US Department of Health and Human Services has developed a guidebook for research-based approaches to Web design (http://usability.gov/pdfs/guidelines_book.pdf); the World Wide Web Consortium has also released a set of Web accessibility standards and guidelines (http://www.w3.org/WAI/guid-tech.html). Quality, usability, and accessibility issues were outside the scope of this review, but are important points to consider when creating suitable print- or Web-based materials.

The seven instruments have not been compared to one another directly. And, our review is limited because we were unable to determine how much the scoring of categories of suitability differs between instruments. Using the studies to compare the instruments indirectly was hampered because we could not reliably determine the degree of overlap between the sets of materials evaluated across studies, although the search methods described by the authors do suggest that most studies assessed representative samples of materials were assembled from national organizations.

Conclusions

Assessment of the six categories of suitability has not been widely reported in the published literature evaluating cancer education print and Web-based materials. Findings from studies included in this review indicate that there are numerous shortcomings in materials related to suitability, the most frequently reported being in literacy demand and cultural appropriateness. Developers of cancer education materials need to be aware of these shortcomings and consider including an assessment of categories of suitability when reviewing current materials and when developing new materials. Future research is still needed to determine whether improving materials based on the categories of suitability also goes beyond increasing appeal among users to increasing knowledge and desired behavior change.

Finally, reliable and valid assessment tools are important to the accurate assessment of suitability. Five of the instruments reported any evidence of reliability (e.g., inter-rater reliability) and only two reported content validity. Therefore, further study of the categories suitability, including testing the measures for reliability and validity, is needed. Only then, can we develop, evaluate, and present materials that patients can dependably understand..