Introduction

Health literacy is a public health concern, the severity of which varies based on age, education level, and socioeconomic status. According to the US Department of Health and Human Services, 77 million adults have basic or below basic health literacy [1]. Those who fall within this group can experience medical complications such as increased hospitalizations and higher mortality rates, even when accounting for confounders such as age, sex, and insurance status [2,3,4].

Many patients are increasingly using Internet resources to supplement their knowledge about medicine [1]. Therefore, it is necessary for electronic health information to be conveyed in an effective manner to patients of all literacy levels. According to the American Medical Association (AMA), the majority of Americans read at or below the 8th grade level. Consequently, the AMA and the National Institutes of Health (NIH) recommend that patient educational resources be written at a 3rd–7th grade level [5, 6]. Unfortunately, previous studies have shown that patient resources for radiology tests and procedures are written at a much higher reading level [7,8,9,10]. As one example, a study by Hansberry et al. found that articles on RadiologyInfo.org, a patient-targeted resource sponsored by the American College of Radiology (ACR) and the Radiological Society of North America (RSNA), were written at the 10th—14th grade level [7].

A wide range of interventional musculoskeletal procedures are performed for both diagnostic and treatment purposes. To our knowledge, there has been no prior comprehensive study evaluating the readability of information on musculoskeletal procedures. Given that patients have a general need for more information prior to undergoing procedures, it is important to ensure that accessible online resources are written appropriately [11]. The purpose of this study was to assess the readability of patient-targeted online information on musculoskeletal radiology procedures. We hypothesized that the reading grade level of online material is higher than the AMA and NIH recommendation.

Materials and methods

Internet data extraction

Several search terms were used to identify common musculoskeletal radiology procedures (Table 1). Each term was input into Google, Yahoo!, and Bing search engines, which facilitate 98% of the internet searches performed in the USA [12]. Uniform resource locators (URLs) from the first 3 pages of results for each search engine were recorded. Searches were performed on the same day to ensure that there were no day-to-day changes in the results.

Table 1 Search terms used for each procedure type

Website selection

The first 3 pages of search results for each term yielded (mean ± standard deviation) 29.5 ± 0.76 websites from Google, 29.6 ± 0.99 from Yahoo!, and 34.5 ± 2.66 from Bing. Each website was manually screened with application of exclusion criteria as shown in Fig. 1. Patient-targeted websites were often marked as such; for example, they had words like “for patients” in their header. They were also verified when content was specifically directed at patients. This included text with frequent use of “you” as the subject, phrases such as “your doctor,” and question and answer formatting or subheadings using the pronoun “I,” such as “how do I prepare for my procedure.” Mainstream media articles were classified as patient-targeted since their audience is broad and they are not directed towards physicians. Some websites had multiple hyperlinks to separate webpages pertaining to the same topic. For these, all of the separate webpages were included as one text. One of the websites had videos but was included because associated transcripts were available. This yielded 384 unique patient-targeted websites for analysis, which were then classified as being from academic or non-academic sources. Academic websites included those produced by academic centers or organizations such as the American College of Radiology (ACR). A fellowship-trained musculoskeletal radiologist with 8 years of radiology experience independently verified the appropriateness of the websites for inclusion.

Fig. 1
figure 1

Website selection criteria

Readability analysis

The text from each of the 384 websites was copied into individual documents (Microsoft Word, Office Professional Plus 2016). All references, headings, and formatting were removed. Punctuation was added to all bullet points. Each document was then evaluated using a readability text analysis program (Readability Studio 2015, Oleander Software). Readability scores were calculated using 6 validated metrics: the Flesch reading ease (FRE), Flesch-Kincaid Grade Level (FKGL), Gunning Fog index (GFI), Fry score, Raygor estimate, and simple measure of gobbledygook (SMOG) grading [13,14,15,16,17,18]. All of these metrics measure text readability, though each does so in a unique way.

The FRE is calculated with the number of syllables per 100 words (word length) and the average sentence length.

$$ \mathrm{FRE}=206.835-\left(846\ast \mathrm{word}\ \mathrm{length}\right)-\left(1.015\ast \mathrm{sentence}\ \mathrm{length}\right) $$

The FRE scale falls between 0 (most difficult, lowest readability) and 100 (high readability, easy to read) [13]. The FKGL is a modification of the FRE formula which reports the grade level of the text rather than a 100-point scale [14].

$$ \mathrm{FKGL}=\left(0.39\ast \mathrm{sentence}\ \mathrm{length}\right)+\left(11.8\ast \mathrm{word}\ \mathrm{length}\right) $$

The GFI includes both the average sentence length and the number of complex words (words with three or more syllables).

$$ \mathrm{GFI}=0.4\ast \left(\mathrm{average}\ \mathrm{sentence}\ \mathrm{length}+\mathrm{percentage}\ \mathrm{of}\ \mathrm{hard}\ \mathrm{words}\right) $$

The resultant score shows a reading grade level similar to the FKGL [15]. The Fry score is based on a readability graph that takes into account the average number of sentences and average number of syllables per 100 words. Upon plotting these numbers on a readability graph, the grade level can be estimated [16]. The Raygor estimate is based on a graph derived from plotting the average number of sentences and average number of words with 6+ characters per 100 words [17]. Finally, the SMOG grading is calculated by selecting 30 sentences from the text, 10 each from beginning, middle, and end, then counting complex words of 3 or more syllables in these sentences [18].

$$ \mathrm{SMOG}=\sqrt{\mathrm{complex}\ \mathrm{words}}+3. $$

As additional measures of complexity, automated analyses were performed for each webpage including the percentage of words with 3+ syllables, percentage of words with 6+ characters, percentage of sentences with 22+ words (difficult sentences), and average number of words per sentence (sentence length).

Statistics were performed using an independent t test in Excel 2019 MSO (version 2003, Microsoft Corporation, Seattle, WA) with significance set to p ≤ 0.05.

Results

The reading grade level for all websites analyzed without regard for procedure type is displayed for 5 of the metrics (FKGL, Fry, GFI, Raygor estimate, and SMOG) in Fig. 2. The mean reading grade levels were all above the AMA and NIH recommendation for each of these metrics. Breakdowns of individual websites for the Fry and Raygor estimate metrics are shown in Figs. 3 and 4, respectively. These graphs plot each website as a single point with corresponding reading grade level. The majority of the websites had text written above the AMA and NIH recommendation on both of these metrics.

Fig. 2
figure 2

Readability scores for all websites. The green shaded area denotes AMA- and NIH-recommended grade level

Fig. 3
figure 3

Fry Readability Graph for all websites. Each website is plotted as a single point. The green shaded area denotes the AMA- and NIH-recommended grade level. The gray zones represent areas of two extremes: long words in the top right corner (words with more syllables) and long sentences in the bottom left corner (fewer number of sentences per 100 words). Plotted points in the gray zones are invalid for analysis (13)

Fig. 4
figure 4

Raygor readability estimate for all websites. Each website is plotted as a single point. The green shaded area denotes the AMA- and NIH-recommended grade level. The gray zones represent areas of two extremes: long words in the bottom right corner (words with more syllables) and long sentences in the top left corner (fewer number of sentences per 100 words). Plotted points in the gray zones are invalid for analysis

The 6th metric (FRE) is shown in Fig. 5. Unlike the other readability measures, the FRE produces scores ranging from 0 to 100 for each website analyzed rather than a reading grade level (lower values indicate more difficult text). A value of below 70 denotes a 7th grade or higher reading level. The majority of websites were therefore above the AMA and NIH recommendation on this metric as well.

Fig. 5
figure 5

Flesch reading ease for all websites. The green shaded area denotes the AMA-and NIH-recommended grade level (values of 70 and below correlate to 7th grade and above reading level), i.e., the lower the index value, the more difficult the text

Detailed information for all procedures with a single composite mean reading grade level for each is shown in Table 2. All of the procedures had websites with text written at or above the high school level (10th to 14th grade). Only 21 websites (5.5%) were written at the AMA- and NIH-recommended level (Supplement A). Prolotherapy was the procedure with the highest reading grade level (~ 14th grade). Arthrogram was the procedure with the lowest reading grade level (~ 10th grade).

Table 2 Mean readability for all metrics

The mean reading grade level was compared between all the websites on the 1st page of search engine results to all the websites on the 2nd and 3rd pages with no difference identified across all procedures (11.9 ± 1.4 vs. 11.6 ± 1.2, p = 0.47). For academic vs. non-academic websites, those on nerve block showed a significantly higher mean reading grade level on non-academic sites (11.6 ± 2.4 vs. 10.0 ± 1.2, p = 0.025). There was otherwise no difference for the remainder of the procedures (Table 3).

Table 3 Comparison of mean readability grade level of academic vs. non-academic websites

When measuring reading complexity, prolotherapy had websites with text containing the highest percentage of words with 3+ syllables and 6+ characters (complex words) and sentences with 22+ words (complex sentences). Data for other procedures are shown in Table 4.

Table 4 Word and sentence complexity for all websites

Discussion

This investigation showed that patient-targeted online information on musculoskeletal radiology procedures is written at the 10th–14th grade level—much higher than the AMA- and NIH-recommended 3rd–7th grade level. There has been limited assessment of the readability of information on minimally invasive procedures, though this is not a problem exclusive to musculoskeletal radiology. Our results are comparable with one study which found that online procedure material in interventional radiology was written at a 10th–15th grade level [19]. They are also similar to another study which reported a 11th–12th grade reading level for information on breast lesions requiring biopsy or surgery [8]. Expanding the scope more widely within medicine, the same trends have also been found in other subspecialties including orthopedics, pediatrics, and ophthalmology [20,21,22].

A total of 36% of the US adult population has basic or below basic health literacy, which can compromise the ability of those people to obtain appropriate medical education [23]. This is further exacerbated when literature is written at too high of a reading level. Low health literacy has been associated with significant negative outcomes including increased mortality [2,3,4]. Given the minimally invasive nature of musculoskeletal procedures, more applicable negative outcomes to consider include an increase in post-procedural complications, requirement of longer recovery time, and increase in anxiety levels [24].

The reading level for nearly all procedures did not depend on whether the websites originated from an academic source. This is consistent with a prior study which found that readability for online materials pertaining to pediatric patients was similar across authors and institution groups, both academic and non-academic [22]. There are a couple of potential explanations for this. First, non-academic materials may fail to adapt language to the patient level because they reference academic sources. Second, there may be a general lack of awareness among content producers about the need to write material at a lower reading level. Apart from this, similar high reading levels were also found pervasively on the first three pages of search results. Taken together, these findings highlight the difficulty that patients may have in finding suitable education material, even when motivated to perform a comprehensive search.

Our results indicate that simple interventions can lower the readability of patient information to be closer to the AMA and NIH recommendation. The webpages we analyzed had a high proportion of complex words (3+ syllables or 6+ characters) and difficult sentences (22+ words). Readability would be enhanced if one or two-syllable words, shorter words, and shorter sentences were used more frequently. Similar interventions and other strategies for improving communication with patients have been advocated by the AMA and Center for Disease Control and Prevention (CDC). They suggest use of the active voice, limiting contents to only what is necessary, and avoidance of complex tables and graphs [5, 25]. Other techniques to improve comprehension also include the use of more colors and illustrations [26].

The following is an example of how these interventions can be helpful. The text below is a sentence excerpt taken from one of the websites in the study, which has a mean reading grade level of 14.7 based on the 6 metrics we used:

“Bone biopsies may be used to confirm the diagnosis of a bone disorder, investigate an abnormality, determine the cause of pain or infection, or distinguish bone tumor from other conditions.” [27].

The same information can be rewritten with a reduction in the number of syllables and characters in words and decreased sentence length, resulting in a mean reading grade level of 6.7:

“A bone biopsy is done to look at bone diseases. It gives your doctor information to find out if there is something abnormal. It can help to find the cause of pain or test for cancer and other problems.”

As another illustrative example, the following sentence excerpt from the description of an arthrogram has a mean reading grade level of 14.2:

“The procedure is often used to help diagnose persistent, unexplained joint pain or discomfort. In some cases, local anesthetic medications or steroids may be injected into the joint along with the contrast material.” [28].

Using the same principles, this can be rewritten to a mean reading grade level of 6.5:

“This test helps to find out what is causing the joint pain that bothers you. Sometimes, numbing or steroid medications are put into the joint with the dye fluid.”

Implementation of some of these strategies in the revision of reading materials has resulted in significant improvement of readability levels [21]. More importantly, patients have demonstrated enhanced understanding of these revised materials [21, 26, 29]. Although similar initiatives have taken place in radiology, Bange et al. have recently shown that some reading materials still persist at too high of a level, even after a 5-year evolutionary period [7, 30].

When deciding to revise patient education materials, it is important to consider that not every patient may benefit from lowering the reading grade levels. Lower levels may not lead to increased comprehension and satisfaction for everyone [21, 31, 32]. Simplification of written texts may eliminate language details and nuances desired by some patients. It is conceivable that those with higher levels of education or health literacy may want more complex reading material. To account for this, some authors have advocated having two sets of documents available—one for easier comprehension and another that contains more extensive details [33].

Our study has clinical relevance and notable strengths. First, we evaluated literature from the Internet, which is where a large percentage of the population finds information [1, 34]. Second, although more than 90% of Internet users do not look past the very first page of results on any given search engine [34, 35], we were comprehensive with our methods by analyzing websites from the first three pages of results on separate search engines.

We also acknowledge that there are some limitations. First, the webpages we evaluated were not representative of all reading material for each procedure and not all types of procedures were included. We did use several search terms to capture as many unique websites as possible, though variations in the keywords would have produced different results. Second, although there is evidence that visual aids are useful for improving recall, we were unable to assess most of the videos and pictures and their role in improving patient comprehension [36, 37]. Third, individual reading level metrics that were used have their own limitations. For example, there is variability of the emphasis on word and sentence length, as well as the assumption that increased word or sentence length is linearly correlated to reading difficulty [13,14,15,16,17,18, 38]. We did manage to mitigate some of this through the use of 6 separate validated metrics, each of which demonstrated similar results. Finally, and most importantly, we did not directly assess patient understanding, which is a critical outcome that deserves further study.

In conclusion, patient-targeted online websites on musculoskeletal radiological procedures are written at 10th–14th grade reading level, which is well beyond the AMA and NIH recommendation (3rd–7th grade). Readability can be lowered by decreasing text complexity through limitation of high-syllable words and reduction in word and sentence length.